public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code
@ 2007-10-28 1:46 lucier at math dot purdue dot edu
2007-10-28 1:49 ` [Bug regression/33928] " lucier at math dot purdue dot edu
` (115 more replies)
0 siblings, 116 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-10-28 1:46 UTC (permalink / raw)
To: gcc-bugs
With these compile options
-Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math
-fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp
With this compiler:
euler-44% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
--enable-languages=c --enable-checking=release --with-gmp=/pkgs/gmp-4.2.2
--with-mpfr=/pkgs/gmp-4.2.2
Thread model: posix
gcc version 4.3.0 20071026 (experimental) [trunk revision 129664] (GCC)
With the following routine compiled with gcc-4.2.2 you get
(time (direct-fft-recursive-4 a table))
366 ms real time
366 ms cpu time (366 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults
while with today's mainline you get
(time (direct-fft-recursive-4 a table))
448 ms real time
448 ms cpu time (448 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults
I've isolated that one routine and I'll add it at the end of an attachment;
unfortunately there are a lot of declarations and global data that are
difficult to winnow.
There is really only one main loop in the routine, the one that begins at
___L19_direct_2d_fft_2d_recursive_2d_4. This loop was scheduled in 102 cycles
(sched2) on 4.4.2 and in 134 cycles in mainline.
--
Summary: 33% performance slowdown from 4.2.2 in floating-point
code
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: regression
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: lucier at math dot purdue dot edu
GCC build triplet: x86_64-unknown-linux-gnu
GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] 33% performance slowdown from 4.2.2 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
@ 2007-10-28 1:49 ` lucier at math dot purdue dot edu
2007-10-28 12:05 ` [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 " rguenth at gcc dot gnu dot org
` (114 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-10-28 1:49 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from lucier at math dot purdue dot edu 2007-10-28 01:49 -------
Created an attachment (id=14418)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14418&action=view)
.i file for fft routine
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
2007-10-28 1:49 ` [Bug regression/33928] " lucier at math dot purdue dot edu
@ 2007-10-28 12:05 ` rguenth at gcc dot gnu dot org
2007-10-28 15:41 ` [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point code with computed gotos lucier at math dot purdue dot edu
` (113 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-10-28 12:05 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from rguenth at gcc dot gnu dot org 2007-10-28 12:05 -------
Can you attach assembler files? What happens if you use -O2? Why do you need
-fno-strict-aliasing? Does -fno-ivopts help?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point code with computed gotos
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
2007-10-28 1:49 ` [Bug regression/33928] " lucier at math dot purdue dot edu
2007-10-28 12:05 ` [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 " rguenth at gcc dot gnu dot org
@ 2007-10-28 15:41 ` lucier at math dot purdue dot edu
2007-10-28 15:42 ` lucier at math dot purdue dot edu
` (112 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-10-28 15:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from lucier at math dot purdue dot edu 2007-10-28 15:41 -------
Created an attachment (id=14423)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14423&action=view)
Assembly from 4.2.2
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point code with computed gotos
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (2 preceding siblings ...)
2007-10-28 15:41 ` [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point code with computed gotos lucier at math dot purdue dot edu
@ 2007-10-28 15:42 ` lucier at math dot purdue dot edu
2007-10-28 15:45 ` lucier at math dot purdue dot edu
` (111 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-10-28 15:42 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from lucier at math dot purdue dot edu 2007-10-28 15:42 -------
Created an attachment (id=14424)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14424&action=view)
assembly from 4.3.0
I had to remove the "static" from the declaration of direct-fft-recursive to
get assembly. (In the larger file the address of direct-fft-recursive is
eventually put into an array.)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point code with computed gotos
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (3 preceding siblings ...)
2007-10-28 15:42 ` lucier at math dot purdue dot edu
@ 2007-10-28 15:45 ` lucier at math dot purdue dot edu
2007-10-28 15:46 ` lucier at math dot purdue dot edu
` (110 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-10-28 15:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from lucier at math dot purdue dot edu 2007-10-28 15:45 -------
Created an attachment (id=14425)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14425&action=view)
assembly after replacing -O1 with -O2
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point code with computed gotos
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (4 preceding siblings ...)
2007-10-28 15:45 ` lucier at math dot purdue dot edu
@ 2007-10-28 15:46 ` lucier at math dot purdue dot edu
2007-10-28 16:05 ` lucier at math dot purdue dot edu
` (109 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-10-28 15:46 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from lucier at math dot purdue dot edu 2007-10-28 15:45 -------
Created an attachment (id=14426)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14426&action=view)
assembly after replacing -O1 with -O2
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point code with computed gotos
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (5 preceding siblings ...)
2007-10-28 15:46 ` lucier at math dot purdue dot edu
@ 2007-10-28 16:05 ` lucier at math dot purdue dot edu
2007-10-28 16:09 ` lucier at math dot purdue dot edu
` (108 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-10-28 16:05 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from lucier at math dot purdue dot edu 2007-10-28 16:05 -------
time with -O2 instead of -O1:
with 4.2.2:
(time (direct-fft-recursive-4 a table))
426 ms real time
426 ms cpu time (425 user, 1 system)
no collections
64 bytes allocated
no minor faults
no major faults
with 4.3.0:
(time (direct-fft-recursive-4 a table))
433 ms real time
433 ms cpu time (433 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults
With -O1 -fno-ivopts:
with 4.2.2:
(time (direct-fft-recursive-4 a table))
374 ms real time
374 ms cpu time (374 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults
with 4.3.0:
(time (direct-fft-recursive-4 a table))
443 ms real time
443 ms cpu time (443 user, 0 system)
no collections
64 bytes allocated
1 minor fault
no major faults
Why -fno-strict-aliasing: I don't need it for this particular routine, but in
the rest of the file is part of a bignum library that accesses the bignum
digits as arrays of either 8-, 32-, or 64-bit unsigned ints, and it hasn't been
rewritten to use unions of arrays. (This is part of the runtime system of a
Scheme implementation, and there are other places that just cast pointers to
achieve low-level things.)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point code with computed gotos
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (6 preceding siblings ...)
2007-10-28 16:05 ` lucier at math dot purdue dot edu
@ 2007-10-28 16:09 ` lucier at math dot purdue dot edu
2007-10-28 16:38 ` rguenth at gcc dot gnu dot org
` (107 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-10-28 16:09 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from lucier at math dot purdue dot edu 2007-10-28 16:08 -------
Subject: Re: 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point
code
On Oct 28, 2007, at 8:05 AM, rguenth at gcc dot gnu dot org wrote:
> ------- Comment #2 from rguenth at gcc dot gnu dot org 2007-10-28
> 12:05 -------
> Can you attach assembler files? What happens if you use -O2? Why
> do you need
> -fno-strict-aliasing? Does -fno-ivopts help?
I think I've answered your questions in the attachments and comments
to the PR.
Brad
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point code with computed gotos
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (7 preceding siblings ...)
2007-10-28 16:09 ` lucier at math dot purdue dot edu
@ 2007-10-28 16:38 ` rguenth at gcc dot gnu dot org
2007-10-28 16:39 ` [Bug regression/33928] [4.3 Regression] " rguenth at gcc dot gnu dot org
` (106 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-10-28 16:38 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from rguenth at gcc dot gnu dot org 2007-10-28 16:38 -------
The main difference I see is that 4.2 avoids re-use of %eax as index register:
.L34:
movq %r11, %rdi
addq 8(%r10), %rdi
movq 8(%r10), %rsi
movq 8(%r10), %rdx
movq 40(%r10), %rax
leaq 4(%r11), %rbx
addq %rdi, %rsi
leaq 4(%rdi), %r9
movq %rdi, -8(%r10)
addq %rsi, %rdx
leaq 4(%rsi), %r8
movq %rsi, -24(%r10)
leaq 4(%rdx), %rcx
movq %r9, -16(%r10)
movq %rdx, -40(%r10)
movq %r8, -32(%r10)
addq $7, %rax
movq %rcx, -48(%r10)
movsd (%rax,%rcx,2), %xmm12
leaq (%rbx,%rbx), %rcx
movsd (%rax,%rdx,2), %xmm3
leaq (%rax,%r11,2), %rdx
addq $8, %r11
movsd (%rax,%r8,2), %xmm14
cmpq %r11, %r13
movsd (%rax,%rsi,2), %xmm13
movsd (%rax,%r9,2), %xmm11
movsd (%rax,%rdi,2), %xmm10
movsd (%rax,%rcx), %xmm8
...
while 4.3 always re-loads %rax as index:
.L26:
leaq 4(%rdi), %rdx
movq %rdi, %rax
movq %rdx, -8(%rsp)
addq (%r8), %rax
movq %rax, (%r9)
addq $4, %rax
movq %rax, (%rbp)
movq (%r9), %rax
addq (%r8), %rax
movq %rax, (%r10)
addq $4, %rax
movq %rax, (%rbx)
movq (%r10), %rax
addq (%r8), %rax
movq %rax, (%r11)
movq -64(%rsp), %rcx
addq $4, %rax
movq %rax, (%rcx)
movq (%rsi), %rdx
movq -8(%rsp), %rcx
addq $7, %rdx
movsd (%rdx,%rax,2), %xmm13
movq (%r11), %rax
addq %rcx, %rcx
movsd (%rdx,%rcx), %xmm8
movsd (%rdx,%rax,2), %xmm3
movq (%rbx), %rax
movsd (%rdx,%rax,2), %xmm14
movq (%r10), %rax
movsd (%rdx,%rax,2), %xmm12
movq (%rbp), %rax
movsd (%rdx,%rax,2), %xmm11
movq (%r9), %rax
movsd (%rdx,%rax,2), %xmm10
movq (%r12), %rax
leaq (%rdx,%rdi,2), %rdx
...
the root cause needs to be investigated still.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point code with computed gotos
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (8 preceding siblings ...)
2007-10-28 16:38 ` rguenth at gcc dot gnu dot org
@ 2007-10-28 16:39 ` rguenth at gcc dot gnu dot org
2007-11-12 21:50 ` [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code lucier at math dot purdue dot edu
` (105 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2007-10-28 16:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from rguenth at gcc dot gnu dot org 2007-10-28 16:39 -------
So, confirmed.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Keywords| |missed-optimization
Last reconfirmed|0000-00-00 00:00:00 |2007-10-28 16:39:27
date| |
Summary|33% performance slowdown |[4.3 Regression] 33%
|from 4.2.2 to 4.3.0 in |performance slowdown from
|floating-point code with |4.2.2 to 4.3.0 in floating-
|computed gotos |point code with computed
| |gotos
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (9 preceding siblings ...)
2007-10-28 16:39 ` [Bug regression/33928] [4.3 Regression] " rguenth at gcc dot gnu dot org
@ 2007-11-12 21:50 ` lucier at math dot purdue dot edu
2007-11-12 21:51 ` lucier at math dot purdue dot edu
` (104 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-11-12 21:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from lucier at math dot purdue dot edu 2007-11-12 21:50 -------
I suspected that the slowdown had nothing to do with computed gotos, so I
regenerated the C code using a switch instead of the computed gotos and got the
following:
For that same copy of mainline
gcc version 4.3.0 20071026 (experimental) [trunk revision 129664] (GCC)
:
(time (direct-fft-recursive-4 a table))
470 ms real time
470 ms cpu time (470 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults
For 4.2.2:
(time (direct-fft-recursive-4 a table))
384 ms real time
384 ms cpu time (383 user, 1 system)
no collections
64 bytes allocated
no minor faults
no major faults
So that's almost exactly the same slowdown as with computed gotos.
I changed the subject line to use 22% instead of 33% (I don't know how I got
33% before, perhaps I just mistyped it) and removed the phrase "with computed
gotos".
I'll include the new .i and .s files as attachments.
--
lucier at math dot purdue dot edu changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[4.3 Regression] 33% |[4.3 Regression] 22%
|performance slowdown from |performance slowdown from
|4.2.2 to 4.3.0 in floating- |4.2.2 to 4.3.0 in floating-
|point code with computed |point code
|gotos |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (10 preceding siblings ...)
2007-11-12 21:50 ` [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code lucier at math dot purdue dot edu
@ 2007-11-12 21:51 ` lucier at math dot purdue dot edu
2007-11-12 21:52 ` lucier at math dot purdue dot edu
` (103 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-11-12 21:51 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from lucier at math dot purdue dot edu 2007-11-12 21:51 -------
Created an attachment (id=14534)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14534&action=view)
.i file using a switch instead of computed gotos
This is the generated code with a switch instead of computed gotos.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (11 preceding siblings ...)
2007-11-12 21:51 ` lucier at math dot purdue dot edu
@ 2007-11-12 21:52 ` lucier at math dot purdue dot edu
2007-11-12 21:53 ` lucier at math dot purdue dot edu
` (102 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-11-12 21:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from lucier at math dot purdue dot edu 2007-11-12 21:52 -------
Created an attachment (id=14535)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14535&action=view)
4.2.2 assembly for code using switch.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (12 preceding siblings ...)
2007-11-12 21:52 ` lucier at math dot purdue dot edu
@ 2007-11-12 21:53 ` lucier at math dot purdue dot edu
2007-11-19 6:06 ` pinskia at gcc dot gnu dot org
` (101 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-11-12 21:53 UTC (permalink / raw)
To: gcc-bugs
------- Comment #14 from lucier at math dot purdue dot edu 2007-11-12 21:53 -------
Created an attachment (id=14536)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14536&action=view)
4.3.0 assembly for code using a switch
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (13 preceding siblings ...)
2007-11-12 21:53 ` lucier at math dot purdue dot edu
@ 2007-11-19 6:06 ` pinskia at gcc dot gnu dot org
2007-11-27 5:53 ` mmitchel at gcc dot gnu dot org
` (100 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-11-19 6:06 UTC (permalink / raw)
To: gcc-bugs
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pinskia at gcc dot gnu dot
| |org
Target Milestone|--- |4.3.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (14 preceding siblings ...)
2007-11-19 6:06 ` pinskia at gcc dot gnu dot org
@ 2007-11-27 5:53 ` mmitchel at gcc dot gnu dot org
2007-11-30 5:39 ` bonzini at gnu dot org
` (99 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-11-27 5:53 UTC (permalink / raw)
To: gcc-bugs
------- Comment #15 from mmitchel at gcc dot gnu dot org 2007-11-27 05:53 -------
I've marked this P1 because I'd like to see us start to explain these kinds of
dramatic performance changes. If we can explain the issue coherently, we may
well decide that it's not important to fix it, but I think we ought to force
ourselves to figure out what's going on.
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (15 preceding siblings ...)
2007-11-27 5:53 ` mmitchel at gcc dot gnu dot org
@ 2007-11-30 5:39 ` bonzini at gnu dot org
2007-11-30 14:47 ` lucier at math dot purdue dot edu
` (98 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2007-11-30 5:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #16 from bonzini at gnu dot org 2007-11-30 05:39 -------
One suspect is fwprop. Anyone can confirm?
--
bonzini at gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bonzini at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (16 preceding siblings ...)
2007-11-30 5:39 ` bonzini at gnu dot org
@ 2007-11-30 14:47 ` lucier at math dot purdue dot edu
2007-11-30 14:58 ` bonzini at gnu dot org
` (97 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-11-30 14:47 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from lucier at math dot purdue dot edu 2007-11-30 14:47 -------
Subject: Re: [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in
floating-point code
On Nov 30, 2007, at 12:39 AM, bonzini at gnu dot org wrote:
> One suspect is fwprop. Anyone can confirm?
How does one turn off fwprop? It doesn't seem to like "-fno-fwprop".
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (17 preceding siblings ...)
2007-11-30 14:47 ` lucier at math dot purdue dot edu
@ 2007-11-30 14:58 ` bonzini at gnu dot org
2007-12-01 18:59 ` lucier at math dot purdue dot edu
` (96 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2007-11-30 14:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from bonzini at gnu dot org 2007-11-30 14:58 -------
It would be -fno-forward-propagate, but what I meant is that the changes
*connected to* fwprop could be the culprit. One has to look at dumps to
understand if this is the case.
It would be possible, maybe, to put an asm around the problematic basic block,
so that one could plot the number of instructions in that basic block over
time?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (18 preceding siblings ...)
2007-11-30 14:58 ` bonzini at gnu dot org
@ 2007-12-01 18:59 ` lucier at math dot purdue dot edu
2008-01-09 14:18 ` rguenth at gcc dot gnu dot org
` (95 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2007-12-01 18:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #19 from lucier at math dot purdue dot edu 2007-12-01 18:59 -------
Subject: Re: [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in
floating-point code
On Nov 30, 2007, at 9:58 AM, bonzini at gnu dot org wrote:
> -fno-forward-propagate
I don't know how to debug this, that's clear enough, but adding -fno-
forward-propagate as an option doesn't change the code at all.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (19 preceding siblings ...)
2007-12-01 18:59 ` lucier at math dot purdue dot edu
@ 2008-01-09 14:18 ` rguenth at gcc dot gnu dot org
2008-01-09 19:21 ` lucier at math dot purdue dot edu
` (94 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-01-09 14:18 UTC (permalink / raw)
To: gcc-bugs
------- Comment #20 from rguenth at gcc dot gnu dot org 2008-01-09 12:45 -------
Can we have updated measurements please? Also I don't think this bug should be
P1.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (20 preceding siblings ...)
2008-01-09 14:18 ` rguenth at gcc dot gnu dot org
@ 2008-01-09 19:21 ` lucier at math dot purdue dot edu
2008-01-12 18:03 ` rguenth at gcc dot gnu dot org
` (93 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2008-01-09 19:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #21 from lucier at math dot purdue dot edu 2008-01-09 18:44 -------
The assembler is identical to that in the third attachment and the time is
basically the same (other things were going on at the same time):
(time (direct-fft-recursive-4 a table))
465 ms real time
466 ms cpu time (466 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults
euler-86% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
--enable-languages=c --enable-checking=release --with-gmp=/pkgs/gmp-4.2.2
--with-mpfr=/pkgs/gmp-4.2.2 --enable-gather-detailed-mem-stats
Thread model: posix
gcc version 4.3.0 20080109 (experimental) [trunk revision 131427] (GCC)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (21 preceding siblings ...)
2008-01-09 19:21 ` lucier at math dot purdue dot edu
@ 2008-01-12 18:03 ` rguenth at gcc dot gnu dot org
2008-01-21 20:01 ` ubizjak at gmail dot com
` (92 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-01-12 18:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #22 from rguenth at gcc dot gnu dot org 2008-01-12 17:56 -------
I'm downgrading this to P2.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P1 |P2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (22 preceding siblings ...)
2008-01-12 18:03 ` rguenth at gcc dot gnu dot org
@ 2008-01-21 20:01 ` ubizjak at gmail dot com
2008-01-21 23:12 ` lucier at math dot purdue dot edu
` (91 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-21 20:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #23 from ubizjak at gmail dot com 2008-01-21 19:21 -------
It is not possible to create an executable from direct.i. My compilation fails:
(.text+0x20): undefined reference to `main'
/tmp/cc0VOLHm.o: In function `___H_direct_2d_fft_2d_recursive_2d_4':
_num.c:(.text+0xf1): undefined reference to `___gstate'
_num.c:(.text+0x18e): undefined reference to `___gstate'
_num.c:(.text+0x1c7): undefined reference to `___gstate'
_num.c:(.text+0x27b): undefined reference to `___gstate'
_num.c:(.text+0x2e0): undefined reference to `___gstate'
/tmp/cc0VOLHm.o:_num.c:(.text+0x6f0): more undefined references to `___gstate'
follow
Could you attach the source that can be used to create the executable? Or
perhaps a detailed instructions how to create one from sources you already
posted.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (23 preceding siblings ...)
2008-01-21 20:01 ` ubizjak at gmail dot com
@ 2008-01-21 23:12 ` lucier at math dot purdue dot edu
2008-01-22 12:23 ` ubizjak at gmail dot com
` (90 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2008-01-21 23:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #24 from lucier at math dot purdue dot edu 2008-01-21 22:43 -------
Subject: Re: [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in
floating-point code
On Jan 21, 2008, at 2:21 PM, ubizjak at gmail dot com wrote:
> It is not possible to create an executable from direct.i.
That's correct, sorry.
> Could you attach the source that can be used to create the executable?
Here are instructions on how to build and test a modified version of
Gambit, from which I derived direct.i.
Download the file
http://www.math.purdue.edu/~lucier/gcc/test-files/bugzilla/33928/
gambc-v4_1_2.tgz
Build it with the following commands:
> tar zxf gambc-v4_1_2.tgz
> cd gambc-v4_1_2
> ./configure CC='/pkgs/gcc-mainline/bin/gcc -save-temps'
> make -j
If you want to recompile the source after reconfiguring, do
> make mostlyclean
not 'make clean', unfortunately.
Then test it with
> gsi/gsi -e '(define a (time (expt 3 10000000)))(define b (time (* a
> a)))'
The output ends with something like
> (time (##bignum.make (##fixnum.quotient result-length
> (##fixnum.quotient ##bignum.adigit-width ##bignum.fdigit-width)) #f
> #f))
> 4 ms real time
> 5 ms cpu time (3 user, 2 system)
> no collections
> 3962448 bytes allocated
> 968 minor faults
> no major faults
> (time (##make-f64vector (##fixnum.* two^n 2)))
> 5 ms real time
> 5 ms cpu time (1 user, 4 system)
> 1 collection accounting for 5 ms real time (1 user, 4 system)
> 33554464 bytes allocated
> 59 minor faults
> no major faults
> (time (make-w (##fixnum.- log-two^n 1)))
> 30 ms real time
> 31 ms cpu time (17 user, 14 system)
> no collections
> 16810144 bytes allocated
> 4097 minor faults
> no major faults
> (time (make-w-rac log-two^n))
> 28 ms real time
> 28 ms cpu time (16 user, 12 system)
> no collections
> 16826272 bytes allocated
> 4097 minor faults
> no major faults
> (time (bignum->f64vector-rac x a))
> 45 ms real time
> 45 ms cpu time (20 user, 25 system)
> no collections
> -16 bytes allocated
> 8192 minor faults
> no major faults
> (time (componentwise-rac-multiply a rac-table))
> 26 ms real time
> 26 ms cpu time (26 user, 0 system)
> no collections
> -16 bytes allocated
> no minor faults
> no major faults
> (time (direct-fft-recursive-4 a table))
> 445 ms real time
> 445 ms cpu time (445 user, 0 system)
> no collections
> 64 bytes allocated
> no minor faults
> no major faults
> (time (componentwise-complex-multiply a a))
> 24 ms real time
> 24 ms cpu time (24 user, 0 system)
> no collections
> -16 bytes allocated
> no minor faults
> no major faults
> (time (inverse-fft-recursive-4 a table))
> 418 ms real time
> 418 ms cpu time (418 user, 0 system)
> no collections
> 64 bytes allocated
> no minor faults
> no major faults
> (time (componentwise-rac-multiply-conjugate a rac-table))
> 26 ms real time
> 26 ms cpu time (26 user, 0 system)
> no collections
> -16 bytes allocated
> no minor faults
> no major faults
> (time (bignum<-f64vector-rac a result result-length))
> 108 ms real time
> 108 ms cpu time (108 user, 0 system)
> no collections
> 112 bytes allocated
> no minor faults
> no major faults
> (time (* a a))
> 1170 ms real time
> 1170 ms cpu time (1105 user, 65 system)
> 1 collection accounting for 5 ms real time (1 user, 4 system)
> 71266896 bytes allocated
> 17413 minor faults
> no major faults
The time for the routine in direct.i is the time reported for direct-
fft-recursive-4:
> (time (direct-fft-recursive-4 a table))
> 445 ms real time
> 445 ms cpu time (445 user, 0 system)
> no collections
> 64 bytes allocated
> no minor faults
> no major faults
The name of the routine in the .i and .s files is
___H_direct_2d_fft_2d_recursive_2d_4.
By the way, ___H_inverse_2d_fft_2d_recursive_2d_4 is a similar
routine implementing the inverse fft, which, for some reason, goes
faster than the direct (forward) fft.
Brad
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (24 preceding siblings ...)
2008-01-21 23:12 ` lucier at math dot purdue dot edu
@ 2008-01-22 12:23 ` ubizjak at gmail dot com
2008-01-22 12:29 ` [Bug target/33928] " pinskia at gcc dot gnu dot org
` (89 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-22 12:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #25 from ubizjak at gmail dot com 2008-01-22 12:03 -------
Created an attachment (id=14996)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14996&action=view)
Much shorter testcase.
This testcase was used to track down problems with fre pass. Stay tuned for an
analysis.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug target/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (25 preceding siblings ...)
2008-01-22 12:23 ` ubizjak at gmail dot com
@ 2008-01-22 12:29 ` pinskia at gcc dot gnu dot org
2008-01-22 12:38 ` ubizjak at gmail dot com
` (88 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-01-22 12:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #26 from pinskia at gcc dot gnu dot org 2008-01-22 12:07 -------
Really I bet FRE is doing its job and the RA can't do its.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug target/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (26 preceding siblings ...)
2008-01-22 12:29 ` [Bug target/33928] " pinskia at gcc dot gnu dot org
@ 2008-01-22 12:38 ` ubizjak at gmail dot com
2008-01-22 13:24 ` rguenth at gcc dot gnu dot org
` (87 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-22 12:38 UTC (permalink / raw)
To: gcc-bugs
------- Comment #27 from ubizjak at gmail dot com 2008-01-22 12:20 -------
As already noted by Richi in Comment #9, the difference is in usage of %rax.
gcc-4.2 generates:
...
addq $7, %rax
leaq (%rax,%rbp,2), %r10
leaq (%rax,%rdx,2), %rdx
leaq (%rax,%rdi,2), %rdi
movq (%rcx), %rsi
movq (%r13), %rcx
leaq (%rax,%r9,2), %r9
leaq (%rax,%r8,2), %r8
leaq (%rax,%r14,2), %r11
addq $8, %rbp
movsd (%rdx), %xmm3
leaq (%rax,%rsi,2), %rsi
leaq (%rax,%rcx,2), %rcx
...
movsd %xmm7, (%rcx)
subsd %xmm1, %xmm10
addsd %xmm1, %xmm0
movsd %xmm8, (%rsi)
movsd %xmm0, (%rdi)
movapd %xmm12, %xmm0
subsd %xmm3, %xmm12
addsd %xmm3, %xmm0
movsd %xmm0, (%r8)
movsd %xmm10, (%r9)
movsd %xmm12, (%rdx)
jg .L26
where gcc-4.3 limps along with:
...
leaq 7(%rax), %r9
movq %rbx, -64(%rsp)
movq -56(%rsp), %rcx
addq %r10, %r10
movsd 7(%rax,%rdx), %xmm3
movsd (%r9,%rbx,2), %xmm8
movq (%r11), %rbx
movsd 7(%rax,%r10), %xmm5
addq %r8, %r8
addq %rdi, %rdi
movsd 7(%rax,%r8), %xmm12
movsd 15(%rbx), %xmm2
leaq (%r9,%rbp,2), %r9
movsd 7(%rbx), %xmm1
...
movsd %xmm0, 7(%rax,%r9,2)
movapd %xmm10, %xmm0
movsd %xmm7, 7(%rax,%rcx)
subsd %xmm1, %xmm10
addsd %xmm1, %xmm0
movsd %xmm8, 7(%rax,%rsi)
movsd %xmm0, 7(%rax,%rdi)
movapd %xmm12, %xmm0
subsd %xmm3, %xmm12
addsd %xmm3, %xmm0
movsd %xmm0, 7(%rax,%r8)
movsd %xmm10, 7(%rax,%r10)
movsd %xmm12, 7(%rax,%rdx)
jg .L17
The difference is in offseted addresses. Looking at the tree dumps, it is
obvious that the problem is in fre pass.
At the end of the loop (line 685+ in _.034.fre) gcc-4.2 transforms every
seqence of:
D.2013_432 = ___fp_256 + 40B;
D.2014_433 = *D.2013_432;
D.2068_434 = (long int *) D.2014_433;
D.2069_435 = D.2068_434 + 7B;
D.2070_436 = (long int) D.2069_435;
D.2094_437 = ___r3_35 << 1;
D.2095_438 = D.2070_436 + D.2094_437;
D.2096_439 = (double *) D.2095_438;
*D.2096_439 = ___F64V53_431;
D.2013_440 = ___fp_256 + 40B;
D.2014_441 = *D.2013_440;
D.2068_442 = (long int *) D.2014_441;
D.2069_443 = D.2068_442 + 7B;
D.2070_444 = (long int) D.2069_443;
D.2091_445 = ___r4_257 << 1;
D.2092_446 = D.2070_444 + D.2091_445;
D.2093_447 = (double *) D.2092_446;
*D.2093_447 = ___F64V52_430;
D.2013_448 = ___fp_256 + 40B;
D.2014_449 = *D.2013_448;
D.2068_450 = (long int *) D.2014_449;
D.2069_451 = D.2068_450 + 7B;
D.2070_452 = (long int) D.2069_451;
...
into:
D.2013_432 = D.2013_286;
D.2014_433 = D.2014_287;
D.2068_434 = D.2068_288;
D.2069_435 = D.2069_289;
D.2070_436 = D.2070_290;
D.2094_437 = D.2094_366;
D.2095_438 = D.2095_367;
D.2096_439 = D.2096_368;
*D.2096_439 = ___F64V53_431;
D.2013_440 = D.2013_286;
D.2014_441 = D.2014_287;
D.2068_442 = D.2068_288;
D.2069_443 = D.2069_289;
D.2070_444 = D.2070_290;
D.2091_445 = D.2091_357;
D.2092_446 = D.2092_358;
D.2093_447 = D.2093_359;
*D.2093_447 = ___F64V52_430;
D.2013_448 = D.2013_286;
D.2014_449 = D.2014_287;
D.2068_450 = D.2068_288;
D.2069_451 = D.2069_289;
D.2070_452 = D.2070_290;
D.1994_453 = D.1994_258;
D.2040_454 = D.2040_347;
D.2041_455 = D.2041_348;
D.2089_456 = D.2089_349;
D.2090_457 = D.2090_350;
...
and this is optimized in further passes into:
*D.2096 = ___F64V32 + ___F64V45;
*D.2093 = ___F64V31 + ___F64V42;
*D.2090 = ___F64V32 - ___F64V45;
*D.2088 = ___F64V31 - ___F64V42;
*D.2084 = ___F64V28 + ___F64V39;
*D.2081 = ___F64V27 + ___F64V36;
*D.2077 = ___F64V28 - ___F64V39;
*D.2074 = ___F64V27 - ___F64V36;
However, for some reason gcc-4.3 transforms only _some_ instructions (line 708+
in _.085t.fre dump), creating:
D.1683_428 = D.1683_282;
D.1684_429 = D.1684_283;
D.1738_430 = D.1738_284;
D.1739_431 = D.1739_285;
D.1740_432 = D.1740_286;
D.1764_433 = D.1764_362;
D.1765_434 = D.1765_363;
D.1766_435 = D.1766_364;
*D.1766_435 = ___F64V53_427;
D.1683_436 = D.1683_282;
D.1684_437 = *D.1683_436;
D.1738_438 = (long unsigned int) D.1684_437;
D.1739_439 = D.1738_438 + 7;
D.1740_440 = (long int) D.1739_439;
D.1761_441 = D.1761_353;
D.1762_442 = D.1740_440 + D.1761_441;
D.1763_443 = (double *) D.1762_442;
*D.1763_443 = ___F64V52_426;
D.1683_444 = D.1683_282;
D.1684_445 = *D.1683_444;
D.1738_446 = (long unsigned int) D.1684_445;
D.1739_447 = D.1738_446 + 7;
D.1740_448 = (long int) D.1739_447;
...
which leaves us with:
*D.1766 = ___F64V32 + ___F64V45;
*(double *) (D.1761 + (long int) ((long unsigned int) *pretmp.33 + 7)) =
___F64V31 + ___F64V42;
*(double *) ((long int) ((long unsigned int) *pretmp.33 + 7) + (*temp.65 <<
1)) = ___F64V32 - ___F64V45;
*(double *) ((long int) ((long unsigned int) *pretmp.33 + 7) + (*D.1685 <<
1)) = ___F64V31 - ___F64V42;
*(double *) ((long int) ((long unsigned int) *pretmp.33 + 7) + (*temp.61 <<
1)) = ___F64V28 + ___F64V39;
*(double *) ((long int) ((long unsigned int) *pretmp.33 + 7) + (*pretmp.152
<< 1)) = ___F64V27 + ___F64V36;
*(double *) ((long int) ((long unsigned int) *pretmp.33 + 7) + (*pretmp.147
<< 1)) = ___F64V28 - ___F64V39;
*(double *) ((long int) ((long unsigned int) *pretmp.33 + 7) + (*___fp.47 <<
1)) = ___F64V27 - ___F64V36;
and creates unoptimal asm as above.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug target/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (27 preceding siblings ...)
2008-01-22 12:38 ` ubizjak at gmail dot com
@ 2008-01-22 13:24 ` rguenth at gcc dot gnu dot org
2008-01-22 13:25 ` [Bug tree-optimization/33928] " bonzini at gnu dot org
` (86 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-01-22 13:24 UTC (permalink / raw)
To: gcc-bugs
------- Comment #28 from rguenth at gcc dot gnu dot org 2008-01-22 12:38 -------
This is an alias partitioning problem, with --param max-aliased-vops=10000 I
see the sequence optimized by FRE. Or, with the alias-oracle patch for FRE
--param max-fields-for-field-sensitive=1 does the job as well.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |alias
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (28 preceding siblings ...)
2008-01-22 13:24 ` rguenth at gcc dot gnu dot org
@ 2008-01-22 13:25 ` bonzini at gnu dot org
2008-01-22 13:29 ` ubizjak at gmail dot com
` (85 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2008-01-22 13:25 UTC (permalink / raw)
To: gcc-bugs
------- Comment #29 from bonzini at gnu dot org 2008-01-22 12:39 -------
target independent
--
bonzini at gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|target |tree-optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (29 preceding siblings ...)
2008-01-22 13:25 ` [Bug tree-optimization/33928] " bonzini at gnu dot org
@ 2008-01-22 13:29 ` ubizjak at gmail dot com
2008-01-22 13:30 ` rguenth at gcc dot gnu dot org
` (84 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-22 13:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #30 from ubizjak at gmail dot com 2008-01-22 12:52 -------
Please note that for the original testcase (direct.i), even '-O2 --param
max-aliased-vops=100000' doesn't generate expected code.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (30 preceding siblings ...)
2008-01-22 13:29 ` ubizjak at gmail dot com
@ 2008-01-22 13:30 ` rguenth at gcc dot gnu dot org
2008-03-14 17:04 ` [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 " rguenth at gcc dot gnu dot org
` (83 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-01-22 13:30 UTC (permalink / raw)
To: gcc-bugs
------- Comment #31 from rguenth at gcc dot gnu dot org 2008-01-22 13:06 -------
Created an attachment (id=14997)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14997&action=view)
asm with alias-oracle enabled FRE
This is the asm produced from direct.i with -O2 --param
max-fields-for-field-sensitive=1 (SFTs disabled, which is the goal for 4.4)
with the (ok, a modified) alias-oracle patch for FRE applied.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (31 preceding siblings ...)
2008-01-22 13:30 ` rguenth at gcc dot gnu dot org
@ 2008-03-14 17:04 ` rguenth at gcc dot gnu dot org
2008-05-30 16:02 ` lucier at math dot purdue dot edu
` (82 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-03-14 17:04 UTC (permalink / raw)
To: gcc-bugs
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to fail| |4.3.0
Target Milestone|4.3.0 |4.3.1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (32 preceding siblings ...)
2008-03-14 17:04 ` [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 " rguenth at gcc dot gnu dot org
@ 2008-05-30 16:02 ` lucier at math dot purdue dot edu
2008-06-06 15:00 ` rguenth at gcc dot gnu dot org
` (81 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2008-05-30 16:02 UTC (permalink / raw)
To: gcc-bugs
------- Comment #32 from lucier at math dot purdue dot edu 2008-05-30 16:01 -------
I've decided to test the current ira branch with this problem. I used the
build instructions in comment 24.
With -fno-ira I get the same results as with 4.3.0 (no surprise there).
With -fira I get the time
(time (direct-fft-recursive-4 a table))
422 ms real time
421 ms cpu time (421 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults
which is an improvement, and the code at the beginning of the loop is
.L7262:
movq %rdx, %rcx
addq (%rsi), %rcx
leaq 4(%rdx), %r15
movq %rcx, (%rbx)
addq $4, %rcx
movq %rcx, (%rbp)
movq (%rbx), %rcx
addq (%rsi), %rcx
movq %rcx, (%rdi)
addq $4, %rcx
movq %rcx, (%r8)
movq (%rdi), %rcx
addq (%rsi), %rcx
leaq 4(%rcx), %r10
movq %rcx, (%r9)
movq %r10, (%r13)
movq (%rax), %rcx
addq $7, %rcx
movsd (%rcx,%r10,2), %xmm4
movq (%r9), %r10
leaq (%rcx,%rdx,2), %r11
addq $8, %rdx
movsd (%r11), %xmm11
movsd (%rcx,%r10,2), %xmm5
movq (%r8), %r10
movsd (%rcx,%r10,2), %xmm6
movq (%rdi), %r10
movsd (%rcx,%r10,2), %xmm7
movq (%rbp), %r10
movsd (%rcx,%r10,2), %xmm8
movq (%rbx), %r10
movapd %xmm8, %xmm14
movsd (%rcx,%r10,2), %xmm9
leaq (%r15,%r15), %r10
movsd (%rcx,%r10), %xmm10
movq (%r12), %rcx
movapd %xmm9, %xmm15
movsd 15(%rcx), %xmm1
movsd 7(%rcx), %xmm2
movapd %xmm1, %xmm13
movsd 31(%rcx), %xmm3
movapd %xmm2, %xmm12
which is also an improvement, but it still is nowhere near the result for
4.2.2.
So, whatever is causing this problem, it appears the new register allocator
isn't going to fix it.
The code generated by today's mainline (136210) isn't better than 4.3.0; the
time is
(time (direct-fft-recursive-4 a table))
469 ms real time
469 ms cpu time (469 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults
and code is essentially the same as for 4.3.0
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (33 preceding siblings ...)
2008-05-30 16:02 ` lucier at math dot purdue dot edu
@ 2008-06-06 15:00 ` rguenth at gcc dot gnu dot org
2008-07-09 16:06 ` lucier at math dot purdue dot edu
` (80 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-06-06 15:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #33 from rguenth at gcc dot gnu dot org 2008-06-06 14:58 -------
4.3.1 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.1 |4.3.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (34 preceding siblings ...)
2008-06-06 15:00 ` rguenth at gcc dot gnu dot org
@ 2008-07-09 16:06 ` lucier at math dot purdue dot edu
2008-08-27 22:10 ` jsm28 at gcc dot gnu dot org
` (79 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2008-07-09 16:06 UTC (permalink / raw)
To: gcc-bugs
------- Comment #34 from lucier at math dot purdue dot edu 2008-07-09 16:05 -------
Problem still exists with
euler-18% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--with-gmp=/pkgs/gmp-4.2.2/ --with-mpfr=/pkgs/gmp-4.2.2/
--prefix=/pkgs/gcc-mainline --enable-languages=c
--enable-gather-detailed-mem-stats
Thread model: posix
gcc version 4.4.0 20080708 (experimental) [trunk revision 137644] (GCC)
Just checking whether recent changes happened to fix it.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (35 preceding siblings ...)
2008-07-09 16:06 ` lucier at math dot purdue dot edu
@ 2008-08-27 22:10 ` jsm28 at gcc dot gnu dot org
2008-09-04 20:40 ` lucier at math dot purdue dot edu
` (78 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2008-08-27 22:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #35 from jsm28 at gcc dot gnu dot org 2008-08-27 22:02 -------
4.3.2 is released, changing milestones to 4.3.3.
--
jsm28 at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.2 |4.3.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (36 preceding siblings ...)
2008-08-27 22:10 ` jsm28 at gcc dot gnu dot org
@ 2008-09-04 20:40 ` lucier at math dot purdue dot edu
2008-09-04 20:45 ` rguenth at gcc dot gnu dot org
` (77 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2008-09-04 20:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #36 from lucier at math dot purdue dot edu 2008-09-04 20:39 -------
I don't really understand the status of this bug.
Before 4.3.0, it was P!, and Mark said he said he'd "like to see us start to
explain these kinds of dramatic performance changes."
There was quite a bit of detective work that ended with "for some reason
gcc-4.3 transforms only _some_ instructions (line 708+ in _.085t.fre dump)
...".
Richard opined that it was an "alias partitioning problem", but Uros noted that
for the original code instead of the reduced testcase expanding some parameter
to its maximum still doesn't fix the problem.
So (a) we don't know what the current code is doing wrong, and (b) we don't
know why 4.2 got it right.
So I don't think Mark got what he wanted, and now it's P2, and each release the
target release for fixing it gets pushed back.
I've been testing mainline on this bug sporadically, especially when an entry
in gcc-patches mentions some words that also appear on this PR, to see if it's
fixed. I'm a bit concerned that the target of 4.3.* is becoming increasingly
out of reach, as changes committed to that branch seem to be more and more
conservative because it's a release branch.
I don't think the code for this bug is terribly atypical for machine-generated
code; it would be nice to be able to remove this performance regression.
Unfortunately, I'm in no position to do so.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (37 preceding siblings ...)
2008-09-04 20:40 ` lucier at math dot purdue dot edu
@ 2008-09-04 20:45 ` rguenth at gcc dot gnu dot org
2008-09-04 20:50 ` lucier at math dot purdue dot edu
` (76 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-09-04 20:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #37 from rguenth at gcc dot gnu dot org 2008-09-04 20:43 -------
We have to admit that this bug is unlikely to get fixed in the 4.3 series.
It still lacks proper analysis, as unfortunately that done on the shorter
testcase was not valid. Analysis takes time, and honestly at this point I
rather spend time fixing wrong-code or ice-on-valid bugs.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (38 preceding siblings ...)
2008-09-04 20:45 ` rguenth at gcc dot gnu dot org
@ 2008-09-04 20:50 ` lucier at math dot purdue dot edu
2008-12-06 16:39 ` lucier at math dot purdue dot edu
` (75 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2008-09-04 20:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #38 from lucier at math dot purdue dot edu 2008-09-04 20:49 -------
OK, but I was moved to write because Jakub's latest 4.4 status report requests
Please concentrate now on fixing bugs, especially the performance regressions.
and this is a definite 4.3/4.4 performance regression from 4.2. (How many of
the P1 PRs are performance regressions?)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (39 preceding siblings ...)
2008-09-04 20:50 ` lucier at math dot purdue dot edu
@ 2008-12-06 16:39 ` lucier at math dot purdue dot edu
2008-12-07 2:56 ` bonzini at gnu dot org
` (74 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2008-12-06 16:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #39 from lucier at math dot purdue dot edu 2008-12-06 16:37 -------
I may have narrowed down the problem a bit.
With this compiler (revision 118491):
pythagoras-277% /tmp/lucier/install/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/tmp/lucier/install --enable-languages=c
Thread model: posix
gcc version 4.3.0 20061105 (experimental)
one gets (on a faster machine than previous reports)
(time (direct-fft-recursive-4 a table))
133 ms real time
140 ms cpu time (140 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults
With this compiler (revision 118474):
pythagoras-24% /tmp/lucier/install/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/tmp/lucier/install --enable-languages=c
Thread model: posix
gcc version 4.3.0 20061104 (experimental)
one gets
(time (direct-fft-recursive-4 a table))
116 ms real time
108 ms cpu time (108 user, 0 system)
no collections
64 bytes allocated
no minor faults
no major faults
and you see the typical problem with assembly code from direct.i with the later
compiler.
Paolo may have been right about fwprop, this patch was installed that day:
Author: bonzini
Date: Sat Nov 4 08:36:45 2006
New Revision: 118475
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=118475
Log:
2006-11-03 Paolo Bonzini <bonzini@gnu.org>
Steven Bosscher <stevenb.gcc@gmail.com>
* fwprop.c: New file.
* Makefile.in: Add fwprop.o.
* tree-pass.h (pass_rtl_fwprop, pass_rtl_fwprop_with_addr): New.
* passes.c (init_optimization_passes): Schedule forward propagation.
* rtlanal.c (loc_mentioned_in_p): Support NULL value of the second
parameter.
* timevar.def (TV_FWPROP): New.
* common.opt (-fforward-propagate): New.
* opts.c (decode_options): Enable forward propagation at -O2.
* gcse.c (one_cprop_pass): Do not run local cprop unless touching
jumps.
* cse.c (fold_rtx_subreg, fold_rtx_mem, fold_rtx_mem_1, find_best_addr,
canon_for_address, table_size): Remove.
(new_basic_block, insert, remove_from_table): Remove references to
table_size.
(fold_rtx): Process SUBREGs and MEMs with equiv_constant, make
simplification loop more straightforward by not calling fold_rtx
recursively.
(equiv_constant): Move here a small part of fold_rtx_subreg,
do not call fold_rtx. Call avoid_constant_pool_reference
to process MEMs.
* recog.c (canonicalize_change_group): New.
* recog.h (canonicalize_change_group): New.
* doc/invoke.texi (Optimization Options): Document fwprop.
* doc/passes.texi (RTL passes): Document fwprop.
Added:
trunk/gcc/fwprop.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/Makefile.in
trunk/gcc/common.opt
trunk/gcc/cse.c
trunk/gcc/doc/invoke.texi
trunk/gcc/doc/passes.texi
trunk/gcc/gcse.c
trunk/gcc/opts.c
trunk/gcc/passes.c
trunk/gcc/recog.c
trunk/gcc/recog.h
trunk/gcc/rtlanal.c
trunk/gcc/timevar.def
trunk/gcc/tree-pass.h
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (40 preceding siblings ...)
2008-12-06 16:39 ` lucier at math dot purdue dot edu
@ 2008-12-07 2:56 ` bonzini at gnu dot org
2008-12-07 13:01 ` rguenth at gcc dot gnu dot org
` (73 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2008-12-07 2:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #40 from bonzini at gnu dot org 2008-12-07 02:55 -------
IIUC this is a typical case in which CSE was fixing something that earlier
passes messed up. Unfortunately fwprop does (better) what CSE was meant to do,
but does not do what I assumed was already done before CSE.
If the problem is aliasing/FRE, then I think Richi is the one who could fix it
for good in the tree passes. If there is more to it, however, I can take a
look at why fwprop is generating the ugly code.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 in floating-point code
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (41 preceding siblings ...)
2008-12-07 2:56 ` bonzini at gnu dot org
@ 2008-12-07 13:01 ` rguenth at gcc dot gnu dot org
2008-12-07 19:40 ` [Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475 lucier at math dot purdue dot edu
` (72 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-12-07 13:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #41 from rguenth at gcc dot gnu dot org 2008-12-07 13:00 -------
There's not much to be done for aliasing - everything points to global memory
and thus aliases. There may be some opportunities for offset-based
disambiguations
via pointers, but I didn't investigate in detail. Whoever wants someone to
work on specific details needs to provide way shorter testcases ;)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (42 preceding siblings ...)
2008-12-07 13:01 ` rguenth at gcc dot gnu dot org
@ 2008-12-07 19:40 ` lucier at math dot purdue dot edu
2009-01-24 10:28 ` rguenth at gcc dot gnu dot org
` (71 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2008-12-07 19:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #42 from lucier at math dot purdue dot edu 2008-12-07 19:39 -------
Just a comment that -fforward-propagate isn't enabled at -O1 (the main
optimization option in the test) while the cse code it replaces was enabled at
-O1. This is presumably why adding -fno-forward-propagate to the command line
in the test a year ago didn't affect the generated code.
Adding -fno-forward-propagate to the command line of the test case with
revision r118475 of gcc changes the generated code, but doesn't improve the
problem code in the main loop.
Updated the title to report the performance hit on
Intel(R) Xeon(R) CPU X5460 @ 3.16GHz
as reported by /proc/cpuinfo
--
lucier at math dot purdue dot edu changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[4.3/4.4 Regression] 22% |[4.3/4.4 Regression] 30%
|performance slowdown from |performance slowdown in
|4.2.2 to 4.3/4.4.0 in |floating-point code caused
|floating-point code |by r118475
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (43 preceding siblings ...)
2008-12-07 19:40 ` [Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475 lucier at math dot purdue dot edu
@ 2009-01-24 10:28 ` rguenth at gcc dot gnu dot org
2009-02-13 16:05 ` bonzini at gnu dot org
` (70 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-01-24 10:28 UTC (permalink / raw)
To: gcc-bugs
------- Comment #43 from rguenth at gcc dot gnu dot org 2009-01-24 10:19 -------
GCC 4.3.3 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.3 |4.3.4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (44 preceding siblings ...)
2009-01-24 10:28 ` rguenth at gcc dot gnu dot org
@ 2009-02-13 16:05 ` bonzini at gnu dot org
2009-02-13 16:10 ` lucier at math dot purdue dot edu
` (69 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-02-13 16:05 UTC (permalink / raw)
To: gcc-bugs
------- Comment #44 from bonzini at gnu dot org 2009-02-13 16:05 -------
A simplified (local, noncascading) fwprop not using UD chains would not be hard
to do... Basically, at -O1 use FOR_EACH_BB/FOR_EACH_BB_INSN instead of walking
the uses, keep a (regno, insn) map of pseudos (cleared at the beginning of
every basic block), and use that info instead of UD chains in
use_killed_between...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (45 preceding siblings ...)
2009-02-13 16:05 ` bonzini at gnu dot org
@ 2009-02-13 16:10 ` lucier at math dot purdue dot edu
2009-02-13 16:32 ` bonzini at gnu dot org
` (68 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-02-13 16:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #45 from lucier at math dot purdue dot edu 2009-02-13 16:09 -------
Subject: Re: [4.3/4.4 Regression] 30%
performance slowdown in floating-point code caused by r118475
On Fri, 2009-02-13 at 16:05 +0000, bonzini at gnu dot org wrote:
> ------- Comment #44 from bonzini at gnu dot org 2009-02-13 16:05 -------
> A simplified (local, noncascading) fwprop not using UD chains would not be hard
> to do... Basically, at -O1 use FOR_EACH_BB/FOR_EACH_BB_INSN instead of walking
> the uses, keep a (regno, insn) map of pseudos (cleared at the beginning of
> every basic block), and use that info instead of UD chains in
> use_killed_between...
As noted in comment 42, enabling FWPROP on this test case does not fix
the performance problem.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (46 preceding siblings ...)
2009-02-13 16:10 ` lucier at math dot purdue dot edu
@ 2009-02-13 16:32 ` bonzini at gnu dot org
2009-02-13 17:23 ` lucier at math dot purdue dot edu
` (67 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-02-13 16:32 UTC (permalink / raw)
To: gcc-bugs
------- Comment #46 from bonzini at gnu dot org 2009-02-13 16:32 -------
Regarding your comment in bug 26854:
> address calculations are no longer optimized as much as they
> were before
Sometimes, actually, they are optimized better. It depends on the case.
In comment #42, also, you talked about -O1, where fwprop is not enabled. So
I'm failing to understand if the problem is at the tree or RTL level for this
bug.
My comment was related to something said in PR39517, i.e. that chains are very
expensive and a reason why fwprop should not be enabled at -O1. Following up
on my comment, alternatively, fwprop could compute its own dataflow instead of
using UD chains, since it only cares by design about uses with a single
definition. This looks much better.
You would use something like df_chain_create_bb and
df_chain_create_bb_process_use, with code like the following (cfr.
df_chain_create_bb_process_use):
/* Do not want to go through this for an uninitialized var. */
int count = DF_DEFS_COUNT (regno);
if (count)
{
if (top_flag == (DF_REF_FLAGS (use) & DF_REF_AT_TOP))
{
unsigned int first_index = DF_DEFS_BEGIN (uregno);
unsigned int last_index = first_index + count - 1;
/* Uninitialized? Exit. */
bmp_iter_set_init (&bi, local_rd, first_index, &def_index);
if (!bmp_iter_set (&bi, &def_index) || def_index >
last_index)
continue;
/* 2 or more defs for this use, exit. */
bmp_iter_next (&(ITER), &(BITNUM)))
if (!bmp_iter_set (&bi, &def_index) || def_index >
last_index)
SET_BIT (can_fwprop, DF_REF_ID (use));
}
}
With this change there would be no reason not to run fwprop at -O1.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (47 preceding siblings ...)
2009-02-13 16:32 ` bonzini at gnu dot org
@ 2009-02-13 17:23 ` lucier at math dot purdue dot edu
2009-02-13 20:10 ` bonzini at gnu dot org
` (66 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-02-13 17:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #47 from lucier at math dot purdue dot edu 2009-02-13 17:22 -------
Subject: Re: [4.3/4.4 Regression] 30%
performance slowdown in floating-point code caused by r118475
On Fri, 2009-02-13 at 16:32 +0000, bonzini at gnu dot org wrote:
>
>
> ------- Comment #46 from bonzini at gnu dot org 2009-02-13 16:32 -------
> Regarding your comment in bug 26854:
>
> > address calculations are no longer optimized as much as they
> > were before
>
> Sometimes, actually, they are optimized better. It depends on the case.
Yes. I don't see why the optimizations in CSE, which were relatively
cheap and which were effective for this case, needed to be disabled when
FWPROP was added without, evidently, understanding why FWPROP does not
do what CSE was already doing.
> In comment #42, also, you talked about -O1, where fwprop is not enabled. So
> I'm failing to understand if the problem is at the tree or RTL level for this
> bug.
When I add -fforward-propagate to the command line, then the assembly
code changes in some ways, but the performance problem remains the same.
Brad
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (48 preceding siblings ...)
2009-02-13 17:23 ` lucier at math dot purdue dot edu
@ 2009-02-13 20:10 ` bonzini at gnu dot org
2009-04-23 15:59 ` [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 79% performance slowdown in floating-point code partially " lucier at math dot purdue dot edu
` (65 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-02-13 20:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #48 from bonzini at gnu dot org 2009-02-13 20:09 -------
Subject: Re: [4.3/4.4 Regression] 30%
performance slowdown in floating-point code caused by r118475
> Yes. I don't see why the optimizations in CSE, which were relatively
> cheap and which were effective for this case, needed to be disabled when
> FWPROP was added without, evidently, understanding why FWPROP does not
> do what CSE was already doing.
Just to mention it, fwprop saved 3% of compile time. That's not
"cheap". It was also tested with SPEC and Nullstone on several
architectures.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 79% performance slowdown in floating-point code partially caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (49 preceding siblings ...)
2009-02-13 20:10 ` bonzini at gnu dot org
@ 2009-04-23 15:59 ` lucier at math dot purdue dot edu
2009-04-23 16:01 ` lucier at math dot purdue dot edu
` (64 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-04-23 15:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #49 from lucier at math dot purdue dot edu 2009-04-23 15:58 -------
With 4.4.0 and with mainline this code now runs in 280 ms instead of in 156 ms
with 4.2.4.
Since 280/156 = 1.794871794871795 I changed the subject line (the slowdown is
now not completely caused by r118475).
I guess I'll post the assembly code generated by 4.4.0 in the next attachment.
Timings (best of three runs) for the last
(time (direct-fft-recursive-4 a table))
from
gsi/gsi -e '(define a (time (expt 3 10000000)))(define b (time (* a a)))'
With gcc-4.1.2:
188 ms cpu time (188 user, 0 system)
With gcc-4.2.4
156 ms cpu time (152 user, 4 system)
With gcc-4.3.3:
180 ms cpu time (180 user, 0 system)
With gcc-4.4.0
280 ms cpu time (280 user, 0 system)
With 4.5.0 20090423 (experimental) [trunk revision 146634]
280 ms cpu time (280 user, 0 system)
--
lucier at math dot purdue dot edu changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[4.3/4.4/4.5 Regression] 30%|[4.3/4.4/4.5 Regression] 79%
|performance slowdown in |performance slowdown in
|floating-point code caused |floating-point code
|by r118475 |partially caused by r118475
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 79% performance slowdown in floating-point code partially caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (50 preceding siblings ...)
2009-04-23 15:59 ` [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 79% performance slowdown in floating-point code partially " lucier at math dot purdue dot edu
@ 2009-04-23 16:01 ` lucier at math dot purdue dot edu
2009-04-23 16:03 ` lucier at math dot purdue dot edu
` (63 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-04-23 16:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #50 from lucier at math dot purdue dot edu 2009-04-23 16:00 -------
Created an attachment (id=17685)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17685&action=view)
direct.s generated by 4.4.0
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 79% performance slowdown in floating-point code partially caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (51 preceding siblings ...)
2009-04-23 16:01 ` lucier at math dot purdue dot edu
@ 2009-04-23 16:03 ` lucier at math dot purdue dot edu
2009-04-26 18:27 ` [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code " lucier at math dot purdue dot edu
` (62 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-04-23 16:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #51 from lucier at math dot purdue dot edu 2009-04-23 16:03 -------
Forgot to mention, the main loop starts at .L2947.
This is on
model name : Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz
Brad
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (52 preceding siblings ...)
2009-04-23 16:03 ` lucier at math dot purdue dot edu
@ 2009-04-26 18:27 ` lucier at math dot purdue dot edu
2009-05-06 3:43 ` lucier at math dot purdue dot edu
` (61 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-04-26 18:27 UTC (permalink / raw)
To: gcc-bugs
------- Comment #52 from lucier at math dot purdue dot edu 2009-04-26 18:27 -------
I narrowed down the new performance regression to code added some time around
March 12, 2009, so I changed back the subject line of this PR to reflect the
performance regression caused only by the code added 2006-11-03 and added a new
PR
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39914
to reflect the effects of the March, 2009, code.
--
lucier at math dot purdue dot edu changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[4.3/4.4/4.5 Regression] 79%|[4.3/4.4/4.5 Regression] 30%
|performance slowdown in |performance slowdown in
|floating-point code |floating-point code caused
|partially caused by r118475|by r118475
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (53 preceding siblings ...)
2009-04-26 18:27 ` [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code " lucier at math dot purdue dot edu
@ 2009-05-06 3:43 ` lucier at math dot purdue dot edu
2009-05-06 3:50 ` lucier at math dot purdue dot edu
` (60 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-06 3:43 UTC (permalink / raw)
To: gcc-bugs
------- Comment #53 from lucier at math dot purdue dot edu 2009-05-06 03:43 -------
I posted a possible fix to gcc-patches with the subject line
Possible fix for 30% performance regression in PR 33928
Here's the assembly for the main loop after the changes I proposed:
.L4230:
movq %r11, %rdi
addq 8(%r10), %rdi
movq 8(%r10), %rsi
movq 8(%r10), %rdx
movq 40(%r10), %rax
leaq 4(%r11), %rbx
addq %rdi, %rsi
leaq 4(%rdi), %r9
movq %rdi, -8(%r10)
addq %rsi, %rdx
leaq 4(%rsi), %r8
movq %rsi, -24(%r10)
leaq 4(%rdx), %rcx
movq %r9, -16(%r10)
movq %rdx, -40(%r10)
movq %r8, -32(%r10)
addq $7, %rax
movq %rcx, -48(%r10)
movsd (%rax,%rcx,2), %xmm12
leaq (%rbx,%rbx), %rcx
movsd (%rax,%rdx,2), %xmm3
leaq (%rax,%r11,2), %rdx
addq $8, %r11
movsd (%rax,%r8,2), %xmm14
cmpq %r11, %r13
movsd (%rax,%rsi,2), %xmm13
movsd (%rax,%r9,2), %xmm11
movsd (%rax,%rdi,2), %xmm10
movsd (%rax,%rcx), %xmm8
movq 24(%r10), %rax
movsd (%rdx), %xmm7
movsd 15(%rax), %xmm2
movsd 7(%rax), %xmm1
movapd %xmm2, %xmm0
movsd 31(%rax), %xmm9
movapd %xmm1, %xmm6
mulsd %xmm3, %xmm0
movapd %xmm1, %xmm4
mulsd %xmm12, %xmm6
mulsd %xmm3, %xmm4
movapd %xmm1, %xmm3
mulsd %xmm13, %xmm1
mulsd %xmm14, %xmm3
addsd %xmm0, %xmm6
movapd %xmm2, %xmm0
movsd 23(%rax), %xmm5
mulsd %xmm12, %xmm0
movapd %xmm7, %xmm12
subsd %xmm0, %xmm4
movapd %xmm2, %xmm0
mulsd %xmm14, %xmm2
movapd %xmm8, %xmm14
mulsd %xmm13, %xmm0
movapd %xmm11, %xmm13
addsd %xmm6, %xmm11
subsd %xmm6, %xmm13
subsd %xmm2, %xmm1
movapd %xmm10, %xmm2
addsd %xmm0, %xmm3
movapd %xmm5, %xmm0
subsd %xmm4, %xmm2
addsd %xmm4, %xmm10
subsd %xmm1, %xmm12
addsd %xmm1, %xmm7
movapd %xmm9, %xmm1
subsd %xmm3, %xmm14
mulsd %xmm2, %xmm0
xorpd .LC5(%rip), %xmm1
addsd %xmm3, %xmm8
movapd %xmm1, %xmm3
mulsd %xmm2, %xmm1
movapd %xmm5, %xmm2
mulsd %xmm13, %xmm3
mulsd %xmm11, %xmm2
addsd %xmm0, %xmm3
movapd %xmm5, %xmm0
mulsd %xmm10, %xmm5
mulsd %xmm13, %xmm0
subsd %xmm0, %xmm1
movapd %xmm9, %xmm0
mulsd %xmm11, %xmm9
mulsd %xmm10, %xmm0
subsd %xmm9, %xmm5
addsd %xmm0, %xmm2
movapd %xmm7, %xmm0
addsd %xmm5, %xmm0
subsd %xmm5, %xmm7
movsd %xmm0, (%rdx)
movapd %xmm8, %xmm0
movq 40(%r10), %rax
subsd %xmm2, %xmm8
addsd %xmm2, %xmm0
movsd %xmm0, 7(%rcx,%rax)
movq -8(%r10), %rdx
movq 40(%r10), %rax
movapd %xmm12, %xmm0
subsd %xmm1, %xmm12
movsd %xmm7, 7(%rax,%rdx,2)
movq -16(%r10), %rdx
movq 40(%r10), %rax
addsd %xmm1, %xmm0
movsd %xmm8, 7(%rax,%rdx,2)
movq -24(%r10), %rdx
movq 40(%r10), %rax
movsd %xmm0, 7(%rax,%rdx,2)
movapd %xmm14, %xmm0
movq -32(%r10), %rdx
movq 40(%r10), %rax
subsd %xmm3, %xmm14
addsd %xmm3, %xmm0
movsd %xmm0, 7(%rax,%rdx,2)
movq -40(%r10), %rdx
movq 40(%r10), %rax
movsd %xmm12, 7(%rax,%rdx,2)
movq -48(%r10), %rdx
movq 40(%r10), %rax
movsd %xmm14, 7(%rax,%rdx,2)
jg .L4230
movq %rbx, %r13
.L4228:
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (54 preceding siblings ...)
2009-05-06 3:43 ` lucier at math dot purdue dot edu
@ 2009-05-06 3:50 ` lucier at math dot purdue dot edu
2009-05-06 9:21 ` bonzini at gnu dot org
` (59 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-06 3:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #54 from lucier at math dot purdue dot edu 2009-05-06 03:50 -------
Created an attachment (id=17805)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17805&action=view)
svn diff of cse.c to fix the performance regression
This partially reverts r118475 and adds code to call find_best_address for MEMs
in fold_rtx.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (55 preceding siblings ...)
2009-05-06 3:50 ` lucier at math dot purdue dot edu
@ 2009-05-06 9:21 ` bonzini at gnu dot org
2009-05-06 9:32 ` bonzini at gnu dot org
` (58 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-06 9:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #55 from bonzini at gnu dot org 2009-05-06 09:20 -------
Created an attachment (id=17807)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17807&action=view)
svn diff of cse.c to "fix" the performance regression (updated)
--
bonzini at gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #17805|0 |1
is obsolete| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (56 preceding siblings ...)
2009-05-06 9:21 ` bonzini at gnu dot org
@ 2009-05-06 9:32 ` bonzini at gnu dot org
2009-05-06 9:50 ` jakub at gcc dot gnu dot org
` (57 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-06 9:32 UTC (permalink / raw)
To: gcc-bugs
------- Comment #56 from bonzini at gnu dot org 2009-05-06 09:31 -------
Created an attachment (id=17808)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17808&action=view)
usable testcase
Ok, I managed to make a reasonably readable source code (uninclude stdlib
files, remove unused gambit stuff and ___ prefixes, simplify some expressions),
find the heavy loops, annotate them with asm statements (see comment #18,
2007-11-30) and find the length of the loops.
4.2 4.5 4.5 + patch
LOOP 1 ~190 ~230 ~190
INNER LOOP 1.1 ~120 ~130 ~120
LOOP 2 33 36 31
I am thus obsoleting (almost) everything that was posted and is not relevant
anymore. Let's start from scratch with the new testcase.
--
bonzini at gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #14418|0 |1
is obsolete| |
Attachment #14423|0 |1
is obsolete| |
Attachment #14424|0 |1
is obsolete| |
Attachment #14425|0 |1
is obsolete| |
Attachment #14426|0 |1
is obsolete| |
Attachment #14534|0 |1
is obsolete| |
Attachment #14535|0 |1
is obsolete| |
Attachment #14536|0 |1
is obsolete| |
Attachment #14997|0 |1
is obsolete| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (57 preceding siblings ...)
2009-05-06 9:32 ` bonzini at gnu dot org
@ 2009-05-06 9:50 ` jakub at gcc dot gnu dot org
2009-05-06 9:57 ` bonzini at gnu dot org
` (56 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-05-06 9:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #57 from jakub at gcc dot gnu dot org 2009-05-06 09:49 -------
Why do you need any #include lines at all in the reduced testcase? Compiles
just fine even without them...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (58 preceding siblings ...)
2009-05-06 9:50 ` jakub at gcc dot gnu dot org
@ 2009-05-06 9:57 ` bonzini at gnu dot org
2009-05-06 10:00 ` bonzini at gnu dot org
` (55 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-06 9:57 UTC (permalink / raw)
To: gcc-bugs
------- Comment #58 from bonzini at gnu dot org 2009-05-06 09:56 -------
Uhm, it's better to run unpatched 4.5 with -O1 -fforward-propagate to get a
fair comparison. Also, I was counting the loop headers, which are not part of
the hot code.
4.2 -O1 4.5 -O1 -ffw-prop 4.5 + patch -O1
LOOP 1 181 201 180
INNER LOOP 1.1 117 118 113
LOOP 2 27 27 26
This shows that you should compare running the code (you can use direct.i) with
4.2/-O1 and 4.5/-O1 -fforward-propagate. This is very important, otherwise
you're comparing apples to oranges.
fwprop is creating too high register pressure by creating offsets like these in
the loop header:
leaq -8(%r12), %rsi
leaq 8(%r12), %r10
leaq -16(%r12), %r9
leaq -24(%r12), %rbx
leaq -32(%r12), %rbp
leaq -40(%r12), %rdi
leaq -48(%r12), %r11
leaq 40(%r12), %rdx
Then, the additional register pressure is causing the bad scheduling we have in
the fast assembly outputs:
movq (%rdx), %rax
movsd (%rax,%r15,2), %xmm7
movq (%rdi), %r15
movsd (%rax,%r15,2), %xmm10
movq (%rbp), %r15
movsd (%rax,%r15,2), %xmm5
movq (%rbx), %r15
movsd (%rax,%r15,2), %xmm6
movq (%r9), %r15
movsd (%rax,%r15,2), %xmm15
movq (%rsi), %r15
movsd (%rax,%r15,2), %xmm11
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (59 preceding siblings ...)
2009-05-06 9:57 ` bonzini at gnu dot org
@ 2009-05-06 10:00 ` bonzini at gnu dot org
2009-05-06 10:48 ` bonzini at gnu dot org
` (54 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-06 10:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #59 from bonzini at gnu dot org 2009-05-06 09:59 -------
Created an attachment (id=17809)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17809&action=view)
usable testcase
Without includes as Jakub suggested.
--
bonzini at gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #17808|0 |1
is obsolete| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (60 preceding siblings ...)
2009-05-06 10:00 ` bonzini at gnu dot org
@ 2009-05-06 10:48 ` bonzini at gnu dot org
2009-05-06 13:06 ` [Bug rtl-optimization/33928] " jakub at gcc dot gnu dot org
` (53 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-06 10:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #60 from bonzini at gnu dot org 2009-05-06 10:47 -------
Actually those are created by -fmove-loop-invariants. With -O1
-fforward-propagate -fno-move-loop-invariants I get:
4.5 -O1 -ffw-prop -fno-move-loop-inv
LOOP 1 183
INNER LOOP 1.1 116
LOOP 2 25
You should be able to get performance close to 4.2 or better with options "-O1
-fforward-propagate -fno-move-loop-invariants -fschedule-insns2". If you do,
this means two things:
1) That the bug is in the register pressure estimations of
-fno-move-loop-invariants, and merely exposed by the fwprop patch.
2) That maybe you should start from -O2 and go backwards, eliminating
optimizations that do not help you or cause high compilation time, instead of
using -O1.
--
bonzini at gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |WAITING
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (61 preceding siblings ...)
2009-05-06 10:48 ` bonzini at gnu dot org
@ 2009-05-06 13:06 ` jakub at gcc dot gnu dot org
2009-05-06 15:08 ` bonzini at gnu dot org
` (52 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-05-06 13:06 UTC (permalink / raw)
To: gcc-bugs
------- Comment #61 from jakub at gcc dot gnu dot org 2009-05-06 13:05 -------
Also see PR39871, maybe that's related (though on ARM).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (62 preceding siblings ...)
2009-05-06 13:06 ` [Bug rtl-optimization/33928] " jakub at gcc dot gnu dot org
@ 2009-05-06 15:08 ` bonzini at gnu dot org
2009-05-06 19:58 ` lucier at math dot purdue dot edu
` (51 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-06 15:08 UTC (permalink / raw)
To: gcc-bugs
------- Comment #62 from bonzini at gnu dot org 2009-05-06 15:07 -------
No, totally unrelated to PR39871
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (63 preceding siblings ...)
2009-05-06 15:08 ` bonzini at gnu dot org
@ 2009-05-06 19:58 ` lucier at math dot purdue dot edu
2009-05-06 20:44 ` lucier at math dot purdue dot edu
` (50 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-06 19:58 UTC (permalink / raw)
To: gcc-bugs
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1966 bytes --]
------- Comment #63 from lucier at math dot purdue dot edu 2009-05-06 19:57 -------
Was the patch in comment 55 meant for me to bootstrap and test with today's
mainline? It crashes at the gcc_assert at
/* Subroutine of canon_reg. Pass *XLOC through canon_reg, and validate
the result if necessary. INSN is as for canon_reg. */
static void
validate_canon_reg (rtx *xloc, rtx insn)
{
if (*xloc)
{
rtx new_rtx = canon_reg (*xloc, insn);
/* If replacing pseudo with hard reg or vice versa, ensure the
insn remains valid. Likewise if the insn has MATCH_DUPs. */
gcc_assert (insn && new_rtx);
validate_change (insn, xloc, new_rtx, 1);
}
}
when building libgcc:
/tmp/lucier/gcc/objdirs/mainline/./gcc/xgcc
-B/tmp/lucier/gcc/objdirs/mainline/./gcc/
-B/pkgs/gcc-mainline/x86_64-unknown-linux-gnu/bin/
-B/pkgs/gcc-mainline/x86_64-unknown-linux-gnu/lib/ -isystem
/pkgs/gcc-mainline/x86_64-unknown-linux-gnu/include -isystem
/pkgs/gcc-mainline/x86_64-unknown-linux-gnu/sys-include -g -O2 -m32 -O2 -g -O2
-DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes
-Wcast-qual -Wold-style-definition -isystem ./include -fPIC -g
-DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -I. -I.
-I../../.././gcc -I../../../../../mainline/libgcc
-I../../../../../mainline/libgcc/. -I../../../../../mainline/libgcc/../gcc
-I../../../../../mainline/libgcc/../include
-I../../../../../mainline/libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT
-DHAVE_CC_TLS -DUSE_TLS -o _moddi3.o -MT _moddi3.o -MD -MP -MF _moddi3.dep
-DL_moddi3 -c ../../../../../mainline/libgcc/../gcc/libgcc2.c \
-fexceptions -fnon-call-exceptions -fvisibility=hidden -DHIDE_EXPORTS
../../../../../mainline/libgcc/../gcc/libgcc2.c: In function â:
../../../../../mainline/libgcc/../gcc/libgcc2.c:1121: internal compiler error:
in validate_canon_reg, at cse.c:2730
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (64 preceding siblings ...)
2009-05-06 19:58 ` lucier at math dot purdue dot edu
@ 2009-05-06 20:44 ` lucier at math dot purdue dot edu
2009-05-07 5:04 ` bonzini at gnu dot org
` (49 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-06 20:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #64 from lucier at math dot purdue dot edu 2009-05-06 20:43 -------
In answer to comment 60, here's the command line where I added
-fforward-propagate -fno-move-loop-invariants:
/pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W -Wno-unused
-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing
-fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fforward-propagate
-fno-move-loop-invariants -DHAVE_CONFIG_H -D___PRIMAL -D___LIBRARY
-D___GAMBCDIR="\"/usr/local/Gambit-C/v4.1.2\"" -D___SYS_TYPE_CPU="\"x86_64\""
-D___SYS_TYPE_VENDOR="\"unknown\"" -D___SYS_TYPE_OS="\"linux-gnu\"" -c _num.c
here's the compiler:
/pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /tmp/lucier/gcc/mainline/configure --enable-checking=release
--prefix=/pkgs/gcc-mainline --enable-languages=c
Thread model: posix
gcc version 4.5.0 20090506 (experimental) [trunk revision 147199] (GCC)
and the runtime didn't change (substantially)
132 ms cpu time (132 user, 0 system)
and the loop looks pretty much just as bad (it's 117 instructions long, by my
count):
.L2752:
movq %rcx, %rdx
addq 8(%rax), %rdx
leaq 4(%rcx), %rdi
movq %rdx, -8(%rax)
leaq 4(%rdx), %rbx
addq 8(%rax), %rdx
movq %rbx, -16(%rax)
movq %rdx, -24(%rax)
leaq 4(%rdx), %rbx
addq 8(%rax), %rdx
movq %rbx, -32(%rax)
movq %rdx, -40(%rax)
leaq 4(%rdx), %rbx
movq 40(%rax), %rdx
movq %rbx, -48(%rax)
movsd 7(%rdx,%rbx,2), %xmm9
movq -40(%rax), %rbx
leaq 7(%rdx,%rcx,2), %r8
addq $8, %rcx
movsd (%r8), %xmm4
cmpq %rcx, %r13
movsd 7(%rdx,%rbx,2), %xmm11
movq -32(%rax), %rbx
movsd 7(%rdx,%rbx,2), %xmm5
movq -24(%rax), %rbx
movsd 7(%rdx,%rbx,2), %xmm7
movq -16(%rax), %rbx
movsd 7(%rdx,%rbx,2), %xmm14
movq -8(%rax), %rbx
movsd 7(%rdx,%rbx,2), %xmm6
leaq (%rdi,%rdi), %rbx
movsd 7(%rbx,%rdx), %xmm8
movq 24(%rax), %rdx
movapd %xmm6, %xmm13
movsd 15(%rdx), %xmm1
movsd 7(%rdx), %xmm2
movapd %xmm1, %xmm10
movsd 31(%rdx), %xmm3
movapd %xmm2, %xmm12
mulsd %xmm11, %xmm10
mulsd %xmm9, %xmm12
mulsd %xmm2, %xmm11
mulsd %xmm1, %xmm9
movsd 23(%rdx), %xmm0
addsd %xmm12, %xmm10
movapd %xmm2, %xmm12
mulsd %xmm7, %xmm2
subsd %xmm9, %xmm11
movapd %xmm1, %xmm9
mulsd %xmm5, %xmm12
mulsd %xmm5, %xmm1
movapd %xmm8, %xmm5
mulsd %xmm7, %xmm9
movapd %xmm4, %xmm7
subsd %xmm11, %xmm13
addsd %xmm6, %xmm11
movsd .LC5(%rip), %xmm6
subsd %xmm1, %xmm2
movapd %xmm0, %xmm1
addsd %xmm12, %xmm9
movapd %xmm14, %xmm12
xorpd %xmm3, %xmm6
subsd %xmm10, %xmm12
mulsd %xmm13, %xmm1
subsd %xmm2, %xmm7
addsd %xmm4, %xmm2
movapd %xmm6, %xmm4
addsd %xmm14, %xmm10
mulsd %xmm13, %xmm6
mulsd %xmm12, %xmm4
subsd %xmm9, %xmm5
mulsd %xmm0, %xmm12
addsd %xmm8, %xmm9
movapd %xmm0, %xmm8
mulsd %xmm11, %xmm0
addsd %xmm1, %xmm4
movapd %xmm3, %xmm1
mulsd %xmm10, %xmm3
subsd %xmm12, %xmm6
mulsd %xmm11, %xmm1
mulsd %xmm10, %xmm8
subsd %xmm3, %xmm0
addsd %xmm1, %xmm8
movapd %xmm2, %xmm1
addsd %xmm0, %xmm1
subsd %xmm0, %xmm2
movapd %xmm7, %xmm0
subsd %xmm6, %xmm7
addsd %xmm6, %xmm0
movsd %xmm1, (%r8)
movapd %xmm9, %xmm1
movq 40(%rax), %rdx
subsd %xmm8, %xmm9
addsd %xmm8, %xmm1
movsd %xmm1, 7(%rbx,%rdx)
movq -8(%rax), %rbx
movq 40(%rax), %rdx
movsd %xmm2, 7(%rdx,%rbx,2)
movq -16(%rax), %rbx
movq 40(%rax), %rdx
movsd %xmm9, 7(%rdx,%rbx,2)
movq -24(%rax), %rbx
movq 40(%rax), %rdx
movsd %xmm0, 7(%rdx,%rbx,2)
movapd %xmm5, %xmm0
movq -32(%rax), %rbx
movq 40(%rax), %rdx
subsd %xmm4, %xmm5
addsd %xmm4, %xmm0
movsd %xmm0, 7(%rdx,%rbx,2)
movq -40(%rax), %rbx
movq 40(%rax), %rdx
movsd %xmm7, 7(%rdx,%rbx,2)
movq -48(%rax), %rbx
movq 40(%rax), %rdx
movsd %xmm5, 7(%rdx,%rbx,2)
jg .L2752
movq %rdi, %r13
.L2751:
--
lucier at math dot purdue dot edu changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |NEW
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (65 preceding siblings ...)
2009-05-06 20:44 ` lucier at math dot purdue dot edu
@ 2009-05-07 5:04 ` bonzini at gnu dot org
2009-05-07 5:27 ` lucier at math dot purdue dot edu
` (48 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-07 5:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #65 from bonzini at gnu dot org 2009-05-07 05:03 -------
Subject: Re: [4.3/4.4/4.5 Regression] 30% performance
slowdown in floating-point code caused by r118475
lucier at math dot purdue dot edu wrote:
> ------- Comment #64 from lucier at math dot purdue dot edu 2009-05-06 20:43 -------
> In answer to comment 60, here's the command line where I added
> -fforward-propagate -fno-move-loop-invariants:
Hmm, can you try adding -frename-registers *or* -fweb (i.e. together
they get no benefit) too?
> and the loop looks pretty much just as bad (it's 117 instructions long, by my
> count):
116 actually: the movq here is outside the loop (that's how I made all
the instruction counts).
> movsd %xmm5, 7(%rdx,%rbx,2)
> jg .L2752
> movq %rdi, %r13
> .L2751:
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (66 preceding siblings ...)
2009-05-07 5:04 ` bonzini at gnu dot org
@ 2009-05-07 5:27 ` lucier at math dot purdue dot edu
2009-05-07 13:41 ` bonzini at gnu dot org
` (47 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-07 5:27 UTC (permalink / raw)
To: gcc-bugs
------- Comment #66 from lucier at math dot purdue dot edu 2009-05-07 05:27 -------
Adding -frename-registers gives a significant speedup (sometimes as fast as
4.1.2 on this shared machine, i.e., it somtimes hits 108 ms instead of
132-140ms), the command line with -fforward-propagate -fno-move-loop-invariants
-frename-registers is
/pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W -Wno-unused
-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing
-fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fforward-propagate
-fno-move-loop-invariants -frename-registers -DHAVE_CONFIG_H -D___PRIMAL
-D___LIBRARY -D___GAMBCDIR="\"/usr/local/Gambit-C/v4.1.2\""
-D___SYS_TYPE_CPU="\"x86_64\"" -D___SYS_TYPE_VENDOR="\"unknown\""
-D___SYS_TYPE_OS="\"linux-gnu\"" -c _num.c
and the loop is
.L2752:
movq %rcx, %r12
addq 8(%rax), %r12
leaq 4(%rcx), %rdi
movq %r12, -8(%rax)
leaq 4(%r12), %r8
addq 8(%rax), %r12
movq %r8, -16(%rax)
movq -8(%rax), %r8
movq -16(%rax), %rdx
movq %r12, -24(%rax)
leaq 4(%r12), %rbx
addq 8(%rax), %r12
movq -24(%rax), %r9
movq %rbx, -32(%rax)
movq 24(%rax), %rbx
movq -32(%rax), %r10
leaq 4(%r12), %r11
movq %r12, -40(%rax)
movq 40(%rax), %r12
movq -40(%rax), %r14
movq %r11, -48(%rax)
movsd 15(%rbx), %xmm1
movsd 7(%rbx), %xmm2
movsd 7(%r12,%r11,2), %xmm9
movapd %xmm1, %xmm3
movsd 7(%r12,%r14,2), %xmm11
leaq 7(%r12,%rcx,2), %r11
movapd %xmm2, %xmm10
leaq (%rdi,%rdi), %r14
mulsd %xmm11, %xmm3
movapd %xmm2, %xmm12
mulsd %xmm9, %xmm10
addq $8, %rcx
mulsd %xmm1, %xmm9
cmpq %rcx, %r13
mulsd %xmm2, %xmm11
movsd 7(%r12,%r10,2), %xmm5
movsd 7(%r12,%r9,2), %xmm7
addsd %xmm10, %xmm3
movsd 7(%r12,%r8,2), %xmm6
subsd %xmm9, %xmm11
mulsd %xmm7, %xmm2
movapd %xmm1, %xmm9
mulsd %xmm5, %xmm1
movapd %xmm6, %xmm13
movsd 7(%r12,%rdx,2), %xmm14
mulsd %xmm5, %xmm12
mulsd %xmm7, %xmm9
subsd %xmm11, %xmm13
movsd 31(%rbx), %xmm0
addsd %xmm6, %xmm11
movsd .LC5(%rip), %xmm6
subsd %xmm1, %xmm2
movsd (%r11), %xmm4
movapd %xmm14, %xmm10
xorpd %xmm0, %xmm6
addsd %xmm12, %xmm9
movsd 7(%r14,%r12), %xmm8
subsd %xmm3, %xmm10
movapd %xmm4, %xmm7
addsd %xmm14, %xmm3
movsd 23(%rbx), %xmm15
subsd %xmm2, %xmm7
movapd %xmm8, %xmm5
addsd %xmm4, %xmm2
movapd %xmm6, %xmm4
subsd %xmm9, %xmm5
movapd %xmm15, %xmm14
addsd %xmm8, %xmm9
mulsd %xmm10, %xmm4
movapd %xmm15, %xmm8
mulsd %xmm15, %xmm10
movapd %xmm0, %xmm12
mulsd %xmm11, %xmm15
mulsd %xmm3, %xmm0
movapd %xmm7, %xmm1
mulsd %xmm13, %xmm6
mulsd %xmm3, %xmm8
movapd %xmm9, %xmm3
mulsd %xmm11, %xmm12
subsd %xmm0, %xmm15
mulsd %xmm13, %xmm14
subsd %xmm10, %xmm6
movapd %xmm2, %xmm10
movapd %xmm5, %xmm0
addsd %xmm12, %xmm8
addsd %xmm15, %xmm10
subsd %xmm15, %xmm2
addsd %xmm14, %xmm4
addsd %xmm8, %xmm3
movsd %xmm10, (%r11)
movq 40(%rax), %r10
subsd %xmm8, %xmm9
addsd %xmm6, %xmm1
addsd %xmm4, %xmm0
movsd %xmm3, 7(%r14,%r10)
movq -8(%rax), %r9
movq 40(%rax), %rdx
subsd %xmm6, %xmm7
subsd %xmm4, %xmm5
movsd %xmm2, 7(%rdx,%r9,2)
movq -16(%rax), %r8
movq 40(%rax), %r12
movsd %xmm9, 7(%r12,%r8,2)
movq -24(%rax), %rbx
movq 40(%rax), %r11
movsd %xmm1, 7(%r11,%rbx,2)
movq -32(%rax), %r14
movq 40(%rax), %r10
movsd %xmm0, 7(%r10,%r14,2)
movq -40(%rax), %r9
movq 40(%rax), %rdx
movsd %xmm7, 7(%rdx,%r9,2)
movq -48(%rax), %r8
movq 40(%rax), %r12
movsd %xmm5, 7(%r12,%r8,2)
jg .L2752
Adding -fforward-propagate -fno-move-loop-invariants -fweb instead of
-fforward-propagate -fno-move-loop-invariants -frename-registers, so the
compile line is
/pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W -Wno-unused
-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing
-fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -fforward-propagate
-fno-move-loop-invariants -fweb -DHAVE_CONFIG_H -D___PRIMAL -D___LIBRARY
-D___GAMBCDIR="\"/usr/local/Gambit-C/v4.1.2\"" -D___SYS_TYPE_CPU="\"x86_64\""
-D___SYS_TYPE_VENDOR="\"unknown\"" -D___SYS_TYPE_OS="\"linux-gnu\"" -c _num.c
the time is not so good (consistently 128ms) and the loop is
.L2752:
movq %rcx, %rdx
addq 8(%rax), %rdx
leaq 4(%rcx), %rdi
movq %rdx, -8(%rax)
leaq 4(%rdx), %rbx
addq 8(%rax), %rdx
movq %rbx, -16(%rax)
movq %rdx, -24(%rax)
leaq 4(%rdx), %rbx
addq 8(%rax), %rdx
movq %rbx, -32(%rax)
movq %rdx, -40(%rax)
leaq 4(%rdx), %rbx
movq 40(%rax), %rdx
movq %rbx, -48(%rax)
movsd 7(%rdx,%rbx,2), %xmm9
movq -40(%rax), %rbx
leaq 7(%rdx,%rcx,2), %r8
addq $8, %rcx
movsd (%r8), %xmm4
cmpq %rcx, %r13
movsd 7(%rdx,%rbx,2), %xmm11
movq -32(%rax), %rbx
movsd 7(%rdx,%rbx,2), %xmm5
movq -24(%rax), %rbx
movsd 7(%rdx,%rbx,2), %xmm7
movq -16(%rax), %rbx
movsd 7(%rdx,%rbx,2), %xmm14
movq -8(%rax), %rbx
movsd 7(%rdx,%rbx,2), %xmm6
leaq (%rdi,%rdi), %rbx
movsd 7(%rbx,%rdx), %xmm8
movq 24(%rax), %rdx
movapd %xmm6, %xmm13
movsd 15(%rdx), %xmm1
movsd 7(%rdx), %xmm2
movapd %xmm1, %xmm10
movsd 31(%rdx), %xmm3
movapd %xmm2, %xmm12
mulsd %xmm11, %xmm10
mulsd %xmm9, %xmm12
mulsd %xmm2, %xmm11
mulsd %xmm1, %xmm9
movsd 23(%rdx), %xmm0
addsd %xmm12, %xmm10
movapd %xmm2, %xmm12
mulsd %xmm7, %xmm2
subsd %xmm9, %xmm11
movapd %xmm1, %xmm9
mulsd %xmm5, %xmm12
mulsd %xmm5, %xmm1
movapd %xmm8, %xmm5
mulsd %xmm7, %xmm9
movapd %xmm4, %xmm7
subsd %xmm11, %xmm13
addsd %xmm6, %xmm11
movsd .LC5(%rip), %xmm6
subsd %xmm1, %xmm2
movapd %xmm0, %xmm1
addsd %xmm12, %xmm9
movapd %xmm14, %xmm12
xorpd %xmm3, %xmm6
subsd %xmm10, %xmm12
mulsd %xmm13, %xmm1
subsd %xmm2, %xmm7
addsd %xmm4, %xmm2
movapd %xmm6, %xmm4
addsd %xmm14, %xmm10
mulsd %xmm13, %xmm6
mulsd %xmm12, %xmm4
subsd %xmm9, %xmm5
mulsd %xmm0, %xmm12
addsd %xmm8, %xmm9
movapd %xmm0, %xmm8
mulsd %xmm11, %xmm0
addsd %xmm1, %xmm4
movapd %xmm3, %xmm1
mulsd %xmm10, %xmm3
subsd %xmm12, %xmm6
mulsd %xmm11, %xmm1
mulsd %xmm10, %xmm8
subsd %xmm3, %xmm0
addsd %xmm1, %xmm8
movapd %xmm2, %xmm1
addsd %xmm0, %xmm1
subsd %xmm0, %xmm2
movapd %xmm7, %xmm0
subsd %xmm6, %xmm7
addsd %xmm6, %xmm0
movsd %xmm1, (%r8)
movapd %xmm9, %xmm1
movq 40(%rax), %rdx
subsd %xmm8, %xmm9
addsd %xmm8, %xmm1
movsd %xmm1, 7(%rbx,%rdx)
movq -8(%rax), %rbx
movq 40(%rax), %rdx
movsd %xmm2, 7(%rdx,%rbx,2)
movq -16(%rax), %rbx
movq 40(%rax), %rdx
movsd %xmm9, 7(%rdx,%rbx,2)
movq -24(%rax), %rbx
movq 40(%rax), %rdx
movsd %xmm0, 7(%rdx,%rbx,2)
movapd %xmm5, %xmm0
movq -32(%rax), %rbx
movq 40(%rax), %rdx
subsd %xmm4, %xmm5
addsd %xmm4, %xmm0
movsd %xmm0, 7(%rdx,%rbx,2)
movq -40(%rax), %rbx
movq 40(%rax), %rdx
movsd %xmm7, 7(%rdx,%rbx,2)
movq -48(%rax), %rbx
movq 40(%rax), %rdx
movsd %xmm5, 7(%rdx,%rbx,2)
jg .L2752
And I still count 117 instructions in the loop in comment 64 (whether that
matters, I don't know).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (67 preceding siblings ...)
2009-05-07 5:27 ` lucier at math dot purdue dot edu
@ 2009-05-07 13:41 ` bonzini at gnu dot org
2009-05-07 15:41 ` steven at gcc dot gnu dot org
` (46 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-07 13:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #67 from bonzini at gnu dot org 2009-05-07 13:40 -------
I'm thinking of enabling -frename-registers on x86; since it does not enable
the first scheduling pass, the live ranges will be shorter and the register
allocator may reuse the same register over and over with no freedom on
schedule-insns2.
This would leave only the bug with RTL loop invariant motion.
Brad, you are the one who's regularly producing "insane" testcases, can you
measure the slowdown from -O1 to -O1 -frename-registers? It is a local pass,
so it should not be that much, but I'd rather check before (I'll check on a
bootstrap instead).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (68 preceding siblings ...)
2009-05-07 13:41 ` bonzini at gnu dot org
@ 2009-05-07 15:41 ` steven at gcc dot gnu dot org
2009-05-07 15:58 ` lucier at math dot purdue dot edu
` (45 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: steven at gcc dot gnu dot org @ 2009-05-07 15:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #68 from steven at gcc dot gnu dot org 2009-05-07 15:40 -------
Be careful with -frename-registers, it is quadratic in the size of a basic
block. For Bradley's test cases it will certainly give a slow-down.
I have tried a rewrite of -frename-registers, but I keep running into trouble
with the INDEX_REGS and BASE_REGS non-classes. Paolo, we could look at this
stuff together if you want my help.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (69 preceding siblings ...)
2009-05-07 15:41 ` steven at gcc dot gnu dot org
@ 2009-05-07 15:58 ` lucier at math dot purdue dot edu
2009-05-07 16:01 ` lucier at math dot purdue dot edu
` (44 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-07 15:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #69 from lucier at math dot purdue dot edu 2009-05-07 15:57 -------
Well, adding -frename-registers by itself to -O1 and not
-fforward-propagate and -fno-move-loop-invariants doesn't help (loop is given
below, along with complete compile options), the time is
140 ms cpu time (140 user, 0 system)
and adding -frename-registers and -fno-move-loop-invariants without
-fforward-propagate doesn't help (loop is again given below), it gets
140 ms cpu time (140 user, 0 system)
Adding all three gives a very consistent time this morning of
120 ms cpu time (120 user, 0 system)
so which is the same as the 4.2.4 time without any of these options (this
morning).
But -fforward-propagate is not a viable option in general for this type of
code; here are some times for the testcase from PR 31957 with various options
on a 2.something GHz Xeon server:
pythagoras-45% time /pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I.
-Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math
-fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp
-frename-registers -DHAVE_CONFIG_H -D___PRIMAL -D___LIBRARY -c compiler.i
-ftime-report -fmem-report >& rename-report
252.987u 9.592s 4:23.20 99.7% 0+0k 0+0io 0pf+0w
pythagoras-46% time /pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I.
-Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math
-fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp
-DHAVE_CONFIG_H -D___PRIMAL -D___LIBRARY -c compiler.i -ftime-report
-fmem-report > & no-rename-report
249.875u 10.544s 4:21.73 99.4% 0+0k 0+0io 0pf+0w
pythagoras-47% time /pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I.
-Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math
-fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp
-frename-registers -fno-move-loop-invariants -DHAVE_CONFIG_H -D___PRIMAL
-D___LIBRARY -c compiler.i -ftime-report -fmem-report > &
rename-no-move-loop-invariants-report
246.663u 10.484s 4:18.30 99.5% 0+0k 0+0io 0pf+0w
pythagoras-48% time /pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I.
-Wall -W -Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math
-fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp
-frename-registers -fno-move-loop-invariants -fforward-propagate
-DHAVE_CONFIG_H -D___PRIMAL -D___LIBRARY -c compiler.i -ftime-report
-fmem-report > & rename-no-move-loop-invariants-forward-propagate-report
357.830u 28.417s 6:27.81 99.5% 0+0k 0+0io 11pf+0w
With -fforward-propagate the memory required went up to at least 21GB.
I'll attach the time reports for the various options, but the compiler
wasn't configured to provide detailed memory reports.
Brad
Loop with -frename-registers
/pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W
-Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math
-fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp
-frename-registers -DHAVE_CONFIG_H -D___PRIMAL -D___LIBRARY
-D___GAMBCDIR="\"/usr/local/Gambit-C/v4.1.2\"" -D___SYS_TYPE_CPU="\"x86_64\""
-D___SYS_TYPE_VENDOR="\"unknown\"" -D___SYS_TYPE_OS="\"linux-gnu\"" -c _num.c
movq %rdx, %r12
addq (%r11), %r12
leaq 4(%rdx), %r14
movq %r12, (%rsi)
addq $4, %r12
movq %r12, (%r10)
movq (%r11), %rcx
addq (%rsi), %rcx
movq %rcx, (%rbx)
addq $4, %rcx
movq %rcx, (%r9)
movq (%r11), %r13
addq (%rbx), %r13
movq %r13, (%r8)
addq $4, %r13
movq %r13, (%r15)
movq (%rax), %rcx
movq (%r8), %r12
addq $7, %rcx
movsd (%rcx,%r12,2), %xmm10
movq (%rbx), %r12
movsd (%rcx,%r13,2), %xmm13
movq (%r9), %r13
movsd (%rcx,%r12,2), %xmm6
movq (%rsi), %r12
movsd (%rcx,%r13,2), %xmm5
movq (%r10), %r13
movsd (%rcx,%r12,2), %xmm9
leaq (%r14,%r14), %r12
movsd (%rcx,%r13,2), %xmm11
leaq (%rcx,%rdx,2), %r13
movsd (%rcx,%r12), %xmm3
movq 24(%rdi), %rcx
movsd (%r13), %xmm4
addq $8, %rdx
movsd 15(%rcx), %xmm14
movsd 7(%rcx), %xmm15
movapd %xmm14, %xmm8
movapd %xmm14, %xmm7
movapd %xmm15, %xmm12
mulsd %xmm10, %xmm8
mulsd %xmm13, %xmm12
mulsd %xmm15, %xmm10
mulsd %xmm14, %xmm13
movsd 31(%rcx), %xmm2
addsd %xmm8, %xmm12
movapd %xmm15, %xmm8
mulsd %xmm6, %xmm7
mulsd %xmm5, %xmm14
subsd %xmm13, %xmm10
mulsd %xmm5, %xmm8
movapd %xmm2, %xmm13
mulsd %xmm6, %xmm15
movapd %xmm4, %xmm6
xorpd .LC5(%rip), %xmm13
movapd %xmm3, %xmm5
addsd %xmm7, %xmm8
movapd %xmm11, %xmm7
subsd %xmm14, %xmm15
movapd %xmm9, %xmm14
movsd 23(%rcx), %xmm0
subsd %xmm12, %xmm7
subsd %xmm10, %xmm14
movapd %xmm13, %xmm1
addsd %xmm11, %xmm12
movapd %xmm2, %xmm11
subsd %xmm15, %xmm6
addsd %xmm4, %xmm15
movapd %xmm0, %xmm4
mulsd %xmm7, %xmm1
addsd %xmm9, %xmm10
mulsd %xmm14, %xmm4
subsd %xmm8, %xmm5
mulsd %xmm0, %xmm7
addsd %xmm3, %xmm8
mulsd %xmm13, %xmm14
movapd %xmm15, %xmm9
mulsd %xmm10, %xmm11
mulsd %xmm0, %xmm10
addsd %xmm1, %xmm4
movapd %xmm8, %xmm3
movapd %xmm5, %xmm1
subsd %xmm7, %xmm14
movapd %xmm0, %xmm7
mulsd %xmm12, %xmm7
addsd %xmm4, %xmm1
mulsd %xmm2, %xmm12
movapd %xmm6, %xmm2
subsd %xmm14, %xmm6
addsd %xmm14, %xmm2
addsd %xmm11, %xmm7
subsd %xmm12, %xmm10
subsd %xmm4, %xmm5
addsd %xmm7, %xmm3
addsd %xmm10, %xmm9
subsd %xmm10, %xmm15
subsd %xmm7, %xmm8
movsd %xmm9, (%r13)
movq (%rax), %rcx
movsd %xmm3, 7(%r12,%rcx)
movq (%rsi), %r13
movq (%rax), %rcx
movsd %xmm15, 7(%rcx,%r13,2)
movq (%r10), %r12
movq (%rax), %r13
movsd %xmm8, 7(%r13,%r12,2)
movq (%rbx), %rcx
movq (%rax), %r13
movsd %xmm2, 7(%r13,%rcx,2)
movq (%r9), %r12
movq (%rax), %rcx
movsd %xmm1, 7(%rcx,%r12,2)
movq (%r8), %r13
movq (%rax), %rcx
movsd %xmm6, 7(%rcx,%r13,2)
movq (%r15), %r12
movq (%rax), %r13
movsd %xmm5, 7(%r13,%r12,2)
cmpq %rdx, -104(%rsp)
jg .L2941
Loop with -frename-registers -fno-move-loop-invariants
/pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W
-Wno-unused -O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math
-fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp
-frename-registers -fno-move-loop-invariants -DHAVE_CONFIG_H -D___PRIMAL
-D___LIBRARY -D___GAMBCDIR="\"/usr/local/Gambit-C/v4.1.2\""
-D___SYS_TYPE_CPU="\"x86_64\"" -D___SYS_TYPE_VENDOR="\"unknown\""
-D___SYS_TYPE_OS="\"linux-gnu\"" -c _num.c
.L2755:
leaq 8(%rax), %rdx
movq %rcx, %r13
leaq -16(%rax), %r9
leaq -8(%rax), %r10
leaq -24(%rax), %r8
leaq -32(%rax), %rdi
addq (%rdx), %r13
leaq 4(%rcx), %r14
leaq 4(%r13), %rsi
movq %r13, (%r10)
movq %rsi, (%r9)
addq (%rdx), %r13
leaq -40(%rax), %rsi
leaq 4(%r13), %r11
movq %r13, (%r8)
movq %r11, (%rdi)
addq (%rdx), %r13
leaq -48(%rax), %r11
leaq 40(%rax), %rdx
movq %r13, (%rsi)
addq $4, %r13
movq %r13, (%r11)
movq (%rdx), %rbx
movq (%rsi), %r12
addq $7, %rbx
movsd (%rbx,%r12,2), %xmm11
movq (%r8), %r12
movsd (%rbx,%r13,2), %xmm9
movq (%rdi), %r13
movsd (%rbx,%r12,2), %xmm7
movq (%r10), %r12
movsd (%rbx,%r13,2), %xmm5
movq (%r9), %r13
movsd (%rbx,%r12,2), %xmm6
leaq (%r14,%r14), %r12
movsd (%rbx,%r13,2), %xmm14
leaq (%rbx,%rcx,2), %r13
movsd (%rbx,%r12), %xmm8
movq 24(%rax), %rbx
movapd %xmm6, %xmm13
addq $8, %rcx
movsd (%r13), %xmm4
cmpq %rcx, %r15
movsd 15(%rbx), %xmm1
movsd 7(%rbx), %xmm2
movapd %xmm1, %xmm3
movsd 31(%rbx), %xmm0
movapd %xmm2, %xmm10
mulsd %xmm11, %xmm3
movapd %xmm2, %xmm12
mulsd %xmm9, %xmm10
mulsd %xmm2, %xmm11
mulsd %xmm1, %xmm9
mulsd %xmm7, %xmm2
addsd %xmm10, %xmm3
mulsd %xmm5, %xmm12
movapd %xmm14, %xmm10
movsd 23(%rbx), %xmm15
subsd %xmm9, %xmm11
movapd %xmm1, %xmm9
mulsd %xmm5, %xmm1
movapd %xmm8, %xmm5
mulsd %xmm7, %xmm9
subsd %xmm3, %xmm10
movapd %xmm4, %xmm7
subsd %xmm11, %xmm13
addsd %xmm6, %xmm11
movsd .LC5(%rip), %xmm6
subsd %xmm1, %xmm2
xorpd %xmm0, %xmm6
addsd %xmm14, %xmm3
addsd %xmm12, %xmm9
movapd %xmm15, %xmm14
movapd %xmm0, %xmm12
subsd %xmm2, %xmm7
mulsd %xmm13, %xmm14
addsd %xmm4, %xmm2
movapd %xmm6, %xmm4
subsd %xmm9, %xmm5
mulsd %xmm3, %xmm0
addsd %xmm8, %xmm9
mulsd %xmm10, %xmm4
movapd %xmm15, %xmm8
mulsd %xmm15, %xmm10
mulsd %xmm11, %xmm15
movapd %xmm7, %xmm1
mulsd %xmm13, %xmm6
mulsd %xmm3, %xmm8
movapd %xmm9, %xmm3
mulsd %xmm11, %xmm12
addsd %xmm14, %xmm4
subsd %xmm0, %xmm15
movapd %xmm5, %xmm0
subsd %xmm10, %xmm6
movapd %xmm2, %xmm10
addsd %xmm12, %xmm8
addsd %xmm15, %xmm10
subsd %xmm15, %xmm2
addsd %xmm6, %xmm1
addsd %xmm8, %xmm3
movsd %xmm10, (%r13)
movq (%rdx), %rbx
subsd %xmm8, %xmm9
addsd %xmm4, %xmm0
subsd %xmm6, %xmm7
movsd %xmm3, 7(%r12,%rbx)
movq (%r10), %r10
movq (%rdx), %r13
subsd %xmm4, %xmm5
movsd %xmm2, 7(%r13,%r10,2)
movq (%r9), %rbx
movq (%rdx), %r12
movsd %xmm9, 7(%r12,%rbx,2)
movq (%r8), %r13
movq (%rdx), %r10
movsd %xmm1, 7(%r10,%r13,2)
movq (%rdi), %r9
movq (%rdx), %rbx
movsd %xmm0, 7(%rbx,%r9,2)
movq (%rsi), %rsi
movq (%rdx), %r8
movsd %xmm7, 7(%r8,%rsi,2)
movq (%r11), %rdi
movq (%rdx), %r12
movsd %xmm5, 7(%r12,%rdi,2)
jg .L2755
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (70 preceding siblings ...)
2009-05-07 15:58 ` lucier at math dot purdue dot edu
@ 2009-05-07 16:01 ` lucier at math dot purdue dot edu
2009-05-07 16:03 ` lucier at math dot purdue dot edu
` (43 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-07 16:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #70 from lucier at math dot purdue dot edu 2009-05-07 16:00 -------
Created an attachment (id=17819)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17819&action=view)
time report related to comment 69, time for PR 31957 with no options
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (72 preceding siblings ...)
2009-05-07 16:03 ` lucier at math dot purdue dot edu
@ 2009-05-07 16:03 ` lucier at math dot purdue dot edu
2009-05-07 16:04 ` lucier at math dot purdue dot edu
` (41 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-07 16:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #72 from lucier at math dot purdue dot edu 2009-05-07 16:03 -------
Created an attachment (id=17821)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17821&action=view)
time for 31957, with rename-registers no-move-loop-invariants
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (71 preceding siblings ...)
2009-05-07 16:01 ` lucier at math dot purdue dot edu
@ 2009-05-07 16:03 ` lucier at math dot purdue dot edu
2009-05-07 16:03 ` lucier at math dot purdue dot edu
` (42 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-07 16:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #71 from lucier at math dot purdue dot edu 2009-05-07 16:02 -------
Created an attachment (id=17820)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17820&action=view)
time for 31957, with rename-registers
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (73 preceding siblings ...)
2009-05-07 16:03 ` lucier at math dot purdue dot edu
@ 2009-05-07 16:04 ` lucier at math dot purdue dot edu
2009-05-07 16:21 ` bonzini at gnu dot org
` (40 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-07 16:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #73 from lucier at math dot purdue dot edu 2009-05-07 16:04 -------
Created an attachment (id=17822)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17822&action=view)
time for 31957, with rename-registers no-move-loop-invariants forward-propagate
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (74 preceding siblings ...)
2009-05-07 16:04 ` lucier at math dot purdue dot edu
@ 2009-05-07 16:21 ` bonzini at gnu dot org
2009-05-07 16:32 ` lucier at math dot purdue dot edu
` (39 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-07 16:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #74 from bonzini at gnu dot org 2009-05-07 16:21 -------
Ok. One step at a time. :-) To recap, here is the situation:
- the CSE optimization you mention was *not* removed, it was moved to fwprop,
so it does not run at -O1.
- once this was done, the way to go is to tune new optimizations, not to
reintroduce old ones
- for example, fwprop in turn triggered a bad choice in loop invariant motion,
for which a patch has been posted. This patch will remove the need for
-fno-move-loop-invariants on this testcase (this is a deficiency in LIM that is
not specific to machine-generated code, OTOH the presence of many fp[N]
accesses helps triggering it).
- that scheduling is necessary now and not in 4.2.x, probably is just a matter
of luck
- why renaming registers is necessary now and not in 4.2.x is still a mystery;
but, there is an explanation as to why it helps (it prolongs live ranges,
something that on non-x86 archs is done by the pre-regalloc scheduling)
- at least we have a set of options providing good performance on this
testcase, and guidance towards better tuning of the various problematic
optimizations
To conclude, nobody is underestimating the significance of its PR, it's just a
matter of priorities. Near the end of the release cycle, you tend to look at
PRs with small testcases to minimize the time spent understanding the code;
near the beginning, you hope that new features magically fix the PRs and
concentrate on wrong-code bugs and so on. Complex P2s such as this one
unfortunately tend to stay in a limbo.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (75 preceding siblings ...)
2009-05-07 16:21 ` bonzini at gnu dot org
@ 2009-05-07 16:32 ` lucier at math dot purdue dot edu
2009-05-07 16:38 ` bonzini at gnu dot org
` (38 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-07 16:32 UTC (permalink / raw)
To: gcc-bugs
------- Comment #75 from lucier at math dot purdue dot edu 2009-05-07 16:31 -------
Subject: Re: [4.3/4.4/4.5 Regression] 30% performance slowdown in
floating-point code caused by r118475
On May 7, 2009, at 12:21 PM, bonzini at gnu dot org wrote:
> ------- Comment #74 from bonzini at gnu dot org 2009-05-07 16:21
> -------
> Ok. One step at a time. :-) To recap, here is the situation:
>
> - that scheduling is necessary now and not in 4.2.x, probably is
> just a matter
> of luck
If you mean -fschedule-insns2, it has always been part of the options
list.
> - at least we have a set of options providing good performance on this
> testcase, and guidance towards better tuning of the various
> problematic
> optimizations
OK, but -fforward-propagate is not viable in general for these
machine-generated codes.
>
Brad
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (76 preceding siblings ...)
2009-05-07 16:32 ` lucier at math dot purdue dot edu
@ 2009-05-07 16:38 ` bonzini at gnu dot org
2009-05-07 17:50 ` steven at gcc dot gnu dot org
` (37 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-07 16:38 UTC (permalink / raw)
To: gcc-bugs
------- Comment #76 from bonzini at gnu dot org 2009-05-07 16:37 -------
It should be possible to modify fwprop to avoid excessive memory usage (doing
its own dataflow, basically, instead of using UD chains)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (77 preceding siblings ...)
2009-05-07 16:38 ` bonzini at gnu dot org
@ 2009-05-07 17:50 ` steven at gcc dot gnu dot org
2009-05-08 6:51 ` bonzini at gcc dot gnu dot org
` (36 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: steven at gcc dot gnu dot org @ 2009-05-07 17:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #77 from steven at gcc dot gnu dot org 2009-05-07 17:50 -------
Re. comment #75: Just the fact that an option is enabled in both releases
doesn't mean the pass behind it is doing the same thing in both releases. What
the scheduler does, depends heavily on the code you feed it. Sometimes it is
pure (good or bad) luck that changes the behavior of a pass in the compiler.
The interactions between all the pieces are just very complicated (which is
why, IMHO, retargetable-compiler engineering is so difficult: controlling the
pipeline is undoable).
Re. comment #76:
Sad as it may be, I think this is the best short-term solution.
Alternatively we could re-work fwprop to work on regions and use the
partial-CFG dataflow stuff, similar to what the RTL loop optimizers (like
loop-invariant) do. To be honest, I'd much prefer the latter, but the
DIY-fwprop thing is probably easier in the short term.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (78 preceding siblings ...)
2009-05-07 17:50 ` steven at gcc dot gnu dot org
@ 2009-05-08 6:51 ` bonzini at gcc dot gnu dot org
2009-05-08 7:18 ` bonzini at gnu dot org
` (35 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gcc dot gnu dot org @ 2009-05-08 6:51 UTC (permalink / raw)
To: gcc-bugs
------- Comment #78 from bonzini at gnu dot org 2009-05-08 06:51 -------
Subject: Bug 33928
Author: bonzini
Date: Fri May 8 06:51:12 2009
New Revision: 147270
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=147270
Log:
2009-05-08 Paolo Bonzini <bonzini@gnu.org>
PR rtl-optimization/33928
* loop-invariant.c (struct use): Add addr_use_p.
(struct def): Add n_addr_uses.
(struct invariant): Add cheap_address.
(create_new_invariant): Set cheap_address.
(record_use): Accept df_ref. Set addr_use_p and update n_addr_uses.
(record_uses): Pass df_ref to record_use.
(get_inv_cost): Do not add inv->cost to comp_cost for cheap addresses
used
only as such.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/loop-invariant.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (79 preceding siblings ...)
2009-05-08 6:51 ` bonzini at gcc dot gnu dot org
@ 2009-05-08 7:18 ` bonzini at gnu dot org
2009-05-08 7:52 ` bonzini at gcc dot gnu dot org
` (34 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-08 7:18 UTC (permalink / raw)
To: gcc-bugs
------- Comment #79 from bonzini at gnu dot org 2009-05-08 07:18 -------
I'm cobbling up the DIY dataflow patch and it is all but ugly, actually.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (80 preceding siblings ...)
2009-05-08 7:18 ` bonzini at gnu dot org
@ 2009-05-08 7:52 ` bonzini at gcc dot gnu dot org
2009-05-08 7:55 ` bonzini at gnu dot org
` (33 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gcc dot gnu dot org @ 2009-05-08 7:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #80 from bonzini at gnu dot org 2009-05-08 07:51 -------
Subject: Bug 33928
Author: bonzini
Date: Fri May 8 07:51:46 2009
New Revision: 147274
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=147274
Log:
2009-05-08 Paolo Bonzini <bonzini@gnu.org>
PR rtl-optimization/33928
* loop-invariant.c (record_use): Fix && vs. || mishap.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/loop-invariant.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (81 preceding siblings ...)
2009-05-08 7:52 ` bonzini at gcc dot gnu dot org
@ 2009-05-08 7:55 ` bonzini at gnu dot org
2009-05-08 9:41 ` bonzini at gnu dot org
` (32 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-08 7:55 UTC (permalink / raw)
To: gcc-bugs
------- Comment #81 from bonzini at gnu dot org 2009-05-08 07:55 -------
Created an attachment (id=17825)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17825&action=view)
speed up fwprop and enable it at -O1
Here is a patch I'm bootstrapping to remove fwprop's usage of UD chains. It
does not affect at all the assembly output, it just changes the data structure
that is used.
compiler.i is probably too big for me, but I tried slatex.i and fwprop was ~2%
of compilation time with this patch.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (82 preceding siblings ...)
2009-05-08 7:55 ` bonzini at gnu dot org
@ 2009-05-08 9:41 ` bonzini at gnu dot org
2009-05-08 12:23 ` bonzini at gcc dot gnu dot org
` (31 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-08 9:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #82 from bonzini at gnu dot org 2009-05-08 09:41 -------
Hm, looking at the time reports the patch will save about 30-40% of the fwprop
execution time, and should fix the memory hog problem, but will still leave in
the 70s needed to compute reaching definitions. I guess it's a step forward
for -O2 but borderline for -O1.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (83 preceding siblings ...)
2009-05-08 9:41 ` bonzini at gnu dot org
@ 2009-05-08 12:23 ` bonzini at gcc dot gnu dot org
2009-05-15 10:36 ` bonzini at gnu dot org
` (30 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gcc dot gnu dot org @ 2009-05-08 12:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #83 from bonzini at gnu dot org 2009-05-08 12:22 -------
Subject: Bug 33928
Author: bonzini
Date: Fri May 8 12:22:30 2009
New Revision: 147282
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=147282
Log:
2009-05-08 Paolo Bonzini <bonzini@gnu.org>
PR rtl-optimization/33928
PR 26854
* fwprop.c (use_def_ref, get_def_for_use, bitmap_only_bit_bitween,
process_uses, build_single_def_use_links): New.
(update_df): Update use_def_ref.
(forward_propagate_into): Use get_def_for_use instead of use-def
chains.
(fwprop_init): Call build_single_def_use_links and let it initialize
dataflow.
(fwprop_done): Free use_def_ref.
(fwprop_addr): Eliminate duplicate call to df_set_flags.
* df-problems.c (df_rd_simulate_artificial_defs_at_top,
df_rd_simulate_one_insn): New.
(df_rd_bb_local_compute_process_def): Update head comment.
(df_chain_create_bb): Use the new RD simulation functions.
* df.h (df_rd_simulate_artificial_defs_at_top,
df_rd_simulate_one_insn): New.
* opts.c (decode_options): Enable fwprop at -O1.
* doc/invoke.texi (-fforward-propagate): Document this.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/df-problems.c
trunk/gcc/df.h
trunk/gcc/doc/invoke.texi
trunk/gcc/fwprop.c
trunk/gcc/opts.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (84 preceding siblings ...)
2009-05-08 12:23 ` bonzini at gcc dot gnu dot org
@ 2009-05-15 10:36 ` bonzini at gnu dot org
2009-05-16 0:20 ` lucier at math dot purdue dot edu
` (29 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-05-15 10:36 UTC (permalink / raw)
To: gcc-bugs
------- Comment #84 from bonzini at gnu dot org 2009-05-15 10:35 -------
Ok, I am working on a patch to add a multiple-definitions DF problem and use
that together with a domwalk to find the single definitions (instead of
reaching-definitions, which is the remaining slow part). The new problem has a
bitvector sized by the number of registers rather than the number of defs (that
is sized like the bitvectors for liveness), which means it will be fast. It is
defined as follows:
MDkill (B) = regs that have a def in B
MDinit (B) = (union of MDkill (P) for every P : B \in DomFrontier(P) \cap
LRin(B)
MDin (B) = MDinit (B) \cup (union of MDout (P) for every predecessor P of B)
MDout (B) = MDin (B) - MDkill (B)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (85 preceding siblings ...)
2009-05-15 10:36 ` bonzini at gnu dot org
@ 2009-05-16 0:20 ` lucier at math dot purdue dot edu
2009-05-16 0:29 ` lucier at math dot purdue dot edu
` (28 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-16 0:20 UTC (permalink / raw)
To: gcc-bugs
------- Comment #85 from lucier at math dot purdue dot edu 2009-05-16 00:20 -------
Created an attachment (id=17878)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17878&action=view)
Large test file for testing time and memory usage
This is the file compiler.i used in the previous tests.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (86 preceding siblings ...)
2009-05-16 0:20 ` lucier at math dot purdue dot edu
@ 2009-05-16 0:29 ` lucier at math dot purdue dot edu
2009-05-16 0:33 ` lucier at math dot purdue dot edu
` (27 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-16 0:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #86 from lucier at math dot purdue dot edu 2009-05-16 00:29 -------
Created an attachment (id=17879)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17879&action=view)
Time and memory report for compiler.i
This is the time and memory report after the hack from
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39301#c8
to make the statistic fields HOST_WIDEST_INTs.
Some interesting lines:
fwprop.c:178 (build_single_def_use_links) 8 8438189160
82240 0 1027496
df-problems.c:311 (df_rd_alloc) 155420 8433928200
8433870880 8433870880 0
df-problems.c:593 (df_rd_transfer_functio 909666 40718919320
6755812320 6755736840 2025096
Total 13171390 61130398320
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (87 preceding siblings ...)
2009-05-16 0:29 ` lucier at math dot purdue dot edu
@ 2009-05-16 0:33 ` lucier at math dot purdue dot edu
2009-06-08 8:40 ` bonzini at gnu dot org
` (26 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-05-16 0:33 UTC (permalink / raw)
To: gcc-bugs
------- Comment #87 from lucier at math dot purdue dot edu 2009-05-16 00:33 -------
The compiler options for the previous report:
/pkgs/gcc-mainline/bin/gcc -save-temps -I../include -I. -Wall -W -Wno-unused
-O1 -fno-math-errno -fschedule-insns2 -fno-trapping-math -fno-strict-aliasing
-fwrapv -fomit-frame-pointer -fPIC -fno-common -mieee-fp -frename-registers
-fno-move-loop-invariants -fforward-propagate -DHAVE_CONFIG_H -D___PRIMAL
-D___LIBRARY -c compiler.i -ftime-report -fmem-report > &
rename-no-move-loop-invariants-forward-propagate-report-new
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (88 preceding siblings ...)
2009-05-16 0:33 ` lucier at math dot purdue dot edu
@ 2009-06-08 8:40 ` bonzini at gnu dot org
2009-06-08 8:59 ` bonzini at gnu dot org
` (25 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-06-08 8:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #88 from bonzini at gnu dot org 2009-06-08 08:40 -------
Created an attachment (id=17963)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17963&action=view)
patch I'm testing
Here is a patch I'm testing that completes the rewrite of fwprop's dataflow.
This should make it much faster and less memory hungry. It should also keep
the generated code fast (with -frename-registers of course), if not it's a bug
in the patch.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (89 preceding siblings ...)
2009-06-08 8:40 ` bonzini at gnu dot org
@ 2009-06-08 8:59 ` bonzini at gnu dot org
2009-06-08 16:36 ` bonzini at gnu dot org
` (24 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-06-08 8:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #89 from bonzini at gnu dot org 2009-06-08 08:59 -------
Created an attachment (id=17964)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17964&action=view)
correct version
oops, the previous one didn't work at -O1 even though it bootstrapped :-)
--
bonzini at gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #17963|0 |1
is obsolete| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (90 preceding siblings ...)
2009-06-08 8:59 ` bonzini at gnu dot org
@ 2009-06-08 16:36 ` bonzini at gnu dot org
2009-06-08 18:19 ` lucier at math dot purdue dot edu
` (23 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-06-08 16:36 UTC (permalink / raw)
To: gcc-bugs
------- Comment #90 from bonzini at gnu dot org 2009-06-08 16:35 -------
Yo, with the patch the time to compile compiler.i with the given options is
331s on my machine (with a checking compiler). Fwprop takes only 1% (including
computation of the new dataflow problem). I'd estimate around 250s with your
nonchecking build. I'll split it and post it tomorrow.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (91 preceding siblings ...)
2009-06-08 16:36 ` bonzini at gnu dot org
@ 2009-06-08 18:19 ` lucier at math dot purdue dot edu
2009-06-12 14:51 ` bonzini at gnu dot org
` (22 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-06-08 18:19 UTC (permalink / raw)
To: gcc-bugs
------- Comment #91 from lucier at math dot purdue dot edu 2009-06-08 18:19 -------
Created an attachment (id=17968)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17968&action=view)
time and memory report for compiler.i after Paolo's patch
The patch cut the total bitmaps used compiling compiler.i from > 60GB to 3GB;
maximum memory (just from top) was 1631MB.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (92 preceding siblings ...)
2009-06-08 18:19 ` lucier at math dot purdue dot edu
@ 2009-06-12 14:51 ` bonzini at gnu dot org
2009-06-13 14:18 ` rguenth at gcc dot gnu dot org
` (21 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-06-12 14:51 UTC (permalink / raw)
To: gcc-bugs
------- Comment #92 from bonzini at gnu dot org 2009-06-12 14:50 -------
In the meanwhile something caused "tree incremental SSA" to jump up from 10s to
26s. Sob.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (93 preceding siblings ...)
2009-06-12 14:51 ` bonzini at gnu dot org
@ 2009-06-13 14:18 ` rguenth at gcc dot gnu dot org
2009-06-14 4:44 ` jamborm at gcc dot gnu dot org
` (20 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-06-13 14:18 UTC (permalink / raw)
To: gcc-bugs
------- Comment #93 from rguenth at gcc dot gnu dot org 2009-06-13 14:18 -------
I would say that was the new SRA.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mjambor at suse dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (94 preceding siblings ...)
2009-06-13 14:18 ` rguenth at gcc dot gnu dot org
@ 2009-06-14 4:44 ` jamborm at gcc dot gnu dot org
2009-06-14 14:59 ` lucier at math dot purdue dot edu
` (19 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: jamborm at gcc dot gnu dot org @ 2009-06-14 4:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #94 from jamborm at gcc dot gnu dot org 2009-06-14 04:43 -------
(In reply to comment #92)
> In the meanwhile something caused "tree incremental SSA" to jump up from 10s to
> 26s. Sob.
>
(In reply to comment #93)
> I would say that was the new SRA.
>
OK, I'll try to investigate. Which of the various attachments to this
bug is the one to look at?
Martin
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (95 preceding siblings ...)
2009-06-14 4:44 ` jamborm at gcc dot gnu dot org
@ 2009-06-14 14:59 ` lucier at math dot purdue dot edu
2009-06-14 15:02 ` lucier at math dot purdue dot edu
` (18 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-06-14 14:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #95 from lucier at math dot purdue dot edu 2009-06-14 14:59 -------
The test case is compiler.i.gz
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (96 preceding siblings ...)
2009-06-14 14:59 ` lucier at math dot purdue dot edu
@ 2009-06-14 15:02 ` lucier at math dot purdue dot edu
2009-06-15 15:14 ` bonzini at gnu dot org
` (17 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-06-14 15:02 UTC (permalink / raw)
To: gcc-bugs
------- Comment #96 from lucier at math dot purdue dot edu 2009-06-14 15:02 -------
Sorry, the gcc options are in comment 87 (the -fforward-propagate is now
redundant), and without Paolo's recently proposed patch it requires about 9GB
of memory to compile.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (97 preceding siblings ...)
2009-06-14 15:02 ` lucier at math dot purdue dot edu
@ 2009-06-15 15:14 ` bonzini at gnu dot org
2009-06-15 16:12 ` lucier at math dot purdue dot edu
` (16 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-06-15 15:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #97 from bonzini at gnu dot org 2009-06-15 15:14 -------
Brad, could you try to time compiler.i with and without -ftime-report to see
how much of the "tree stmt walking" timevar is just accounting overhead?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (98 preceding siblings ...)
2009-06-15 15:14 ` bonzini at gnu dot org
@ 2009-06-15 16:12 ` lucier at math dot purdue dot edu
2009-06-15 16:21 ` paolo dot bonzini at gmail dot com
` (15 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-06-15 16:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #98 from lucier at math dot purdue dot edu 2009-06-15 16:11 -------
I don't quite understand how you would like me to configure and run the test.
First, I've applied your patches to speed up computing DF to my tree; do you
want them included in the test, or should I use a pristine mainline?
Second, when configuring mainline, should I include, or not include
1. --enable-gather-detailed-mem-stats
2. --enable-checking=release
After that, I think you just want to run two compiles with and without
-ftime-report, is that right? (Nothing about -fmem-report.)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (99 preceding siblings ...)
2009-06-15 16:12 ` lucier at math dot purdue dot edu
@ 2009-06-15 16:21 ` paolo dot bonzini at gmail dot com
2009-06-15 16:22 ` bonzini at gnu dot org
` (14 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: paolo dot bonzini at gmail dot com @ 2009-06-15 16:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #99 from paolo dot bonzini at gmail dot com 2009-06-15 16:20 -------
Subject: Re: [4.3/4.4/4.5 Regression] 30% performance
slowdown in floating-point code caused by r118475
> First, I've applied your patches to speed up computing DF to my tree; do you
> want them included in the test, or should I use a pristine mainline?
It doesn't matter, but yes, use them.
> Second, when configuring mainline, should I include, or not include
>
> 1. --enable-gather-detailed-mem-stats
> 2. --enable-checking=release
Again it shouldn't matter, but use only --enable-checking=release.
> After that, I think you just want to run two compiles with and without
> -ftime-report, is that right? (Nothing about -fmem-report.)
Yes, and the output of -ftime-report is not needed. Just the "time
./cc1 ..." output for the two. Thanks!
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (100 preceding siblings ...)
2009-06-15 16:21 ` paolo dot bonzini at gmail dot com
@ 2009-06-15 16:22 ` bonzini at gnu dot org
2009-06-15 16:26 ` [Bug rtl-optimization/33928] [4.3/4.4 " bonzini at gnu dot org
` (13 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-06-15 16:22 UTC (permalink / raw)
To: gcc-bugs
------- Comment #100 from bonzini at gnu dot org 2009-06-15 16:22 -------
Just as a reminder for after the fwprop patches are committed, the problem in
CFG cleanup is that the iterative fixing of dominators in
remove_edge_and_dominated_blocks is very expensive. Probably we should make
sure no dominators are there in some key cfgcleanup passes.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (101 preceding siblings ...)
2009-06-15 16:22 ` bonzini at gnu dot org
@ 2009-06-15 16:26 ` bonzini at gnu dot org
2009-06-15 19:57 ` lucier at math dot purdue dot edu
` (12 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-06-15 16:26 UTC (permalink / raw)
To: gcc-bugs
------- Comment #101 from bonzini at gnu dot org 2009-06-15 16:26 -------
Time for cleanup. This bug is fixed on mainline, and likely WONTFIX on 4.3/4.4
(though it could in principle be fixed by backporting the fwprop patches to
4.4). I'll add some pointers to PR26854 for the attachments related to
compile-time problems.
--
bonzini at gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |4.5.0
Summary|[4.3/4.4/4.5 Regression] 30%|[4.3/4.4 Regression] 30%
|performance slowdown in |performance slowdown in
|floating-point code caused |floating-point code caused
|by r118475 |by r118475
Version|4.3.0 |4.5.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (102 preceding siblings ...)
2009-06-15 16:26 ` [Bug rtl-optimization/33928] [4.3/4.4 " bonzini at gnu dot org
@ 2009-06-15 19:57 ` lucier at math dot purdue dot edu
2009-06-15 20:21 ` [Bug rtl-optimization/33928] [4.3/4.4/4.5 " lucier at math dot purdue dot edu
` (11 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-06-15 19:57 UTC (permalink / raw)
To: gcc-bugs
------- Comment #102 from lucier at math dot purdue dot edu 2009-06-15 19:57 -------
Subject: Re: [4.3/4.4/4.5 Regression] 30%
performance slowdown in floating-point code caused by r118475
On Mon, 2009-06-15 at 16:20 +0000, paolo dot bonzini at gmail dot com
wrote:
> Yes, and the output of -ftime-report is not needed. Just the "time
> ./cc1 ..." output for the two. Thanks!
The two commands:
time /pkgs/gcc-mainline/bin/gcc -O1 -fno-math-errno -fschedule-insns2
-fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC
-fno-common -mieee-fp -c compiler.i
261.424u 1.184s 4:22.76 99.9% 0+0k 0+28456io 0pf+0w
time /pkgs/gcc-mainline/bin/gcc -O1 -fno-math-errno -fschedule-insns2
-fno-trapping-math -fno-strict-aliasing -fwrapv -fomit-frame-pointer -fPIC
-fno-common -mieee-fp -c compiler.i -ftime-report
263.424u 4.900s 4:28.68 99.8% 0+0k 0+28480io 0pf+0w
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (103 preceding siblings ...)
2009-06-15 19:57 ` lucier at math dot purdue dot edu
@ 2009-06-15 20:21 ` lucier at math dot purdue dot edu
2009-06-16 6:48 ` bonzini at gnu dot org
` (10 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-06-15 20:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #103 from lucier at math dot purdue dot edu 2009-06-15 20:21 -------
Regarding comment #101 ...
With
heine:~/programs/gcc/objdirs/gsc-fft-tests/gambc-v4_1_2>
/pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
--enable-languages=c --disable-multilib --enable-checking=release
Thread model: posix
gcc version 4.5.0 20090608 (experimental) [trunk revision 148276] (GCC)
(and including Paolo's patch to speed up DF), the routine in direct.c takes
168 ms cpu time (168 user, 0 system)
As reported here
http://www.math.purdue.edu/~lucier/bugzilla/9/
with gcc-4.2.4, this routine takes 156 ms on the same machine.
Comment #9 gives the code that 4.2.4 generates at the start of the main loop;
the start of the main loop with the version of 4.5.0 I gave above is:
.L2938:
movq %rcx, %rdx
addq 8(%rax), %rdx
leaq 4(%rcx), %rbx
movq %rdx, -8(%rax)
leaq 4(%rdx), %rdi
addq 8(%rax), %rdx
movq %rdi, -16(%rax)
movq %rdx, -24(%rax)
leaq 4(%rdx), %rdi
addq 8(%rax), %rdx
movq %rdi, -32(%rax)
movq %rdx, -40(%rax)
leaq 4(%rdx), %rdi
movq 40(%rax), %rdx
movq %rdi, -48(%rax)
movsd 7(%rdx,%rdi,2), %xmm7
movq -40(%rax), %rdi
leaq 7(%rdx,%rcx,2), %r8
addq $8, %rcx
movsd (%r8), %xmm4
cmpq %rcx, %r13
movsd 7(%rdx,%rdi,2), %xmm10
movq -32(%rax), %rdi
movsd 7(%rdx,%rdi,2), %xmm5
movq -24(%rax), %rdi
movsd 7(%rdx,%rdi,2), %xmm6
movq -16(%rax), %rdi
movsd 7(%rdx,%rdi,2), %xmm13
movq -8(%rax), %rdi
movsd 7(%rdx,%rdi,2), %xmm11
leaq (%rbx,%rbx), %rdi
movsd 7(%rdi,%rdx), %xmm9
movq 24(%rax), %rdx
movapd %xmm11, %xmm14
movsd 15(%rdx), %xmm1
movsd 7(%rdx), %xmm2
movapd %xmm1, %xmm8
movsd 31(%rdx), %xmm3
movapd %xmm2, %xmm12
mulsd %xmm10, %xmm8
mulsd %xmm7, %xmm12
mulsd %xmm2, %xmm10
mulsd %xmm1, %xmm7
movsd 23(%rdx), %xmm0
So, to my mind, this is still a 4.5 regression, as there is still a slow-down
and the code is still much less optimized by 4.5.0 than by 4.2.4. 168/156 ~
1.08, so if you want to change the Summary of this bug to 8% regression, or
some other things, that's fine, but I've changed this PR back to being a 4.5
regression.
I was not really thrilled when Richard marked PR 39157 as a duplicate of this
PR. To my mind, there are three more or less independent things---run time of
Gambit-generated code, compile time of the code, and the space required to
compile the code. This PR is about run time; PR 39157 was about space needed
by the compiler; PR 26854 is about compile time. They seem to have all been
mushed together.
--
lucier at math dot purdue dot edu changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work|4.5.0 |
Summary|[4.3/4.4 Regression] 30% |[4.3/4.4/4.5 Regression] 30%
|performance slowdown in |performance slowdown in
|floating-point code caused |floating-point code caused
|by r118475 |by r118475
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (104 preceding siblings ...)
2009-06-15 20:21 ` [Bug rtl-optimization/33928] [4.3/4.4/4.5 " lucier at math dot purdue dot edu
@ 2009-06-16 6:48 ` bonzini at gnu dot org
2009-06-16 7:02 ` bonzini at gnu dot org
` (9 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-06-16 6:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #104 from bonzini at gnu dot org 2009-06-16 06:47 -------
I understood that with -frename-registers the regression is fixed. As I said,
without a pre-regalloc scheduling pass and without register renaming, the
scheduling quality you get is more or less random.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (105 preceding siblings ...)
2009-06-16 6:48 ` bonzini at gnu dot org
@ 2009-06-16 7:02 ` bonzini at gnu dot org
2009-06-16 7:25 ` lucier at math dot purdue dot edu
` (8 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: bonzini at gnu dot org @ 2009-06-16 7:02 UTC (permalink / raw)
To: gcc-bugs
------- Comment #105 from bonzini at gnu dot org 2009-06-16 07:01 -------
Marking PR39157 as a duplicate of PR26854 is not exact (only the fwprop part is
a duplicate, because we were getting large compile times because of building
large data structures; the CFG Cleanup part is not exactly a duplicate) but I
don't think it's important because anyway we have a patch for the fwprop issue.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (106 preceding siblings ...)
2009-06-16 7:02 ` bonzini at gnu dot org
@ 2009-06-16 7:25 ` lucier at math dot purdue dot edu
2009-08-04 12:37 ` rguenth at gcc dot gnu dot org
` (7 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-06-16 7:25 UTC (permalink / raw)
To: gcc-bugs
------- Comment #106 from lucier at math dot purdue dot edu 2009-06-16 07:24 -------
This machine has 4ms ticks, so we're getting down to a few ticks difference
with a benchmark of this size. It's 156ms with 4.2.4, 168ms with 4.5.0, and
164 ms when -frename-registers is added to the command line.
It's not just scheduling, there are more memory accesses with 4.5.0.
With a problem roughly 10 times as large, the times are
4.2.4: 2912ms
4.5.0: 3204ms
4.5.0: 3120ms (adding -frename-registers)
So there's a 7% difference with -frename-registers.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (107 preceding siblings ...)
2009-06-16 7:25 ` lucier at math dot purdue dot edu
@ 2009-08-04 12:37 ` rguenth at gcc dot gnu dot org
2009-08-27 1:18 ` lucier at math dot purdue dot edu
` (6 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-08-04 12:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #107 from rguenth at gcc dot gnu dot org 2009-08-04 12:28 -------
GCC 4.3.4 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.4 |4.3.5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (108 preceding siblings ...)
2009-08-04 12:37 ` rguenth at gcc dot gnu dot org
@ 2009-08-27 1:18 ` lucier at math dot purdue dot edu
2009-08-27 1:22 ` lucier at math dot purdue dot edu
` (5 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-08-27 1:18 UTC (permalink / raw)
To: gcc-bugs
------- Comment #108 from lucier at math dot purdue dot edu 2009-08-27 01:18 -------
direct.c contains a direct FFT; I've compiled the direct and inverse fft and I
ran it on arrays with 2^23 double-precision complex elements and
heine:~/programs/gcc/objdirs/bench-mainline-on-fft> /pkgs/gcc-mainline/bin/gcc
-v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release
--prefix=/pkgs/gcc-mainline --enable-languages=c,c++
-enable-stage1-languages=c,c++
Thread model: posix
gcc version 4.5.0 20090803 (experimental) [trunk revision 150373] (GCC)
The compile options were
/pkgs/gcc-mainline/bin/gcc -save-temps -c -Wno-unused -O1 -fno-math-errno
-fschedule-insns2 -fno-trapping-math -fno-strict-aliasing -fwrapv
-fomit-frame-pointer -fPIC -fno-common -mieee-fp -rdynamic -shared
-fschedule-insns
and the same without -fschedule-insns.
The runtime for direct+inverse FFT with instruction scheduling was 1.264
seconds and the time for direct+inverse FFT without -fschedule-insns was 1.444
seconds, which is a 14% speedup for that one compiler option. This is on a
2.33GHz Core 2 quad machine.
I'll attach the inner loops of direct.c with and with -fschedule-insns.
I haven't been able to compile the complete Gambit runtime with
-fschedule-insns on either x86-64 or ppc64; I've filed PR41164 and PR41176 for
those two different failures.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (109 preceding siblings ...)
2009-08-27 1:18 ` lucier at math dot purdue dot edu
@ 2009-08-27 1:22 ` lucier at math dot purdue dot edu
2009-08-27 1:23 ` lucier at math dot purdue dot edu
` (4 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-08-27 1:22 UTC (permalink / raw)
To: gcc-bugs
------- Comment #109 from lucier at math dot purdue dot edu 2009-08-27 01:22 -------
Created an attachment (id=18432)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18432&action=view)
inner loop of direct.c with -fschedule-insns
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (110 preceding siblings ...)
2009-08-27 1:22 ` lucier at math dot purdue dot edu
@ 2009-08-27 1:23 ` lucier at math dot purdue dot edu
2009-08-27 17:02 ` lucier at math dot purdue dot edu
` (3 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-08-27 1:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #110 from lucier at math dot purdue dot edu 2009-08-27 01:22 -------
Created an attachment (id=18433)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18433&action=view)
inner loop of direct.c without -fschedule-insns
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (111 preceding siblings ...)
2009-08-27 1:23 ` lucier at math dot purdue dot edu
@ 2009-08-27 17:02 ` lucier at math dot purdue dot edu
2009-10-03 1:39 ` bergner at gcc dot gnu dot org
` (2 subsequent siblings)
115 siblings, 0 replies; 117+ messages in thread
From: lucier at math dot purdue dot edu @ 2009-08-27 17:02 UTC (permalink / raw)
To: gcc-bugs
------- Comment #111 from lucier at math dot purdue dot edu 2009-08-27 17:02 -------
I can compile gambit 4.1.2 with -fschedule-insns except for the function noted
in PR41164.
On
model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz
with
gcc version 4.5.0 20090803 (experimental) [trunk revision 150373] (GCC)
the times with -fschedule-insns are
(time (direct-fft-recursive-4 a table))
144 ms cpu time (144 user, 0 system)
(time (inverse-fft-recursive-4 a table))
136 ms cpu time (136 user, 0 system)
and the times without -fschedule-insns are
(time (direct-fft-recursive-4 a table))
168 ms cpu time (168 user, 0 system)
(time (inverse-fft-recursive-4 a table))
172 ms cpu time (172 user, 0 system)
That's a pretty big improvement.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (112 preceding siblings ...)
2009-08-27 17:02 ` lucier at math dot purdue dot edu
@ 2009-10-03 1:39 ` bergner at gcc dot gnu dot org
2010-04-29 14:35 ` [Bug rtl-optimization/33928] [4.3/4.4/4.5/4.6 " bergner at gcc dot gnu dot org
2010-05-22 18:20 ` rguenth at gcc dot gnu dot org
115 siblings, 0 replies; 117+ messages in thread
From: bergner at gcc dot gnu dot org @ 2009-10-03 1:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #112 from bergner at gcc dot gnu dot org 2009-10-03 01:39 -------
Subject: Bug 33928
Author: bergner
Date: Sat Oct 3 01:39:14 2009
New Revision: 152430
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=152430
Log:
Backport from mainline.
2009-08-30 Alan Modra <amodra@bigpond.net.au>
PR target/41081
* fwprop.c (get_reg_use_in): Delete.
(free_load_extend): New function.
(forward_propagate_subreg): Use it.
2009-08-23 Alan Modra <amodra@bigpond.net.au>
PR target/41081
* fwprop.c (try_fwprop_subst): Allow multiple sets.
(get_reg_use_in): New function.
(forward_propagate_subreg): Propagate through subreg of zero_extend
or sign_extend.
2009-05-08 Paolo Bonzini <bonzini@gnu.org>
PR rtl-optimization/33928
PR 26854
* fwprop.c (use_def_ref, get_def_for_use, bitmap_only_bit_bitween,
process_uses, build_single_def_use_links): New.
(update_df): Update use_def_ref.
(forward_propagate_into): Use get_def_for_use instead of use-def
chains.
(fwprop_init): Call build_single_def_use_links and let it initialize
dataflow.
(fwprop_done): Free use_def_ref.
(fwprop_addr): Eliminate duplicate call to df_set_flags.
* df-problems.c (df_rd_simulate_artificial_defs_at_top,
df_rd_simulate_one_insn): New.
(df_rd_bb_local_compute_process_def): Update head comment.
(df_chain_create_bb): Use the new RD simulation functions.
* df.h (df_rd_simulate_artificial_defs_at_top,
df_rd_simulate_one_insn): New.
* opts.c (decode_options): Enable fwprop at -O1.
* doc/invoke.texi (-fforward-propagate): Document this.
Modified:
branches/ibm/gcc-4_3-branch/gcc/ChangeLog.ibm
branches/ibm/gcc-4_3-branch/gcc/REVISION
branches/ibm/gcc-4_3-branch/gcc/df-problems.c
branches/ibm/gcc-4_3-branch/gcc/df.h
branches/ibm/gcc-4_3-branch/gcc/doc/invoke.texi
branches/ibm/gcc-4_3-branch/gcc/fwprop.c
branches/ibm/gcc-4_3-branch/gcc/opts.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5/4.6 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (113 preceding siblings ...)
2009-10-03 1:39 ` bergner at gcc dot gnu dot org
@ 2010-04-29 14:35 ` bergner at gcc dot gnu dot org
2010-05-22 18:20 ` rguenth at gcc dot gnu dot org
115 siblings, 0 replies; 117+ messages in thread
From: bergner at gcc dot gnu dot org @ 2010-04-29 14:35 UTC (permalink / raw)
To: gcc-bugs
------- Comment #113 from bergner at gcc dot gnu dot org 2010-04-29 14:34 -------
Subject: Bug 33928
Author: bergner
Date: Thu Apr 29 14:34:35 2010
New Revision: 158902
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158902
Log:
Backport from mainline.
2009-08-30 Alan Modra <amodra@bigpond.net.au>
PR target/41081
* fwprop.c (get_reg_use_in): Delete.
(free_load_extend): New function.
(forward_propagate_subreg): Use it.
2009-08-23 Alan Modra <amodra@bigpond.net.au>
PR target/41081
* fwprop.c (try_fwprop_subst): Allow multiple sets.
(get_reg_use_in): New function.
(forward_propagate_subreg): Propagate through subreg of zero_extend
or sign_extend.
2009-05-08 Paolo Bonzini <bonzini@gnu.org>
PR rtl-optimization/33928
PR 26854
* fwprop.c (use_def_ref, get_def_for_use, bitmap_only_bit_bitween,
process_uses, build_single_def_use_links): New.
(update_df): Update use_def_ref.
(forward_propagate_into): Use get_def_for_use instead of use-def
chains.
(fwprop_init): Call build_single_def_use_links and let it initialize
dataflow.
(fwprop_done): Free use_def_ref.
(fwprop_addr): Eliminate duplicate call to df_set_flags.
* df-problems.c (df_rd_simulate_artificial_defs_at_top,
df_rd_simulate_one_insn): New.
(df_rd_bb_local_compute_process_def): Update head comment.
(df_chain_create_bb): Use the new RD simulation functions.
* df.h (df_rd_simulate_artificial_defs_at_top,
df_rd_simulate_one_insn): New.
* opts.c (decode_options): Enable fwprop at -O1.
* doc/invoke.texi (-fforward-propagate): Document this.
Modified:
branches/ibm/gcc-4_4-branch/gcc/ChangeLog.ibm
branches/ibm/gcc-4_4-branch/gcc/df-problems.c
branches/ibm/gcc-4_4-branch/gcc/df.h
branches/ibm/gcc-4_4-branch/gcc/doc/invoke.texi
branches/ibm/gcc-4_4-branch/gcc/fwprop.c
branches/ibm/gcc-4_4-branch/gcc/opts.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
* [Bug rtl-optimization/33928] [4.3/4.4/4.5/4.6 Regression] 30% performance slowdown in floating-point code caused by r118475
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
` (114 preceding siblings ...)
2010-04-29 14:35 ` [Bug rtl-optimization/33928] [4.3/4.4/4.5/4.6 " bergner at gcc dot gnu dot org
@ 2010-05-22 18:20 ` rguenth at gcc dot gnu dot org
115 siblings, 0 replies; 117+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-05-22 18:20 UTC (permalink / raw)
To: gcc-bugs
------- Comment #114 from rguenth at gcc dot gnu dot org 2010-05-22 18:11 -------
GCC 4.3.5 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.5 |4.3.6
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928
^ permalink raw reply [flat|nested] 117+ messages in thread
end of thread, other threads:[~2010-05-22 18:20 UTC | newest]
Thread overview: 117+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-28 1:46 [Bug regression/33928] New: 33% performance slowdown from 4.2.2 in floating-point code lucier at math dot purdue dot edu
2007-10-28 1:49 ` [Bug regression/33928] " lucier at math dot purdue dot edu
2007-10-28 12:05 ` [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 " rguenth at gcc dot gnu dot org
2007-10-28 15:41 ` [Bug regression/33928] 33% performance slowdown from 4.2.2 to 4.3.0 in floating-point code with computed gotos lucier at math dot purdue dot edu
2007-10-28 15:42 ` lucier at math dot purdue dot edu
2007-10-28 15:45 ` lucier at math dot purdue dot edu
2007-10-28 15:46 ` lucier at math dot purdue dot edu
2007-10-28 16:05 ` lucier at math dot purdue dot edu
2007-10-28 16:09 ` lucier at math dot purdue dot edu
2007-10-28 16:38 ` rguenth at gcc dot gnu dot org
2007-10-28 16:39 ` [Bug regression/33928] [4.3 Regression] " rguenth at gcc dot gnu dot org
2007-11-12 21:50 ` [Bug regression/33928] [4.3 Regression] 22% performance slowdown from 4.2.2 to 4.3.0 in floating-point code lucier at math dot purdue dot edu
2007-11-12 21:51 ` lucier at math dot purdue dot edu
2007-11-12 21:52 ` lucier at math dot purdue dot edu
2007-11-12 21:53 ` lucier at math dot purdue dot edu
2007-11-19 6:06 ` pinskia at gcc dot gnu dot org
2007-11-27 5:53 ` mmitchel at gcc dot gnu dot org
2007-11-30 5:39 ` bonzini at gnu dot org
2007-11-30 14:47 ` lucier at math dot purdue dot edu
2007-11-30 14:58 ` bonzini at gnu dot org
2007-12-01 18:59 ` lucier at math dot purdue dot edu
2008-01-09 14:18 ` rguenth at gcc dot gnu dot org
2008-01-09 19:21 ` lucier at math dot purdue dot edu
2008-01-12 18:03 ` rguenth at gcc dot gnu dot org
2008-01-21 20:01 ` ubizjak at gmail dot com
2008-01-21 23:12 ` lucier at math dot purdue dot edu
2008-01-22 12:23 ` ubizjak at gmail dot com
2008-01-22 12:29 ` [Bug target/33928] " pinskia at gcc dot gnu dot org
2008-01-22 12:38 ` ubizjak at gmail dot com
2008-01-22 13:24 ` rguenth at gcc dot gnu dot org
2008-01-22 13:25 ` [Bug tree-optimization/33928] " bonzini at gnu dot org
2008-01-22 13:29 ` ubizjak at gmail dot com
2008-01-22 13:30 ` rguenth at gcc dot gnu dot org
2008-03-14 17:04 ` [Bug tree-optimization/33928] [4.3/4.4 Regression] 22% performance slowdown from 4.2.2 to 4.3/4.4.0 " rguenth at gcc dot gnu dot org
2008-05-30 16:02 ` lucier at math dot purdue dot edu
2008-06-06 15:00 ` rguenth at gcc dot gnu dot org
2008-07-09 16:06 ` lucier at math dot purdue dot edu
2008-08-27 22:10 ` jsm28 at gcc dot gnu dot org
2008-09-04 20:40 ` lucier at math dot purdue dot edu
2008-09-04 20:45 ` rguenth at gcc dot gnu dot org
2008-09-04 20:50 ` lucier at math dot purdue dot edu
2008-12-06 16:39 ` lucier at math dot purdue dot edu
2008-12-07 2:56 ` bonzini at gnu dot org
2008-12-07 13:01 ` rguenth at gcc dot gnu dot org
2008-12-07 19:40 ` [Bug tree-optimization/33928] [4.3/4.4 Regression] 30% performance slowdown in floating-point code caused by r118475 lucier at math dot purdue dot edu
2009-01-24 10:28 ` rguenth at gcc dot gnu dot org
2009-02-13 16:05 ` bonzini at gnu dot org
2009-02-13 16:10 ` lucier at math dot purdue dot edu
2009-02-13 16:32 ` bonzini at gnu dot org
2009-02-13 17:23 ` lucier at math dot purdue dot edu
2009-02-13 20:10 ` bonzini at gnu dot org
2009-04-23 15:59 ` [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 79% performance slowdown in floating-point code partially " lucier at math dot purdue dot edu
2009-04-23 16:01 ` lucier at math dot purdue dot edu
2009-04-23 16:03 ` lucier at math dot purdue dot edu
2009-04-26 18:27 ` [Bug tree-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code " lucier at math dot purdue dot edu
2009-05-06 3:43 ` lucier at math dot purdue dot edu
2009-05-06 3:50 ` lucier at math dot purdue dot edu
2009-05-06 9:21 ` bonzini at gnu dot org
2009-05-06 9:32 ` bonzini at gnu dot org
2009-05-06 9:50 ` jakub at gcc dot gnu dot org
2009-05-06 9:57 ` bonzini at gnu dot org
2009-05-06 10:00 ` bonzini at gnu dot org
2009-05-06 10:48 ` bonzini at gnu dot org
2009-05-06 13:06 ` [Bug rtl-optimization/33928] " jakub at gcc dot gnu dot org
2009-05-06 15:08 ` bonzini at gnu dot org
2009-05-06 19:58 ` lucier at math dot purdue dot edu
2009-05-06 20:44 ` lucier at math dot purdue dot edu
2009-05-07 5:04 ` bonzini at gnu dot org
2009-05-07 5:27 ` lucier at math dot purdue dot edu
2009-05-07 13:41 ` bonzini at gnu dot org
2009-05-07 15:41 ` steven at gcc dot gnu dot org
2009-05-07 15:58 ` lucier at math dot purdue dot edu
2009-05-07 16:01 ` lucier at math dot purdue dot edu
2009-05-07 16:03 ` lucier at math dot purdue dot edu
2009-05-07 16:03 ` lucier at math dot purdue dot edu
2009-05-07 16:04 ` lucier at math dot purdue dot edu
2009-05-07 16:21 ` bonzini at gnu dot org
2009-05-07 16:32 ` lucier at math dot purdue dot edu
2009-05-07 16:38 ` bonzini at gnu dot org
2009-05-07 17:50 ` steven at gcc dot gnu dot org
2009-05-08 6:51 ` bonzini at gcc dot gnu dot org
2009-05-08 7:18 ` bonzini at gnu dot org
2009-05-08 7:52 ` bonzini at gcc dot gnu dot org
2009-05-08 7:55 ` bonzini at gnu dot org
2009-05-08 9:41 ` bonzini at gnu dot org
2009-05-08 12:23 ` bonzini at gcc dot gnu dot org
2009-05-15 10:36 ` bonzini at gnu dot org
2009-05-16 0:20 ` lucier at math dot purdue dot edu
2009-05-16 0:29 ` lucier at math dot purdue dot edu
2009-05-16 0:33 ` lucier at math dot purdue dot edu
2009-06-08 8:40 ` bonzini at gnu dot org
2009-06-08 8:59 ` bonzini at gnu dot org
2009-06-08 16:36 ` bonzini at gnu dot org
2009-06-08 18:19 ` lucier at math dot purdue dot edu
2009-06-12 14:51 ` bonzini at gnu dot org
2009-06-13 14:18 ` rguenth at gcc dot gnu dot org
2009-06-14 4:44 ` jamborm at gcc dot gnu dot org
2009-06-14 14:59 ` lucier at math dot purdue dot edu
2009-06-14 15:02 ` lucier at math dot purdue dot edu
2009-06-15 15:14 ` bonzini at gnu dot org
2009-06-15 16:12 ` lucier at math dot purdue dot edu
2009-06-15 16:21 ` paolo dot bonzini at gmail dot com
2009-06-15 16:22 ` bonzini at gnu dot org
2009-06-15 16:26 ` [Bug rtl-optimization/33928] [4.3/4.4 " bonzini at gnu dot org
2009-06-15 19:57 ` lucier at math dot purdue dot edu
2009-06-15 20:21 ` [Bug rtl-optimization/33928] [4.3/4.4/4.5 " lucier at math dot purdue dot edu
2009-06-16 6:48 ` bonzini at gnu dot org
2009-06-16 7:02 ` bonzini at gnu dot org
2009-06-16 7:25 ` lucier at math dot purdue dot edu
2009-08-04 12:37 ` rguenth at gcc dot gnu dot org
2009-08-27 1:18 ` lucier at math dot purdue dot edu
2009-08-27 1:22 ` lucier at math dot purdue dot edu
2009-08-27 1:23 ` lucier at math dot purdue dot edu
2009-08-27 17:02 ` lucier at math dot purdue dot edu
2009-10-03 1:39 ` bergner at gcc dot gnu dot org
2010-04-29 14:35 ` [Bug rtl-optimization/33928] [4.3/4.4/4.5/4.6 " bergner at gcc dot gnu dot org
2010-05-22 18:20 ` rguenth at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).