public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/42376]  New: [4.5] Performance regression of generated code
@ 2009-12-15  8:41 martin at mpa-garching dot mpg dot de
  2009-12-15  8:42 ` [Bug tree-optimization/42376] " martin at mpa-garching dot mpg dot de
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: martin at mpa-garching dot mpg dot de @ 2009-12-15  8:41 UTC (permalink / raw)
  To: gcc-bugs

I have noticed a big performance decrease in one of my numerical codes
when switching from gcc 4.4 to gcc 4.5. A small test case is attached.
When compiling this test case with "gcc -O3 perf.c -lm -std=c99"
and executing the resulting binary, the CPU time with the head of
the 4.4 branch is about 1.1s, with the head of the trunk it is 2.1s.

This is on a Pentium D CPU. I have verified that both binaries produce
identical results.

Verbose output of gcc-4.4:

~/tmp/wigner3j>gcc -O3 perf.c -lm -std=c99 -save_temps -v
Using built-in specs.
gcc: unrecognized option '-save_temps'
Target: i686-pc-linux-gnu
Configured with: /scratch/martin/gcc44/configure
--prefix=/scratch/martin/ugcc44
 --enable-languages=c++,fortran --enable-target=all --disable-bootstrap
--enable
-checking=release
Thread model: posix
gcc version 4.4.3 20091130 (prerelease) [gcc-4_4-branch revision 154765] (GCC) 
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save_temps' '-v' '-mtune=generic'
 /scratch/martin/ugcc44/libexec/gcc/i686-pc-linux-gnu/4.4.3/cc1 -quiet -v
perf.c
 -quiet -dumpbase perf.c -mtune=generic -auxbase perf -O3 -std=c99 -version -o
/
tmp/cc3D10Yi.s
ignoring nonexistent directory
"/scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu
/4.4.3/../../../../i686-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /scratch/martin/ugcc44/include
 /scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4.3/include
 /scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4.3/include-fixed
 /usr/include
End of search list.
GNU C (GCC) version 4.4.3 20091130 (prerelease) [gcc-4_4-branch revision
154765]
 (i686-pc-linux-gnu)
        compiled by GNU C version 4.2.3, GMP version 4.2.4, MPFR version 2.3.2.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 0428a618e74de3f947d92ab031f86f8a
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save_temps' '-v' '-mtune=generic'
 as -V -Qy -o /tmp/cc6AnZqy.o /tmp/cc3D10Yi.s
GNU assembler version 2.18 (i686-pc-linux-gnu) using BFD version (GNU Binutils) 
2.18
COMPILER_PATH=/scratch/martin/ugcc44/libexec/gcc/i686-pc-linux-gnu/4.4.3/:/scrat
ch/martin/ugcc44/libexec/gcc/i686-pc-linux-gnu/4.4.3/:/scratch/martin/ugcc44/lib
exec/gcc/i686-pc-linux-gnu/:/scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4
.3/:/scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu/:/usr/libexec/gcc/i686-pc-l
inux-gnu/:/usr/lib/gcc/i686-pc-linux-gnu/
LIBRARY_PATH=/scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4.3/:/scratch/ma
rtin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4.3/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save_temps' '-v' '-mtune=generic'
 /scratch/martin/ugcc44/libexec/gcc/i686-pc-linux-gnu/4.4.3/collect2
--eh-frame-
hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt1.o
/usr/lib/crti
.o /scratch/martin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4.3/crtbegin.o
-L/scratch/
martin/ugcc44/lib/gcc/i686-pc-linux-gnu/4.4.3
-L/scratch/martin/ugcc44/lib/gcc/i
686-pc-linux-gnu/4.4.3/../../.. /tmp/cc6AnZqy.o -lm -lgcc --as-needed -lgcc_s
--
no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed
/scratch/martin/ugcc44
/lib/gcc/i686-pc-linux-gnu/4.4.3/crtend.o /usr/lib/crtn.o

Verbose output of gcc-4.5:
~/tmp/wigner3j>gcc -O3 perf.c -lm -std=c99 -save-temps -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/4.5.0/lto-wrapper
Target: i686-pc-linux-gnu
Configured with: /scratch/martin/gcc/configure --enable-gold
--prefix=/afs/mpa/data/martin/ugcc --with-mpfr=/afs/mpa/data/martin/numlibs
--with-gmp=/afs/mpa/data/martin/numlibs --with-mpc=/afs/mpa/data/martin/numlibs
--enable-languages=c++,fortran --enable-target=all --enable-checking=release
Thread model: posix
gcc version 4.5.0 20091214 (experimental) [trunk revision 155208] (GCC) 
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save-temps' '-v' '-mtune=generic'
 /afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/4.5.0/cc1 -E -quiet -v
perf.c -mtune=generic -std=c99 -O3 -fpch-preprocess -o perf.i
ignoring nonexistent directory
"/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/../../../../i686-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /afs/mpa/data/martin/ugcc/include
 /afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/include
 /afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/include-fixed
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save-temps' '-v' '-mtune=generic'
 /afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/4.5.0/cc1
-fpreprocessed perf.i -quiet -dumpbase perf.c -mtune=generic -auxbase perf -O3
-std=c99 -version -o perf.s
GNU C (GCC) version 4.5.0 20091214 (experimental) [trunk revision 155208]
(i686-pc-linux-gnu)
        compiled by GNU C version 4.5.0 20091214 (experimental) [trunk revision
155208], GMP version 4.3.1, MPFR version 2.4.2, MPC version 0.8
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C (GCC) version 4.5.0 20091214 (experimental) [trunk revision 155208]
(i686-pc-linux-gnu)
        compiled by GNU C version 4.5.0 20091214 (experimental) [trunk revision
155208], GMP version 4.3.1, MPFR version 2.4.2, MPC version 0.8
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 9df7fe822ccb89478c9ff357db9be45e
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save-temps' '-v' '-mtune=generic'
 as -V -Qy --32 -o perf.o perf.s
GNU assembler version 2.18 (i686-pc-linux-gnu) using BFD version (GNU Binutils)
2.18
COMPILER_PATH=/afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/4.5.0/:/afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/4.5.0/:/afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/:/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/:/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/
LIBRARY_PATH=/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/:/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-O3' '-std=c99' '-save-temps' '-v' '-mtune=generic'
 /afs/mpa/data/martin/ugcc/libexec/gcc/i686-pc-linux-gnu/4.5.0/collect2
--eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt1.o
/usr/lib/crti.o
/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/crtbegin.o
-L/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0
-L/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/../../.. perf.o -lm
-lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s
--no-as-needed
/afs/mpa/data/martin/ugcc/lib/gcc/i686-pc-linux-gnu/4.5.0/crtend.o
/usr/lib/crtn.o

I attach the test case and the two generated assembler files.


-- 
           Summary: [4.5] Performance regression of generated code
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: martin at mpa-garching dot mpg dot de
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42376


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/42376] [4.5] Performance regression of generated code
  2009-12-15  8:41 [Bug tree-optimization/42376] New: [4.5] Performance regression of generated code martin at mpa-garching dot mpg dot de
@ 2009-12-15  8:42 ` martin at mpa-garching dot mpg dot de
  2009-12-15  8:42 ` martin at mpa-garching dot mpg dot de
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: martin at mpa-garching dot mpg dot de @ 2009-12-15  8:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from martin at mpa-garching dot mpg dot de  2009-12-15 08:41 -------
Created an attachment (id=19305)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19305&action=view)
test case


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42376


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/42376] [4.5] Performance regression of generated code
  2009-12-15  8:41 [Bug tree-optimization/42376] New: [4.5] Performance regression of generated code martin at mpa-garching dot mpg dot de
  2009-12-15  8:42 ` [Bug tree-optimization/42376] " martin at mpa-garching dot mpg dot de
@ 2009-12-15  8:42 ` martin at mpa-garching dot mpg dot de
  2009-12-15  8:43 ` martin at mpa-garching dot mpg dot de
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: martin at mpa-garching dot mpg dot de @ 2009-12-15  8:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from martin at mpa-garching dot mpg dot de  2009-12-15 08:42 -------
Created an attachment (id=19306)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19306&action=view)
assembler generated by gcc 4.5


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42376


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/42376] [4.5] Performance regression of generated code
  2009-12-15  8:41 [Bug tree-optimization/42376] New: [4.5] Performance regression of generated code martin at mpa-garching dot mpg dot de
  2009-12-15  8:42 ` [Bug tree-optimization/42376] " martin at mpa-garching dot mpg dot de
  2009-12-15  8:42 ` martin at mpa-garching dot mpg dot de
@ 2009-12-15  8:43 ` martin at mpa-garching dot mpg dot de
  2009-12-15 13:16 ` [Bug tree-optimization/42376] [4.5 Regression] " rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: martin at mpa-garching dot mpg dot de @ 2009-12-15  8:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from martin at mpa-garching dot mpg dot de  2009-12-15 08:43 -------
Created an attachment (id=19307)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19307&action=view)
assembler generated by gcc 4.4


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42376


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/42376] [4.5 Regression] Performance regression of generated code
  2009-12-15  8:41 [Bug tree-optimization/42376] New: [4.5] Performance regression of generated code martin at mpa-garching dot mpg dot de
                   ` (2 preceding siblings ...)
  2009-12-15  8:43 ` martin at mpa-garching dot mpg dot de
@ 2009-12-15 13:16 ` rguenth at gcc dot gnu dot org
  2009-12-17 14:12 ` martin at mpa-garching dot mpg dot de
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-12-15 13:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from rguenth at gcc dot gnu dot org  2009-12-15 13:15 -------
This is because (quoting http://gcc.gnu.org/gcc-4.5/changes.html):

"GCC now supports handling floating-point excess precision arising from use of
the x87 floating-point unit in a way that conforms to ISO C99. This is enabled
with -fexcess-precision=standard and with standards conformance options such as
-std=c99, and may be disabled using -fexcess-precision=fast."

GCC with -std=c99 makes sure to properly handle the i387 FPU excess precision.
With -fexcess-precision=fast the code is as fast (and non-conforming) like
with GCC 4.4.  Using -std=gnu99 is also an option.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jsm28 at gcc dot gnu dot org
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |WONTFIX
            Summary|[4.5] Performance regression|[4.5 Regression] Performance
                   |of generated code           |regression of generated code
   Target Milestone|---                         |4.5.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42376


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/42376] [4.5 Regression] Performance regression of generated code
  2009-12-15  8:41 [Bug tree-optimization/42376] New: [4.5] Performance regression of generated code martin at mpa-garching dot mpg dot de
                   ` (3 preceding siblings ...)
  2009-12-15 13:16 ` [Bug tree-optimization/42376] [4.5 Regression] " rguenth at gcc dot gnu dot org
@ 2009-12-17 14:12 ` martin at mpa-garching dot mpg dot de
  2009-12-17 16:23 ` rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: martin at mpa-garching dot mpg dot de @ 2009-12-17 14:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from martin at mpa-garching dot mpg dot de  2009-12-17 14:12 -------
> GCC with -std=c99 makes sure to properly handle the i387 FPU excess precision.
> With -fexcess-precision=fast the code is as fast (and non-conforming) like
> with GCC 4.4.  Using -std=gnu99 is also an option.

Thanks a lot for pointing this out! I was aware of the floating-point change
but simply had not realized it would be switched on by -std=c99.
I imagine that this might catch many people by surprise once 4.5.0 is released,
and it might be politically advisable to mention it (and the "fix") in a place
where users can't miss it.
Is there a plan to mention this (in a prominent place) in the release notes?
Or in the FAQ or the "non-bugs" section of bugs.html? I can prepare a
documentation patch if this is desirable.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42376


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/42376] [4.5 Regression] Performance regression of generated code
  2009-12-15  8:41 [Bug tree-optimization/42376] New: [4.5] Performance regression of generated code martin at mpa-garching dot mpg dot de
                   ` (4 preceding siblings ...)
  2009-12-17 14:12 ` martin at mpa-garching dot mpg dot de
@ 2009-12-17 16:23 ` rguenth at gcc dot gnu dot org
  2010-01-07 15:17 ` martin at mpa-garching dot mpg dot de
  2010-01-07 15:27 ` rguenth at gcc dot gnu dot org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-12-17 16:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from rguenth at gcc dot gnu dot org  2009-12-17 16:22 -------
Documentation improvement is always welcome, especially if you looked for it
but
missed the critical piece.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42376


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/42376] [4.5 Regression] Performance regression of generated code
  2009-12-15  8:41 [Bug tree-optimization/42376] New: [4.5] Performance regression of generated code martin at mpa-garching dot mpg dot de
                   ` (5 preceding siblings ...)
  2009-12-17 16:23 ` rguenth at gcc dot gnu dot org
@ 2010-01-07 15:17 ` martin at mpa-garching dot mpg dot de
  2010-01-07 15:27 ` rguenth at gcc dot gnu dot org
  7 siblings, 0 replies; 9+ messages in thread
From: martin at mpa-garching dot mpg dot de @ 2010-01-07 15:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from martin at mpa-garching dot mpg dot de  2010-01-07 15:16 -------
Created an attachment (id=19499)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19499&action=view)
Proposed wwwdocs patch to explain the apparent performance regression

Here is a proposed patch to gcc-4.5/changes.html, which mentions the apparent
performance regression (and describes how to avoid it) in the "Caveats"
section.

The FSF should have my copyright assignment; in any case I think this patch is
trivial enough.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42376


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/42376] [4.5 Regression] Performance regression of generated code
  2009-12-15  8:41 [Bug tree-optimization/42376] New: [4.5] Performance regression of generated code martin at mpa-garching dot mpg dot de
                   ` (6 preceding siblings ...)
  2010-01-07 15:17 ` martin at mpa-garching dot mpg dot de
@ 2010-01-07 15:27 ` rguenth at gcc dot gnu dot org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-01-07 15:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from rguenth at gcc dot gnu dot org  2010-01-07 15:27 -------
Can you please post the patch to gcc-patches@gcc.gnu.org instead?  Thanks.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42376


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-01-07 15:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-15  8:41 [Bug tree-optimization/42376] New: [4.5] Performance regression of generated code martin at mpa-garching dot mpg dot de
2009-12-15  8:42 ` [Bug tree-optimization/42376] " martin at mpa-garching dot mpg dot de
2009-12-15  8:42 ` martin at mpa-garching dot mpg dot de
2009-12-15  8:43 ` martin at mpa-garching dot mpg dot de
2009-12-15 13:16 ` [Bug tree-optimization/42376] [4.5 Regression] " rguenth at gcc dot gnu dot org
2009-12-17 14:12 ` martin at mpa-garching dot mpg dot de
2009-12-17 16:23 ` rguenth at gcc dot gnu dot org
2010-01-07 15:17 ` martin at mpa-garching dot mpg dot de
2010-01-07 15:27 ` rguenth at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).