public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug lto/44334]  New: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852
@ 2010-05-30 17:17 dominiq at lps dot ens dot fr
  2010-05-30 18:06 ` [Bug lto/44334] " dominiq at lps dot ens dot fr
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-05-30 17:17 UTC (permalink / raw)
  To: gcc-bugs

After revision 159852

Author: pault
Date:   Wed May 26 05:11:04 2010 UTC (4 days, 12 hours ago)
Changed paths:  4
Log Message:    
2010-05-26  Paul Thomas  <pault@gcc.gnu.org>

        PR fortran/40011
        * resolve.c (resolve_global_procedure): Resolve the gsymbol's
        namespace before trying to reorder the gsymbols.

2010-05-26  Paul Thomas  <pault@gcc.gnu.org>

        PR fortran/40011
        * gfortran.dg/whole_file_19.f90 : New test.

the executable of the polyhedron test rnflow.f90 is ~27% slower when compiled
with -fwhole-program -flto:

[macbook] lin/test% gfcpf -v
Using built-in specs.
COLLECT_GCC=gfcpf
COLLECT_LTO_WRAPPER=/opt/gcc/gcc4.6pf/libexec/gcc/x86_64-apple-darwin10/4.6.0/lto-wrapper
Target: x86_64-apple-darwin10
Configured with: ../p_work/configure --prefix=/opt/gcc/gcc4.6pf
--mandir=/opt/gcc/gcc4.6pf/share/man --infodir=/opt/gcc/gcc4.6pf/share/info
--build=x86_64-apple-darwin10 --host=x86_64-apple-darwin10
--target=x86_64-apple-darwin10 --enable-languages=c,fortran
--with-gmp=/opt/sw64 --with-libiconv-prefix=/opt/sw64 --with-system-zlib
--x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib
--with-cloog=/opt/sw64 --with-ppl=/opt/sw64 --with-mpc=/opt/sw64 --enable-lto
Thread model: posix
gcc version 4.6.0 20100526 (experimental) [trunk revision 159851] (GCC) 
[macbook] lin/test% gfcpf -O3 -ffast-math -funroll-loops -fomit-frame-pointer
rnflow.f90 
[macbook] lin/test% time a.out > /dev/null
25.826u 0.686s 0:26.52 99.9%    0+0k 0+0io 0pf+0w
[macbook] lin/test% gfcpf -O3 -ffast-math -funroll-loops -fomit-frame-pointer
-fwhole-file -flto rnflow.f90
[macbook] lin/test% time a.out > /dev/null
25.506u 0.674s 0:26.19 99.9%    0+0k 0+0io 0pf+0w
[macbook] lin/test% gfcpf -O3 -ffast-math -funroll-loops -fomit-frame-pointer
-fwhole-program -flto rnflow.f90
[macbook] lin/test% time a.out > /dev/null
25.772u 0.678s 0:26.46 99.9%    0+0k 0+0io 0pf+0w
[macbook] lin/test% gfcp -v
Using built-in specs.
COLLECT_GCC=gfcp
COLLECT_LTO_WRAPPER=/opt/gcc/gcc4.6p/libexec/gcc/x86_64-apple-darwin10/4.6.0/lto-wrapper
Target: x86_64-apple-darwin10
Configured with: ../p_work/configure --prefix=/opt/gcc/gcc4.6p
--mandir=/opt/gcc/gcc4.6p/share/man --infodir=/opt/gcc/gcc4.6p/share/info
--build=x86_64-apple-darwin10 --host=x86_64-apple-darwin10
--target=x86_64-apple-darwin10 --enable-languages=c,fortran
--with-gmp=/opt/sw64 --with-libiconv-prefix=/opt/sw64 --with-system-zlib
--x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib
--with-cloog=/opt/sw64 --with-ppl=/opt/sw64 --with-mpc=/opt/sw64 --enable-lto
Thread model: posix
gcc version 4.6.0 20100526 (experimental) [trunk revision 159852] (GCC) 
[macbook] lin/test% gfcp -O3 -ffast-math -funroll-loops -fomit-frame-pointer
rnflow.f90
[macbook] lin/test% time a.out > /dev/null
25.841u 0.696s 0:26.54 99.9%    0+0k 0+0io 0pf+0w
[macbook] lin/test% gfcp -O3 -ffast-math -funroll-loops -fomit-frame-pointer
-fwhole-file -flto rnflow.f90
[macbook] lin/test% time a.out > /dev/null
25.540u 0.677s 0:26.22 99.9%    0+0k 0+0io 0pf+0w
[macbook] lin/test% gfcp -O3 -ffast-math -funroll-loops -fomit-frame-pointer
-fwhole-program -flto rnflow.f90
[macbook] lin/test% time a.out > /dev/null
32.627u 0.685s 0:33.31 99.9%    0+0k 0+0io 0pf+0w             <---  ~27% slower

As it has been noticed previously the executable of fatigue.f90 is ~30% faster
when compiled with -fwhole-program:

[macbook] lin/test% gfcp -O3 -ffast-math -funroll-loops -fomit-frame-pointer
-fwhole-file -flto fatigue.f90
[macbook] lin/test% time a.out > /dev/null
9.031u 0.006s 0:09.04 99.8%     0+0k 0+1io 0pf+0w
[macbook] lin/test% gfcp -O3 -ffast-math -funroll-loops -fomit-frame-pointer
-fwhole-program fatigue.f90
[macbook] lin/test% time a.out > /dev/null
6.448u 0.004s 0:06.47 99.5%     0+0k 0+1io 0pf+0w


-- 
           Summary: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-
                    program -flto after revision 159852
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: lto
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dominiq at lps dot ens dot fr
 GCC build triplet: x86_64-apple-darwin10
  GCC host triplet: x86_64-apple-darwin10
GCC target triplet: x86_64-apple-darwin10


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44334


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug lto/44334] [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852
  2010-05-30 17:17 [Bug lto/44334] New: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852 dominiq at lps dot ens dot fr
@ 2010-05-30 18:06 ` dominiq at lps dot ens dot fr
  2010-05-30 18:09 ` [Bug fortran/44334] " rguenth at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-05-30 18:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from dominiq at lps dot ens dot fr  2010-05-30 18:06 -------
I'll attach the assembly generated with -O3 -ffast-math -funroll-loops
-fomit-frame-pointer -flto for revisions 159851 and 159852. It is the same
with/without -fwhole-program (probably obvious), however when assembled and
linked with 

gfcp -O3 -ffast-math -funroll-loops -fomit-frame-pointer -fwhole-program -flto
rnflow_wp5*.s

the timing depends on the revision used to generate the assembly, but not on
the compiler revision.


-- 

dominiq at lps dot ens dot fr changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenther at suse dot de, jh
                   |                            |at suse dot cz, pault at gcc
                   |                            |dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44334


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug fortran/44334] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852
  2010-05-30 17:17 [Bug lto/44334] New: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852 dominiq at lps dot ens dot fr
  2010-05-30 18:06 ` [Bug lto/44334] " dominiq at lps dot ens dot fr
@ 2010-05-30 18:09 ` rguenth at gcc dot gnu dot org
  2010-05-30 18:11 ` dominiq at lps dot ens dot fr
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-05-30 18:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2010-05-30 18:09 -------
Insufficient analysis.  This more sounds like a dup of profile-estimate
messed up by inlining.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|lto                         |fortran
            Summary|[4.6 Regression] rnflow.f90 |rnflow.f90 ~27% slower with
                   |~27% slower with -fwhole-   |-fwhole-program -flto after
                   |program -flto after revision|revision 159852
                   |159852                      |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44334


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug fortran/44334] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852
  2010-05-30 17:17 [Bug lto/44334] New: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852 dominiq at lps dot ens dot fr
  2010-05-30 18:06 ` [Bug lto/44334] " dominiq at lps dot ens dot fr
  2010-05-30 18:09 ` [Bug fortran/44334] " rguenth at gcc dot gnu dot org
@ 2010-05-30 18:11 ` dominiq at lps dot ens dot fr
  2010-05-30 18:12 ` dominiq at lps dot ens dot fr
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-05-30 18:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from dominiq at lps dot ens dot fr  2010-05-30 18:10 -------
Created an attachment (id=20780)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20780&action=view)
Assembly generated with  -O3 -ffast-math -funroll-loops -fomit-frame-pointer
-flto and revision 159851


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44334


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug fortran/44334] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852
  2010-05-30 17:17 [Bug lto/44334] New: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852 dominiq at lps dot ens dot fr
                   ` (2 preceding siblings ...)
  2010-05-30 18:11 ` dominiq at lps dot ens dot fr
@ 2010-05-30 18:12 ` dominiq at lps dot ens dot fr
  2010-05-30 18:31 ` dominiq at lps dot ens dot fr
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-05-30 18:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from dominiq at lps dot ens dot fr  2010-05-30 18:12 -------
Created an attachment (id=20781)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20781&action=view)
Assembly generated with  -O3 -ffast-math -funroll-loops -fomit-frame-pointer
-flto and revision 159852


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44334


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug fortran/44334] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852
  2010-05-30 17:17 [Bug lto/44334] New: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852 dominiq at lps dot ens dot fr
                   ` (3 preceding siblings ...)
  2010-05-30 18:12 ` dominiq at lps dot ens dot fr
@ 2010-05-30 18:31 ` dominiq at lps dot ens dot fr
  2010-05-30 18:49 ` rguenth at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-05-30 18:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from dominiq at lps dot ens dot fr  2010-05-30 18:30 -------
Output of gprof on darwin:

Revision 159851:

                                  called/total       parents 
index  %time    self descendents  called+self    name           index
                                  called/total       children

                                  520605             _dgetf2_ [81]
                0.00        0.00      64/1041192     ___timctr_MOD_gettim
[1429]
                0.00        0.00    6548/1041192     _dswap_ [4112]
                0.00        0.00 1034580/1041192     _xerbla_ [83]
[81]     0.0    0.00        0.00 1041192+520605 _dgetf2_ [81]
                0.00        0.00   64137/110864      _dgetrf_ [82]
                                  520605             _dgetf2_ [81]

-----------------------------------------------

                                   13315             _dgetrf_ [82]
                0.00        0.00       8/110864      ___timctr_MOD_gettim
[1429]
                0.00        0.00    6548/110864      _dswap_ [4112]
                0.00        0.00    6685/110864      __dyld_func_lookup [1665]
                0.00        0.00   33486/110864      _xerbla_ [83]
                0.00        0.00   64137/110864      _dgetf2_ [81]
[82]     0.0    0.00        0.00  110864+13315  _dgetrf_ [82]
                0.00        0.00       1/1           _main [85]
                                   13315             _dgetrf_ [82]

-----------------------------------------------

                0.00        0.00   10872/10872       _dswap_ [4112]
[83]     0.0    0.00        0.00   10872         _xerbla_ [83]
                0.00        0.00 1034580/1041192     _dgetf2_ [81]
                0.00        0.00   33486/110864      _dgetrf_ [82]

-----------------------------------------------

                0.00        0.00       1/1           _main [85]
[84]     0.0    0.00        0.00       1         __start [84]

-----------------------------------------------

                0.00        0.00       1/1           _dgetrf_ [82]
[85]     0.0    0.00        0.00       1         _main [85]
                0.00        0.00       1/1           __start [84]

-----------------------------------------------

...

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  0.0       0.00     0.00  1561733     0.00     0.00  _dgetf2_ [81]
  0.0       0.00     0.00   110927     0.00     0.00  _dgetrf_ [82]
  0.0       0.00     0.00    10872     0.00     0.00  _xerbla_ [83]
  0.0       0.00     0.00        1     0.00     0.00  __start [84]
  0.0       0.00     0.00        1     0.00     0.00  _main [85]

================================================================================

Revision 159852:

                                  called/total       parents 
index  %time    self descendents  called+self    name           index
                                  called/total       children

                0.00        0.00    6548/1561733     _dswap_ [4112]
                0.00        0.00 1555185/1561733     _xerbla_ [83]
[81]     0.0    0.00        0.00 1561733         _dgetf2_ [81]
                0.00        0.00   64136/110927      _dgetrf_ [82]

-----------------------------------------------

                                   13315             _dgetrf_ [82]
                0.00        0.00      72/110927      ___timctr_MOD_gettim
[1429]
                0.00        0.00    6548/110927      _dswap_ [4112]
                0.00        0.00    6685/110927      __dyld_func_lookup [1665]
                0.00        0.00   33486/110927      _xerbla_ [83]
                0.00        0.00   64136/110927      _dgetf2_ [81]
[82]     0.0    0.00        0.00  110927+13315  _dgetrf_ [82]
                0.00        0.00       1/1           _main [85]
                                   13315             _dgetrf_ [82]

-----------------------------------------------

                0.00        0.00   10872/10872       _dswap_ [4112]
[83]     0.0    0.00        0.00   10872         _xerbla_ [83]
                0.00        0.00 1555185/1561733     _dgetf2_ [81]
                0.00        0.00   33486/110927      _dgetrf_ [82]

-----------------------------------------------

                0.00        0.00       1/1           _main [85]
[84]     0.0    0.00        0.00       1         __start [84]

-----------------------------------------------

                0.00        0.00       1/1           _dgetrf_ [82]
[85]     0.0    0.00        0.00       1         _main [85]
                0.00        0.00       1/1           __start [84]

-----------------------------------------------

...

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  0.0       0.00     0.00  5572994     0.00     0.00  _xerbla_ [154]
  0.0       0.00     0.00    20556     0.00     0.00  _dswap_ [155]
  0.0       0.00     0.00    20000     0.00     0.00  ___timctr_MOD_gettim
[156]
  0.0       0.00     0.00        3     0.00     0.00  __dyld_func_lookup [157]
  0.0       0.00     0.00        2     0.00     0.00  __start [158]


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44334


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug fortran/44334] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852
  2010-05-30 17:17 [Bug lto/44334] New: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852 dominiq at lps dot ens dot fr
                   ` (4 preceding siblings ...)
  2010-05-30 18:31 ` dominiq at lps dot ens dot fr
@ 2010-05-30 18:49 ` rguenth at gcc dot gnu dot org
  2010-05-30 18:55 ` dominiq at lps dot ens dot fr
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-05-30 18:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from rguenth at gcc dot gnu dot org  2010-05-30 18:48 -------
 0.0       0.00     0.00  5572994     0.00     0.00  _xerbla_ [154]

eh?  that's the blas error handler.  something is fishy with your setup.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44334


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug fortran/44334] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852
  2010-05-30 17:17 [Bug lto/44334] New: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852 dominiq at lps dot ens dot fr
                   ` (5 preceding siblings ...)
  2010-05-30 18:49 ` rguenth at gcc dot gnu dot org
@ 2010-05-30 18:55 ` dominiq at lps dot ens dot fr
  2010-06-05  9:52 ` dominiq at lps dot ens dot fr
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-05-30 18:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from dominiq at lps dot ens dot fr  2010-05-30 18:55 -------
> Insufficient analysis.  This more sounds like a dup of profile-estimate
> messed up by inlining.

Do you mean a dup of pr40106? Or is there others I am not aware of?

> eh?  that's the blas error handler.  something is fishy with your setup.

Which setup?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44334


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug fortran/44334] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852
  2010-05-30 17:17 [Bug lto/44334] New: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852 dominiq at lps dot ens dot fr
                   ` (6 preceding siblings ...)
  2010-05-30 18:55 ` dominiq at lps dot ens dot fr
@ 2010-06-05  9:52 ` dominiq at lps dot ens dot fr
  2010-09-08 21:00 ` burnus at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-06-05  9:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from dominiq at lps dot ens dot fr  2010-06-05 09:52 -------
At revision 160309, I get

[macbook] lin/test% gfc -O3 -ffast-math -funroll-loops -fomit-frame-pointer
-fwhole-program -flto rnflow.f90 --param hot-bb-frequency-fraction=1000
[macbook] lin/test% time a.out > /dev/null
32.601u 0.716s 0:33.35 99.8%    0+0k 0+0io 0pf+0w
[macbook] lin/test% gfc -O3 -ffast-math -funroll-loops -fomit-frame-pointer
-fwhole-program -flto rnflow.f90 --param hot-bb-frequency-fraction=2000
[macbook] lin/test% time a.out > /dev/null
25.760u 0.708s 0:26.47 99.9%    0+0k 0+0io 0pf+0w


-- 

dominiq at lps dot ens dot fr changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44334


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug fortran/44334] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852
  2010-05-30 17:17 [Bug lto/44334] New: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852 dominiq at lps dot ens dot fr
                   ` (7 preceding siblings ...)
  2010-06-05  9:52 ` dominiq at lps dot ens dot fr
@ 2010-09-08 21:00 ` burnus at gcc dot gnu dot org
  2010-09-08 21:04 ` hubicka at gcc dot gnu dot org
  2010-09-09  9:01 ` burnus at gcc dot gnu dot org
  10 siblings, 0 replies; 12+ messages in thread
From: burnus at gcc dot gnu dot org @ 2010-09-08 21:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from burnus at gcc dot gnu dot org  2010-09-08 21:00 -------
For what it is worth, on AMD Athlon 64 X2 4800+ / x86-64-linux, I get for
gfortran -O3 -ffast-math -march=native -- and with with and without -flto:
 0m45.132s -- (options as above)
 0m52.731s -- additionally -fwhole-program

That's a +16% increase in run-time with -fwhole-program.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44334


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug fortran/44334] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852
  2010-05-30 17:17 [Bug lto/44334] New: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852 dominiq at lps dot ens dot fr
                   ` (8 preceding siblings ...)
  2010-09-08 21:00 ` burnus at gcc dot gnu dot org
@ 2010-09-08 21:04 ` hubicka at gcc dot gnu dot org
  2010-09-09  9:01 ` burnus at gcc dot gnu dot org
  10 siblings, 0 replies; 12+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2010-09-08 21:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from hubicka at gcc dot gnu dot org  2010-09-08 21:04 -------
So hot-bb-frequency-fraction solves the whole regression?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44334


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug fortran/44334] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852
  2010-05-30 17:17 [Bug lto/44334] New: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852 dominiq at lps dot ens dot fr
                   ` (9 preceding siblings ...)
  2010-09-08 21:04 ` hubicka at gcc dot gnu dot org
@ 2010-09-09  9:01 ` burnus at gcc dot gnu dot org
  10 siblings, 0 replies; 12+ messages in thread
From: burnus at gcc dot gnu dot org @ 2010-09-09  9:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from burnus at gcc dot gnu dot org  2010-09-09 09:00 -------
[Move comment from IRC #gcc to bugzilla]

(In reply to comment #9)
> For what it is worth, on AMD Athlon 64 X2 4800+ / x86-64-linux, [...]
> That's a +16% increase in run-time with -fwhole-program.

(In reply to comment #10)
> So hot-bb-frequency-fraction solves the whole regression?

For me (cf. system above), --param hot-bb-frequency-fraction=2000 reduces the
slow down due to -fwhole-program from 16% to 3%. (The LTO version with and
without -fwhole-file is about 2% slower than the corresponding -fno-lto
version.)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44334


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-09-09  9:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-30 17:17 [Bug lto/44334] New: [4.6 Regression] rnflow.f90 ~27% slower with -fwhole-program -flto after revision 159852 dominiq at lps dot ens dot fr
2010-05-30 18:06 ` [Bug lto/44334] " dominiq at lps dot ens dot fr
2010-05-30 18:09 ` [Bug fortran/44334] " rguenth at gcc dot gnu dot org
2010-05-30 18:11 ` dominiq at lps dot ens dot fr
2010-05-30 18:12 ` dominiq at lps dot ens dot fr
2010-05-30 18:31 ` dominiq at lps dot ens dot fr
2010-05-30 18:49 ` rguenth at gcc dot gnu dot org
2010-05-30 18:55 ` dominiq at lps dot ens dot fr
2010-06-05  9:52 ` dominiq at lps dot ens dot fr
2010-09-08 21:00 ` burnus at gcc dot gnu dot org
2010-09-08 21:04 ` hubicka at gcc dot gnu dot org
2010-09-09  9:01 ` burnus at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).