public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step
@ 2010-12-12 10:31 burnus at gcc dot gnu.org
  2010-12-12 10:33 ` [Bug middle-end/46900] " burnus at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-12-12 10:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900

           Summary: 50% slowdown when linking with LTO in a single step
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: fortran
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: burnus@gcc.gnu.org


Cf. PR 44334 for another LTO slowdown. Cf.
http://gcc.gnu.org/ml/fortran/2010-12/msg00067.html


I had expected that doing the LTO linkage in one or in two steps is identical,
but seemingly it is not:

$ gfortran -fexternal-blas -flto -Ofast -march=native \
           test.f90 dgemm.f lsame.f xerbla.f
$ ./a.out 
 Time, MATMUL:    1.4680910       53.480084765505403     
 dgemm:    1.4720919       56.452265589399069


But if one first compiles and then links (w/ or w/o LTO), the programm is 47%
faster:

$ gfortran -fexternal-blas -Ofast -march=native \
           test.f90 dgemm.f lsame.f xerbla.f
 Time, MATMUL:    1.0080630       53.480084765505403     
 dgemm:    1.0200630       56.452265589399069     

$ gfortran -c -fexternal-blas -flto -Ofast -march=native \
              test.f90 dgemm.f lsame.f xerbla.f 
$ gfortran -flto -Ofast -march=native test.o dgemm.o lsame.o xerbla.o
$ ./a.out 
 Time, MATMUL:    1.0080630       53.480084765505403     
 dgemm:    1.0080630       56.452265589399069


(If one


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/46900] 50% slowdown when linking with LTO in a single step
  2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
@ 2010-12-12 10:33 ` burnus at gcc dot gnu.org
  2010-12-12 10:34 ` burnus at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-12-12 10:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900

Tobias Burnus <burnus at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|fortran                     |middle-end

--- Comment #1 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-12-12 10:33:09 UTC ---
(If one uses the libgfortran MATMUL (i.e. without -fexternal-matmul), the call
takes 1.7001060s)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/46900] 50% slowdown when linking with LTO in a single step
  2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
  2010-12-12 10:33 ` [Bug middle-end/46900] " burnus at gcc dot gnu.org
@ 2010-12-12 10:34 ` burnus at gcc dot gnu.org
  2010-12-12 10:39 ` [Bug middle-end/46900] [4.6 Regression] " burnus at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-12-12 10:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900

--- Comment #2 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-12-12 10:34:48 UTC ---
Created attachment 22722
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22722
Test case (tar.bz2). The .f files are from BLAS (taken from LAPACK 3.3)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
  2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
  2010-12-12 10:33 ` [Bug middle-end/46900] " burnus at gcc dot gnu.org
  2010-12-12 10:34 ` burnus at gcc dot gnu.org
@ 2010-12-12 10:39 ` burnus at gcc dot gnu.org
  2010-12-12 10:50 ` burnus at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-12-12 10:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900

Tobias Burnus <burnus at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.6.0
            Summary|50% slowdown when linking   |[4.6 Regression] 50%
                   |with LTO in a single step   |slowdown when linking with
                   |                            |LTO in a single step

--- Comment #3 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-12-12 10:39:17 UTC ---
The linkage seems to be a regression. If I compile with GCC 4.5, I get for the
direct dgemm call the same performance if I link in a single step:

$ gfortran-4.5 -fexternal-blas -flto -O3 -ffast-math -march=native \
               test.f90 dgemm.f lsame.f xerbla.f 
$ ./a.out 
 Time, MATMUL:    1.4160880       53.480084765505403     
 dgemm:    1.0840679       56.452265589399069     

(I don't understand why the MATMUL part differs that much - it should call the
same BLAS function [via the same GCC 4.6 libgfortran.so wrapper] and LTO should
not affect it.)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
  2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2010-12-12 10:39 ` [Bug middle-end/46900] [4.6 Regression] " burnus at gcc dot gnu.org
@ 2010-12-12 10:50 ` burnus at gcc dot gnu.org
  2010-12-16 14:40 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-12-12 10:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900

--- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-12-12 10:50:09 UTC ---
(In reply to comment #3)
> (I don't understand why the MATMUL part differs that much - it should call the
> same BLAS function [via the same GCC 4.6 libgfortran.so wrapper] and LTO should
> not affect it.)

Seemingly, LTO is crucial for 4.5 - without LTO dgemm gets slower but the
libgfortran version gets faster:

$ gfortran-4.5 -fexternal-blas -O3 -ffast-math -march=native test.f90 dgemm.f
lsame.f xerbla.f && ./a.out
 Time, MATMUL:    1.3200819       53.480084765505403     
 dgemm:    1.3120821       56.452265589399069

$ gfortran-4.5 -c -flto -fexternal-blas -O3 -ffast-math -march=native test.f90
dgemm.f lsame.f xerbla.f 
$ gfortran-4.5 -flto -O3 -ffast-math -march=native test.o dgemm.o lsame.o
xerbla.o
$ ./a.out
 Time, MATMUL:    1.3080810       53.480084765505403     
 dgemm:    1.0800680       56.452265589399069     

Here, for GCC 4.5, one sees that for the direct call of dgemm, LTO improves the
performance - and doing a single step compilation+linkage or in two steps does
not matter.
However, also for GCC 4.5 the single-step pessimizes the performance of the
libgfortran MATMUL (which is a wrapper for dgemm).


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
  2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2010-12-12 10:50 ` burnus at gcc dot gnu.org
@ 2010-12-16 14:40 ` rguenth at gcc dot gnu.org
  2010-12-16 15:24 ` burnus at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-12-16 14:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #5 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-12-16 14:40:21 UTC ---
Can you please check the difference between how the driver calls lto1 for
the separate / combined LTO compiles?  You probably want to use
-v -save-temps and inspect the response files that are passed to
the lto -fwpa command.  There shouldn't be any difference really.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
  2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2010-12-16 14:40 ` rguenth at gcc dot gnu.org
@ 2010-12-16 15:24 ` burnus at gcc dot gnu.org
  2010-12-17 12:14 ` d.g.gorbachev at gmail dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-12-16 15:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900

--- Comment #6 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-12-16 15:23:40 UTC ---
I think that the files passed to lto1 are the same - but I get different
command-line options:


a) Two-step compile with "-fexternal-blas -flto -Ofast -march=native"

x86_64-unknown-linux-gnu/4.6.0/lto1 -march=k8-sse3 -msahf --param
l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=1024
-mtune=k8


b) Single-step compile:

x86_64-unknown-linux-gnu/4.6.0/lto1 -quiet -dumpbase ccmRjX1L.ltrans0.o
-mtune=generic -march=x86-64


Thus, the "-march=native" somehow gets lost in the single-step compile.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
  2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2010-12-16 15:24 ` burnus at gcc dot gnu.org
@ 2010-12-17 12:14 ` d.g.gorbachev at gmail dot com
  2011-01-19 17:10 ` rguenth at gcc dot gnu.org
  2011-01-19 22:11 ` burnus at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: d.g.gorbachev at gmail dot com @ 2010-12-17 12:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900

--- Comment #7 from Dmitry Gorbachev <d.g.gorbachev at gmail dot com> 2010-12-17 12:13:54 UTC ---
> Thus, the "-march=native" somehow gets
> lost in the single-step compile.

PR42445 ?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
  2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2010-12-17 12:14 ` d.g.gorbachev at gmail dot com
@ 2011-01-19 17:10 ` rguenth at gcc dot gnu.org
  2011-01-19 22:11 ` burnus at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-01-19 17:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2011.01.19 16:48:38
     Ever Confirmed|0                           |1

--- Comment #8 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-01-19 16:48:38 UTC ---
PR42445 has been fixed - does this bug still reproduce?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
  2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2011-01-19 17:10 ` rguenth at gcc dot gnu.org
@ 2011-01-19 22:11 ` burnus at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu.org @ 2011-01-19 22:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900

Tobias Burnus <burnus at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|                            |FIXED

--- Comment #9 from Tobias Burnus <burnus at gcc dot gnu.org> 2011-01-19 22:03:58 UTC ---
Seems to be FIXED.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-01-19 22:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
2010-12-12 10:33 ` [Bug middle-end/46900] " burnus at gcc dot gnu.org
2010-12-12 10:34 ` burnus at gcc dot gnu.org
2010-12-12 10:39 ` [Bug middle-end/46900] [4.6 Regression] " burnus at gcc dot gnu.org
2010-12-12 10:50 ` burnus at gcc dot gnu.org
2010-12-16 14:40 ` rguenth at gcc dot gnu.org
2010-12-16 15:24 ` burnus at gcc dot gnu.org
2010-12-17 12:14 ` d.g.gorbachev at gmail dot com
2011-01-19 17:10 ` rguenth at gcc dot gnu.org
2011-01-19 22:11 ` burnus at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).