* [Bug middle-end/46900] 50% slowdown when linking with LTO in a single step
2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
@ 2010-12-12 10:33 ` burnus at gcc dot gnu.org
2010-12-12 10:34 ` burnus at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-12-12 10:33 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900
Tobias Burnus <burnus at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|fortran |middle-end
--- Comment #1 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-12-12 10:33:09 UTC ---
(If one uses the libgfortran MATMUL (i.e. without -fexternal-matmul), the call
takes 1.7001060s)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/46900] 50% slowdown when linking with LTO in a single step
2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
2010-12-12 10:33 ` [Bug middle-end/46900] " burnus at gcc dot gnu.org
@ 2010-12-12 10:34 ` burnus at gcc dot gnu.org
2010-12-12 10:39 ` [Bug middle-end/46900] [4.6 Regression] " burnus at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-12-12 10:34 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900
--- Comment #2 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-12-12 10:34:48 UTC ---
Created attachment 22722
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22722
Test case (tar.bz2). The .f files are from BLAS (taken from LAPACK 3.3)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
2010-12-12 10:33 ` [Bug middle-end/46900] " burnus at gcc dot gnu.org
2010-12-12 10:34 ` burnus at gcc dot gnu.org
@ 2010-12-12 10:39 ` burnus at gcc dot gnu.org
2010-12-12 10:50 ` burnus at gcc dot gnu.org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-12-12 10:39 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900
Tobias Burnus <burnus at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |4.6.0
Summary|50% slowdown when linking |[4.6 Regression] 50%
|with LTO in a single step |slowdown when linking with
| |LTO in a single step
--- Comment #3 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-12-12 10:39:17 UTC ---
The linkage seems to be a regression. If I compile with GCC 4.5, I get for the
direct dgemm call the same performance if I link in a single step:
$ gfortran-4.5 -fexternal-blas -flto -O3 -ffast-math -march=native \
test.f90 dgemm.f lsame.f xerbla.f
$ ./a.out
Time, MATMUL: 1.4160880 53.480084765505403
dgemm: 1.0840679 56.452265589399069
(I don't understand why the MATMUL part differs that much - it should call the
same BLAS function [via the same GCC 4.6 libgfortran.so wrapper] and LTO should
not affect it.)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
` (2 preceding siblings ...)
2010-12-12 10:39 ` [Bug middle-end/46900] [4.6 Regression] " burnus at gcc dot gnu.org
@ 2010-12-12 10:50 ` burnus at gcc dot gnu.org
2010-12-16 14:40 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-12-12 10:50 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900
--- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-12-12 10:50:09 UTC ---
(In reply to comment #3)
> (I don't understand why the MATMUL part differs that much - it should call the
> same BLAS function [via the same GCC 4.6 libgfortran.so wrapper] and LTO should
> not affect it.)
Seemingly, LTO is crucial for 4.5 - without LTO dgemm gets slower but the
libgfortran version gets faster:
$ gfortran-4.5 -fexternal-blas -O3 -ffast-math -march=native test.f90 dgemm.f
lsame.f xerbla.f && ./a.out
Time, MATMUL: 1.3200819 53.480084765505403
dgemm: 1.3120821 56.452265589399069
$ gfortran-4.5 -c -flto -fexternal-blas -O3 -ffast-math -march=native test.f90
dgemm.f lsame.f xerbla.f
$ gfortran-4.5 -flto -O3 -ffast-math -march=native test.o dgemm.o lsame.o
xerbla.o
$ ./a.out
Time, MATMUL: 1.3080810 53.480084765505403
dgemm: 1.0800680 56.452265589399069
Here, for GCC 4.5, one sees that for the direct call of dgemm, LTO improves the
performance - and doing a single step compilation+linkage or in two steps does
not matter.
However, also for GCC 4.5 the single-step pessimizes the performance of the
libgfortran MATMUL (which is a wrapper for dgemm).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
` (3 preceding siblings ...)
2010-12-12 10:50 ` burnus at gcc dot gnu.org
@ 2010-12-16 14:40 ` rguenth at gcc dot gnu.org
2010-12-16 15:24 ` burnus at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-12-16 14:40 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
--- Comment #5 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-12-16 14:40:21 UTC ---
Can you please check the difference between how the driver calls lto1 for
the separate / combined LTO compiles? You probably want to use
-v -save-temps and inspect the response files that are passed to
the lto -fwpa command. There shouldn't be any difference really.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
` (4 preceding siblings ...)
2010-12-16 14:40 ` rguenth at gcc dot gnu.org
@ 2010-12-16 15:24 ` burnus at gcc dot gnu.org
2010-12-17 12:14 ` d.g.gorbachev at gmail dot com
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-12-16 15:24 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900
--- Comment #6 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-12-16 15:23:40 UTC ---
I think that the files passed to lto1 are the same - but I get different
command-line options:
a) Two-step compile with "-fexternal-blas -flto -Ofast -march=native"
x86_64-unknown-linux-gnu/4.6.0/lto1 -march=k8-sse3 -msahf --param
l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=1024
-mtune=k8
b) Single-step compile:
x86_64-unknown-linux-gnu/4.6.0/lto1 -quiet -dumpbase ccmRjX1L.ltrans0.o
-mtune=generic -march=x86-64
Thus, the "-march=native" somehow gets lost in the single-step compile.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
` (5 preceding siblings ...)
2010-12-16 15:24 ` burnus at gcc dot gnu.org
@ 2010-12-17 12:14 ` d.g.gorbachev at gmail dot com
2011-01-19 17:10 ` rguenth at gcc dot gnu.org
2011-01-19 22:11 ` burnus at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: d.g.gorbachev at gmail dot com @ 2010-12-17 12:14 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900
--- Comment #7 from Dmitry Gorbachev <d.g.gorbachev at gmail dot com> 2010-12-17 12:13:54 UTC ---
> Thus, the "-march=native" somehow gets
> lost in the single-step compile.
PR42445 ?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
` (6 preceding siblings ...)
2010-12-17 12:14 ` d.g.gorbachev at gmail dot com
@ 2011-01-19 17:10 ` rguenth at gcc dot gnu.org
2011-01-19 22:11 ` burnus at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-01-19 17:10 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |WAITING
Last reconfirmed| |2011.01.19 16:48:38
Ever Confirmed|0 |1
--- Comment #8 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-01-19 16:48:38 UTC ---
PR42445 has been fixed - does this bug still reproduce?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
2010-12-12 10:31 [Bug fortran/46900] New: 50% slowdown when linking with LTO in a single step burnus at gcc dot gnu.org
` (7 preceding siblings ...)
2011-01-19 17:10 ` rguenth at gcc dot gnu.org
@ 2011-01-19 22:11 ` burnus at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu.org @ 2011-01-19 22:11 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900
Tobias Burnus <burnus at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |RESOLVED
Resolution| |FIXED
--- Comment #9 from Tobias Burnus <burnus at gcc dot gnu.org> 2011-01-19 22:03:58 UTC ---
Seems to be FIXED.
^ permalink raw reply [flat|nested] 10+ messages in thread