public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program
@ 2010-09-27 13:13 burnus at gcc dot gnu.org
2010-09-27 15:48 ` [Bug lto/45810] " Joost.VandeVondele at pci dot uzh.ch
` (26 more replies)
0 siblings, 27 replies; 28+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-09-27 13:13 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
Summary: 40% slowdown when using LTO for a single-file program
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: lto
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: burnus@gcc.gnu.org
That's on a Intel Core(TM)2 Duo CPU E8400 @ 3.00GHz and using CentOS Linux 5.5
(x86-64) with glibc-2.5-49.el5_5.2, binutils-2.17.50.0.6-14.el5 and
gcc version 4.6.0 20100921 (experimental) [trunk revision 164472] (GCC)
The performance for fatigue of the Polyhedron test case drops by 40% if one
enables LTO (using -fwhole=program):
gfortran -march=native -ffast-math -funroll-loops -fwhole-program
-fno-protect-parens -O3
real 0m5.115s / user 0m5.071s / sys 0m0.015s
gfortran -march=native -ffast-math -funroll-loops -flto -fwhole-program
-fno-protect-parens -O3
real 0m7.225s / user 0m7.129s / sys 0m0.017s
For the other test cases, the results are mostly similar w/ and w/o LTO though
in tendency, the non-LTO version seems to be slightly slower (but also other
programs are running now thus the results are not 100% comparable with my
previous ones at
https://users.physik.fu-berlin.de/~tburnus/gcc-trunk/benchmark/iff/ )
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
@ 2010-09-27 15:48 ` Joost.VandeVondele at pci dot uzh.ch
2010-09-27 15:54 ` rguenth at gcc dot gnu.org
` (25 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Joost.VandeVondele at pci dot uzh.ch @ 2010-09-27 15:48 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |Joost.VandeVondele at pci
| |dot uzh.ch
--- Comment #1 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2010-09-27 10:39:05 UTC ---
I have observed similar 40% slowdown in CP2K as a result of LTO. I haven't yet
investigated.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
2010-09-27 15:48 ` [Bug lto/45810] " Joost.VandeVondele at pci dot uzh.ch
@ 2010-09-27 15:54 ` rguenth at gcc dot gnu.org
2010-09-28 15:35 ` burnus at gcc dot gnu.org
` (24 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-09-27 15:54 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-09-27 10:48:33 UTC ---
For single-file programs -fwhole-program and -flto should be basically
equivalent if the Frontend provides correctly merged decls. I suppose
it does not and thus we do less inlining with -fwhole-program compared
to -flto.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
2010-09-27 15:48 ` [Bug lto/45810] " Joost.VandeVondele at pci dot uzh.ch
2010-09-27 15:54 ` rguenth at gcc dot gnu.org
@ 2010-09-28 15:35 ` burnus at gcc dot gnu.org
2010-09-28 16:24 ` rguenth at gcc dot gnu.org
` (23 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-09-28 15:35 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
Tobias Burnus <burnus at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hubicka at gcc dot gnu.org
--- Comment #3 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-09-28 12:23:06 UTC ---
(In reply to comment #2)
> For single-file programs -fwhole-program and -flto should be basically
> equivalent if the Frontend provides correctly merged decls. I suppose
> it does not and thus we do less inlining with -fwhole-program compared
> to -flto.
It might well be the reason that one does less inlining without LTO - but
that's then not only a FE bug (not correctly merged decls) but also a ME/target
bug as the LTO program is _slower_.
Cf. also PR 44334, which is about a -fwhole-program slowdown (w/ and w/o
-flto). For the latter program, it helped to use "--param
hot-bb-frequency-fraction=2000". However, for this PR, the option does not seem
to help.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (2 preceding siblings ...)
2010-09-28 15:35 ` burnus at gcc dot gnu.org
@ 2010-09-28 16:24 ` rguenth at gcc dot gnu.org
2010-09-28 16:25 ` Joost.VandeVondele at pci dot uzh.ch
` (22 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-09-28 16:24 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #4 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-09-28 13:38:58 UTC ---
(In reply to comment #3)
> (In reply to comment #2)
> > For single-file programs -fwhole-program and -flto should be basically
> > equivalent if the Frontend provides correctly merged decls. I suppose
> > it does not and thus we do less inlining with -fwhole-program compared
> > to -flto.
>
> It might well be the reason that one does less inlining without LTO - but
more inlining with LTO. You read my stmt wrong.
> that's then not only a FE bug (not correctly merged decls) but also a ME/target
> bug as the LTO program is _slower_.
Sure. As with all performance related bugs this needs analysis and is
unlikely an "LTO" problem - LTO does not (not-)optimize, optimization
passes do.
>
> Cf. also PR 44334, which is about a -fwhole-program slowdown (w/ and w/o
> -flto). For the latter program, it helped to use "--param
> hot-bb-frequency-fraction=2000". However, for this PR, the option does not seem
> to help.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (3 preceding siblings ...)
2010-09-28 16:24 ` rguenth at gcc dot gnu.org
@ 2010-09-28 16:25 ` Joost.VandeVondele at pci dot uzh.ch
2010-09-28 16:50 ` rguenth at gcc dot gnu.org
` (21 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Joost.VandeVondele at pci dot uzh.ch @ 2010-09-28 16:25 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #5 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2010-09-28 13:58:18 UTC ---
(In reply to comment #4)
> Sure. As with all performance related bugs this needs analysis and is
> unlikely an "LTO" problem - LTO does not (not-)optimize, optimization
> passes do.
I'm wondering if there is any description on how to do this. For example, how
do I get the assembly of a function and the -fdump-tree-all files from a gold
based linking that goes as:
rm -f test.s test2.s test.o test2.o ;
gfortran -c -flto test.f90 ;
gfortran -c -flto test2.f90 ;
gfortran -O3 -march=native -fuse-linker-plugin -fwhopr=2 test.o test2.o
just using -S or -fdump-tree-all doesn't work.
Is 'objdump -d' the only tool ?
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (5 preceding siblings ...)
2010-09-28 16:50 ` rguenth at gcc dot gnu.org
@ 2010-09-28 16:50 ` Joost.VandeVondele at pci dot uzh.ch
2010-09-28 16:55 ` burnus at gcc dot gnu.org
` (19 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: Joost.VandeVondele at pci dot uzh.ch @ 2010-09-28 16:50 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #7 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2010-09-28 14:19:38 UTC ---
(In reply to comment #6)
> No, -fdump-tree-all works
great... I forgot to look in /tmp, and -save-temps also works fine.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (4 preceding siblings ...)
2010-09-28 16:25 ` Joost.VandeVondele at pci dot uzh.ch
@ 2010-09-28 16:50 ` rguenth at gcc dot gnu.org
2010-09-28 16:50 ` Joost.VandeVondele at pci dot uzh.ch
` (20 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-09-28 16:50 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #6 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-09-28 14:07:54 UTC ---
(In reply to comment #5)
> (In reply to comment #4)
> > Sure. As with all performance related bugs this needs analysis and is
> > unlikely an "LTO" problem - LTO does not (not-)optimize, optimization
> > passes do.
>
> I'm wondering if there is any description on how to do this. For example, how
> do I get the assembly of a function and the -fdump-tree-all files from a gold
> based linking that goes as:
>
> rm -f test.s test2.s test.o test2.o ;
> gfortran -c -flto test.f90 ;
> gfortran -c -flto test2.f90 ;
> gfortran -O3 -march=native -fuse-linker-plugin -fwhopr=2 test.o test2.o
>
> just using -S or -fdump-tree-all doesn't work.
>
> Is 'objdump -d' the only tool ?
No, -fdump-tree-all works, it just uses maybe un-intuitive base-names.
Append -v to see them, for -fwhopr it should be the output file
specified with -o (which you leave out which causes us to use
not a.out but some temporary file in /tmp), with -o t I get
t.ltrans[01].147t.optimized, etc. With -flto it's just t.147t.optimized.
To retain assembler you have to use -save-temps which retains
t.ltrans[01].s, with -flto it retains t1.s (using the base of the first
object file).
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (6 preceding siblings ...)
2010-09-28 16:50 ` Joost.VandeVondele at pci dot uzh.ch
@ 2010-09-28 16:55 ` burnus at gcc dot gnu.org
2010-09-30 3:27 ` dominiq at lps dot ens.fr
` (18 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: burnus at gcc dot gnu.org @ 2010-09-28 16:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #8 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-09-28 14:57:34 UTC ---
Using -fno-inline-functions, the program recovers the speed of the no-LTO
version.
Notes from #gcc:
(dominiq) For fatigue the key for speed-up is inlining of
generalized_hookes_law and you need -finline-limit=400
(richi) "Considering inline candidate generalized_hookes_law." / "Inlining
failed: --param max-inline-insns-auto limit reached"
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (7 preceding siblings ...)
2010-09-28 16:55 ` burnus at gcc dot gnu.org
@ 2010-09-30 3:27 ` dominiq at lps dot ens.fr
2010-09-30 19:54 ` dominiq at lps dot ens.fr
` (17 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: dominiq at lps dot ens.fr @ 2010-09-30 3:27 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #9 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2010-09-29 20:27:36 UTC ---
(In reply to comment #8)
> Using -fno-inline-functions, the program recovers the speed of the no-LTO
> version.
This is weird!-( I have done the following profiling and it shows that -flto
prevents the inlining of __perdida_m_MOD_perdida, while -fno-inline-functions
restores it. This contradicts what the manual says:
-finline-functions
Integrate all simple functions into their callers. The compiler heuristically
decides which functions are simple enough to be worth integrating in this way.
Note also that in order to inline __perdida_m_MOD_generalized_hookes_law one
needs -finline-limit=600 (actually some number between 300 and 400).
[macbook] lin/test% gfc -Ofast -funroll-loops -fwhole-program -g fatigue.f90
[macbook] lin/test% time a.out > /dev/null
6.547u 0.024s 0:06.57 99.8% 0+0k 0+2io 0pf+0w
+ 70.8%, MAIN__, a.out
| + 10.1%, free, libSystem.B.dylib
| | 7.9%, szone_size, libSystem.B.dylib
| + 8.0%, malloc, libSystem.B.dylib
| | + 6.4%, malloc_zone_malloc, libSystem.B.dylib
| | | 4.4%, szone_malloc_should_clear, libSystem.B.dylib
| | | 0.4%, szone_malloc, libSystem.B.dylib
| | 0.4%, dyld_stub_malloc_zone_malloc, libSystem.B.dylib
| | 0.1%, szone_malloc_should_clear, libSystem.B.dylib
| 4.1%, szone_free_definite_size, libSystem.B.dylib
| 2.4%, cosisin, libSystem.B.dylib
| + 0.7%, cexp, libSystem.B.dylib
| | 0.1%, exp$fenv_access_off, libSystem.B.dylib
| | 0.0%, dyld_stub_exp, libSystem.B.dylib
27.2%, __perdida_m_MOD_generalized_hookes_law, a.out
0.5%, dyld_stub_malloc, a.out
0.4%, free, libSystem.B.dylib
0.4%, dyld_stub_free, a.out
0.4%, szone_free_definite_size, libSystem.B.dylib
0.2%, malloc, libSystem.B.dylib
0.1%, dyld_stub_cexp, a.out
0.0%, cexp, libSystem.B.dylib
[macbook] lin/test% gfc -Ofast -funroll-loops -fwhole-program -flto fatigue.f90
[macbook] lin/test% time a.out > /dev/null
9.013u 0.027s 0:09.04 99.8% 0+0k 0+2io 0pf+0w
+ 64.8%, __perdida_m_MOD_perdida, a.out
<-------
| + 6.8%, free, libSystem.B.dylib
| | 4.9%, szone_size, libSystem.B.dylib
| + 5.2%, malloc, libSystem.B.dylib
| | + 4.1%, malloc_zone_malloc, libSystem.B.dylib
| | | 2.5%, szone_malloc_should_clear, libSystem.B.dylib
| | | 0.5%, szone_malloc, libSystem.B.dylib
| | 0.3%, dyld_stub_malloc_zone_malloc, libSystem.B.dylib
| 3.1%, szone_free_definite_size, libSystem.B.dylib
19.3%, __perdida_m_MOD_generalized_hookes_law, a.out
+ 14.6%, MAIN__.2130, a.out
| 1.8%, cosisin, libSystem.B.dylib
| + 0.4%, cexp, libSystem.B.dylib
| | 0.1%, exp$fenv_access_off, libSystem.B.dylib
| | 0.0%, dyld_stub_exp, libSystem.B.dylib
| | 0.0%, cosisin, libSystem.B.dylib
0.3%, szone_free_definite_size, libSystem.B.dylib
0.3%, dyld_stub_malloc, a.out
0.3%, dyld_stub_free, a.out
0.2%, free, libSystem.B.dylib
0.2%, malloc, libSystem.B.dylib
0.0%, cexp, libSystem.B.dylib
0.0%, data_transfer_init, libgfortran.3.dylib
[macbook] lin/test% gfc -Ofast -funroll-loops -fwhole-program -flto
-fno-inline-functions fatigue.f90
[macbook] lin/test% time a.out > /dev/null
6.575u 0.021s 0:06.61 99.6% 0+0k 0+2io 0pf+0w
+ 71.0%, MAIN__.2130, a.out
| + 8.9%, free, libSystem.B.dylib
| | 6.6%, szone_size, libSystem.B.dylib
| + 8.1%, malloc, libSystem.B.dylib
| | + 6.4%, malloc_zone_malloc, libSystem.B.dylib
| | | 4.5%, szone_malloc_should_clear, libSystem.B.dylib
| | | 0.6%, szone_malloc, libSystem.B.dylib
| | 0.4%, dyld_stub_malloc_zone_malloc, libSystem.B.dylib
| | 0.2%, szone_malloc_should_clear, libSystem.B.dylib
| 4.4%, szone_free_definite_size, libSystem.B.dylib
| 1.9%, cosisin, libSystem.B.dylib
| + 1.0%, cexp, libSystem.B.dylib
| | 0.1%, exp$fenv_access_off, libSystem.B.dylib
| | 0.1%, cosisin, libSystem.B.dylib
| | 0.0%, dyld_stub_exp, libSystem.B.dylib
27.3%, __perdida_m_MOD_generalized_hookes_law, a.out
0.4%, free, libSystem.B.dylib
0.3%, dyld_stub_malloc, a.out
0.3%, dyld_stub_free, a.out
0.3%, szone_free_definite_size, libSystem.B.dylib
0.2%, malloc, libSystem.B.dylib
0.1%, dyld_stub_cexp, a.out
0.0%, cexp, libSystem.B.dylib
[macbook] lin/test% gfc -Ofast -funroll-loops -fwhole-program -flto
-finline-limit=600 fatigue.f90
[macbook] lin/test% time a.out > /dev/null
4.768u 0.018s 0:04.79 99.5% 0+0k 0+1io 0pf+0w
+ 97.5%, MAIN__.2133, a.out
| + 15.4%, free, libSystem.B.dylib
| | 10.6%, szone_size, libSystem.B.dylib
| + 11.4%, malloc, libSystem.B.dylib
| | + 9.6%, malloc_zone_malloc, libSystem.B.dylib
| | | 4.9%, szone_malloc_should_clear, libSystem.B.dylib
| | | 0.9%, szone_malloc, libSystem.B.dylib
| | 0.4%, dyld_stub_malloc_zone_malloc, libSystem.B.dylib
| 6.4%, szone_free_definite_size, libSystem.B.dylib
| 2.7%, cosisin, libSystem.B.dylib
| + 0.8%, cexp, libSystem.B.dylib
| | 0.1%, exp$fenv_access_off, libSystem.B.dylib
| | 0.1%, cosisin, libSystem.B.dylib
| | 0.0%, dyld_stub_exp, libSystem.B.dylib
0.5%, szone_free_definite_size, libSystem.B.dylib
0.5%, dyld_stub_malloc, a.out
0.5%, dyld_stub_free, a.out
0.4%, free, libSystem.B.dylib
0.4%, malloc, libSystem.B.dylib
0.1%, dyld_stub_cexp, a.out
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (8 preceding siblings ...)
2010-09-30 3:27 ` dominiq at lps dot ens.fr
@ 2010-09-30 19:54 ` dominiq at lps dot ens.fr
2011-01-08 20:41 ` hubicka at gcc dot gnu.org
` (16 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: dominiq at lps dot ens.fr @ 2010-09-30 19:54 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #10 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2010-09-30 17:28:19 UTC ---
(In reply to comment #8)
> Using -fno-inline-functions, the program recovers the speed of the no-LTO
> version.
This does not work on powerpc-apple-darwin9:
[karma] lin/test% gfc -Ofast -funroll-loops -fwhole-program -g fatigue.f90
[karma] lin/test% time a.out > /dev/null
15.942u 0.052s 0:16.54 96.6% 0+0k 2+1io 40pf+0w
[karma] lin/test% gfc -Ofast -funroll-loops -fwhole-program -g -flto
fatigue.f90
[karma] lin/test% time a.out > /dev/null
20.330u 0.063s 0:21.06 96.8% 0+0k 0+2io 0pf+0w
[karma] lin/test% gfc -Ofast -funroll-loops -fwhole-program -g -flto
-fno-inline-functions fatigue.f90
[karma] lin/test% time a.out > /dev/null
20.678u 0.063s 0:21.33 97.1% 0+0k 0+2io 0pf+0w
[karma] lin/test% gfc -Ofast -funroll-loops -fwhole-program -g -flto
-finline-limit=600 fatigue.f90
[karma] lin/test% time a.out > /dev/null
10.903u 0.036s 0:11.30 96.7% 0+0k 0+2io 0pf+0w
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (9 preceding siblings ...)
2010-09-30 19:54 ` dominiq at lps dot ens.fr
@ 2011-01-08 20:41 ` hubicka at gcc dot gnu.org
2011-01-23 16:36 ` hubicka at gcc dot gnu.org
` (15 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu.org @ 2011-01-08 20:41 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #11 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-01-08 20:08:26 UTC ---
Does --param hot-bb-frequency-fraction=100000 work here?
This is weird!-( I have done the following profiling and it shows that -flto
prevents the inlining of __perdida_m_MOD_perdida, while -fno-inline-functions
restores it. This contradicts what the manual says:
-finline-functions
Integrate all simple functions into their callers. The compiler heuristically
decides which functions are simple enough to be worth integrating in this way.
Disabling autoinlining of small function can allow other inlining (inlining
functions called once or inlining for size), so this is not completely
unexpected.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (10 preceding siblings ...)
2011-01-08 20:41 ` hubicka at gcc dot gnu.org
@ 2011-01-23 16:36 ` hubicka at gcc dot gnu.org
2011-01-23 18:08 ` hubicka at gcc dot gnu.org
` (14 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu.org @ 2011-01-23 16:36 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2011.01.23 15:59:30
Ever Confirmed|0 |1
--- Comment #12 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-01-23 15:59:30 UTC ---
Reproduces for me.
Perdida is funcion called once, what happens with default settings is that
perdida is not considered as inline candidate for small function inlining (it
is estimated to over 700 instructions, so it is huge)
later we try to inline it as function called once, but hit large function
growth limit. Compiling with --param large-function-growth=1000000 solve the
problem, but it does not make the testcase faster.
So problem is elsewhere.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (11 preceding siblings ...)
2011-01-23 16:36 ` hubicka at gcc dot gnu.org
@ 2011-01-23 18:08 ` hubicka at gcc dot gnu.org
2011-01-23 19:38 ` dominiq at lps dot ens.fr
` (13 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu.org @ 2011-01-23 18:08 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #13 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-01-23 16:45:23 UTC ---
OK, the slowdown comes away when both hookers_law and perida is inlined.
First needs -finline-limit=380 the second needs large-function-growth=10000000
(or large increase of inline limit to make perida to be considered as small
function and inlined before iztaccihuatl grows that much).
Without large-function-growth we fail at:
Considering perdida size 1056.
Called once from iztaccihuatl 6151 insns.
Not inlining: --param large-function-growth limit reached.
This is because inlining for functions called once first process read_input:
Considering read_input size 3099.
Called once from iztaccihuatl 3128 insns.
Inlined into iztaccihuatl which now has 6151 size for a net change of -76
size.
that makes it too large.
large-function-insns is 2700, large-function-growth is 100%, so iztaccihuatl
can't growth past 3128*2 insns.
We might increase large-function-growth (I will give it a try on our
benchmarks) or we might convince inlined to inline first perida rather than
read_input because perida is smaller...
Honza
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (12 preceding siblings ...)
2011-01-23 18:08 ` hubicka at gcc dot gnu.org
@ 2011-01-23 19:38 ` dominiq at lps dot ens.fr
2011-01-23 20:00 ` hubicka at gcc dot gnu.org
` (12 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: dominiq at lps dot ens.fr @ 2011-01-23 19:38 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #14 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-01-23 17:04:07 UTC ---
After removing the comments, generalized_hookes_law reads
function generalized_hookes_law (strain_tensor, lambda, mu) result
(stress_tensor)
!
real (kind = LONGreal), dimension(:,:), intent(in) :: strain_tensor
real (kind = LONGreal), intent(in) :: lambda, mu
real (kind = LONGreal), dimension(3,3) :: stress_tensor
real (kind = LONGreal), dimension(6) ::generalized_strain_vector,
&
generalized_stress_vector
real (kind = LONGreal), dimension(6,6) :: generalized_constitutive_tensor
integer :: i
!
generalized_constitutive_tensor(:,:) = 0.0_LONGreal
generalized_constitutive_tensor(1,1) = lambda + 2.0_LONGreal * mu
generalized_constitutive_tensor(1,2) = lambda
generalized_constitutive_tensor(1,3) = lambda
generalized_constitutive_tensor(2,1) = lambda
generalized_constitutive_tensor(2,2) = lambda + 2.0_LONGreal * mu
generalized_constitutive_tensor(2,3) = lambda
generalized_constitutive_tensor(3,1) = lambda
generalized_constitutive_tensor(3,2) = lambda
generalized_constitutive_tensor(3,3) = lambda + 2.0_LONGreal * mu
generalized_constitutive_tensor(4,4) = mu
generalized_constitutive_tensor(5,5) = mu
generalized_constitutive_tensor(6,6) = mu
!
generalized_strain_vector(1) = strain_tensor(1,1)
generalized_strain_vector(2) = strain_tensor(2,2)
generalized_strain_vector(3) = strain_tensor(3,3)
generalized_strain_vector(4) = strain_tensor(2,3)
generalized_strain_vector(5) = strain_tensor(1,3)
generalized_strain_vector(6) = strain_tensor(1,2)
!
do i = 1, 6
generalized_stress_vector(i) =
dot_product(generalized_constitutive_tensor(i,:), &
generalized_strain_vector(:))
end do
!
stress_tensor(1,1) = generalized_stress_vector(1)
stress_tensor(2,2) = generalized_stress_vector(2)
stress_tensor(3,3) = generalized_stress_vector(3)
stress_tensor(2,3) = generalized_stress_vector(4)
stress_tensor(1,3) = generalized_stress_vector(5)
stress_tensor(1,2) = generalized_stress_vector(6)
stress_tensor(3,2) = stress_tensor(2,3)
stress_tensor(3,1) = stress_tensor(1,3)
stress_tensor(2,1) = stress_tensor(1,2)
!
end function generalized_hookes_law
Note that 24 elements out of the 36 ones of generalized_constitutive_tensor are
null. Using that, the subroutine can be replaced with
function generalized_hookes_law (strain_tensor, lambda, mu) result
(stress_tensor)
!
real (kind = LONGreal), dimension(:,:), intent(in) :: strain_tensor
real (kind = LONGreal), intent(in) :: lambda, mu
real (kind = LONGreal), dimension(3,3) :: stress_tensor
real (kind = LONGreal) :: tmp
!
stress_tensor(:,:) = mu * strain_tensor(:,:)
tmp = lambda * (strain_tensor(1,1) + strain_tensor(2,2) +
strain_tensor(3,3))
stress_tensor(1,1) = tmp + 2.0_LONGreal * stress_tensor(1,1)
stress_tensor(2,2) = tmp + 2.0_LONGreal * stress_tensor(2,2)
stress_tensor(3,3) = tmp + 2.0_LONGreal * stress_tensor(3,3)
!
end function generalized_hookes_law
end module perdida_m
which is inlined at -finline-limit=320.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (13 preceding siblings ...)
2011-01-23 19:38 ` dominiq at lps dot ens.fr
@ 2011-01-23 20:00 ` hubicka at gcc dot gnu.org
2011-01-23 21:02 ` hubicka at gcc dot gnu.org
` (11 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu.org @ 2011-01-23 20:00 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #15 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-01-23 17:56:31 UTC ---
Enabling early FRE
Index: passes.c
===================================================================
--- passes.c (revision 169136)
+++ passes.c (working copy)
@@ -760,6 +760,7 @@
NEXT_PASS (pass_remove_cgraph_callee_edges);
NEXT_PASS (pass_rename_ssa_copies);
NEXT_PASS (pass_ccp);
+ NEXT_PASS (pass_fre);
NEXT_PASS (pass_forwprop);
/* pass_build_ealias is a dummy pass that ensures that we
execute TODO_rebuild_alias at this point. Re-building
@@ -782,7 +783,7 @@
reduces perida size estimate to 694 (so by about 30%) and hookes law to 141 (by
11%). Not enough to make inlining happen, still.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (14 preceding siblings ...)
2011-01-23 20:00 ` hubicka at gcc dot gnu.org
@ 2011-01-23 21:02 ` hubicka at gcc dot gnu.org
2011-01-23 21:12 ` dominiq at lps dot ens.fr
` (10 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu.org @ 2011-01-23 21:02 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #16 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-01-23 17:57:58 UTC ---
Also w/o inlining hookes_law but with inlining perida (by using
large-function-growth parameter only and the patch abov), I get 30% speedup,
not 50% as with inlining both, but it seems that we miss some optimization that
is independent on inlining w/o early FRE.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (15 preceding siblings ...)
2011-01-23 21:02 ` hubicka at gcc dot gnu.org
@ 2011-01-23 21:12 ` dominiq at lps dot ens.fr
2011-01-23 22:12 ` hubicka at gcc dot gnu.org
` (9 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: dominiq at lps dot ens.fr @ 2011-01-23 21:12 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #17 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-01-23 19:38:30 UTC ---
With the patch in comment #15 and -finline-limit=300, I get
================================================================================
Date & Time : 23 Jan 2011 20:18:02
Test Name : pbharness
Compile Command : gfcp %n.f90 -Ofast -funroll-loops -ftree-loop-linear
-fomit-frame-pointer -finline-limit=300 -fwhole-program -flto -o %n
Benchmarks : ac aermod air capacita channel doduc fatigue gas_dyn induct
linpk mdbx nf protein rnflow test_fpu tfft
Maximum Times : 300.0
Target Error % : 0.200
Minimum Repeats : 2
Maximum Repeats : 5
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
ac 3.55 54576 8.12 2 0.0062
aermod 103.51 1595448 18.87 2 0.0079
air 8.87 90048 6.89 2 0.0798
capacita 5.84 89056 40.27 2 0.0199
channel 1.62 34448 2.98 2 0.0168
doduc 14.30 203936 27.79 2 0.0162
fatigue 4.89 89264 4.74 2 0.0106
gas_dyn 11.72 148176 4.64 5 0.0535
induct 10.87 205976 14.00 2 0.0036
linpk 1.58 21536 21.71 2 0.0415
mdbx 5.60 84752 12.56 2 0.1871
nf 7.24 83712 29.23 5 0.0744
protein 11.81 163760 35.10 2 0.0342
rnflow 14.86 171392 26.91 2 0.0223
test_fpu 11.35 145848 11.03 2 0.0952
tfft 1.10 22072 3.30 2 0.1817
Geometric Mean Execution Time = 12.36 seconds
to be compared to the lowest Geometric Mean I have got so far (most of the
difference is due to nf which depends a lot of the mood of my laptop)
================================================================================
Date & Time : 22 Dec 2010 10:33:08
Test Name : pbharness
Compile Command : gfc %n.f90 -Ofast -funroll-loops -ftree-loop-linear
-fomit-frame-pointer -finline-limit=600 --param hot-bb-frequency-fraction=2000
-fwhole-program -flto -o %n
Benchmarks : ac aermod air capacita channel doduc fatigue gas_dyn induct
linpk mdbx nf protein rnflow test_fpu tfft
Maximum Times : 300.0
Target Error % : 0.200
Minimum Repeats : 2
Maximum Repeats : 5
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
ac 11.55 58672 8.11 2 0.0123
aermod 164.78 1522240 19.11 2 0.1151
air 20.73 85984 6.87 5 0.1914
capacita 14.66 105472 40.22 2 0.0584
channel 3.22 34448 2.92 4 0.1714
doduc 24.70 212360 27.81 2 0.1025
fatigue 9.81 85144 4.70 3 0.1862
gas_dyn 24.13 144240 4.66 5 0.4507
induct 22.50 214136 13.69 2 0.1096
linpk 2.56 21536 21.68 2 0.0231
mdbx 8.93 84744 12.52 2 0.0080
nf 22.61 104136 27.63 2 0.0778
protein 26.19 155768 35.51 2 0.0127
rnflow 30.99 163200 26.15 2 0.0248
test_fpu 18.79 145848 10.98 2 0.0182
tfft 1.92 22072 3.29 2 0.0304
Geometric Mean Execution Time = 12.27 seconds
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (16 preceding siblings ...)
2011-01-23 21:12 ` dominiq at lps dot ens.fr
@ 2011-01-23 22:12 ` hubicka at gcc dot gnu.org
2011-01-23 22:39 ` hubicka at gcc dot gnu.org
` (8 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu.org @ 2011-01-23 22:12 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2011-01-23 15:59:30 |
CC| |rguenther at suse dot de
--- Comment #18 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-01-23 20:00:23 UTC ---
We produce very lousy code for the out of line copy of
__perdida_m_MOD_generalized_hookes_law. This seems to be reason why we inline
it.
Code is bit better with early FRE but still we get in
vect_pgeneralized_constitutive_tensor (optimized dump):
generalized_constitutive_tensor = {};
D.4502_45 = *lambda_44(D);
D.4503_47 = *mu_46(D);
D.4504_48 = D.4503_47 * 2.0e+0;
D.4505_49 = D.4504_48 + D.4502_45;
generalized_constitutive_tensor[0] = D.4505_49;
generalized_constitutive_tensor[6] = D.4502_45;
generalized_constitutive_tensor[12] = D.4502_45;
generalized_constitutive_tensor[1] = D.4502_45;
generalized_constitutive_tensor[7] = D.4505_49;
generalized_constitutive_tensor[13] = D.4502_45;
generalized_constitutive_tensor[2] = D.4502_45;
generalized_constitutive_tensor[8] = D.4502_45;
generalized_constitutive_tensor[14] = D.4505_49;
generalized_constitutive_tensor[21] = D.4503_47;
generalized_constitutive_tensor[28] = D.4503_47;
generalized_constitutive_tensor[35] = D.4503_47;
initialize the array with mostly zeros and then we use it in vectorized loop:
vect_cst_.855_301 = {D.4508_69, D.4508_69};
vect_cst_.862_295 = {D.4511_73, D.4511_73};
vect_cst_.870_288 = {D.4514_77, D.4514_77};
vect_cst_.878_323 = {D.4519_82, D.4519_82};
vect_cst_.886_330 = {D.4522_86, D.4522_86};
vect_cst_.894_337 = {D.4526_90, D.4526_90};
vect_var_.853_205 = MEM[(real(kind=8)[36]
*)&generalized_constitutive_tensor];
vect_var_.854_210 = vect_var_.853_205 * vect_cst_.855_301;
vect_var_.860_211 = MEM[(real(kind=8)[36] *)&generalized_constitutive_tensor
+ 48B];
vect_var_.861_214 = vect_var_.860_211 * vect_cst_.862_295;
vect_var_.863_215 = vect_var_.861_214 + vect_var_.854_210;
vect_var_.868_220 = MEM[(real(kind=8)[36] *)&generalized_constitutive_tensor
+ 96B];
vect_var_.869_221 = vect_var_.868_220 * vect_cst_.870_288;
vect_var_.871_224 = vect_var_.863_215 + vect_var_.869_221;
vect_var_.876_225 = MEM[(real(kind=8)[36] *)&generalized_constitutive_tensor
+ 144B];
we would better go with unrolling this and optimizing away 0 terms.
W/o -ftree-vectorize we however still don't do this transform. We end up with:
generalized_constitutive_tensor = {};
D.4502_45 = *lambda_44(D);
D.4503_47 = *mu_46(D);
D.4504_48 = D.4503_47 * 2.0e+0;
D.4505_49 = D.4504_48 + D.4502_45;
generalized_constitutive_tensor[1] = D.4502_45;
generalized_constitutive_tensor[7] = D.4505_49;
generalized_constitutive_tensor[13] = D.4502_45;
generalized_constitutive_tensor[2] = D.4502_45;
generalized_constitutive_tensor[8] = D.4502_45;
generalized_constitutive_tensor[14] = D.4505_49;
generalized_constitutive_tensor[21] = D.4503_47;
generalized_constitutive_tensor[28] = D.4503_47;
generalized_constitutive_tensor[35] = D.4503_47;
....
pretmp.827_334 = generalized_constitutive_tensor[1];
pretmp.830_336 = generalized_constitutive_tensor[7];
pretmp.832_338 = generalized_constitutive_tensor[13];
pretmp.834_340 = generalized_constitutive_tensor[19];
pretmp.836_342 = generalized_constitutive_tensor[25];
pretmp.838_344 = generalized_constitutive_tensor[31];
so copy propagation and SRA are missing. Moreover we can't figure out that
generalized_constitutive_tensor[31] is 0.
So it is quite good testcase for optimization queue ordering.
Honza
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (17 preceding siblings ...)
2011-01-23 22:12 ` hubicka at gcc dot gnu.org
@ 2011-01-23 22:39 ` hubicka at gcc dot gnu.org
2011-01-24 2:04 ` dominiq at lps dot ens.fr
` (7 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: hubicka at gcc dot gnu.org @ 2011-01-23 22:39 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2011-01-23 15:59:30
--- Comment #19 from Jan Hubicka <hubicka at gcc dot gnu.org> 2011-01-23 21:05:51 UTC ---
This adds enough passes so we generate sane code for hookes_law.
(and we do that before inlining)
Index: passes.c
===================================================================
--- passes.c (revision 169136)
+++ passes.c (working copy)
@@ -775,6 +775,14 @@
NEXT_PASS (pass_convert_switch);
NEXT_PASS (pass_cleanup_eh);
NEXT_PASS (pass_profile);
+ NEXT_PASS (pass_tree_loop_init);
+ NEXT_PASS (pass_complete_unroll);
+ NEXT_PASS (pass_tree_loop_done);
+ NEXT_PASS (pass_ccp);
+ NEXT_PASS (pass_fre);
+ NEXT_PASS (pass_dse);
+ NEXT_PASS (pass_fre);
+ NEXT_PASS (pass_cd_dce);
NEXT_PASS (pass_local_pure_const);
/* Split functions creates parts that are not run through
early optimizations again. It is thus good idea to do this
@@ -782,7 +790,7 @@
We need to unroll the loop, do ccp to get constant array indexes, FRE to
propagate through memory acceses. For some reason FRE is needed twice or the
loads from the temporary array are not copy propagated.
I didn't tested if DSE is really needed or cd_dce gets rid of the dead store
into the array. Still a lot of copyprop oppurtunity is left.
This makes hookes_law estimate to be 91 instructions, so -finline-limit=183
should be enough.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (18 preceding siblings ...)
2011-01-23 22:39 ` hubicka at gcc dot gnu.org
@ 2011-01-24 2:04 ` dominiq at lps dot ens.fr
2011-01-24 9:43 ` dominiq at lps dot ens.fr
` (6 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: dominiq at lps dot ens.fr @ 2011-01-24 2:04 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #20 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-01-23 23:20:34 UTC ---
> This makes hookes_law estimate to be 91 instructions, so -finline-limit=183
> should be enough.
With the patch in comment #19, I rather find a threshold of -finline-limit=256.
In top of that as shown by the timing below the patch increases the threshold
for ac.f90 and breaks the vectorization for induct.f90.
Would the patch in comment #15 and an increase of the default value for
-finline-limit to 300 be acceptable at this stage (with the usual bells and
whisles: SPEC, ...)?
================================================================================
Date & Time : 23 Jan 2011 23:18:23
Test Name : pbharness
Compile Command : gfcp %n.f90 -Ofast -funroll-loops -ftree-loop-linear
-fomit-frame-pointer -finline-limit=300 -fwhole-program -flto -o %n
Benchmarks : ac aermod air capacita channel doduc fatigue gas_dyn induct
linpk mdbx nf protein rnflow test_fpu tfft
Maximum Times : 300.0
Target Error % : 0.200
Minimum Repeats : 2
Maximum Repeats : 5
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
ac 3.15 50536 9.58 2 0.0156
aermod 104.98 1652280 18.79 2 0.1011
air 8.83 90048 6.99 5 0.7334
capacita 5.95 89056 40.21 2 0.0174
channel 1.65 34448 2.99 2 0.0502
doduc 14.59 208056 27.91 2 0.0036
fatigue 4.80 89264 4.72 2 0.0212
gas_dyn 11.65 148176 4.66 5 0.4391
induct 11.20 205976 22.34 2 0.0672
linpk 1.59 21536 21.70 2 0.0299
mdbx 5.78 84760 12.58 2 0.0119
nf 7.60 83712 29.53 5 0.3854
protein 11.69 163760 35.18 2 0.1109
rnflow 15.23 167296 26.97 2 0.0890
test_fpu 11.33 145848 11.06 5 0.3715
tfft 1.13 22072 3.30 2 0.0607
Geometric Mean Execution Time = 12.89 seconds
================================================================================
Date & Time : 23 Jan 2011 23:54:28
Test Name : pbharness
Compile Command : gfcp %n.f90 -Ofast -funroll-loops -ftree-loop-linear
-fomit-frame-pointer -finline-limit=600 -fwhole-program -flto -o %n
Benchmarks : ac aermod air capacita channel doduc fatigue gas_dyn induct
linpk mdbx nf protein rnflow test_fpu tfft
Maximum Times : 300.0
Target Error % : 0.200
Minimum Repeats : 2
Maximum Repeats : 5
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
ac 3.59 54576 8.10 2 0.0062
aermod 103.73 1558344 18.91 2 0.0238
air 10.47 89992 6.77 5 0.1563
capacita 7.47 101344 40.08 2 0.0137
channel 1.65 34448 2.97 5 0.5872
doduc 15.82 216376 27.61 2 0.0000
fatigue 5.10 89264 4.73 2 0.0000
gas_dyn 12.09 152264 4.69 5 0.6428
induct 11.10 205976 22.33 2 0.0403
linpk 1.59 21536 21.72 2 0.0368
mdbx 5.85 84760 12.58 2 0.0517
nf 11.34 108280 28.98 2 0.1087
protein 11.65 163760 35.18 3 0.1422
rnflow 17.39 183696 26.71 2 0.0243
test_fpu 11.49 145816 11.02 2 0.1226
tfft 1.43 22072 3.29 2 0.0911
Geometric Mean Execution Time = 12.70 seconds
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (19 preceding siblings ...)
2011-01-24 2:04 ` dominiq at lps dot ens.fr
@ 2011-01-24 9:43 ` dominiq at lps dot ens.fr
2011-01-24 14:37 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: dominiq at lps dot ens.fr @ 2011-01-24 9:43 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #21 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-01-24 09:29:00 UTC ---
I have regtested my working tree (with other patches) with the patch in comment
#15 and got 180 new failures (likely 90 for both -m32 and -m64), but I have not
checked that carefully).
Among them, 124 are of the kind "scan-tree-dump-times fre *: dump file does not
exist" and seem to be due to the extra pass producing fre1 and fre2. I can
adjust the test for say fre2 and see what's happening.
Then I see
FAIL: gcc.dg/ipa/ipa-pta-14.c scan-ipa-dump pta "foo.result = { NULL a[^ ]* a[^
]* c[^ ]* }"
FAIL: gcc.dg/matrix/matrix-1.c scan-ipa-dump-times matrix-reorg "Flattened 3
dimensions" 1
FAIL: gcc.dg/matrix/matrix-2.c scan-ipa-dump-times matrix-reorg "Flattened 2
dimensions" 1
FAIL: gcc.dg/matrix/matrix-3.c scan-ipa-dump-times matrix-reorg "Flattened 2
dimensions" 1
FAIL: gcc.dg/matrix/matrix-6.c scan-ipa-dump-times matrix-reorg "Flattened 2
dimensions" 1
FAIL: gcc.dg/matrix/transpose-1.c scan-ipa-dump-times matrix-reorg "Flattened 3
dimensions" 1
FAIL: gcc.dg/matrix/transpose-1.c scan-ipa-dump-times matrix-reorg "Transposed"
3
FAIL: gcc.dg/matrix/transpose-2.c scan-ipa-dump-times matrix-reorg "Flattened 3
dimensions" 1
FAIL: gcc.dg/matrix/transpose-3.c scan-ipa-dump-times matrix-reorg "Flattened 2
dimensions" 1
FAIL: gcc.dg/matrix/transpose-3.c scan-ipa-dump-times matrix-reorg "Transposed"
2
FAIL: gcc.dg/matrix/transpose-4.c scan-ipa-dump-times matrix-reorg "Flattened 3
dimensions" 1
FAIL: gcc.dg/matrix/transpose-4.c scan-ipa-dump-times matrix-reorg "Transposed"
2
FAIL: gcc.dg/matrix/transpose-5.c scan-ipa-dump-times matrix-reorg "Flattened 3
dimensions" 1
FAIL: gcc.dg/matrix/transpose-6.c scan-ipa-dump-times matrix-reorg "Flattened 3
dimensions" 1
FAIL: gcc.dg/torture/pta-structcopy-1.c -O2 scan-tree-dump alias "points-to
vars: { i }"
FAIL: gcc.dg/torture/pta-structcopy-1.c -O3 -fomit-frame-pointer
scan-tree-dump alias "points-to vars: { i }"
FAIL: gcc.dg/torture/pta-structcopy-1.c -O3 -g scan-tree-dump alias
"points-to vars: { i }"
FAIL: gcc.dg/torture/pta-structcopy-1.c -Os scan-tree-dump alias "points-to
vars: { i }"
FAIL: gcc.dg/torture/pta-structcopy-1.c -O2 -flto -flto-partition=none
scan-tree-dump alias "points-to vars: { i }"
FAIL: gcc.dg/torture/pta-structcopy-1.c -O2 -flto scan-tree-dump alias
"points-to vars: { i }"
FAIL: gcc.dg/tree-ssa/pta-ptrarith-1.c scan-tree-dump ealias "q_., points-to
vars: { k }"
FAIL: gcc.dg/tree-ssa/sra-9.c scan-tree-dump-times optimized "= s.b" 0
FAIL: gcc.dg/tree-ssa/ssa-dce-4.c scan-tree-dump-times cddce1 "a\[[^
FAIL: gcc.dg/tree-ssa/stdarg-2.c scan-tree-dump stdarg "f6: va_list escapes 0,
needs to save (3|12|24) GPR units"
FAIL: gcc.dg/tree-ssa/stdarg-2.c scan-tree-dump stdarg "f11: va_list escapes 0,
needs to save (3|12|24) GPR units"
FAIL: gcc.dg/tree-ssa/stdarg-2.c scan-tree-dump stdarg "f12: va_list escapes 0,
needs to save [1-9][0-9]* GPR units"
FAIL: gcc.dg/tree-ssa/stdarg-2.c scan-tree-dump stdarg "f13: va_list escapes 0,
needs to save [1-9][0-9]* GPR units"
FAIL: gcc.dg/tree-ssa/stdarg-2.c scan-tree-dump stdarg "f14: va_list escapes 0,
needs to save [1-9][0-9]* GPR units"
FAIL: g++.dg/ipa/iinline-1.C scan-ipa-dump inline "String::funcOne[^\n]*inline
copy in int main"
FAIL: g++.dg/ipa/iinline-2.C scan-ipa-dump inline "String::funcOne[^\n]*inline
copy in int main"
So far I have only looked at gcc.dg/ipa/ipa-pta-14.c, for which grepping
foo.result yields
p_1 = foo.result
foo.result = foo.arg1
Equivalence classes for Direct node node id 15:foo.result are pointer: 8,
location:0
Unifying foo.result to foo.arg0
foo.result = { a.0+32 } same as foo.arg0
instead of
p_1 = foo.result
foo.result = D.2736_3
Equivalence classes for Direct node node id 15:foo.result are pointer: 13,
location:0
Unifying foo.result to p_1
foo.result = { NULL a.0+32 a.64+64 c.0+32 } same as p_1
Is it a missed optimization or wrong-code?
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (20 preceding siblings ...)
2011-01-24 9:43 ` dominiq at lps dot ens.fr
@ 2011-01-24 14:37 ` rguenth at gcc dot gnu.org
2011-01-24 18:09 ` howarth at nitro dot med.uc.edu
` (4 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-01-24 14:37 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #22 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-01-24 14:07:14 UTC ---
(In reply to comment #15)
> Enabling early FRE
> Index: passes.c
> ===================================================================
> --- passes.c (revision 169136)
> +++ passes.c (working copy)
> @@ -760,6 +760,7 @@
> NEXT_PASS (pass_remove_cgraph_callee_edges);
> NEXT_PASS (pass_rename_ssa_copies);
> NEXT_PASS (pass_ccp);
> + NEXT_PASS (pass_fre);
> NEXT_PASS (pass_forwprop);
> /* pass_build_ealias is a dummy pass that ensures that we
> execute TODO_rebuild_alias at this point. Re-building
> @@ -782,7 +783,7 @@
>
> reduces perida size estimate to 694 (so by about 30%) and hookes law to 141 (by
> 11%). Not enough to make inlining happen, still.
That FRE pass should be after pass_sra_early (certainly after
pass_build_ealias).
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (21 preceding siblings ...)
2011-01-24 14:37 ` rguenth at gcc dot gnu.org
@ 2011-01-24 18:09 ` howarth at nitro dot med.uc.edu
2011-01-24 18:38 ` dominiq at lps dot ens.fr
` (3 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: howarth at nitro dot med.uc.edu @ 2011-01-24 18:09 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
Jack Howarth <howarth at nitro dot med.uc.edu> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |howarth at nitro dot
| |med.uc.edu
--- Comment #23 from Jack Howarth <howarth at nitro dot med.uc.edu> 2011-01-24 17:58:00 UTC ---
(In reply to comment #22)
> That FRE pass should be after pass_sra_early (certainly after
> pass_build_ealias).
Index: gcc/passes.c
===================================================================
--- gcc/passes.c (revision 169145)
+++ gcc/passes.c (working copy)
@@ -767,6 +767,7 @@ init_optimization_passes (void)
locals into SSA form if possible. */
NEXT_PASS (pass_build_ealias);
NEXT_PASS (pass_sra_early);
+ NEXT_PASS (pass_fre);
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_merge_phi);
NEXT_PASS (pass_cd_dce);
gives Elapsed CPU time = 8.43600E+00 for
gfortran -O3 -ffast-math -funroll-loops -flto -fwhole-program fatigue.f90 -o
fatigue
and Elapsed CPU time = 4.16600E+00 for
gfortran -O3 -ffast-math -funroll-loops -finline-limit=250 --param
large-function-growth=250 -flto -fwhole-program fatigue.f90 -o fatigue
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (22 preceding siblings ...)
2011-01-24 18:09 ` howarth at nitro dot med.uc.edu
@ 2011-01-24 18:38 ` dominiq at lps dot ens.fr
2011-02-16 18:44 ` dominiq at lps dot ens.fr
` (2 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: dominiq at lps dot ens.fr @ 2011-01-24 18:38 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #24 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-01-24 18:16:47 UTC ---
(In reply to comment #22)
> That FRE pass should be after pass_sra_early (certainly after
> pass_build_ealias).
Moving pass_fre after pass_sra_early does not fix the failures in the test
suite rported in comment #21.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (23 preceding siblings ...)
2011-01-24 18:38 ` dominiq at lps dot ens.fr
@ 2011-02-16 18:44 ` dominiq at lps dot ens.fr
2011-09-22 15:53 ` dominiq at lps dot ens.fr
2011-09-26 10:37 ` rguenth at gcc dot gnu.org
26 siblings, 0 replies; 28+ messages in thread
From: dominiq at lps dot ens.fr @ 2011-02-16 18:44 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #25 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-02-16 18:38:19 UTC ---
AFAICT the patch in http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00973.html
seems to fix most of the fatigue.f90 problems:
At revision 170178 without the patch, I get
[macbook] lin/test% gfcp -Ofast fatigue.f90
[macbook] lin/test% time a.out > /dev/null
8.903u 0.005s 0:08.91 99.8% 0+0k 0+2io 0pf+0w
[macbook] lin/test% gfcp -Ofast -fwhole-program fatigue.f90
[macbook] lin/test% time a.out > /dev/null
6.392u 0.002s 0:06.39 100.0% 0+0k 0+0io 0pf+0w
[macbook] lin/test% gfcp -Ofast -finline-limit=322 -fwhole-program fatigue.f90
[macbook] lin/test% time a.out > /dev/null
4.653u 0.002s 0:04.65 100.0% 0+0k 0+1io 0pf+0w
[macbook] lin/test% gfcp -Ofast -finline-limit=322 -fwhole-program -flto
fatigue.f90
[macbook] lin/test% time a.out > /dev/null
8.212u 0.004s 0:08.22 99.8% 0+0k 0+2io 0pf+0w
[macbook] lin/test% gfcp -Ofast -finline-limit=322 --param
large-function-growth=132 -fwhole-program -flto fatigue.f90
[macbook] lin/test% time a.out > /dev/null
4.526u 0.004s 0:04.53 99.7% 0+0k 0+1io 0pf+0w
At revision 170212 with the patch, I get
[macbook] lin/test% gfc -Ofast fatigue.f90
[macbook] lin/test% time a.out > /dev/null
4.628u 0.002s 0:04.63 99.7% 0+0k 0+0io 0pf+0w
[macbook] lin/test% gfc -Ofast -fwhole-program fatigue.f90
[macbook] lin/test% time a.out > /dev/null
4.654u 0.002s 0:04.65 100.0% 0+0k 0+1io 0pf+0w
[macbook] lin/test% gfc -Ofast -finline-limit=322 -fwhole-program fatigue.f90
[macbook] lin/test% time a.out > /dev/null
4.657u 0.002s 0:04.66 99.7% 0+0k 0+1io 0pf+0w
[macbook] lin/test% gfc -Ofast -finline-limit=322 -fwhole-program -flto
fatigue.f90
[macbook] lin/test% time a.out > /dev/null
4.715u 0.003s 0:04.72 99.7% 0+0k 0+1io 0pf+0w
[macbook] lin/test% gfc -Ofast -finline-limit=322 --param
large-function-growth=132 -fwhole-program -flto fatigue.f90
[macbook] lin/test% time a.out > /dev/null
4.713u 0.003s 0:04.71 100.0% 0+0k 0+1io 0pf+0w
[macbook] lin/test% gfc -Ofast -finline-limit=322 --param
large-function-growth=137 -fwhole-program -flto fatigue.f90
[macbook] lin/test% time a.out > /dev/null
4.524u 0.003s 0:04.52 100.0% 0+0k 0+1io 0pf+0w
[macbook] lin/test% gfc -Ofast --param large-function-growth=137
-fwhole-program -flto fatigue.f90
[macbook] lin/test% time a.out > /dev/null
4.564u 0.003s 0:04.57 99.7% 0+0k 0+1io 0pf+0w
[macbook] lin/test% gfc -Ofast --param large-function-growth=137
-fwhole-program fatigue.f90
[macbook] lin/test% time a.out > /dev/null
4.479u 0.003s 0:04.48 99.7% 0+0k 0+2io 0pf+0w
A quick check of the other tests does not show any obvious slowdown with the
patch.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (24 preceding siblings ...)
2011-02-16 18:44 ` dominiq at lps dot ens.fr
@ 2011-09-22 15:53 ` dominiq at lps dot ens.fr
2011-09-26 10:37 ` rguenth at gcc dot gnu.org
26 siblings, 0 replies; 28+ messages in thread
From: dominiq at lps dot ens.fr @ 2011-09-22 15:53 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
--- Comment #26 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-09-22 15:25:48 UTC ---
AFAICT this pr has been fixed since some time. Here are the results I get on
x86_64-apple-darwin10 (Core2Duo 2.53Ghz, 3Mb cache, 4Gb RAM) at revision
179079:
Compile options : -fprotect-parens -Ofast -funroll-loops -fwhole-program
without -flto with -flto
Benchmark Compile Executable Ave Run Compile Executable Ave Run
Name (secs) (bytes) (secs) (secs) (bytes) (secs)
--------- ------- ---------- ------- ------- ---------- -------
ac 3.28 54936 8.81 6.64 54968 8.81
aermod 75.46 1184280 18.65 131.50 1212648 18.20
air 11.24 106336 7.26 22.38 106904 7.39
capacita 3.87 77152 41.29 7.36 77200 41.31
channel 1.25 34744 3.03 2.39 34864 3.03
doduc 12.40 200016 28.02 22.47 200496 27.69
fatigue 4.06 77400 4.83 8.17 77488 4.84
gas_dyn 9.32 119256 4.92 16.64 119816 4.92
induct 7.37 148840 13.83 14.76 153224 13.84
linpk 0.70 26024 21.64 1.93 26064 21.64
mdbx 3.77 80864 12.46 7.21 81040 12.46
nf 4.08 71848 19.34 8.07 71896 19.35
protein 15.17 131304 35.30 26.05 127224 35.48
rnflow 12.58 130888 28.25 23.76 131000 26.92
test_fpu 4.78 92968 10.63 13.35 93024 10.64
tfft 0.74 22352 3.28 1.98 22432 3.28
Geometric Mean Execution Time = 12.23 secs 12.18 secs
Compile options : -fprotect-parens -Ofast -funroll-loops -ftree-loop-linear
-fomit-frame-pointer --param max-inline-insns-auto=200 -fwhole-program
without -flto with -flto
Benchmark Compile Executable Ave Run Compile Executable Ave Run
Name (secs) (bytes) (secs) (secs) (bytes) (secs)
--------- ------- ---------- ------- ------- ---------- -------
ac 4.05 54904 8.11 8.18 54920 8.11
aermod 101.55 1494688 18.17 169.63 1527120 18.12
air 14.46 114328 7.05 30.35 114912 7.04
capacita 5.39 97552 40.24 10.80 97584 40.21
channel 1.68 38792 2.91 3.17 38888 2.91
doduc 12.98 208112 27.47 25.77 208584 27.52
fatigue 4.84 81440 2.95 10.27 81504 2.93
gas_dyn 13.55 143776 4.86 25.03 144392 4.86
induct 12.95 189872 13.78 24.32 190176 13.96
linpk 0.73 21856 21.69 2.44 21888 21.69
mdbx 4.32 84928 12.45 9.39 85104 12.54
nf 7.41 92248 18.93 17.82 92272 18.91
protein 17.26 160040 35.51 31.08 155984 35.47
rnflow 15.16 138880 28.27 27.28 139040 26.85
test_fpu 5.05 92872 10.65 14.65 92928 10.65
tfft 0.75 22352 3.28 1.72 22432 3.28
Geometric Mean Execution Time = 11.67 secs 11.64 secs
The option -flto improves the run time for rnflow.f90 by ~5% without slowdown
for the other tests. Could these results be checked on other platforms and this
PR closed if they agree with mine?
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug lto/45810] 40% slowdown when using LTO for a single-file program
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
` (25 preceding siblings ...)
2011-09-22 15:53 ` dominiq at lps dot ens.fr
@ 2011-09-26 10:37 ` rguenth at gcc dot gnu.org
26 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-09-26 10:37 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45810
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
--- Comment #27 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-09-26 10:16:20 UTC ---
Yes, I think I analyzed the reason for this at some point (IPA profile) and
fixed it.
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2011-09-26 10:17 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-27 13:13 [Bug lto/45810] New: 40% slowdown when using LTO for a single-file program burnus at gcc dot gnu.org
2010-09-27 15:48 ` [Bug lto/45810] " Joost.VandeVondele at pci dot uzh.ch
2010-09-27 15:54 ` rguenth at gcc dot gnu.org
2010-09-28 15:35 ` burnus at gcc dot gnu.org
2010-09-28 16:24 ` rguenth at gcc dot gnu.org
2010-09-28 16:25 ` Joost.VandeVondele at pci dot uzh.ch
2010-09-28 16:50 ` rguenth at gcc dot gnu.org
2010-09-28 16:50 ` Joost.VandeVondele at pci dot uzh.ch
2010-09-28 16:55 ` burnus at gcc dot gnu.org
2010-09-30 3:27 ` dominiq at lps dot ens.fr
2010-09-30 19:54 ` dominiq at lps dot ens.fr
2011-01-08 20:41 ` hubicka at gcc dot gnu.org
2011-01-23 16:36 ` hubicka at gcc dot gnu.org
2011-01-23 18:08 ` hubicka at gcc dot gnu.org
2011-01-23 19:38 ` dominiq at lps dot ens.fr
2011-01-23 20:00 ` hubicka at gcc dot gnu.org
2011-01-23 21:02 ` hubicka at gcc dot gnu.org
2011-01-23 21:12 ` dominiq at lps dot ens.fr
2011-01-23 22:12 ` hubicka at gcc dot gnu.org
2011-01-23 22:39 ` hubicka at gcc dot gnu.org
2011-01-24 2:04 ` dominiq at lps dot ens.fr
2011-01-24 9:43 ` dominiq at lps dot ens.fr
2011-01-24 14:37 ` rguenth at gcc dot gnu.org
2011-01-24 18:09 ` howarth at nitro dot med.uc.edu
2011-01-24 18:38 ` dominiq at lps dot ens.fr
2011-02-16 18:44 ` dominiq at lps dot ens.fr
2011-09-22 15:53 ` dominiq at lps dot ens.fr
2011-09-26 10:37 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).