is LTO aimed for large programs?

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* is LTO aimed for large programs?
@ 2009-11-08 23:03 Basile STARYNKEVITCH
  2009-11-08 23:31 ` Robert Dewar
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Basile STARYNKEVITCH @ 2009-11-08 23:03 UTC (permalink / raw)
  To: GCC Mailing List

Hello All,

is gcc-trunk -flto -O2 aimed for medium sized programs (something like bash), or for bigger ones (something like the 
linux kernel, the Xorg server, the Qt or GTK graphical toolkit libraries, or bootstrapping GCC itself. Currently it 
seems that the stage3 compiler is not compiled with -flto - I suppose that would require a stage4 or perhaps even a 
stage5.)?

I know my question is really naive, because what "large" means depend a lot.

I sometimes try using gcc-trunk -flto when recompiling new stuff. The biggest software I tried so far with success is 
caia or malice by J.Pitrat (440KLOC of source, 10Mb binary) or ocamlrun (20?KLOC source, 212Kb binary) but I never used 
it yet on very big software (like the linux kernel, or GCC itself).

Perhaps the question is when not to use -flto and use -fwhopr instead?

Maybe we might add a hint in the *.texi documentation like;
    avoid using --flto on a program or library whose source size + binary size is bigger than 30% of the RAM available?
[of course I don't know if the formula is good; we could try finding a better one]

I have no idea if in practice the compilation time penalty of -flto -O2 is quadratic in the size of the generated binary.

Regards.

-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: is LTO aimed for large programs?
  2009-11-08 23:03 is LTO aimed for large programs? Basile STARYNKEVITCH
@ 2009-11-08 23:31 ` Robert Dewar
  2009-11-09  6:43   ` Basile STARYNKEVITCH
  2009-11-09  9:44 ` Tobias Burnus
  2009-11-09 12:35 ` Diego Novillo
  2 siblings, 1 reply; 11+ messages in thread
From: Robert Dewar @ 2009-11-08 23:31 UTC (permalink / raw)
  To: Basile STARYNKEVITCH; +Cc: GCC Mailing List

Basile STARYNKEVITCH wrote:

> I sometimes try using gcc-trunk -flto when recompiling new stuff. The biggest software I tried so far with success is 
> caia or malice by J.Pitrat (440KLOC of source, 10Mb binary) or ocamlrun (20?KLOC source, 212Kb binary) but I never used 
> it yet on very big software (like the linux kernel, or GCC itself).

Compared to some of the application systems we deal with gcc is large, 
but not very large. We have several Ada users with millions of lines of
code in a single program.
> 
> Perhaps the question is when not to use -flto and use -fwhopr instead?
> 
> Maybe we might add a hint in the *.texi documentation like;
>     avoid using --flto on a program or library whose source size + binary size is bigger than 30% of the RAM available?
> [of course I don't know if the formula is good; we could try finding a better one]
> 
> I have no idea if in practice the compilation time penalty of -flto -O2 is quadratic in the size of the generated binary.
> 
> Regards.
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: is LTO aimed for large programs?
  2009-11-08 23:31 ` Robert Dewar
@ 2009-11-09  6:43   ` Basile STARYNKEVITCH
  2009-11-09  7:13     ` Eric Botcazou
  0 siblings, 1 reply; 11+ messages in thread
From: Basile STARYNKEVITCH @ 2009-11-09  6:43 UTC (permalink / raw)
  To: Robert Dewar; +Cc: GCC Mailing List

Robert Dewar wrote:
> 
> Compared to some of the application systems we deal with gcc is large, 
> but not very large. We have several Ada users with millions of lines of
> code in a single program.

Do you "sell" the -flto option to your customers?

Do you suggest your big customers to recompile their 10MLOC Ada code with -flto?

Did they (or you) already try doing that?

Or should they use -fwhopr?

Or perhaps they prefer a bit faster compilation time, only using -O1?

Regards.

-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: is LTO aimed for large programs?
  2009-11-09  6:43   ` Basile STARYNKEVITCH
@ 2009-11-09  7:13     ` Eric Botcazou
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Botcazou @ 2009-11-09  7:13 UTC (permalink / raw)
  To: Basile STARYNKEVITCH; +Cc: gcc, Robert Dewar

> Do you suggest your big customers to recompile their 10MLOC Ada code with
> -flto?

-flto doesn't work for Ada yet.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: is LTO aimed for large programs?
  2009-11-08 23:03 is LTO aimed for large programs? Basile STARYNKEVITCH
  2009-11-08 23:31 ` Robert Dewar
@ 2009-11-09  9:44 ` Tobias Burnus
  2009-11-09  9:57   ` Richard Guenther
  2009-11-12 16:46   ` Jan Hubicka
  2009-11-09 12:35 ` Diego Novillo
  2 siblings, 2 replies; 11+ messages in thread
From: Tobias Burnus @ 2009-11-09  9:44 UTC (permalink / raw)
  To: Basile STARYNKEVITCH; +Cc: GCC Mailing List

On 11/09/2009 12:03 AM, Basile STARYNKEVITCH wrote:
> is gcc-trunk -flto -O2 aimed for medium sized programs (something like
> bash), or for bigger ones (something like the linux kernel, the Xorg
> server, the Qt or GTK graphical toolkit libraries, or bootstrapping GCC
> itself.

My understanding is that LTO aims at both, but that one needs to use
-fwhopr for really large systems as the otherwise e.g. the memory usage
may exceed the available memory. I don't know whether one can really
estimated how much memory compilation needs. It is surely not a simple
function on the number of code lines.

I tried -flto successfully for our 100 kLoC Fortran code and there lto1
needs <1/2 GB of RAM (370 MB if I recall correctly). (Thanks to
especially Richard; initially more than 4 GB were needed and lto1
crashed thus). Toon also used LTO [1] for their HIRLAM weather
forecasting program, which has according to [2] 1.2 MLoC in Fortran and
O(10 kLoC) in C.

If I recall correctly, bootstrapping GCC also works in principle, except
for problems when comparing stage2 with stage3.

> Perhaps the question is when not to use -flto and use -fwhopr instead?

My rule of thumb is: Try -flto first, if it does not work (running out
of memory), try -fwhopr. I think the advantage of -flto is also that it
is better tested, while -fwhopr has known issues.

Tobias

[1] http://gcc.gnu.org/ml/gcc/2009-10/msg00122.html
[2] http://moene.org/~toon/GCCSummit-2006.pdf

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: is LTO aimed for large programs?
  2009-11-09  9:44 ` Tobias Burnus
@ 2009-11-09  9:57   ` Richard Guenther
  2009-11-12 16:46   ` Jan Hubicka
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Guenther @ 2009-11-09  9:57 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Basile STARYNKEVITCH, GCC Mailing List

On Mon, Nov 9, 2009 at 10:44 AM, Tobias Burnus <burnus@net-b.de> wrote:
> On 11/09/2009 12:03 AM, Basile STARYNKEVITCH wrote:
>> is gcc-trunk -flto -O2 aimed for medium sized programs (something like
>> bash), or for bigger ones (something like the linux kernel, the Xorg
>> server, the Qt or GTK graphical toolkit libraries, or bootstrapping GCC
>> itself.
>
> My understanding is that LTO aims at both, but that one needs to use
> -fwhopr for really large systems as the otherwise e.g. the memory usage
> may exceed the available memory. I don't know whether one can really
> estimated how much memory compilation needs. It is surely not a simple
> function on the number of code lines.
>
> I tried -flto successfully for our 100 kLoC Fortran code and there lto1
> needs <1/2 GB of RAM (370 MB if I recall correctly). (Thanks to
> especially Richard; initially more than 4 GB were needed and lto1
> crashed thus). Toon also used LTO [1] for their HIRLAM weather
> forecasting program, which has according to [2] 1.2 MLoC in Fortran and
> O(10 kLoC) in C.
>
> If I recall correctly, bootstrapping GCC also works in principle, except
> for problems when comparing stage2 with stage3.
>
>
>> Perhaps the question is when not to use -flto and use -fwhopr instead?
>
> My rule of thumb is: Try -flto first, if it does not work (running out
> of memory), try -fwhopr. I think the advantage of -flto is also that it
> is better tested, while -fwhopr has known issues.

Indeed for 4.5 I wouldn't recommend on using -fwhopr at all.

The compile-time penalty is dependent on the complexity of
the IPA passes we run - and I have no idea if they are currently
worse than linear in the number of cgraph nodes or edges.

The main issue with LTO is memory usage as you have to
keep the whole program in memory.

Another issue is that with excessive inlining (which we avoid
by default, but of course you can tweak params) you can
more easily hit non-linear time- and memory complexity
in function local passes.

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: is LTO aimed for large programs?
  2009-11-08 23:03 is LTO aimed for large programs? Basile STARYNKEVITCH
  2009-11-08 23:31 ` Robert Dewar
  2009-11-09  9:44 ` Tobias Burnus
@ 2009-11-09 12:35 ` Diego Novillo
  2009-11-09 12:48   ` Richard Guenther
  2 siblings, 1 reply; 11+ messages in thread
From: Diego Novillo @ 2009-11-09 12:35 UTC (permalink / raw)
  To: Basile STARYNKEVITCH; +Cc: GCC Mailing List

On Sun, Nov 8, 2009 at 18:03, Basile STARYNKEVITCH
<basile@starynkevitch.net> wrote:

> Perhaps the question is when not to use -flto and use -fwhopr instead?

I don't think anyone has systematically tried to determine these
limits.  The original design tried to address a specific instance of a
program with about 400 million callgraph nodes.  At the time, -flto
was running out of virtual addressing space to hold it (the gcc binary
was 32 bits), but it could be processed with -fwhopr.

The current implementation of -fwhopr is incomplete, however.  It
needs fixes to the pass manager to properly apply all IPA passes (and
associated bug fixes).  I would not use it in 4.5.  Richi has made
numerous fixes to symbol/type handling, so -flto is now more memory
efficient than it was when I last tried it on a large application.

Diego.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: is LTO aimed for large programs?
  2009-11-09 12:35 ` Diego Novillo
@ 2009-11-09 12:48   ` Richard Guenther
  2009-11-09 12:52     ` Diego Novillo
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2009-11-09 12:48 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Basile STARYNKEVITCH, GCC Mailing List

On Mon, Nov 9, 2009 at 1:35 PM, Diego Novillo <dnovillo@google.com> wrote:
> On Sun, Nov 8, 2009 at 18:03, Basile STARYNKEVITCH
> <basile@starynkevitch.net> wrote:
>
>> Perhaps the question is when not to use -flto and use -fwhopr instead?
>
> I don't think anyone has systematically tried to determine these
> limits.  The original design tried to address a specific instance of a
> program with about 400 million callgraph nodes.  At the time, -flto
> was running out of virtual addressing space to hold it (the gcc binary
> was 32 bits), but it could be processed with -fwhopr.

Hm, as WPA needs the whole cgraph in memory and a single
cgraph node (not counting any edges or decls) is 256 bytes
large that would require 97GB ram alone for cgraph nodes.
So I don't believe you ;)  Even with 400 thousand cgraph nodes
you'd run out of virtual memory on 32bits unless the cgraph
node size on 32bit is less than 10 bytes which it is of course not ...
(btw, a function decl is also 240 bytes).

I think we can scale to a million cgraph nodes on a 64bit
host with lots of memory (remember we need to pull in
all decls and types during WPA phase).

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: is LTO aimed for large programs?
  2009-11-09 12:48   ` Richard Guenther
@ 2009-11-09 12:52     ` Diego Novillo
  0 siblings, 0 replies; 11+ messages in thread
From: Diego Novillo @ 2009-11-09 12:52 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Basile STARYNKEVITCH, GCC Mailing List

On Mon, Nov 9, 2009 at 07:47, Richard Guenther
<richard.guenther@gmail.com> wrote:

> So I don't believe you ;)  Even with 400 thousand cgraph nodes
> you'd run out of virtual memory on 32bits unless the cgraph
> node size on 32bit is less than 10 bytes which it is of course not ...

You are right.  I was thinking 400k, not 400m.  Sorry about that.


Diego.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: is LTO aimed for large programs?
  2009-11-09  9:44 ` Tobias Burnus
  2009-11-09  9:57   ` Richard Guenther
@ 2009-11-12 16:46   ` Jan Hubicka
  2009-11-12 17:01     ` Robert Dewar
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Hubicka @ 2009-11-12 16:46 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: Basile STARYNKEVITCH, GCC Mailing List

> > Perhaps the question is when not to use -flto and use -fwhopr instead?
> 
> My rule of thumb is: Try -flto first, if it does not work (running out
> of memory), try -fwhopr. I think the advantage of -flto is also that it
> is better tested, while -fwhopr has known issues.

-fwhopr is quite broken in the current implementation and I am not sure
we can resoably fix it in stage3, so -flto is only choice for the moment
for larger programs. If it explodes in memory use, I would be interested
in having the testcase.

Honza
> 
> Tobias
> 
> [1] http://gcc.gnu.org/ml/gcc/2009-10/msg00122.html
> [2] http://moene.org/~toon/GCCSummit-2006.pdf

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: is LTO aimed for large programs?
  2009-11-12 16:46   ` Jan Hubicka
@ 2009-11-12 17:01     ` Robert Dewar
  0 siblings, 0 replies; 11+ messages in thread
From: Robert Dewar @ 2009-11-12 17:01 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Tobias Burnus, Basile STARYNKEVITCH, GCC Mailing List

Jan Hubicka wrote:
>>> Perhaps the question is when not to use -flto and use -fwhopr instead?
>> My rule of thumb is: Try -flto first, if it does not work (running out
>> of memory), try -fwhopr. I think the advantage of -flto is also that it
>> is better tested, while -fwhopr has known issues.
> 
> -fwhopr is quite broken in the current implementation and I am not sure
> we can resoably fix it in stage3, so -flto is only choice for the moment
> for larger programs. If it explodes in memory use, I would be interested
> in having the testcase.

Always hard to know what explode means. We have customers with programs
large enough that normal -O2 optimization blows 32-bit address limits,
and they have had to move to 64-bit machines to complete compilation.
> 
> Honza
>> Tobias
>>
>> [1] http://gcc.gnu.org/ml/gcc/2009-10/msg00122.html
>> [2] http://moene.org/~toon/GCCSummit-2006.pdf


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-11-12 17:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-08 23:03 is LTO aimed for large programs? Basile STARYNKEVITCH
2009-11-08 23:31 ` Robert Dewar
2009-11-09  6:43   ` Basile STARYNKEVITCH
2009-11-09  7:13     ` Eric Botcazou
2009-11-09  9:44 ` Tobias Burnus
2009-11-09  9:57   ` Richard Guenther
2009-11-12 16:46   ` Jan Hubicka
2009-11-12 17:01     ` Robert Dewar
2009-11-09 12:35 ` Diego Novillo
2009-11-09 12:48   ` Richard Guenther
2009-11-09 12:52     ` Diego Novillo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).