public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Whole program optimization and functions-only-called-once.
@ 2009-11-04 19:19 Toon Moene
  2009-11-04 19:26 ` Richard Guenther
  0 siblings, 1 reply; 18+ messages in thread
From: Toon Moene @ 2009-11-04 19:19 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc mailing list

Jan,

I had some time to study the example I sent you a couple of weeks ago.

According to visible inspection of the source code, there are 5 
functions (subroutines in Fortran parlance) that are called once:

MAIN   calls
HLPROG calls
GEMINI calls
SL2TIM calls
PHCALL calls
PHTASK

I.e., the last five should be candidates for inlining of "functions only 
called once".

However, ccrPOljB.o.047i.inline says:

Deciding on functions called once:

Considering gemini_.clone.1 size 11443.
  Called once from hlprog 462 insns.
  Inlined into hlprog which now has 10728 size for a net change of 
-12620 size.

Considering hlprog size 10728.
  Called once from main 7 insns.
  Not inlined because --param large-function-growth limit reached.

Inlined 1 calls, eliminated 1 functions, size 45477 turned to 32857 size.

The dump option -fdump-ipa-all also gives me the call graph, of which I 
copy here the relevant part:

phcall_.clone.3/11(-1) @0x7fd198c16400 (clone of phcall/33) 
availability:local 8281 time, 972 benefit 1351 size, 291 benefit 984 
bytes stack usage reachable local finalized inlinable
   called by: sl2tim/49 (0.44 per call) sl2tim_.clone.0/16 (0.44 per call)
phtask_.clone.2/12(-1) @0x7fd198c16500 (clone of phtask/41) 
availability:local 26416 time, 4268 benefit 4541 size, 880 benefit 480 
bytes stack usage reachable local finalized inlinable
   called by: phcall_.clone.3/11 (3.52 per call) phcall/33 (3.52 per call)
sl2tim_.clone.0/16(-1) @0x7fd198c16900 (clone of sl2tim/49) 
availability:local 207312 time, 26617 benefit 5169 size, 941 benefit 
3856 bytes stack usage reachable local finalized inlinable
   called by: gemini_.clone.1/40 (1.00 per call) gemini/0 (1.00 per call)
gemini_.clone.1phtask/40(-1) @0x7fd198c35000 (inline copy in hlprog/17) 
(clone of gemini/0) availability:local 147324 time, 2770 benefit 11443 
size, 1177 benefit 11635 bytes stack usage reachable local finalized 
inlinable
   called by: hlprog/17 (3.57 per call) (inlined)
phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268 
benefit 4541 size, 880 benefit 480 bytes stack usage reachable body 
local finalized inlinable
   called by:
phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972 benefit 
1351 size, 291 benefit 984 bytes stack usage reachable body local 
finalized inlinable
   called by:
hlprog/17(-1) @0x7fd198c16a00 availability:local 560 time, 10 benefit 
(516762 after inlining) 462 size, 1 benefit (10728 after inlining) 4216 
bytes stack usage 15851 bytes after inlining reachable body local 
finalized inlinable
   called by: main/29 (1.00 per call)
sl2tim/49(-1) @0x7fd198c35900 availability:local 207312 time, 26617 
benefit 5169 size, 941 benefit 3856 bytes stack usage reachable body 
local finalized inlinable
   called by:
gemini/0(-1) @0x7fd198bef800 availability:local 147324 time, 2770 
benefit 11443 size, 1177 benefit 11635 bytes stack usage reachable body 
local finalized inlinable
   called by:

So if we have to believe this summary,

HLPROG is called by MAIN, but is not suitable for inlining (I can live 
with that).
GEMINI is not called, but GEMINI.clone is (by HLPROG) and is inlined.
SL2TIM is not called, but SL2TIM.clone is called by GEMINI and 
GEMINI.clone; because it is called twice, it is not considered a 
function-only-called-once.
PHCALL is not called, but PHCALL.clone is called by SL2TIM and 
SL2TIM.clone; because it is called twice, it is not considered a 
function-only-called-once.
PHTASK is not called, but PHTASK.clone is called by PHCALL and 
PHCALL.clone; because it is called twice, it is not considered a 
function-only-called-once.

I don't think this is really what we want with 
functions-only-called-once: If only the .clone version of a function is 
used, than a function that's only called once *inside this clone* is a 
function-only-called-once.

I hope this analysis helps,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-04 19:19 Whole program optimization and functions-only-called-once Toon Moene
@ 2009-11-04 19:26 ` Richard Guenther
  2009-11-04 21:20   ` Toon Moene
  2009-11-12 16:16   ` Jan Hubicka
  0 siblings, 2 replies; 18+ messages in thread
From: Richard Guenther @ 2009-11-04 19:26 UTC (permalink / raw)
  To: Toon Moene; +Cc: Jan Hubicka, gcc mailing list

On Wed, Nov 4, 2009 at 8:19 PM, Toon Moene <toon@moene.org> wrote:
> Jan,
>
> I had some time to study the example I sent you a couple of weeks ago.
>
> According to visible inspection of the source code, there are 5 functions
> (subroutines in Fortran parlance) that are called once:
>
> MAIN   calls
> HLPROG calls
> GEMINI calls
> SL2TIM calls
> PHCALL calls
> PHTASK
>
> I.e., the last five should be candidates for inlining of "functions only
> called once".
>
> However, ccrPOljB.o.047i.inline says:
>
> Deciding on functions called once:
>
> Considering gemini_.clone.1 size 11443.
>  Called once from hlprog 462 insns.
>  Inlined into hlprog which now has 10728 size for a net change of -12620
> size.
>
> Considering hlprog size 10728.
>  Called once from main 7 insns.
>  Not inlined because --param large-function-growth limit reached.
>
> Inlined 1 calls, eliminated 1 functions, size 45477 turned to 32857 size.
>
> The dump option -fdump-ipa-all also gives me the call graph, of which I copy
> here the relevant part:
>
> phcall_.clone.3/11(-1) @0x7fd198c16400 (clone of phcall/33)
> availability:local 8281 time, 972 benefit 1351 size, 291 benefit 984 bytes
> stack usage reachable local finalized inlinable
>  called by: sl2tim/49 (0.44 per call) sl2tim_.clone.0/16 (0.44 per call)
> phtask_.clone.2/12(-1) @0x7fd198c16500 (clone of phtask/41)
> availability:local 26416 time, 4268 benefit 4541 size, 880 benefit 480 bytes
> stack usage reachable local finalized inlinable
>  called by: phcall_.clone.3/11 (3.52 per call) phcall/33 (3.52 per call)
> sl2tim_.clone.0/16(-1) @0x7fd198c16900 (clone of sl2tim/49)
> availability:local 207312 time, 26617 benefit 5169 size, 941 benefit 3856
> bytes stack usage reachable local finalized inlinable
>  called by: gemini_.clone.1/40 (1.00 per call) gemini/0 (1.00 per call)
> gemini_.clone.1phtask/40(-1) @0x7fd198c35000 (inline copy in hlprog/17)
> (clone of gemini/0) availability:local 147324 time, 2770 benefit 11443 size,
> 1177 benefit 11635 bytes stack usage reachable local finalized inlinable
>  called by: hlprog/17 (3.57 per call) (inlined)
> phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268 benefit
> 4541 size, 880 benefit 480 bytes stack usage reachable body local finalized
> inlinable
>  called by:
> phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972 benefit 1351
> size, 291 benefit 984 bytes stack usage reachable body local finalized
> inlinable
>  called by:
> hlprog/17(-1) @0x7fd198c16a00 availability:local 560 time, 10 benefit
> (516762 after inlining) 462 size, 1 benefit (10728 after inlining) 4216
> bytes stack usage 15851 bytes after inlining reachable body local finalized
> inlinable
>  called by: main/29 (1.00 per call)
> sl2tim/49(-1) @0x7fd198c35900 availability:local 207312 time, 26617 benefit
> 5169 size, 941 benefit 3856 bytes stack usage reachable body local finalized
> inlinable
>  called by:
> gemini/0(-1) @0x7fd198bef800 availability:local 147324 time, 2770 benefit
> 11443 size, 1177 benefit 11635 bytes stack usage reachable body local
> finalized inlinable
>  called by:
>
> So if we have to believe this summary,
>
> HLPROG is called by MAIN, but is not suitable for inlining (I can live with
> that).
> GEMINI is not called, but GEMINI.clone is (by HLPROG) and is inlined.
> SL2TIM is not called, but SL2TIM.clone is called by GEMINI and GEMINI.clone;
> because it is called twice, it is not considered a
> function-only-called-once.
> PHCALL is not called, but PHCALL.clone is called by SL2TIM and SL2TIM.clone;
> because it is called twice, it is not considered a
> function-only-called-once.
> PHTASK is not called, but PHTASK.clone is called by PHCALL and PHCALL.clone;
> because it is called twice, it is not considered a
> function-only-called-once.
>
> I don't think this is really what we want with functions-only-called-once:
> If only the .clone version of a function is used, than a function that's
> only called once *inside this clone* is a function-only-called-once.
>
> I hope this analysis helps,

I think the underlying issue is

phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268
benefit 4541 size, 880 benefit 480 bytes stack usage reachable body
local finalized inlinable
 called by:
phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972
benefit 1351 size, 291 benefit 984 bytes stack usage reachable body
local finalized inlinable
 called by:

that these are not called but still reachable (they should not be reachable
anymore, instead the clones are now reachable).  I think there already is
a bug about cloning not updating cgraph reachability and not reclaiming
nodes after IPA transform application.

Richard.

> --
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> At home: http://moene.org/~toon/
> Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-04 19:26 ` Richard Guenther
@ 2009-11-04 21:20   ` Toon Moene
  2009-11-04 21:30     ` Andrew Pinski
  2009-11-12 16:16   ` Jan Hubicka
  1 sibling, 1 reply; 18+ messages in thread
From: Toon Moene @ 2009-11-04 21:20 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jan Hubicka, gcc mailing list

Richard Guenther wrote:

> I think the underlying issue is
> 
> phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268
> benefit 4541 size, 880 benefit 480 bytes stack usage reachable body
> local finalized inlinable
>  called by:
> phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972
> benefit 1351 size, 291 benefit 984 bytes stack usage reachable body
> local finalized inlinable
>  called by:
> 
> that these are not called but still reachable (they should not be reachable
> anymore, instead the clones are now reachable).  I think there already is
> a bug about cloning not updating cgraph reachability and not reclaiming
> nodes after IPA transform application.

You don't happen to recall the bug number ?

The last time I did this sort of optimization was in 1992.

f2c (the Fortran-to-C compiler) gave me C equivalents of all Fortran 
code in the forecasting executable.

I spent a rainy Sunday afternoon to paste them into one giant source 
file, order them correctly (all called subroutines first) and then slap 
"static inline" on them.

Subsequently, I compiled the (30,000 line) C file with gcc -O3.  The 
resulting executable was about 10 % faster than the original (which was 
also compiled by f2c - g77 didn't exist at that time).

So my hopes on this optimization (when done right) are quite high :-)

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-04 21:20   ` Toon Moene
@ 2009-11-04 21:30     ` Andrew Pinski
  2009-11-04 21:50       ` Richard Guenther
  2009-11-12 16:46       ` Jan Hubicka
  0 siblings, 2 replies; 18+ messages in thread
From: Andrew Pinski @ 2009-11-04 21:30 UTC (permalink / raw)
  To: Toon Moene; +Cc: Richard Guenther, Jan Hubicka, gcc mailing list

On Wed, Nov 4, 2009 at 1:20 PM, Toon Moene <toon@moene.org> wrote:
> You don't happen to recall the bug number ?

It might be related to PR 41735 which I noticed when looking at the
generated assembly and trying to compare 4.5 to 4.4.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-04 21:30     ` Andrew Pinski
@ 2009-11-04 21:50       ` Richard Guenther
  2009-11-12 16:46       ` Jan Hubicka
  1 sibling, 0 replies; 18+ messages in thread
From: Richard Guenther @ 2009-11-04 21:50 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Toon Moene, Jan Hubicka, gcc mailing list

On Wed, Nov 4, 2009 at 10:30 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Wed, Nov 4, 2009 at 1:20 PM, Toon Moene <toon@moene.org> wrote:
>> You don't happen to recall the bug number ?
>
> It might be related to PR 41735 which I noticed when looking at the
> generated assembly and trying to compare 4.5 to 4.4.

Yes indeed.  Honza may be able to explain why it is like it is and if it's easy
to fix.  He's on vacation though ;)

Richard.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-04 19:26 ` Richard Guenther
  2009-11-04 21:20   ` Toon Moene
@ 2009-11-12 16:16   ` Jan Hubicka
  2009-11-14 12:55     ` Toon Moene
  1 sibling, 1 reply; 18+ messages in thread
From: Jan Hubicka @ 2009-11-12 16:16 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Toon Moene, Jan Hubicka, gcc mailing list

> On Wed, Nov 4, 2009 at 8:19 PM, Toon Moene <toon@moene.org> wrote:
> > Jan,
> >
> > I had some time to study the example I sent you a couple of weeks ago.
> >
> > According to visible inspection of the source code, there are 5 functions
> > (subroutines in Fortran parlance) that are called once:
> >
> > MAIN   calls
> > HLPROG calls
> > GEMINI calls
> > SL2TIM calls
> > PHCALL calls
> > PHTASK
> >
> > I.e., the last five should be candidates for inlining of "functions only
> > called once".
> >
> > However, ccrPOljB.o.047i.inline says:
> >
> > Deciding on functions called once:
> >
> > Considering gemini_.clone.1 size 11443.
> >  Called once from hlprog 462 insns.
> >  Inlined into hlprog which now has 10728 size for a net change of -12620
> > size.
> >
> > Considering hlprog size 10728.
> >  Called once from main 7 insns.
> >  Not inlined because --param large-function-growth limit reached.
> >
> > Inlined 1 calls, eliminated 1 functions, size 45477 turned to 32857 size.
> >
> > The dump option -fdump-ipa-all also gives me the call graph, of which I copy
> > here the relevant part:
> >
> > phcall_.clone.3/11(-1) @0x7fd198c16400 (clone of phcall/33)
> > availability:local 8281 time, 972 benefit 1351 size, 291 benefit 984 bytes
> > stack usage reachable local finalized inlinable
> >  called by: sl2tim/49 (0.44 per call) sl2tim_.clone.0/16 (0.44 per call)
> > phtask_.clone.2/12(-1) @0x7fd198c16500 (clone of phtask/41)
> > availability:local 26416 time, 4268 benefit 4541 size, 880 benefit 480 bytes
> > stack usage reachable local finalized inlinable
> >  called by: phcall_.clone.3/11 (3.52 per call) phcall/33 (3.52 per call)
> > sl2tim_.clone.0/16(-1) @0x7fd198c16900 (clone of sl2tim/49)
> > availability:local 207312 time, 26617 benefit 5169 size, 941 benefit 3856
> > bytes stack usage reachable local finalized inlinable
> >  called by: gemini_.clone.1/40 (1.00 per call) gemini/0 (1.00 per call)
> > gemini_.clone.1phtask/40(-1) @0x7fd198c35000 (inline copy in hlprog/17)
> > (clone of gemini/0) availability:local 147324 time, 2770 benefit 11443 size,
> > 1177 benefit 11635 bytes stack usage reachable local finalized inlinable
> >  called by: hlprog/17 (3.57 per call) (inlined)
> > phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268 benefit
> > 4541 size, 880 benefit 480 bytes stack usage reachable body local finalized
> > inlinable
> >  called by:
> > phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972 benefit 1351
> > size, 291 benefit 984 bytes stack usage reachable body local finalized
> > inlinable
> >  called by:
> > hlprog/17(-1) @0x7fd198c16a00 availability:local 560 time, 10 benefit
> > (516762 after inlining) 462 size, 1 benefit (10728 after inlining) 4216
> > bytes stack usage 15851 bytes after inlining reachable body local finalized
> > inlinable
> >  called by: main/29 (1.00 per call)
> > sl2tim/49(-1) @0x7fd198c35900 availability:local 207312 time, 26617 benefit
> > 5169 size, 941 benefit 3856 bytes stack usage reachable body local finalized
> > inlinable
> >  called by:
> > gemini/0(-1) @0x7fd198bef800 availability:local 147324 time, 2770 benefit
> > 11443 size, 1177 benefit 11635 bytes stack usage reachable body local
> > finalized inlinable
> >  called by:
> >
> > So if we have to believe this summary,
> >
> > HLPROG is called by MAIN, but is not suitable for inlining (I can live with
> > that).
> > GEMINI is not called, but GEMINI.clone is (by HLPROG) and is inlined.
> > SL2TIM is not called, but SL2TIM.clone is called by GEMINI and GEMINI.clone;
> > because it is called twice, it is not considered a
> > function-only-called-once.
> > PHCALL is not called, but PHCALL.clone is called by SL2TIM and SL2TIM.clone;
> > because it is called twice, it is not considered a
> > function-only-called-once.
> > PHTASK is not called, but PHTASK.clone is called by PHCALL and PHCALL.clone;
> > because it is called twice, it is not considered a
> > function-only-called-once.
> >
> > I don't think this is really what we want with functions-only-called-once:
> > If only the .clone version of a function is used, than a function that's
> > only called once *inside this clone* is a function-only-called-once.
> >
> > I hope this analysis helps,
> 
> I think the underlying issue is
> 
> phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268
> benefit 4541 size, 880 benefit 480 bytes stack usage reachable body
> local finalized inlinable
>  called by:
> phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972
> benefit 1351 size, 291 benefit 984 bytes stack usage reachable body
> local finalized inlinable
>  called by:
> 
> that these are not called but still reachable (they should not be reachable
> anymore, instead the clones are now reachable).  I think there already is
> a bug about cloning not updating cgraph reachability and not reclaiming
> nodes after IPA transform application.

reachable flag is not kept up to date after initial cgraph build, only
code removing unreachable functions compute it. 

The actual problem here is uglier - the reachability pass can not really
remove the original functions, since their clones needs to be
constructed, so the function stay in the cgraph until this happens.
This confuse called once logic.

Hmm, I guess we need function called once to be able to figure out those
functions staying in callgraph only because they are masters for clones
to be materialized.  I guess we can make reachability pass ignoring this
issue (so really get reachability up to date) and make inliner (and
other propagation passes) to ignore those unreachable nodes.  Ugly, but
at the moment I don't see better way around :(

-fno-ipa-cp should work around your problem for time being.

Honza
> 
> Richard.
> 
> > --
> > Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> > Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> > At home: http://moene.org/~toon/
> > Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html
> >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-04 21:30     ` Andrew Pinski
  2009-11-04 21:50       ` Richard Guenther
@ 2009-11-12 16:46       ` Jan Hubicka
  2009-11-12 21:41         ` Jan Hubicka
  1 sibling, 1 reply; 18+ messages in thread
From: Jan Hubicka @ 2009-11-12 16:46 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Toon Moene, Richard Guenther, Jan Hubicka, gcc mailing list

> On Wed, Nov 4, 2009 at 1:20 PM, Toon Moene <toon@moene.org> wrote:
> > You don't happen to recall the bug number ?
> 
> It might be related to PR 41735 which I noticed when looking at the
> generated assembly and trying to compare 4.5 to 4.4.

I fixed this bug today, so it might help. But it is related to COMDAT
functions and I don't think fortran actually produce them.
We do reachability after clonning, there must be actually reason to keep
the clone, so we need to debug it.
I will try to look into it tomorrow if other new issues won't stop me
(I just got profile feedback working with LTO, it might be also
interesting if it helps your app). 

Honza
> 
> Thanks,
> Andrew Pinski

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-12 16:46       ` Jan Hubicka
@ 2009-11-12 21:41         ` Jan Hubicka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Hubicka @ 2009-11-12 21:41 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Andrew Pinski, Toon Moene, Richard Guenther, Jan Hubicka,
	gcc mailing list

Hi,
this is WIP patch to deal with the unreachable clones problem.  It
basically renders the clones as unanalyzed cgraph nodes (but with still
body in) so IPA passes don't see them.

Honza

Index: cgraph.c
===================================================================
--- cgraph.c	(revision 154127)
+++ cgraph.c	(working copy)
@@ -1132,7 +1132,7 @@ cgraph_release_function_body (struct cgr
       pop_cfun();
       gimple_set_body (node->decl, NULL);
       VEC_free (ipa_opt_pass, heap,
-      		DECL_STRUCT_FUNCTION (node->decl)->ipa_transforms_to_apply);
+      		node->ipa_transforms_to_apply);
       /* Struct function hangs a lot of data that would leak if we didn't
          removed all pointers to it.   */
       ggc_free (DECL_STRUCT_FUNCTION (node->decl));
@@ -1159,6 +1159,8 @@ cgraph_remove_node (struct cgraph_node *
   cgraph_call_node_removal_hooks (node);
   cgraph_node_remove_callers (node);
   cgraph_node_remove_callees (node);
+  VEC_free (ipa_opt_pass, heap,
+            node->ipa_transforms_to_apply);
 
   /* Incremental inlining access removed nodes stored in the postorder list.
      */
Index: cgraph.h
===================================================================
--- cgraph.h	(revision 154127)
+++ cgraph.h	(working copy)
@@ -190,6 +190,11 @@ struct GTY((chain_next ("%h.next"), chai
 
   PTR GTY ((skip)) aux;
 
+  /* Interprocedural passes scheduled to have their transform functions
+     applied next time we execute local pass on them.  We maintain it
+     per-function in order to allow IPA passes to introduce new functions.  */
+  VEC(ipa_opt_pass,heap) * GTY((skip)) ipa_transforms_to_apply;
+
   struct cgraph_local_info local;
   struct cgraph_global_info global;
   struct cgraph_rtl_info rtl;
@@ -206,16 +211,24 @@ struct GTY((chain_next ("%h.next"), chai
      number of cfg nodes with -fprofile-generate and -fprofile-use */
   int pid;
 
-  /* Set when function must be output - it is externally visible
-     or its address is taken.  */
+  /* Set when function must be output for some reason.  The primary
+     use of this flag is to mark functions needed to be output for
+     non-standard reason.  Functions that are externally visible
+     or reachable from functions needed to be output are marked
+     by specialized flags.  */
   unsigned needed : 1;
-  /* Set when function has address taken.  */
+  /* Set when function has address taken.
+     In current implementation it imply needed flag. */
   unsigned address_taken : 1;
   /* Set when decl is an abstract function pointed to by the
      ABSTRACT_DECL_ORIGIN of a reachable function.  */
   unsigned abstract_and_needed : 1;
   /* Set when function is reachable by call from other function
-     that is either reachable or needed.  */
+     that is either reachable or needed.  
+     This flag is computed at original cgraph construction and then
+     updated in cgraph_remove_unreachable_nodes.  Note that after
+     cgraph_remove_unreachable_nodes cgraph still can contain unreachable
+     nodes when they are needed for virtual clone instantiation.  */
   unsigned reachable : 1;
   /* Set once the function is lowered (i.e. its CFG is built).  */
   unsigned lowered : 1;
Index: cgraphunit.c
===================================================================
--- cgraphunit.c	(revision 154127)
+++ cgraphunit.c	(working copy)
@@ -699,7 +699,7 @@ verify_cgraph_node (struct cgraph_node *
       error_found = true;
     }
 
-  if (node->analyzed && gimple_has_body_p (node->decl)
+  if (node->analyzed && node->reachable && gimple_has_body_p (node->decl)
       && !TREE_ASM_WRITTEN (node->decl)
       && (!DECL_EXTERNAL (node->decl) || node->global.inlined_to)
       && !flag_wpa)
@@ -1777,8 +1777,8 @@ save_inline_function_body (struct cgraph
   TREE_PUBLIC (first_clone->decl) = 0;
   DECL_COMDAT (first_clone->decl) = 0;
   VEC_free (ipa_opt_pass, heap,
-            DECL_STRUCT_FUNCTION (first_clone->decl)->ipa_transforms_to_apply);
-  DECL_STRUCT_FUNCTION (first_clone->decl)->ipa_transforms_to_apply = NULL;
+            first_clone->ipa_transforms_to_apply);
+  first_clone->ipa_transforms_to_apply = NULL;
 
 #ifdef ENABLE_CHECKING
   verify_cgraph_node (first_clone);
@@ -1810,6 +1810,8 @@ cgraph_materialize_clone (struct cgraph_
     node->clone_of->clones = node->next_sibling_clone;
   node->next_sibling_clone = NULL;
   node->prev_sibling_clone = NULL;
+  if (!node->clone_of->analyzed && !node->clone_of->clones)
+    cgraph_remove_node (node->clone_of);
   node->clone_of = NULL;
   bitmap_obstack_release (NULL);
 }
Index: ipa-inline.c
===================================================================
--- ipa-inline.c	(revision 154127)
+++ ipa-inline.c	(working copy)
@@ -1120,7 +1120,7 @@ cgraph_decide_inlining (void)
   max_count = 0;
   max_benefit = 0;
   for (node = cgraph_nodes; node; node = node->next)
-    if (node->analyzed)
+    if (node->reachable)
       {
 	struct cgraph_edge *e;
 
Index: lto-streamer-in.c
===================================================================
--- lto-streamer-in.c	(revision 154127)
+++ lto-streamer-in.c	(working copy)
@@ -1476,6 +1476,7 @@ lto_read_body (struct lto_file_decl_data
       /* Restore decl state */
       file_data->current_decl_state = file_data->global_decl_state;
 
+#if 0
       /* FIXME: ipa_transforms_to_apply holds list of passes that have optimization
          summaries computed and needs to apply changes.  At the moment WHOPR only
          supports inlining, so we can push it here by hand.  In future we need to stream
@@ -1485,6 +1486,7 @@ lto_read_body (struct lto_file_decl_data
 	 VEC_safe_push (ipa_opt_pass, heap,
 			cfun->ipa_transforms_to_apply,
 			(ipa_opt_pass)&pass_ipa_inline);
+#endif
       pop_cfun ();
     }
   else 
Index: c-decl.c
===================================================================
--- c-decl.c	(revision 154127)
+++ c-decl.c	(working copy)
@@ -4497,6 +4497,7 @@ build_compound_literal (location_t loc, 
       set_compound_literal_name (decl);
       DECL_DEFER_OUTPUT (decl) = 1;
       DECL_COMDAT (decl) = 1;
+      TREE_PUBLIC (decl) = 1;
       DECL_ARTIFICIAL (decl) = 1;
       DECL_IGNORED_P (decl) = 1;
       pushdecl (decl);
Index: function.h
===================================================================
--- function.h	(revision 154127)
+++ function.h	(working copy)
@@ -522,11 +522,6 @@ struct GTY(()) function {
   unsigned int curr_properties;
   unsigned int last_verified;
 
-  /* Interprocedural passes scheduled to have their transform functions
-     applied next time we execute local pass on them.  We maintain it
-     per-function in order to allow IPA passes to introduce new functions.  */
-  VEC(ipa_opt_pass,heap) * GTY((skip)) ipa_transforms_to_apply;
-
   /* Non-null if the function does something that would prevent it from
      being copied; this applies to both versioning and inlining.  Set to
      a string describing the reason for failure.  */
Index: ipa.c
===================================================================
--- ipa.c	(revision 154128)
+++ ipa.c	(working copy)
@@ -121,6 +121,7 @@ bool
 cgraph_remove_unreachable_nodes (bool before_inlining_p, FILE *file)
 {
   struct cgraph_node *first = (struct cgraph_node *) (void *) 1;
+  struct cgraph_node *processed = (struct cgraph_node *) (void *) 2;
   struct cgraph_node *node, *next;
   bool changed = false;
 
@@ -142,9 +143,13 @@ cgraph_remove_unreachable_nodes (bool be
         gcc_assert (!node->global.inlined_to);
 	node->aux = first;
 	first = node;
+	node->reachable = true;
       }
     else
-      gcc_assert (!node->aux);
+      {
+        gcc_assert (!node->aux);
+	node->reachable = false;
+      }
 
   /* Perform reachability analysis.  As a special case do not consider
      extern inline functions not inlined as live because we won't output
@@ -154,17 +159,26 @@ cgraph_remove_unreachable_nodes (bool be
       struct cgraph_edge *e;
       node = first;
       first = (struct cgraph_node *) first->aux;
+      node->aux = processed;
 
-      for (e = node->callees; e; e = e->next_callee)
-	if (!e->callee->aux
-	    && node->analyzed
-	    && (!e->inline_failed || !e->callee->analyzed
-		|| (!DECL_EXTERNAL (e->callee->decl))
-                || before_inlining_p))
-	  {
-	    e->callee->aux = first;
-	    first = e->callee;
-	  }
+      if (node->reachable)
+        for (e = node->callees; e; e = e->next_callee)
+	  if (!e->callee->reachable
+	      && node->analyzed
+	      && (!e->inline_failed || !e->callee->analyzed
+		  || (!DECL_EXTERNAL (e->callee->decl))
+                  || before_inlining_p))
+	    {
+	      bool prev_reachable = e->callee->reachable;
+	      e->callee->reachable |= node->reachable;
+	      if (!e->callee->aux
+	          || (e->callee->aux == processed
+		      && prev_reachable != e->callee->reachable))
+	        {
+	          e->callee->aux = first;
+	          first = e->callee;
+	        }
+	    }
       while (node->clone_of && !node->clone_of->aux && !gimple_has_body_p (node->decl))
         {
 	  node = node->clone_of;
@@ -184,13 +198,18 @@ cgraph_remove_unreachable_nodes (bool be
   for (node = cgraph_nodes; node; node = next)
     {
       next = node->next;
+      if (node->aux && !node->reachable)
+        {
+	  cgraph_node_remove_callees (node);
+	  node->analyzed = false;
+	  node->local.inlinable = false;
+	}
       if (!node->aux)
 	{
           node->global.inlined_to = NULL;
 	  if (file)
 	    fprintf (file, " %s", cgraph_node_name (node));
-	  if (!node->analyzed || !DECL_EXTERNAL (node->decl)
-	      || before_inlining_p)
+	  if (!node->analyzed || !DECL_EXTERNAL (node->decl) || before_inlining_p)
 	    cgraph_remove_node (node);
 	  else
 	    {
@@ -204,21 +223,16 @@ cgraph_remove_unreachable_nodes (bool be
 	      /* If so, we need to keep node in the callgraph.  */
 	      if (e || node->needed)
 		{
-		  struct cgraph_node *clone;
-
-		  /* If there are still clones, we must keep body around.
-		     Otherwise we can just remove the body but keep the clone.  */
-		  for (clone = node->clones; clone;
-		       clone = clone->next_sibling_clone)
-		    if (clone->aux)
-		      break;
-		  if (!clone)
-		    {
-		      cgraph_release_function_body (node);
-		      cgraph_node_remove_callees (node);
-		      node->analyzed = false;
-		      node->local.inlinable = false;
-		    }
+		  cgraph_release_function_body (node);
+		  cgraph_node_remove_callees (node);
+		  node->analyzed = false;
+		  node->local.inlinable = false;
+		  if (node->prev_sibling_clone)
+		    node->prev_sibling_clone->next_sibling_clone = node->next_sibling_clone;
+		  else if (node->clone_of)
+		    node->clone_of->clones = node->next_sibling_clone;
+		  if (node->next_sibling_clone)
+		    node->next_sibling_clone->prev_sibling_clone = node->prev_sibling_clone;
 		}
 	      else
 		cgraph_remove_node (node);
@@ -318,7 +332,7 @@ function_and_variable_visibility (bool w
     {
       if (!vnode->finalized)
         continue;
-      gcc_assert ((!DECL_WEAK (vnode->decl) && !DECL_COMMON (vnode->decl))
+      gcc_assert ((!DECL_WEAK (vnode->decl) && !DECL_COMMON (vnode->decl) && !DECL_COMDAT (vnode->decl))
       		  || TREE_PUBLIC (vnode->decl) || DECL_EXTERNAL (vnode->decl));
       if (vnode->needed
 	  && (DECL_COMDAT (vnode->decl) || TREE_PUBLIC (vnode->decl))
Index: tree-inline.c
===================================================================
--- tree-inline.c	(revision 154127)
+++ tree-inline.c	(working copy)
@@ -1983,9 +1983,6 @@ initialize_cfun (tree new_fndecl, tree c
   cfun->function_end_locus = src_cfun->function_end_locus;
   cfun->curr_properties = src_cfun->curr_properties;
   cfun->last_verified = src_cfun->last_verified;
-  if (src_cfun->ipa_transforms_to_apply)
-    cfun->ipa_transforms_to_apply = VEC_copy (ipa_opt_pass, heap,
-					      src_cfun->ipa_transforms_to_apply);
   cfun->va_list_gpr_size = src_cfun->va_list_gpr_size;
   cfun->va_list_fpr_size = src_cfun->va_list_fpr_size;
   cfun->function_frequency = src_cfun->function_frequency;
@@ -3822,6 +3819,10 @@ expand_call_inline (basic_block bb, gimp
   (*debug_hooks->outlining_inline_function) (cg_edge->callee->decl);
 
   /* Update callgraph if needed.  */
+  if (cg_edge->callee->clone_of
+      && !cg_edge->callee->clone_of->next_sibling_clone
+      && !cg_edge->callee->analyzed)
+    cgraph_remove_node (cg_edge->callee);
   cgraph_remove_node (cg_edge->callee);
 
   id->block = NULL_TREE;
@@ -4848,6 +4849,19 @@ tree_function_versioning (tree old_decl,
   id.src_node = old_version_node;
   id.dst_node = new_version_node;
   id.src_cfun = DECL_STRUCT_FUNCTION (old_decl);
+  if (id.src_node->ipa_transforms_to_apply)
+    {
+      VEC(ipa_opt_pass,heap) * old_transforms_to_apply = id.dst_node->ipa_transforms_to_apply;
+      unsigned int i;
+
+      id.dst_node->ipa_transforms_to_apply = VEC_copy (ipa_opt_pass, heap,
+					               id.src_node->ipa_transforms_to_apply);
+      for (i = 0; i < VEC_length (ipa_opt_pass, old_transforms_to_apply); i++)
+        VEC_safe_push (ipa_opt_pass, heap, id.dst_node->ipa_transforms_to_apply,
+		       VEC_index (ipa_opt_pass,
+		       		  old_transforms_to_apply,
+				  i));
+    }
   
   id.copy_decl = copy_decl_no_change;
   id.transform_call_graph_edges
Index: passes.c
===================================================================
--- passes.c	(revision 154127)
+++ passes.c	(working copy)
@@ -1376,15 +1376,6 @@ update_properties_after_pass (void *data
 		           & ~pass->properties_destroyed;
 }
 
-/* Schedule IPA transform pass DATA for CFUN.  */
-
-static void
-add_ipa_transform_pass (void *data)
-{
-  struct ipa_opt_pass_d *ipa_pass = (struct ipa_opt_pass_d *) data;
-  VEC_safe_push (ipa_opt_pass, heap, cfun->ipa_transforms_to_apply, ipa_pass);
-}
-
 /* Execute summary generation for all of the passes in IPA_PASS.  */
 
 void
@@ -1464,19 +1455,22 @@ execute_one_ipa_transform_pass (struct c
 void
 execute_all_ipa_transforms (void)
 {
-  if (cfun && cfun->ipa_transforms_to_apply)
+  struct cgraph_node *node;
+  if (!cfun)
+    return;
+  node = cgraph_node (current_function_decl);
+  if (node->ipa_transforms_to_apply)
     {
       unsigned int i;
-      struct cgraph_node *node = cgraph_node (current_function_decl);
 
-      for (i = 0; i < VEC_length (ipa_opt_pass, cfun->ipa_transforms_to_apply);
+      for (i = 0; i < VEC_length (ipa_opt_pass, node->ipa_transforms_to_apply);
 	   i++)
 	execute_one_ipa_transform_pass (node,
 					VEC_index (ipa_opt_pass,
-						   cfun->ipa_transforms_to_apply,
+						   node->ipa_transforms_to_apply,
 						   i));
-      VEC_free (ipa_opt_pass, heap, cfun->ipa_transforms_to_apply);
-      cfun->ipa_transforms_to_apply = NULL;
+      VEC_free (ipa_opt_pass, heap, node->ipa_transforms_to_apply);
+      node->ipa_transforms_to_apply = NULL;
     }
 }
 
@@ -1551,7 +1545,13 @@ execute_one_pass (struct opt_pass *pass)
   execute_todo (todo_after | pass->todo_flags_finish);
   verify_interpass_invariants ();
   if (pass->type == IPA_PASS)
-    do_per_function (add_ipa_transform_pass, pass);
+    {
+      struct cgraph_node *node;
+      for (node = cgraph_nodes; node; node = node->next)
+        if (node->analyzed)
+          VEC_safe_push (ipa_opt_pass, heap, node->ipa_transforms_to_apply,
+			 (struct ipa_opt_pass_d *)pass);
+    }
 
   if (!current_function_decl)
     cgraph_process_new_functions ();

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-12 16:16   ` Jan Hubicka
@ 2009-11-14 12:55     ` Toon Moene
  2009-11-14 19:52       ` Richard Guenther
  0 siblings, 1 reply; 18+ messages in thread
From: Toon Moene @ 2009-11-14 12:55 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Guenther, Jan Hubicka, gcc mailing list

Jan Hubicka wrote:

> -fno-ipa-cp should work around your problem for time being.

Indeed it did. Some figures:

hlprog (the main forecast program):

link time optimization time: 3:20 minutes
top memory usage:            920  Mbyte

Inliner report:

Inlined 764 calls, eliminated 226 functions, size 260368 turned to 
126882 size.

hirvda (the observation usage program):

link time optimization time: 10:05 minutes
top memory usage:            2.3 Gbyte

Inliner report:

Inlined 2518 calls, eliminated 608 functions, size 1187204 turned to 
705838 size.

Of course, there still is:

Considering invlo6 size 1996.
  Called once from lowpass 530 insns.
  Inlined into lowpass which now has 2293 size for a net change of -2229 
size.

Considering invlo4 size 1462.
  Called once from lowpass 2293 insns.
  Not inlined because --param large-function-growth limit reached.

Considering invlo2 size 933.
  Called once from lowpass 2293 insns.
  Not inlined because --param large-function-growth limit reached.

where the largest callee *does* get inlined, while two smaller ones 
don't (I agree with Jan that this would have been solved by training the 
inliner with profiling data, because only invlo4 gets called).

However, my endeavour is to boldly go where no inliner has gone before, 
and implement -falways-inline-functions-only-called-once, along the 
following lines:

$ svn diff ipa-inline.c
Index: ipa-inline.c
===================================================================
--- ipa-inline.c        (revision 153776)
+++ ipa-inline.c        (working copy)
@@ -1246,7 +1246,7 @@
                            node->callers->caller->global.size);
                 }

-             if (cgraph_check_inline_limits (node->callers->caller, node,
+             if (1 || cgraph_check_inline_limits 
(node->callers->caller, node,
                                               &reason, false))
                 {
                   cgraph_mark_inline (node->callers);

(Sugg. b. Rich. G.), because inlining functions that are only called 
once is always profitable (in number of instructions saved).

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-14 12:55     ` Toon Moene
@ 2009-11-14 19:52       ` Richard Guenther
  2009-11-14 20:14         ` Steven Bosscher
  2009-11-14 22:05         ` Toon Moene
  0 siblings, 2 replies; 18+ messages in thread
From: Richard Guenther @ 2009-11-14 19:52 UTC (permalink / raw)
  To: Toon Moene; +Cc: Jan Hubicka, Jan Hubicka, gcc mailing list

2009/11/14 Toon Moene <toon@moene.org>:
> Jan Hubicka wrote:
>
>> -fno-ipa-cp should work around your problem for time being.
>
> Indeed it did. Some figures:
>
> hlprog (the main forecast program):
>
> link time optimization time: 3:20 minutes
> top memory usage:            920  Mbyte
>
> Inliner report:
>
> Inlined 764 calls, eliminated 226 functions, size 260368 turned to 126882
> size.
>
> hirvda (the observation usage program):
>
> link time optimization time: 10:05 minutes
> top memory usage:            2.3 Gbyte
>
> Inliner report:
>
> Inlined 2518 calls, eliminated 608 functions, size 1187204 turned to 705838
> size.
>
> Of course, there still is:
>
> Considering invlo6 size 1996.
>  Called once from lowpass 530 insns.
>  Inlined into lowpass which now has 2293 size for a net change of -2229
> size.
>
> Considering invlo4 size 1462.
>  Called once from lowpass 2293 insns.
>  Not inlined because --param large-function-growth limit reached.
>
> Considering invlo2 size 933.
>  Called once from lowpass 2293 insns.
>  Not inlined because --param large-function-growth limit reached.
>
> where the largest callee *does* get inlined, while two smaller ones don't (I
> agree with Jan that this would have been solved by training the inliner with
> profiling data, because only invlo4 gets called).
>
> However, my endeavour is to boldly go where no inliner has gone before, and
> implement -falways-inline-functions-only-called-once, along the following
> lines:
>
> $ svn diff ipa-inline.c
> Index: ipa-inline.c
> ===================================================================
> --- ipa-inline.c        (revision 153776)
> +++ ipa-inline.c        (working copy)
> @@ -1246,7 +1246,7 @@
>                           node->callers->caller->global.size);
>                }
>
> -             if (cgraph_check_inline_limits (node->callers->caller, node,
> +             if (1 || cgraph_check_inline_limits (node->callers->caller,
> node,
>                                              &reason, false))
>                {
>                  cgraph_mark_inline (node->callers);
>
> (Sugg. b. Rich. G.), because inlining functions that are only called once is
> always profitable (in number of instructions saved).

;)

Note that some optimizers (for example value-numbering) contain cut-offs
so that they are turned off for large functions as otherwise compile-time
issues appear as algorithms are non-linear in the size of the function.

So it might even be not profitable in the end for size and speed reasons.

Richard.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-14 19:52       ` Richard Guenther
@ 2009-11-14 20:14         ` Steven Bosscher
  2009-11-14 22:13           ` Richard Guenther
  2009-11-14 22:05         ` Toon Moene
  1 sibling, 1 reply; 18+ messages in thread
From: Steven Bosscher @ 2009-11-14 20:14 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Toon Moene, Jan Hubicka, Jan Hubicka, gcc mailing list

On Sat, Nov 14, 2009 at 8:51 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> Note that some optimizers (for example value-numbering) contain cut-offs
> so that they are turned off for large functions as otherwise compile-time
> issues appear as algorithms are non-linear in the size of the function.
>
> So it might even be not profitable in the end for size and speed reasons.

...where one should keep in mind, that this is one of those areas
where GCC is still at least a decade behind the best compilers in the
industry. Those optimizations, that cut themselves off, would work
just fine on regions instead of whole functions. Another thing that
might be helpful, is partial inlining (e.g.
http://www.csc.villanova.edu/~tway/publications/wayPDPTA02.pdf
although I suspect that for the code from Toon only whole-function
inlining is useful...?).

Zadeck had code for structural analysis a couple of years ago. I don't
think anyone has seriously worked with that to experiment with region
based compilation. But I guess it will be the Next Big Challange for
GCC, after LTO.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-14 19:52       ` Richard Guenther
  2009-11-14 20:14         ` Steven Bosscher
@ 2009-11-14 22:05         ` Toon Moene
  1 sibling, 0 replies; 18+ messages in thread
From: Toon Moene @ 2009-11-14 22:05 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jan Hubicka, Jan Hubicka, gcc mailing list

Richard Guenther wrote:

> 2009/11/14 Toon Moene <toon@moene.org>:

>> However, my endeavour is to boldly go where no inliner has gone before, and
>> implement -falways-inline-functions-only-called-once, along the following
>> lines:

   ...

>> (Sugg. b. Rich. G.), because inlining functions that are only called once is
>> always profitable (in number of instructions saved).
> 
> ;)
> 
> Note that some optimizers (for example value-numbering) contain cut-offs
> so that they are turned off for large functions as otherwise compile-time
> issues appear as algorithms are non-linear in the size of the function.

As you correctly note, this is a tongue-in-cheek remark - anyway, we 
(meaning, I) have first to find out why an executable, thus constructed, 
gets execution times for a time step (the "unit-of-work") between 61 and 
94 seconds, something that should be close to the same on every time step.

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-14 20:14         ` Steven Bosscher
@ 2009-11-14 22:13           ` Richard Guenther
  2009-11-15 10:31             ` Steven Bosscher
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Guenther @ 2009-11-14 22:13 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Toon Moene, Jan Hubicka, Jan Hubicka, gcc mailing list

On Sat, Nov 14, 2009 at 2:13 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Sat, Nov 14, 2009 at 8:51 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> Note that some optimizers (for example value-numbering) contain cut-offs
>> so that they are turned off for large functions as otherwise compile-time
>> issues appear as algorithms are non-linear in the size of the function.
>>
>> So it might even be not profitable in the end for size and speed reasons.
>
> ...where one should keep in mind, that this is one of those areas
> where GCC is still at least a decade behind the best compilers in the
> industry. Those optimizations, that cut themselves off, would work
> just fine on regions instead of whole functions. Another thing that
> might be helpful, is partial inlining (e.g.
> http://www.csc.villanova.edu/~tway/publications/wayPDPTA02.pdf
> although I suspect that for the code from Toon only whole-function
> inlining is useful...?).

Indeed.  For Tom it shouldn't really matter whether the functions
are inlined or not - aliasing shouldn't be an issue here due to
Fortran semantics.  Maybe it's alignment ...

With IPA-PTA aliasing shouldn't be an issue for C or C++ either,
the alignment issue remains though.

> Zadeck had code for structural analysis a couple of years ago. I don't
> think anyone has seriously worked with that to experiment with region
> based compilation. But I guess it will be the Next Big Challange for
> GCC, after LTO.

Yeah, I have some patches for the SSA propagators, but those are
not the problematic ones with respect to compile-time.  Value-numbering
cut's itself off at a certain SCC size, which I suspect cannot be easily
fixed with regions (regions probably can't really cross SCCs).

I don't even remember which other passes have this kind of cut-offs ..

Richard.

> Ciao!
> Steven
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-14 22:13           ` Richard Guenther
@ 2009-11-15 10:31             ` Steven Bosscher
  2009-11-15 14:07               ` Toon Moene
  0 siblings, 1 reply; 18+ messages in thread
From: Steven Bosscher @ 2009-11-15 10:31 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Toon Moene, Jan Hubicka, Jan Hubicka, gcc mailing list

On Sat, Nov 14, 2009 at 11:12 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> I don't even remember which other passes have this kind of cut-offs ..

At least CPROP, LCM-PRE, and HOIST (i.e. all passes in gcse.c), and
variable tracking.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-15 10:31             ` Steven Bosscher
@ 2009-11-15 14:07               ` Toon Moene
  2009-11-15 14:44                 ` Richard Guenther
  0 siblings, 1 reply; 18+ messages in thread
From: Toon Moene @ 2009-11-15 14:07 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Richard Guenther, Jan Hubicka, Jan Hubicka, gcc mailing list

Steven Bosscher wrote:

> On Sat, Nov 14, 2009 at 11:12 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:

>> I don't even remember which other passes have this kind of cut-offs ..
> 
> At least CPROP, LCM-PRE, and HOIST (i.e. all passes in gcse.c), and
> variable tracking.

Are they covered by a --param ?  At least that way I could teach them to 
go on indefinitely ...

[ The practice with binaries (i.e., the results of builds up until
   binaries are produced) in my world is: compile once (no matter how
   much time it takes) and run about 18 hours of a 24 hour period each,
   until the next compilation - about a year later.

   So it doesn't really matter how much time a compile/link step takes. ]

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-15 14:07               ` Toon Moene
@ 2009-11-15 14:44                 ` Richard Guenther
  2009-11-15 14:58                   ` Toon Moene
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Guenther @ 2009-11-15 14:44 UTC (permalink / raw)
  To: Toon Moene; +Cc: Steven Bosscher, Jan Hubicka, Jan Hubicka, gcc mailing list

On Sun, Nov 15, 2009 at 8:07 AM, Toon Moene <toon@moene.org> wrote:
> Steven Bosscher wrote:
>
>> On Sat, Nov 14, 2009 at 11:12 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>
>>> I don't even remember which other passes have this kind of cut-offs ..
>>
>> At least CPROP, LCM-PRE, and HOIST (i.e. all passes in gcse.c), and
>> variable tracking.
>
> Are they covered by a --param ?  At least that way I could teach them to go
> on indefinitely ...

I think most of them are.  Maybe we should diagnose the cases where
we hit these limits.

Richard.

> [ The practice with binaries (i.e., the results of builds up until
>  binaries are produced) in my world is: compile once (no matter how
>  much time it takes) and run about 18 hours of a 24 hour period each,
>  until the next compilation - about a year later.
>
>  So it doesn't really matter how much time a compile/link step takes. ]
>
> --
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> At home: http://moene.org/~toon/
> Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-15 14:44                 ` Richard Guenther
@ 2009-11-15 14:58                   ` Toon Moene
  2009-11-15 20:01                     ` Tim Prince
  0 siblings, 1 reply; 18+ messages in thread
From: Toon Moene @ 2009-11-15 14:58 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Steven Bosscher, Jan Hubicka, Jan Hubicka, gcc mailing list

Richard Guenther wrote:

> On Sun, Nov 15, 2009 at 8:07 AM, Toon Moene <toon@moene.org> wrote:

>> Steven Bosscher wrote:

>>> At least CPROP, LCM-PRE, and HOIST (i.e. all passes in gcse.c), and
>>> variable tracking.

>> Are they covered by a --param ?  At least that way I could teach them to go
>> on indefinitely ...

> I think most of them are.  Maybe we should diagnose the cases where
> we hit these limits.

That would be a good idea.  One other compiler I work with frequently 
(the Intel Fortran compiler) does just that.  However, either it doesn't 
have or their marketing department doesn't want you to know about knobs 
to tweak these decisions :-)

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-15 14:58                   ` Toon Moene
@ 2009-11-15 20:01                     ` Tim Prince
  0 siblings, 0 replies; 18+ messages in thread
From: Tim Prince @ 2009-11-15 20:01 UTC (permalink / raw)
  To: Toon Moene
  Cc: Richard Guenther, Steven Bosscher, Jan Hubicka, Jan Hubicka,
	gcc mailing list

Toon Moene wrote:
> Richard Guenther wrote:
> 
>> On Sun, Nov 15, 2009 at 8:07 AM, Toon Moene <toon@moene.org> wrote:
> 
>>> Steven Bosscher wrote:
> 
>>>> At least CPROP, LCM-PRE, and HOIST (i.e. all passes in gcse.c), and
>>>> variable tracking.
> 
>>> Are they covered by a --param ?  At least that way I could teach them 
>>> to go
>>> on indefinitely ...
> 
>> I think most of them are.  Maybe we should diagnose the cases where
>> we hit these limits.
> 
> That would be a good idea.  One other compiler I work with frequently 
> (the Intel Fortran compiler) does just that.  However, either it doesn't 
> have or their marketing department doesn't want you to know about knobs 
> to tweak these decisions :-)
> 
Both gfortran and ifort have a much longer list of adjustable limits on 
in-lining than most customers are willing to study or test.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-11-15 20:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-04 19:19 Whole program optimization and functions-only-called-once Toon Moene
2009-11-04 19:26 ` Richard Guenther
2009-11-04 21:20   ` Toon Moene
2009-11-04 21:30     ` Andrew Pinski
2009-11-04 21:50       ` Richard Guenther
2009-11-12 16:46       ` Jan Hubicka
2009-11-12 21:41         ` Jan Hubicka
2009-11-12 16:16   ` Jan Hubicka
2009-11-14 12:55     ` Toon Moene
2009-11-14 19:52       ` Richard Guenther
2009-11-14 20:14         ` Steven Bosscher
2009-11-14 22:13           ` Richard Guenther
2009-11-15 10:31             ` Steven Bosscher
2009-11-15 14:07               ` Toon Moene
2009-11-15 14:44                 ` Richard Guenther
2009-11-15 14:58                   ` Toon Moene
2009-11-15 20:01                     ` Tim Prince
2009-11-14 22:05         ` Toon Moene

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).