* Whole program optimization and functions-only-called-once. @ 2009-11-04 19:19 Toon Moene 2009-11-04 19:26 ` Richard Guenther 0 siblings, 1 reply; 18+ messages in thread From: Toon Moene @ 2009-11-04 19:19 UTC (permalink / raw) To: Jan Hubicka; +Cc: gcc mailing list Jan, I had some time to study the example I sent you a couple of weeks ago. According to visible inspection of the source code, there are 5 functions (subroutines in Fortran parlance) that are called once: MAIN calls HLPROG calls GEMINI calls SL2TIM calls PHCALL calls PHTASK I.e., the last five should be candidates for inlining of "functions only called once". However, ccrPOljB.o.047i.inline says: Deciding on functions called once: Considering gemini_.clone.1 size 11443. Called once from hlprog 462 insns. Inlined into hlprog which now has 10728 size for a net change of -12620 size. Considering hlprog size 10728. Called once from main 7 insns. Not inlined because --param large-function-growth limit reached. Inlined 1 calls, eliminated 1 functions, size 45477 turned to 32857 size. The dump option -fdump-ipa-all also gives me the call graph, of which I copy here the relevant part: phcall_.clone.3/11(-1) @0x7fd198c16400 (clone of phcall/33) availability:local 8281 time, 972 benefit 1351 size, 291 benefit 984 bytes stack usage reachable local finalized inlinable called by: sl2tim/49 (0.44 per call) sl2tim_.clone.0/16 (0.44 per call) phtask_.clone.2/12(-1) @0x7fd198c16500 (clone of phtask/41) availability:local 26416 time, 4268 benefit 4541 size, 880 benefit 480 bytes stack usage reachable local finalized inlinable called by: phcall_.clone.3/11 (3.52 per call) phcall/33 (3.52 per call) sl2tim_.clone.0/16(-1) @0x7fd198c16900 (clone of sl2tim/49) availability:local 207312 time, 26617 benefit 5169 size, 941 benefit 3856 bytes stack usage reachable local finalized inlinable called by: gemini_.clone.1/40 (1.00 per call) gemini/0 (1.00 per call) gemini_.clone.1phtask/40(-1) @0x7fd198c35000 (inline copy in hlprog/17) (clone of gemini/0) availability:local 147324 time, 2770 benefit 11443 size, 1177 benefit 11635 bytes stack usage reachable local finalized inlinable called by: hlprog/17 (3.57 per call) (inlined) phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268 benefit 4541 size, 880 benefit 480 bytes stack usage reachable body local finalized inlinable called by: phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972 benefit 1351 size, 291 benefit 984 bytes stack usage reachable body local finalized inlinable called by: hlprog/17(-1) @0x7fd198c16a00 availability:local 560 time, 10 benefit (516762 after inlining) 462 size, 1 benefit (10728 after inlining) 4216 bytes stack usage 15851 bytes after inlining reachable body local finalized inlinable called by: main/29 (1.00 per call) sl2tim/49(-1) @0x7fd198c35900 availability:local 207312 time, 26617 benefit 5169 size, 941 benefit 3856 bytes stack usage reachable body local finalized inlinable called by: gemini/0(-1) @0x7fd198bef800 availability:local 147324 time, 2770 benefit 11443 size, 1177 benefit 11635 bytes stack usage reachable body local finalized inlinable called by: So if we have to believe this summary, HLPROG is called by MAIN, but is not suitable for inlining (I can live with that). GEMINI is not called, but GEMINI.clone is (by HLPROG) and is inlined. SL2TIM is not called, but SL2TIM.clone is called by GEMINI and GEMINI.clone; because it is called twice, it is not considered a function-only-called-once. PHCALL is not called, but PHCALL.clone is called by SL2TIM and SL2TIM.clone; because it is called twice, it is not considered a function-only-called-once. PHTASK is not called, but PHTASK.clone is called by PHCALL and PHCALL.clone; because it is called twice, it is not considered a function-only-called-once. I don't think this is really what we want with functions-only-called-once: If only the .clone version of a function is used, than a function that's only called once *inside this clone* is a function-only-called-once. I hope this analysis helps, -- Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-04 19:19 Whole program optimization and functions-only-called-once Toon Moene @ 2009-11-04 19:26 ` Richard Guenther 2009-11-04 21:20 ` Toon Moene 2009-11-12 16:16 ` Jan Hubicka 0 siblings, 2 replies; 18+ messages in thread From: Richard Guenther @ 2009-11-04 19:26 UTC (permalink / raw) To: Toon Moene; +Cc: Jan Hubicka, gcc mailing list On Wed, Nov 4, 2009 at 8:19 PM, Toon Moene <toon@moene.org> wrote: > Jan, > > I had some time to study the example I sent you a couple of weeks ago. > > According to visible inspection of the source code, there are 5 functions > (subroutines in Fortran parlance) that are called once: > > MAIN calls > HLPROG calls > GEMINI calls > SL2TIM calls > PHCALL calls > PHTASK > > I.e., the last five should be candidates for inlining of "functions only > called once". > > However, ccrPOljB.o.047i.inline says: > > Deciding on functions called once: > > Considering gemini_.clone.1 size 11443. > Called once from hlprog 462 insns. > Inlined into hlprog which now has 10728 size for a net change of -12620 > size. > > Considering hlprog size 10728. > Called once from main 7 insns. > Not inlined because --param large-function-growth limit reached. > > Inlined 1 calls, eliminated 1 functions, size 45477 turned to 32857 size. > > The dump option -fdump-ipa-all also gives me the call graph, of which I copy > here the relevant part: > > phcall_.clone.3/11(-1) @0x7fd198c16400 (clone of phcall/33) > availability:local 8281 time, 972 benefit 1351 size, 291 benefit 984 bytes > stack usage reachable local finalized inlinable > called by: sl2tim/49 (0.44 per call) sl2tim_.clone.0/16 (0.44 per call) > phtask_.clone.2/12(-1) @0x7fd198c16500 (clone of phtask/41) > availability:local 26416 time, 4268 benefit 4541 size, 880 benefit 480 bytes > stack usage reachable local finalized inlinable > called by: phcall_.clone.3/11 (3.52 per call) phcall/33 (3.52 per call) > sl2tim_.clone.0/16(-1) @0x7fd198c16900 (clone of sl2tim/49) > availability:local 207312 time, 26617 benefit 5169 size, 941 benefit 3856 > bytes stack usage reachable local finalized inlinable > called by: gemini_.clone.1/40 (1.00 per call) gemini/0 (1.00 per call) > gemini_.clone.1phtask/40(-1) @0x7fd198c35000 (inline copy in hlprog/17) > (clone of gemini/0) availability:local 147324 time, 2770 benefit 11443 size, > 1177 benefit 11635 bytes stack usage reachable local finalized inlinable > called by: hlprog/17 (3.57 per call) (inlined) > phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268 benefit > 4541 size, 880 benefit 480 bytes stack usage reachable body local finalized > inlinable > called by: > phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972 benefit 1351 > size, 291 benefit 984 bytes stack usage reachable body local finalized > inlinable > called by: > hlprog/17(-1) @0x7fd198c16a00 availability:local 560 time, 10 benefit > (516762 after inlining) 462 size, 1 benefit (10728 after inlining) 4216 > bytes stack usage 15851 bytes after inlining reachable body local finalized > inlinable > called by: main/29 (1.00 per call) > sl2tim/49(-1) @0x7fd198c35900 availability:local 207312 time, 26617 benefit > 5169 size, 941 benefit 3856 bytes stack usage reachable body local finalized > inlinable > called by: > gemini/0(-1) @0x7fd198bef800 availability:local 147324 time, 2770 benefit > 11443 size, 1177 benefit 11635 bytes stack usage reachable body local > finalized inlinable > called by: > > So if we have to believe this summary, > > HLPROG is called by MAIN, but is not suitable for inlining (I can live with > that). > GEMINI is not called, but GEMINI.clone is (by HLPROG) and is inlined. > SL2TIM is not called, but SL2TIM.clone is called by GEMINI and GEMINI.clone; > because it is called twice, it is not considered a > function-only-called-once. > PHCALL is not called, but PHCALL.clone is called by SL2TIM and SL2TIM.clone; > because it is called twice, it is not considered a > function-only-called-once. > PHTASK is not called, but PHTASK.clone is called by PHCALL and PHCALL.clone; > because it is called twice, it is not considered a > function-only-called-once. > > I don't think this is really what we want with functions-only-called-once: > If only the .clone version of a function is used, than a function that's > only called once *inside this clone* is a function-only-called-once. > > I hope this analysis helps, I think the underlying issue is phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268 benefit 4541 size, 880 benefit 480 bytes stack usage reachable body local finalized inlinable called by: phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972 benefit 1351 size, 291 benefit 984 bytes stack usage reachable body local finalized inlinable called by: that these are not called but still reachable (they should not be reachable anymore, instead the clones are now reachable). I think there already is a bug about cloning not updating cgraph reachability and not reclaiming nodes after IPA transform application. Richard. > -- > Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > At home: http://moene.org/~toon/ > Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-04 19:26 ` Richard Guenther @ 2009-11-04 21:20 ` Toon Moene 2009-11-04 21:30 ` Andrew Pinski 2009-11-12 16:16 ` Jan Hubicka 1 sibling, 1 reply; 18+ messages in thread From: Toon Moene @ 2009-11-04 21:20 UTC (permalink / raw) To: Richard Guenther; +Cc: Jan Hubicka, gcc mailing list Richard Guenther wrote: > I think the underlying issue is > > phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268 > benefit 4541 size, 880 benefit 480 bytes stack usage reachable body > local finalized inlinable > called by: > phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972 > benefit 1351 size, 291 benefit 984 bytes stack usage reachable body > local finalized inlinable > called by: > > that these are not called but still reachable (they should not be reachable > anymore, instead the clones are now reachable). I think there already is > a bug about cloning not updating cgraph reachability and not reclaiming > nodes after IPA transform application. You don't happen to recall the bug number ? The last time I did this sort of optimization was in 1992. f2c (the Fortran-to-C compiler) gave me C equivalents of all Fortran code in the forecasting executable. I spent a rainy Sunday afternoon to paste them into one giant source file, order them correctly (all called subroutines first) and then slap "static inline" on them. Subsequently, I compiled the (30,000 line) C file with gcc -O3. The resulting executable was about 10 % faster than the original (which was also compiled by f2c - g77 didn't exist at that time). So my hopes on this optimization (when done right) are quite high :-) -- Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-04 21:20 ` Toon Moene @ 2009-11-04 21:30 ` Andrew Pinski 2009-11-04 21:50 ` Richard Guenther 2009-11-12 16:46 ` Jan Hubicka 0 siblings, 2 replies; 18+ messages in thread From: Andrew Pinski @ 2009-11-04 21:30 UTC (permalink / raw) To: Toon Moene; +Cc: Richard Guenther, Jan Hubicka, gcc mailing list On Wed, Nov 4, 2009 at 1:20 PM, Toon Moene <toon@moene.org> wrote: > You don't happen to recall the bug number ? It might be related to PR 41735 which I noticed when looking at the generated assembly and trying to compare 4.5 to 4.4. Thanks, Andrew Pinski ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-04 21:30 ` Andrew Pinski @ 2009-11-04 21:50 ` Richard Guenther 2009-11-12 16:46 ` Jan Hubicka 1 sibling, 0 replies; 18+ messages in thread From: Richard Guenther @ 2009-11-04 21:50 UTC (permalink / raw) To: Andrew Pinski; +Cc: Toon Moene, Jan Hubicka, gcc mailing list On Wed, Nov 4, 2009 at 10:30 PM, Andrew Pinski <pinskia@gmail.com> wrote: > On Wed, Nov 4, 2009 at 1:20 PM, Toon Moene <toon@moene.org> wrote: >> You don't happen to recall the bug number ? > > It might be related to PR 41735 which I noticed when looking at the > generated assembly and trying to compare 4.5 to 4.4. Yes indeed. Honza may be able to explain why it is like it is and if it's easy to fix. He's on vacation though ;) Richard. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-04 21:30 ` Andrew Pinski 2009-11-04 21:50 ` Richard Guenther @ 2009-11-12 16:46 ` Jan Hubicka 2009-11-12 21:41 ` Jan Hubicka 1 sibling, 1 reply; 18+ messages in thread From: Jan Hubicka @ 2009-11-12 16:46 UTC (permalink / raw) To: Andrew Pinski; +Cc: Toon Moene, Richard Guenther, Jan Hubicka, gcc mailing list > On Wed, Nov 4, 2009 at 1:20 PM, Toon Moene <toon@moene.org> wrote: > > You don't happen to recall the bug number ? > > It might be related to PR 41735 which I noticed when looking at the > generated assembly and trying to compare 4.5 to 4.4. I fixed this bug today, so it might help. But it is related to COMDAT functions and I don't think fortran actually produce them. We do reachability after clonning, there must be actually reason to keep the clone, so we need to debug it. I will try to look into it tomorrow if other new issues won't stop me (I just got profile feedback working with LTO, it might be also interesting if it helps your app). Honza > > Thanks, > Andrew Pinski ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-12 16:46 ` Jan Hubicka @ 2009-11-12 21:41 ` Jan Hubicka 0 siblings, 0 replies; 18+ messages in thread From: Jan Hubicka @ 2009-11-12 21:41 UTC (permalink / raw) To: Jan Hubicka Cc: Andrew Pinski, Toon Moene, Richard Guenther, Jan Hubicka, gcc mailing list Hi, this is WIP patch to deal with the unreachable clones problem. It basically renders the clones as unanalyzed cgraph nodes (but with still body in) so IPA passes don't see them. Honza Index: cgraph.c =================================================================== --- cgraph.c (revision 154127) +++ cgraph.c (working copy) @@ -1132,7 +1132,7 @@ cgraph_release_function_body (struct cgr pop_cfun(); gimple_set_body (node->decl, NULL); VEC_free (ipa_opt_pass, heap, - DECL_STRUCT_FUNCTION (node->decl)->ipa_transforms_to_apply); + node->ipa_transforms_to_apply); /* Struct function hangs a lot of data that would leak if we didn't removed all pointers to it. */ ggc_free (DECL_STRUCT_FUNCTION (node->decl)); @@ -1159,6 +1159,8 @@ cgraph_remove_node (struct cgraph_node * cgraph_call_node_removal_hooks (node); cgraph_node_remove_callers (node); cgraph_node_remove_callees (node); + VEC_free (ipa_opt_pass, heap, + node->ipa_transforms_to_apply); /* Incremental inlining access removed nodes stored in the postorder list. */ Index: cgraph.h =================================================================== --- cgraph.h (revision 154127) +++ cgraph.h (working copy) @@ -190,6 +190,11 @@ struct GTY((chain_next ("%h.next"), chai PTR GTY ((skip)) aux; + /* Interprocedural passes scheduled to have their transform functions + applied next time we execute local pass on them. We maintain it + per-function in order to allow IPA passes to introduce new functions. */ + VEC(ipa_opt_pass,heap) * GTY((skip)) ipa_transforms_to_apply; + struct cgraph_local_info local; struct cgraph_global_info global; struct cgraph_rtl_info rtl; @@ -206,16 +211,24 @@ struct GTY((chain_next ("%h.next"), chai number of cfg nodes with -fprofile-generate and -fprofile-use */ int pid; - /* Set when function must be output - it is externally visible - or its address is taken. */ + /* Set when function must be output for some reason. The primary + use of this flag is to mark functions needed to be output for + non-standard reason. Functions that are externally visible + or reachable from functions needed to be output are marked + by specialized flags. */ unsigned needed : 1; - /* Set when function has address taken. */ + /* Set when function has address taken. + In current implementation it imply needed flag. */ unsigned address_taken : 1; /* Set when decl is an abstract function pointed to by the ABSTRACT_DECL_ORIGIN of a reachable function. */ unsigned abstract_and_needed : 1; /* Set when function is reachable by call from other function - that is either reachable or needed. */ + that is either reachable or needed. + This flag is computed at original cgraph construction and then + updated in cgraph_remove_unreachable_nodes. Note that after + cgraph_remove_unreachable_nodes cgraph still can contain unreachable + nodes when they are needed for virtual clone instantiation. */ unsigned reachable : 1; /* Set once the function is lowered (i.e. its CFG is built). */ unsigned lowered : 1; Index: cgraphunit.c =================================================================== --- cgraphunit.c (revision 154127) +++ cgraphunit.c (working copy) @@ -699,7 +699,7 @@ verify_cgraph_node (struct cgraph_node * error_found = true; } - if (node->analyzed && gimple_has_body_p (node->decl) + if (node->analyzed && node->reachable && gimple_has_body_p (node->decl) && !TREE_ASM_WRITTEN (node->decl) && (!DECL_EXTERNAL (node->decl) || node->global.inlined_to) && !flag_wpa) @@ -1777,8 +1777,8 @@ save_inline_function_body (struct cgraph TREE_PUBLIC (first_clone->decl) = 0; DECL_COMDAT (first_clone->decl) = 0; VEC_free (ipa_opt_pass, heap, - DECL_STRUCT_FUNCTION (first_clone->decl)->ipa_transforms_to_apply); - DECL_STRUCT_FUNCTION (first_clone->decl)->ipa_transforms_to_apply = NULL; + first_clone->ipa_transforms_to_apply); + first_clone->ipa_transforms_to_apply = NULL; #ifdef ENABLE_CHECKING verify_cgraph_node (first_clone); @@ -1810,6 +1810,8 @@ cgraph_materialize_clone (struct cgraph_ node->clone_of->clones = node->next_sibling_clone; node->next_sibling_clone = NULL; node->prev_sibling_clone = NULL; + if (!node->clone_of->analyzed && !node->clone_of->clones) + cgraph_remove_node (node->clone_of); node->clone_of = NULL; bitmap_obstack_release (NULL); } Index: ipa-inline.c =================================================================== --- ipa-inline.c (revision 154127) +++ ipa-inline.c (working copy) @@ -1120,7 +1120,7 @@ cgraph_decide_inlining (void) max_count = 0; max_benefit = 0; for (node = cgraph_nodes; node; node = node->next) - if (node->analyzed) + if (node->reachable) { struct cgraph_edge *e; Index: lto-streamer-in.c =================================================================== --- lto-streamer-in.c (revision 154127) +++ lto-streamer-in.c (working copy) @@ -1476,6 +1476,7 @@ lto_read_body (struct lto_file_decl_data /* Restore decl state */ file_data->current_decl_state = file_data->global_decl_state; +#if 0 /* FIXME: ipa_transforms_to_apply holds list of passes that have optimization summaries computed and needs to apply changes. At the moment WHOPR only supports inlining, so we can push it here by hand. In future we need to stream @@ -1485,6 +1486,7 @@ lto_read_body (struct lto_file_decl_data VEC_safe_push (ipa_opt_pass, heap, cfun->ipa_transforms_to_apply, (ipa_opt_pass)&pass_ipa_inline); +#endif pop_cfun (); } else Index: c-decl.c =================================================================== --- c-decl.c (revision 154127) +++ c-decl.c (working copy) @@ -4497,6 +4497,7 @@ build_compound_literal (location_t loc, set_compound_literal_name (decl); DECL_DEFER_OUTPUT (decl) = 1; DECL_COMDAT (decl) = 1; + TREE_PUBLIC (decl) = 1; DECL_ARTIFICIAL (decl) = 1; DECL_IGNORED_P (decl) = 1; pushdecl (decl); Index: function.h =================================================================== --- function.h (revision 154127) +++ function.h (working copy) @@ -522,11 +522,6 @@ struct GTY(()) function { unsigned int curr_properties; unsigned int last_verified; - /* Interprocedural passes scheduled to have their transform functions - applied next time we execute local pass on them. We maintain it - per-function in order to allow IPA passes to introduce new functions. */ - VEC(ipa_opt_pass,heap) * GTY((skip)) ipa_transforms_to_apply; - /* Non-null if the function does something that would prevent it from being copied; this applies to both versioning and inlining. Set to a string describing the reason for failure. */ Index: ipa.c =================================================================== --- ipa.c (revision 154128) +++ ipa.c (working copy) @@ -121,6 +121,7 @@ bool cgraph_remove_unreachable_nodes (bool before_inlining_p, FILE *file) { struct cgraph_node *first = (struct cgraph_node *) (void *) 1; + struct cgraph_node *processed = (struct cgraph_node *) (void *) 2; struct cgraph_node *node, *next; bool changed = false; @@ -142,9 +143,13 @@ cgraph_remove_unreachable_nodes (bool be gcc_assert (!node->global.inlined_to); node->aux = first; first = node; + node->reachable = true; } else - gcc_assert (!node->aux); + { + gcc_assert (!node->aux); + node->reachable = false; + } /* Perform reachability analysis. As a special case do not consider extern inline functions not inlined as live because we won't output @@ -154,17 +159,26 @@ cgraph_remove_unreachable_nodes (bool be struct cgraph_edge *e; node = first; first = (struct cgraph_node *) first->aux; + node->aux = processed; - for (e = node->callees; e; e = e->next_callee) - if (!e->callee->aux - && node->analyzed - && (!e->inline_failed || !e->callee->analyzed - || (!DECL_EXTERNAL (e->callee->decl)) - || before_inlining_p)) - { - e->callee->aux = first; - first = e->callee; - } + if (node->reachable) + for (e = node->callees; e; e = e->next_callee) + if (!e->callee->reachable + && node->analyzed + && (!e->inline_failed || !e->callee->analyzed + || (!DECL_EXTERNAL (e->callee->decl)) + || before_inlining_p)) + { + bool prev_reachable = e->callee->reachable; + e->callee->reachable |= node->reachable; + if (!e->callee->aux + || (e->callee->aux == processed + && prev_reachable != e->callee->reachable)) + { + e->callee->aux = first; + first = e->callee; + } + } while (node->clone_of && !node->clone_of->aux && !gimple_has_body_p (node->decl)) { node = node->clone_of; @@ -184,13 +198,18 @@ cgraph_remove_unreachable_nodes (bool be for (node = cgraph_nodes; node; node = next) { next = node->next; + if (node->aux && !node->reachable) + { + cgraph_node_remove_callees (node); + node->analyzed = false; + node->local.inlinable = false; + } if (!node->aux) { node->global.inlined_to = NULL; if (file) fprintf (file, " %s", cgraph_node_name (node)); - if (!node->analyzed || !DECL_EXTERNAL (node->decl) - || before_inlining_p) + if (!node->analyzed || !DECL_EXTERNAL (node->decl) || before_inlining_p) cgraph_remove_node (node); else { @@ -204,21 +223,16 @@ cgraph_remove_unreachable_nodes (bool be /* If so, we need to keep node in the callgraph. */ if (e || node->needed) { - struct cgraph_node *clone; - - /* If there are still clones, we must keep body around. - Otherwise we can just remove the body but keep the clone. */ - for (clone = node->clones; clone; - clone = clone->next_sibling_clone) - if (clone->aux) - break; - if (!clone) - { - cgraph_release_function_body (node); - cgraph_node_remove_callees (node); - node->analyzed = false; - node->local.inlinable = false; - } + cgraph_release_function_body (node); + cgraph_node_remove_callees (node); + node->analyzed = false; + node->local.inlinable = false; + if (node->prev_sibling_clone) + node->prev_sibling_clone->next_sibling_clone = node->next_sibling_clone; + else if (node->clone_of) + node->clone_of->clones = node->next_sibling_clone; + if (node->next_sibling_clone) + node->next_sibling_clone->prev_sibling_clone = node->prev_sibling_clone; } else cgraph_remove_node (node); @@ -318,7 +332,7 @@ function_and_variable_visibility (bool w { if (!vnode->finalized) continue; - gcc_assert ((!DECL_WEAK (vnode->decl) && !DECL_COMMON (vnode->decl)) + gcc_assert ((!DECL_WEAK (vnode->decl) && !DECL_COMMON (vnode->decl) && !DECL_COMDAT (vnode->decl)) || TREE_PUBLIC (vnode->decl) || DECL_EXTERNAL (vnode->decl)); if (vnode->needed && (DECL_COMDAT (vnode->decl) || TREE_PUBLIC (vnode->decl)) Index: tree-inline.c =================================================================== --- tree-inline.c (revision 154127) +++ tree-inline.c (working copy) @@ -1983,9 +1983,6 @@ initialize_cfun (tree new_fndecl, tree c cfun->function_end_locus = src_cfun->function_end_locus; cfun->curr_properties = src_cfun->curr_properties; cfun->last_verified = src_cfun->last_verified; - if (src_cfun->ipa_transforms_to_apply) - cfun->ipa_transforms_to_apply = VEC_copy (ipa_opt_pass, heap, - src_cfun->ipa_transforms_to_apply); cfun->va_list_gpr_size = src_cfun->va_list_gpr_size; cfun->va_list_fpr_size = src_cfun->va_list_fpr_size; cfun->function_frequency = src_cfun->function_frequency; @@ -3822,6 +3819,10 @@ expand_call_inline (basic_block bb, gimp (*debug_hooks->outlining_inline_function) (cg_edge->callee->decl); /* Update callgraph if needed. */ + if (cg_edge->callee->clone_of + && !cg_edge->callee->clone_of->next_sibling_clone + && !cg_edge->callee->analyzed) + cgraph_remove_node (cg_edge->callee); cgraph_remove_node (cg_edge->callee); id->block = NULL_TREE; @@ -4848,6 +4849,19 @@ tree_function_versioning (tree old_decl, id.src_node = old_version_node; id.dst_node = new_version_node; id.src_cfun = DECL_STRUCT_FUNCTION (old_decl); + if (id.src_node->ipa_transforms_to_apply) + { + VEC(ipa_opt_pass,heap) * old_transforms_to_apply = id.dst_node->ipa_transforms_to_apply; + unsigned int i; + + id.dst_node->ipa_transforms_to_apply = VEC_copy (ipa_opt_pass, heap, + id.src_node->ipa_transforms_to_apply); + for (i = 0; i < VEC_length (ipa_opt_pass, old_transforms_to_apply); i++) + VEC_safe_push (ipa_opt_pass, heap, id.dst_node->ipa_transforms_to_apply, + VEC_index (ipa_opt_pass, + old_transforms_to_apply, + i)); + } id.copy_decl = copy_decl_no_change; id.transform_call_graph_edges Index: passes.c =================================================================== --- passes.c (revision 154127) +++ passes.c (working copy) @@ -1376,15 +1376,6 @@ update_properties_after_pass (void *data & ~pass->properties_destroyed; } -/* Schedule IPA transform pass DATA for CFUN. */ - -static void -add_ipa_transform_pass (void *data) -{ - struct ipa_opt_pass_d *ipa_pass = (struct ipa_opt_pass_d *) data; - VEC_safe_push (ipa_opt_pass, heap, cfun->ipa_transforms_to_apply, ipa_pass); -} - /* Execute summary generation for all of the passes in IPA_PASS. */ void @@ -1464,19 +1455,22 @@ execute_one_ipa_transform_pass (struct c void execute_all_ipa_transforms (void) { - if (cfun && cfun->ipa_transforms_to_apply) + struct cgraph_node *node; + if (!cfun) + return; + node = cgraph_node (current_function_decl); + if (node->ipa_transforms_to_apply) { unsigned int i; - struct cgraph_node *node = cgraph_node (current_function_decl); - for (i = 0; i < VEC_length (ipa_opt_pass, cfun->ipa_transforms_to_apply); + for (i = 0; i < VEC_length (ipa_opt_pass, node->ipa_transforms_to_apply); i++) execute_one_ipa_transform_pass (node, VEC_index (ipa_opt_pass, - cfun->ipa_transforms_to_apply, + node->ipa_transforms_to_apply, i)); - VEC_free (ipa_opt_pass, heap, cfun->ipa_transforms_to_apply); - cfun->ipa_transforms_to_apply = NULL; + VEC_free (ipa_opt_pass, heap, node->ipa_transforms_to_apply); + node->ipa_transforms_to_apply = NULL; } } @@ -1551,7 +1545,13 @@ execute_one_pass (struct opt_pass *pass) execute_todo (todo_after | pass->todo_flags_finish); verify_interpass_invariants (); if (pass->type == IPA_PASS) - do_per_function (add_ipa_transform_pass, pass); + { + struct cgraph_node *node; + for (node = cgraph_nodes; node; node = node->next) + if (node->analyzed) + VEC_safe_push (ipa_opt_pass, heap, node->ipa_transforms_to_apply, + (struct ipa_opt_pass_d *)pass); + } if (!current_function_decl) cgraph_process_new_functions (); ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-04 19:26 ` Richard Guenther 2009-11-04 21:20 ` Toon Moene @ 2009-11-12 16:16 ` Jan Hubicka 2009-11-14 12:55 ` Toon Moene 1 sibling, 1 reply; 18+ messages in thread From: Jan Hubicka @ 2009-11-12 16:16 UTC (permalink / raw) To: Richard Guenther; +Cc: Toon Moene, Jan Hubicka, gcc mailing list > On Wed, Nov 4, 2009 at 8:19 PM, Toon Moene <toon@moene.org> wrote: > > Jan, > > > > I had some time to study the example I sent you a couple of weeks ago. > > > > According to visible inspection of the source code, there are 5 functions > > (subroutines in Fortran parlance) that are called once: > > > > MAIN Â calls > > HLPROG calls > > GEMINI calls > > SL2TIM calls > > PHCALL calls > > PHTASK > > > > I.e., the last five should be candidates for inlining of "functions only > > called once". > > > > However, ccrPOljB.o.047i.inline says: > > > > Deciding on functions called once: > > > > Considering gemini_.clone.1 size 11443. > > Â Called once from hlprog 462 insns. > > Â Inlined into hlprog which now has 10728 size for a net change of -12620 > > size. > > > > Considering hlprog size 10728. > > Â Called once from main 7 insns. > > Â Not inlined because --param large-function-growth limit reached. > > > > Inlined 1 calls, eliminated 1 functions, size 45477 turned to 32857 size. > > > > The dump option -fdump-ipa-all also gives me the call graph, of which I copy > > here the relevant part: > > > > phcall_.clone.3/11(-1) @0x7fd198c16400 (clone of phcall/33) > > availability:local 8281 time, 972 benefit 1351 size, 291 benefit 984 bytes > > stack usage reachable local finalized inlinable > > Â called by: sl2tim/49 (0.44 per call) sl2tim_.clone.0/16 (0.44 per call) > > phtask_.clone.2/12(-1) @0x7fd198c16500 (clone of phtask/41) > > availability:local 26416 time, 4268 benefit 4541 size, 880 benefit 480 bytes > > stack usage reachable local finalized inlinable > > Â called by: phcall_.clone.3/11 (3.52 per call) phcall/33 (3.52 per call) > > sl2tim_.clone.0/16(-1) @0x7fd198c16900 (clone of sl2tim/49) > > availability:local 207312 time, 26617 benefit 5169 size, 941 benefit 3856 > > bytes stack usage reachable local finalized inlinable > > Â called by: gemini_.clone.1/40 (1.00 per call) gemini/0 (1.00 per call) > > gemini_.clone.1phtask/40(-1) @0x7fd198c35000 (inline copy in hlprog/17) > > (clone of gemini/0) availability:local 147324 time, 2770 benefit 11443 size, > > 1177 benefit 11635 bytes stack usage reachable local finalized inlinable > > Â called by: hlprog/17 (3.57 per call) (inlined) > > phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268 benefit > > 4541 size, 880 benefit 480 bytes stack usage reachable body local finalized > > inlinable > > Â called by: > > phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972 benefit 1351 > > size, 291 benefit 984 bytes stack usage reachable body local finalized > > inlinable > > Â called by: > > hlprog/17(-1) @0x7fd198c16a00 availability:local 560 time, 10 benefit > > (516762 after inlining) 462 size, 1 benefit (10728 after inlining) 4216 > > bytes stack usage 15851 bytes after inlining reachable body local finalized > > inlinable > > Â called by: main/29 (1.00 per call) > > sl2tim/49(-1) @0x7fd198c35900 availability:local 207312 time, 26617 benefit > > 5169 size, 941 benefit 3856 bytes stack usage reachable body local finalized > > inlinable > > Â called by: > > gemini/0(-1) @0x7fd198bef800 availability:local 147324 time, 2770 benefit > > 11443 size, 1177 benefit 11635 bytes stack usage reachable body local > > finalized inlinable > > Â called by: > > > > So if we have to believe this summary, > > > > HLPROG is called by MAIN, but is not suitable for inlining (I can live with > > that). > > GEMINI is not called, but GEMINI.clone is (by HLPROG) and is inlined. > > SL2TIM is not called, but SL2TIM.clone is called by GEMINI and GEMINI.clone; > > because it is called twice, it is not considered a > > function-only-called-once. > > PHCALL is not called, but PHCALL.clone is called by SL2TIM and SL2TIM.clone; > > because it is called twice, it is not considered a > > function-only-called-once. > > PHTASK is not called, but PHTASK.clone is called by PHCALL and PHCALL.clone; > > because it is called twice, it is not considered a > > function-only-called-once. > > > > I don't think this is really what we want with functions-only-called-once: > > If only the .clone version of a function is used, than a function that's > > only called once *inside this clone* is a function-only-called-once. > > > > I hope this analysis helps, > > I think the underlying issue is > > phtask/41(-1) @0x7fd198c35100 availability:local 26416 time, 4268 > benefit 4541 size, 880 benefit 480 bytes stack usage reachable body > local finalized inlinable > called by: > phcall/33(-1) @0x7fd198c33a00 availability:local 8281 time, 972 > benefit 1351 size, 291 benefit 984 bytes stack usage reachable body > local finalized inlinable > called by: > > that these are not called but still reachable (they should not be reachable > anymore, instead the clones are now reachable). I think there already is > a bug about cloning not updating cgraph reachability and not reclaiming > nodes after IPA transform application. reachable flag is not kept up to date after initial cgraph build, only code removing unreachable functions compute it. The actual problem here is uglier - the reachability pass can not really remove the original functions, since their clones needs to be constructed, so the function stay in the cgraph until this happens. This confuse called once logic. Hmm, I guess we need function called once to be able to figure out those functions staying in callgraph only because they are masters for clones to be materialized. I guess we can make reachability pass ignoring this issue (so really get reachability up to date) and make inliner (and other propagation passes) to ignore those unreachable nodes. Ugly, but at the moment I don't see better way around :( -fno-ipa-cp should work around your problem for time being. Honza > > Richard. > > > -- > > Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 > > Saturnushof 14, 3738 XG Â Maartensdijk, The Netherlands > > At home: http://moene.org/~toon/ > > Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-12 16:16 ` Jan Hubicka @ 2009-11-14 12:55 ` Toon Moene 2009-11-14 19:52 ` Richard Guenther 0 siblings, 1 reply; 18+ messages in thread From: Toon Moene @ 2009-11-14 12:55 UTC (permalink / raw) To: Jan Hubicka; +Cc: Richard Guenther, Jan Hubicka, gcc mailing list Jan Hubicka wrote: > -fno-ipa-cp should work around your problem for time being. Indeed it did. Some figures: hlprog (the main forecast program): link time optimization time: 3:20 minutes top memory usage: 920 Mbyte Inliner report: Inlined 764 calls, eliminated 226 functions, size 260368 turned to 126882 size. hirvda (the observation usage program): link time optimization time: 10:05 minutes top memory usage: 2.3 Gbyte Inliner report: Inlined 2518 calls, eliminated 608 functions, size 1187204 turned to 705838 size. Of course, there still is: Considering invlo6 size 1996. Called once from lowpass 530 insns. Inlined into lowpass which now has 2293 size for a net change of -2229 size. Considering invlo4 size 1462. Called once from lowpass 2293 insns. Not inlined because --param large-function-growth limit reached. Considering invlo2 size 933. Called once from lowpass 2293 insns. Not inlined because --param large-function-growth limit reached. where the largest callee *does* get inlined, while two smaller ones don't (I agree with Jan that this would have been solved by training the inliner with profiling data, because only invlo4 gets called). However, my endeavour is to boldly go where no inliner has gone before, and implement -falways-inline-functions-only-called-once, along the following lines: $ svn diff ipa-inline.c Index: ipa-inline.c =================================================================== --- ipa-inline.c (revision 153776) +++ ipa-inline.c (working copy) @@ -1246,7 +1246,7 @@ node->callers->caller->global.size); } - if (cgraph_check_inline_limits (node->callers->caller, node, + if (1 || cgraph_check_inline_limits (node->callers->caller, node, &reason, false)) { cgraph_mark_inline (node->callers); (Sugg. b. Rich. G.), because inlining functions that are only called once is always profitable (in number of instructions saved). -- Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-14 12:55 ` Toon Moene @ 2009-11-14 19:52 ` Richard Guenther 2009-11-14 20:14 ` Steven Bosscher 2009-11-14 22:05 ` Toon Moene 0 siblings, 2 replies; 18+ messages in thread From: Richard Guenther @ 2009-11-14 19:52 UTC (permalink / raw) To: Toon Moene; +Cc: Jan Hubicka, Jan Hubicka, gcc mailing list 2009/11/14 Toon Moene <toon@moene.org>: > Jan Hubicka wrote: > >> -fno-ipa-cp should work around your problem for time being. > > Indeed it did. Some figures: > > hlprog (the main forecast program): > > link time optimization time: 3:20 minutes > top memory usage: 920 Mbyte > > Inliner report: > > Inlined 764 calls, eliminated 226 functions, size 260368 turned to 126882 > size. > > hirvda (the observation usage program): > > link time optimization time: 10:05 minutes > top memory usage: 2.3 Gbyte > > Inliner report: > > Inlined 2518 calls, eliminated 608 functions, size 1187204 turned to 705838 > size. > > Of course, there still is: > > Considering invlo6 size 1996. > Called once from lowpass 530 insns. > Inlined into lowpass which now has 2293 size for a net change of -2229 > size. > > Considering invlo4 size 1462. > Called once from lowpass 2293 insns. > Not inlined because --param large-function-growth limit reached. > > Considering invlo2 size 933. > Called once from lowpass 2293 insns. > Not inlined because --param large-function-growth limit reached. > > where the largest callee *does* get inlined, while two smaller ones don't (I > agree with Jan that this would have been solved by training the inliner with > profiling data, because only invlo4 gets called). > > However, my endeavour is to boldly go where no inliner has gone before, and > implement -falways-inline-functions-only-called-once, along the following > lines: > > $ svn diff ipa-inline.c > Index: ipa-inline.c > =================================================================== > --- ipa-inline.c (revision 153776) > +++ ipa-inline.c (working copy) > @@ -1246,7 +1246,7 @@ > node->callers->caller->global.size); > } > > - if (cgraph_check_inline_limits (node->callers->caller, node, > + if (1 || cgraph_check_inline_limits (node->callers->caller, > node, > &reason, false)) > { > cgraph_mark_inline (node->callers); > > (Sugg. b. Rich. G.), because inlining functions that are only called once is > always profitable (in number of instructions saved). ;) Note that some optimizers (for example value-numbering) contain cut-offs so that they are turned off for large functions as otherwise compile-time issues appear as algorithms are non-linear in the size of the function. So it might even be not profitable in the end for size and speed reasons. Richard. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-14 19:52 ` Richard Guenther @ 2009-11-14 20:14 ` Steven Bosscher 2009-11-14 22:13 ` Richard Guenther 2009-11-14 22:05 ` Toon Moene 1 sibling, 1 reply; 18+ messages in thread From: Steven Bosscher @ 2009-11-14 20:14 UTC (permalink / raw) To: Richard Guenther; +Cc: Toon Moene, Jan Hubicka, Jan Hubicka, gcc mailing list On Sat, Nov 14, 2009 at 8:51 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > Note that some optimizers (for example value-numbering) contain cut-offs > so that they are turned off for large functions as otherwise compile-time > issues appear as algorithms are non-linear in the size of the function. > > So it might even be not profitable in the end for size and speed reasons. ...where one should keep in mind, that this is one of those areas where GCC is still at least a decade behind the best compilers in the industry. Those optimizations, that cut themselves off, would work just fine on regions instead of whole functions. Another thing that might be helpful, is partial inlining (e.g. http://www.csc.villanova.edu/~tway/publications/wayPDPTA02.pdf although I suspect that for the code from Toon only whole-function inlining is useful...?). Zadeck had code for structural analysis a couple of years ago. I don't think anyone has seriously worked with that to experiment with region based compilation. But I guess it will be the Next Big Challange for GCC, after LTO. Ciao! Steven ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-14 20:14 ` Steven Bosscher @ 2009-11-14 22:13 ` Richard Guenther 2009-11-15 10:31 ` Steven Bosscher 0 siblings, 1 reply; 18+ messages in thread From: Richard Guenther @ 2009-11-14 22:13 UTC (permalink / raw) To: Steven Bosscher; +Cc: Toon Moene, Jan Hubicka, Jan Hubicka, gcc mailing list On Sat, Nov 14, 2009 at 2:13 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: > On Sat, Nov 14, 2009 at 8:51 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> Note that some optimizers (for example value-numbering) contain cut-offs >> so that they are turned off for large functions as otherwise compile-time >> issues appear as algorithms are non-linear in the size of the function. >> >> So it might even be not profitable in the end for size and speed reasons. > > ...where one should keep in mind, that this is one of those areas > where GCC is still at least a decade behind the best compilers in the > industry. Those optimizations, that cut themselves off, would work > just fine on regions instead of whole functions. Another thing that > might be helpful, is partial inlining (e.g. > http://www.csc.villanova.edu/~tway/publications/wayPDPTA02.pdf > although I suspect that for the code from Toon only whole-function > inlining is useful...?). Indeed. For Tom it shouldn't really matter whether the functions are inlined or not - aliasing shouldn't be an issue here due to Fortran semantics. Maybe it's alignment ... With IPA-PTA aliasing shouldn't be an issue for C or C++ either, the alignment issue remains though. > Zadeck had code for structural analysis a couple of years ago. I don't > think anyone has seriously worked with that to experiment with region > based compilation. But I guess it will be the Next Big Challange for > GCC, after LTO. Yeah, I have some patches for the SSA propagators, but those are not the problematic ones with respect to compile-time. Value-numbering cut's itself off at a certain SCC size, which I suspect cannot be easily fixed with regions (regions probably can't really cross SCCs). I don't even remember which other passes have this kind of cut-offs .. Richard. > Ciao! > Steven > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-14 22:13 ` Richard Guenther @ 2009-11-15 10:31 ` Steven Bosscher 2009-11-15 14:07 ` Toon Moene 0 siblings, 1 reply; 18+ messages in thread From: Steven Bosscher @ 2009-11-15 10:31 UTC (permalink / raw) To: Richard Guenther; +Cc: Toon Moene, Jan Hubicka, Jan Hubicka, gcc mailing list On Sat, Nov 14, 2009 at 11:12 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > I don't even remember which other passes have this kind of cut-offs .. At least CPROP, LCM-PRE, and HOIST (i.e. all passes in gcse.c), and variable tracking. Ciao! Steven ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-15 10:31 ` Steven Bosscher @ 2009-11-15 14:07 ` Toon Moene 2009-11-15 14:44 ` Richard Guenther 0 siblings, 1 reply; 18+ messages in thread From: Toon Moene @ 2009-11-15 14:07 UTC (permalink / raw) To: Steven Bosscher Cc: Richard Guenther, Jan Hubicka, Jan Hubicka, gcc mailing list Steven Bosscher wrote: > On Sat, Nov 14, 2009 at 11:12 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> I don't even remember which other passes have this kind of cut-offs .. > > At least CPROP, LCM-PRE, and HOIST (i.e. all passes in gcse.c), and > variable tracking. Are they covered by a --param ? At least that way I could teach them to go on indefinitely ... [ The practice with binaries (i.e., the results of builds up until binaries are produced) in my world is: compile once (no matter how much time it takes) and run about 18 hours of a 24 hour period each, until the next compilation - about a year later. So it doesn't really matter how much time a compile/link step takes. ] -- Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-15 14:07 ` Toon Moene @ 2009-11-15 14:44 ` Richard Guenther 2009-11-15 14:58 ` Toon Moene 0 siblings, 1 reply; 18+ messages in thread From: Richard Guenther @ 2009-11-15 14:44 UTC (permalink / raw) To: Toon Moene; +Cc: Steven Bosscher, Jan Hubicka, Jan Hubicka, gcc mailing list On Sun, Nov 15, 2009 at 8:07 AM, Toon Moene <toon@moene.org> wrote: > Steven Bosscher wrote: > >> On Sat, Nov 14, 2009 at 11:12 PM, Richard Guenther >> <richard.guenther@gmail.com> wrote: > >>> I don't even remember which other passes have this kind of cut-offs .. >> >> At least CPROP, LCM-PRE, and HOIST (i.e. all passes in gcse.c), and >> variable tracking. > > Are they covered by a --param ? At least that way I could teach them to go > on indefinitely ... I think most of them are. Maybe we should diagnose the cases where we hit these limits. Richard. > [ The practice with binaries (i.e., the results of builds up until > binaries are produced) in my world is: compile once (no matter how > much time it takes) and run about 18 hours of a 24 hour period each, > until the next compilation - about a year later. > > So it doesn't really matter how much time a compile/link step takes. ] > > -- > Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > At home: http://moene.org/~toon/ > Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-15 14:44 ` Richard Guenther @ 2009-11-15 14:58 ` Toon Moene 2009-11-15 20:01 ` Tim Prince 0 siblings, 1 reply; 18+ messages in thread From: Toon Moene @ 2009-11-15 14:58 UTC (permalink / raw) To: Richard Guenther Cc: Steven Bosscher, Jan Hubicka, Jan Hubicka, gcc mailing list Richard Guenther wrote: > On Sun, Nov 15, 2009 at 8:07 AM, Toon Moene <toon@moene.org> wrote: >> Steven Bosscher wrote: >>> At least CPROP, LCM-PRE, and HOIST (i.e. all passes in gcse.c), and >>> variable tracking. >> Are they covered by a --param ? At least that way I could teach them to go >> on indefinitely ... > I think most of them are. Maybe we should diagnose the cases where > we hit these limits. That would be a good idea. One other compiler I work with frequently (the Intel Fortran compiler) does just that. However, either it doesn't have or their marketing department doesn't want you to know about knobs to tweak these decisions :-) -- Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-15 14:58 ` Toon Moene @ 2009-11-15 20:01 ` Tim Prince 0 siblings, 0 replies; 18+ messages in thread From: Tim Prince @ 2009-11-15 20:01 UTC (permalink / raw) To: Toon Moene Cc: Richard Guenther, Steven Bosscher, Jan Hubicka, Jan Hubicka, gcc mailing list Toon Moene wrote: > Richard Guenther wrote: > >> On Sun, Nov 15, 2009 at 8:07 AM, Toon Moene <toon@moene.org> wrote: > >>> Steven Bosscher wrote: > >>>> At least CPROP, LCM-PRE, and HOIST (i.e. all passes in gcse.c), and >>>> variable tracking. > >>> Are they covered by a --param ? At least that way I could teach them >>> to go >>> on indefinitely ... > >> I think most of them are. Maybe we should diagnose the cases where >> we hit these limits. > > That would be a good idea. One other compiler I work with frequently > (the Intel Fortran compiler) does just that. However, either it doesn't > have or their marketing department doesn't want you to know about knobs > to tweak these decisions :-) > Both gfortran and ifort have a much longer list of adjustable limits on in-lining than most customers are willing to study or test. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Whole program optimization and functions-only-called-once. 2009-11-14 19:52 ` Richard Guenther 2009-11-14 20:14 ` Steven Bosscher @ 2009-11-14 22:05 ` Toon Moene 1 sibling, 0 replies; 18+ messages in thread From: Toon Moene @ 2009-11-14 22:05 UTC (permalink / raw) To: Richard Guenther; +Cc: Jan Hubicka, Jan Hubicka, gcc mailing list Richard Guenther wrote: > 2009/11/14 Toon Moene <toon@moene.org>: >> However, my endeavour is to boldly go where no inliner has gone before, and >> implement -falways-inline-functions-only-called-once, along the following >> lines: ... >> (Sugg. b. Rich. G.), because inlining functions that are only called once is >> always profitable (in number of instructions saved). > > ;) > > Note that some optimizers (for example value-numbering) contain cut-offs > so that they are turned off for large functions as otherwise compile-time > issues appear as algorithms are non-linear in the size of the function. As you correctly note, this is a tongue-in-cheek remark - anyway, we (meaning, I) have first to find out why an executable, thus constructed, gets execution times for a time step (the "unit-of-work") between 61 and 94 seconds, something that should be close to the same on every time step. -- Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2009-11-15 20:01 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-11-04 19:19 Whole program optimization and functions-only-called-once Toon Moene 2009-11-04 19:26 ` Richard Guenther 2009-11-04 21:20 ` Toon Moene 2009-11-04 21:30 ` Andrew Pinski 2009-11-04 21:50 ` Richard Guenther 2009-11-12 16:46 ` Jan Hubicka 2009-11-12 21:41 ` Jan Hubicka 2009-11-12 16:16 ` Jan Hubicka 2009-11-14 12:55 ` Toon Moene 2009-11-14 19:52 ` Richard Guenther 2009-11-14 20:14 ` Steven Bosscher 2009-11-14 22:13 ` Richard Guenther 2009-11-15 10:31 ` Steven Bosscher 2009-11-15 14:07 ` Toon Moene 2009-11-15 14:44 ` Richard Guenther 2009-11-15 14:58 ` Toon Moene 2009-11-15 20:01 ` Tim Prince 2009-11-14 22:05 ` Toon Moene
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).