From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24277 invoked by alias); 28 Oct 2011 22:22:12 -0000 Received: (qmail 24234 invoked by uid 22791); 28 Oct 2011 22:22:09 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 28 Oct 2011 22:21:47 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=EU1-MAIL.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1RJuo5-00041x-Oz from Maxim_Kuvyrkov@mentor.com ; Fri, 28 Oct 2011 15:21:46 -0700 Received: from [127.0.0.1] ([172.16.63.104]) by EU1-MAIL.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.1830); Fri, 28 Oct 2011 23:21:43 +0100 Subject: Re: [PATCH] Add capability to run several iterations of early optimizations Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=iso-8859-1 From: Maxim Kuvyrkov In-Reply-To: Date: Fri, 28 Oct 2011 23:07:00 -0000 Cc: GCC Patches , Matt Content-Transfer-Encoding: quoted-printable Message-Id: References: <01F22181-5EA1-46B1-95F6-0F24B92E5FC9@codesourcery.com> <2047F9D7-5DE8-42C3-8E6E-B20A2752AB46@codesourcery.com> To: Richard Guenther Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-10/txt/msg02727.txt.bz2 On 28/10/2011, at 11:43 PM, Richard Guenther wrote: > On Fri, Oct 28, 2011 at 1:05 AM, Maxim Kuvyrkov = wrote: >>=20 >> OK for trunk? >=20 > diff --git a/gcc/cgraph.c b/gcc/cgraph.c > index f056d3d..4738b28 100644 > --- a/gcc/cgraph.c > +++ b/gcc/cgraph.c > @@ -2416,7 +2416,7 @@ cgraph_add_new_function (tree fndecl, bool lowered) > tree_lowering_passes (fndecl); > bitmap_obstack_initialize (NULL); > if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl))) > - execute_pass_list (pass_early_local_passes.pass.sub); > + execute_early_local_passes_for_current_function (); > bitmap_obstack_release (NULL); > pop_cfun (); > current_function_decl =3D NULL; > @@ -2441,7 +2441,7 @@ cgraph_add_new_function (tree fndecl, bool lowered) > gimple_register_cfg_hooks (); > bitmap_obstack_initialize (NULL); > if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl))) > - execute_pass_list (pass_early_local_passes.pass.sub); > + execute_early_local_passes_for_current_function (); >=20 > I think these should only execute the lowering pieces of early local pass= es, > let me see if that's properly split ... >=20 > @@ -255,7 +255,7 @@ cgraph_process_new_functions (void) > /* When not optimizing, be sure we run early local passes an= yway > to expand OMP. */ > || !optimize) > - execute_pass_list (pass_early_local_passes.pass.sub); > + execute_early_local_passes_for_current_function (); >=20 > similar. In the case of tree_lowering_passes, I agree. But with the calls to early_= local_passes from cgraph_add_new_function/cgraph_process_new_functions we o= ught to run the full list of optimizations to make bodies of new functions = ready for inlining to existing functions. Note, cgraph_add_new_function() = explicitly calls tree_lowering_passes and then invokes early_local_passes. What do you think? If we want to pursue this, it should be in a separate p= atch anyway. >=20 > About all this -suffix stuff, I'd like to have the iterations simply re-u= se > the existing dump-files, thus statically sub-divide pass_early_local_pass= es > like I considered this when I was implementing and debugging iterative support, = and it is better to keep dumps for main and iterative parts separate. Thes= e are two different IPA passes, and their sub-passes should use separate du= mps. Given that the set of sub-passes of early_local_passes_main is substantiall= y different from the set of sub-passes of early_local_passes_iter (the form= er has a bunch of init/expand/lower passes in it), mixing the dumps will be= mighty confusing. Without knowing the exact ordering of passes one will n= ot be able to construct a chronological picture of optimizations. Separati= ng dumps of sub-passes of different IPA passes simplifies understanding of = optimization ordering. >=20 > NEXT_PASS (pass_early_local_lowering_passes); > { > NEXT_PASS (pass_fixup_cfg); > NEXT_PASS (pass_init_datastructures); > NEXT_PASS (pass_expand_omp); >=20 > NEXT_PASS (pass_referenced_vars); > NEXT_PASS (pass_build_ssa); > NEXT_PASS (pass_lower_vector); > NEXT_PASS (pass_early_warn_uninitialized); > } > /* The following you maybe iterate. */ > NEXT_PASS (pass_early_local_optimizations); > { > NEXT_PASS (pass_rebuild_cgraph_edges); > NEXT_PASS (pass_inline_parameters); > NEXT_PASS (pass_early_inline); > NEXT_PASS (pass_all_early_optimizations); > { > struct opt_pass **p =3D &pass_all_early_optimizations.pass.sub; > NEXT_PASS (pass_remove_cgraph_callee_edges); > NEXT_PASS (pass_rename_ssa_copies); > NEXT_PASS (pass_ccp); > NEXT_PASS (pass_forwprop); > /* pass_build_ealias is a dummy pass that ensures that we > execute TODO_rebuild_alias at this point. Re-building > alias information also rewrites no longer addressed > locals into SSA form if possible. */ > NEXT_PASS (pass_build_ealias); > NEXT_PASS (pass_sra_early); > NEXT_PASS (pass_fre); > NEXT_PASS (pass_copy_prop); > NEXT_PASS (pass_merge_phi); > NEXT_PASS (pass_cd_dce); > NEXT_PASS (pass_early_ipa_sra); > NEXT_PASS (pass_tail_recursion); > NEXT_PASS (pass_convert_switch); > NEXT_PASS (pass_cleanup_eh); > NEXT_PASS (pass_profile); > NEXT_PASS (pass_local_pure_const); > } > } > NEXT_PASS (pass_late_early_local_optimizations) > { > /* Split functions creates parts that are not run through > early optimizations again. It is thus good idea to do this > late. */ > NEXT_PASS (pass_split_functions); > NEXT_PASS (pass_release_ssa_names); > NEXT_PASS (pass_rebuild_cgraph_edges); > NEXT_PASS (pass_inline_parameters); > } This simple division into 3 stages will not work. On one hand, one needs t= o add fixup/cleanup passes at the beginning and end of every SIMPLE_IPA pas= s; without those cgraph_verify() gets upset very easily. On the other hand= , when iterative optimizations are not enabled, and there is no need to dec= ouple main and late parts, these additional fixup/cleanup passes would be u= nnecessary overhead. Also, in the above division you effectively make 3 separate SIMPLE_IPA pass= es, and I don't know what the effect of that will be. The pass formation and scheduling in the patch I posted may not look as pre= tty as what you describe above, but it is reasonably clean and easy to main= tain. One needs to change only passes in the main part, and those changes = will be automatically picked up in iterative and late parts. >=20 > +++ b/gcc/toplev.c > @@ -1228,7 +1228,6 @@ general_init (const char *argv0) > /* This must be done after global_init_params but before argument > processing. */ > init_ggc_heuristics(); > - init_optimization_passes (); > statistics_early_init (); > finish_params (); > } > @@ -1989,6 +1988,8 @@ toplev_main (int argc, char **argv) > save_decoded_options, save_decoded_options_count, > UNKNOWN_LOCATION, global_dc); >=20 > + init_optimization_passes (); > + > handle_common_deferred_options (); >=20 > init_local_tick (); >=20 > any reason for this? Yes, this moves init_optimization_passes() after processing of --param opti= ons. This is needed to be able to use PARAM_VALUE in passes initialization. >=20 > + /* Don't recurse or wonder on to the next pass when running > + execute_ipa_pass_list below. */ > + current->execute =3D NULL; > + current->next =3D NULL; >=20 > that looks awkward ;) Shouldn't instead >=20 > + execute_ipa_pass_list (current); >=20 > not do what it does? Thus, split that into two pieces, one that we > can iterate w/o the fiddling with current->? The choice is between the above and duplicating post-processing logic of ex= ecute_ipa_pass_list (i.e., call to cgraph_process_new_functions) in the ite= ration loop. I agree that the above is a tad awkward, but it is cleaner th= an copy-paste of the final part of execute_ipa_pass_list. >=20 > I like this variant a lot better than the last one - still it lacks any > analysis-based justification for iteration (see my reply to Matt on > what I discussed with Honza). Yes, having a way to tell whether a function have significantly changed wou= ld be awesome. My approach here would be to make inline_parameters output = feedback of how much the size/time metrics have changed for a function sinc= e previous run. If the change is above X%, then queue functions callers fo= r more optimizations. Similarly, Martin's rebuild_cgraph_edges_and_devirt = (when that goes into trunk) could queue new direct callees and current func= tion for another iteration if new direct edges were resolved. But such analysis is way out of scope of this patch, which adds infrastruct= ure for iterative optimizations to be possible and reliable. > Thus, I don't think we want to > merge this in its current form or in this stage1. What is the benefit of pushing this to a later release? If anything, mergi= ng the support for iterative optimizations now will allow us to consider ad= ding the wonderful smartness to it later. In the meantime, substituting th= at smartness with a knob is still a great alternative. Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics