From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-305929-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 24277 invoked by alias); 28 Oct 2011 22:22:12 -0000
Received: (qmail 24234 invoked by uid 22791); 28 Oct 2011 22:22:09 -0000
X-SWARE-Spam-Status: No, hits=-1.9 required=5.0	tests=AWL,BAYES_00
X-Spam-Check-By: sourceware.org
Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 28 Oct 2011 22:21:47 +0000
Received: from nat-ies.mentorg.com ([192.94.31.2] helo=EU1-MAIL.mgc.mentorg.com)	by relay1.mentorg.com with esmtp 	id 1RJuo5-00041x-Oz from Maxim_Kuvyrkov@mentor.com ; Fri, 28 Oct 2011 15:21:46 -0700
Received: from [127.0.0.1] ([172.16.63.104]) by EU1-MAIL.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.1830);	 Fri, 28 Oct 2011 23:21:43 +0100
Subject: Re: [PATCH] Add capability to run several iterations of early optimizations
Mime-Version: 1.0 (Apple Message framework v1251.1)
Content-Type: text/plain; charset=iso-8859-1
From: Maxim Kuvyrkov <maxim@codesourcery.com>
In-Reply-To: <CAFiYyc1AHg7u0VA2XiZ50ozY6=BaPQx7Dct9-7M1rUzdgNj9Lw@mail.gmail.com>
Date: Fri, 28 Oct 2011 23:07:00 -0000
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, Matt <matt@use.net>
Content-Transfer-Encoding: quoted-printable
Message-Id: <AE6C3BCE-4EE9-478A-BABA-0D52CB7C3003@codesourcery.com>
References: <D2CC2627-619F-4154-85DD-0BF4A11D2027@codesourcery.com> <CAFiYyc3R0r2Gc4bn1dqsx3eiv76yMKj59y2-068mJh5AoAEnyw@mail.gmail.com> <01F22181-5EA1-46B1-95F6-0F24B92E5FC9@codesourcery.com> <CAFiYyc30CiDyug50_fqUFFi6zArwxZP1qz0Fg=JOZQwSS=fK8Q@mail.gmail.com> <2047F9D7-5DE8-42C3-8E6E-B20A2752AB46@codesourcery.com> <CAFiYyc1AHg7u0VA2XiZ50ozY6=BaPQx7Dct9-7M1rUzdgNj9Lw@mail.gmail.com>
To: Richard Guenther <richard.guenther@gmail.com>
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-10/txt/msg02727.txt.bz2

On 28/10/2011, at 11:43 PM, Richard Guenther wrote:

> On Fri, Oct 28, 2011 at 1:05 AM, Maxim Kuvyrkov <maxim@codesourcery.com> =
wrote:
>>=20
>> OK for trunk?
>=20
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index f056d3d..4738b28 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -2416,7 +2416,7 @@ cgraph_add_new_function (tree fndecl, bool lowered)
>            tree_lowering_passes (fndecl);
>            bitmap_obstack_initialize (NULL);
>            if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl)))
> -             execute_pass_list (pass_early_local_passes.pass.sub);
> +             execute_early_local_passes_for_current_function ();
>            bitmap_obstack_release (NULL);
>            pop_cfun ();
>            current_function_decl =3D NULL;
> @@ -2441,7 +2441,7 @@ cgraph_add_new_function (tree fndecl, bool lowered)
>        gimple_register_cfg_hooks ();
>        bitmap_obstack_initialize (NULL);
>        if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl)))
> -         execute_pass_list (pass_early_local_passes.pass.sub);
> +         execute_early_local_passes_for_current_function ();
>=20
> I think these should only execute the lowering pieces of early local pass=
es,
> let me see if that's properly split ...
>=20
> @@ -255,7 +255,7 @@ cgraph_process_new_functions (void)
>              /* When not optimizing, be sure we run early local passes an=
yway
>                 to expand OMP.  */
>              || !optimize)
> -           execute_pass_list (pass_early_local_passes.pass.sub);
> +           execute_early_local_passes_for_current_function ();
>=20
> similar.

In the case of tree_lowering_passes, I agree.  But with the calls to early_=
local_passes from cgraph_add_new_function/cgraph_process_new_functions we o=
ught to run the full list of optimizations to make bodies of new functions =
ready for inlining to existing functions.  Note, cgraph_add_new_function() =
explicitly calls tree_lowering_passes and then invokes early_local_passes.

What do you think?  If we want to pursue this, it should be in a separate p=
atch anyway.

>=20
> About all this -suffix stuff, I'd like to have the iterations simply re-u=
se
> the existing dump-files, thus statically sub-divide pass_early_local_pass=
es
> like

I considered this when I was implementing and debugging iterative support, =
and it is better to keep dumps for main and iterative parts separate.  Thes=
e are two different IPA passes, and their sub-passes should use separate du=
mps.

Given that the set of sub-passes of early_local_passes_main is substantiall=
y different from the set of sub-passes of early_local_passes_iter (the form=
er has a bunch of init/expand/lower passes in it), mixing the dumps will be=
 mighty confusing.  Without knowing the exact ordering of passes one will n=
ot be able to construct a chronological picture of optimizations.  Separati=
ng dumps of sub-passes of different IPA passes simplifies understanding of =
optimization ordering.

>=20
> NEXT_PASS (pass_early_local_lowering_passes);
>  {
>      NEXT_PASS (pass_fixup_cfg);
>      NEXT_PASS (pass_init_datastructures);
>      NEXT_PASS (pass_expand_omp);
>=20
>      NEXT_PASS (pass_referenced_vars);
>      NEXT_PASS (pass_build_ssa);
>      NEXT_PASS (pass_lower_vector);
>      NEXT_PASS (pass_early_warn_uninitialized);
>  }
> /* The following you maybe iterate.  */
> NEXT_PASS (pass_early_local_optimizations);
>  {
>      NEXT_PASS (pass_rebuild_cgraph_edges);
>      NEXT_PASS (pass_inline_parameters);
>      NEXT_PASS (pass_early_inline);
>      NEXT_PASS (pass_all_early_optimizations);
>        {
>          struct opt_pass **p =3D &pass_all_early_optimizations.pass.sub;
>          NEXT_PASS (pass_remove_cgraph_callee_edges);
>          NEXT_PASS (pass_rename_ssa_copies);
>          NEXT_PASS (pass_ccp);
>          NEXT_PASS (pass_forwprop);
>          /* pass_build_ealias is a dummy pass that ensures that we
>             execute TODO_rebuild_alias at this point.  Re-building
>             alias information also rewrites no longer addressed
>             locals into SSA form if possible.  */
>          NEXT_PASS (pass_build_ealias);
>          NEXT_PASS (pass_sra_early);
>          NEXT_PASS (pass_fre);
>          NEXT_PASS (pass_copy_prop);
>          NEXT_PASS (pass_merge_phi);
>          NEXT_PASS (pass_cd_dce);
>          NEXT_PASS (pass_early_ipa_sra);
>          NEXT_PASS (pass_tail_recursion);
>          NEXT_PASS (pass_convert_switch);
>          NEXT_PASS (pass_cleanup_eh);
>          NEXT_PASS (pass_profile);
>          NEXT_PASS (pass_local_pure_const);
>       }
>   }
> NEXT_PASS (pass_late_early_local_optimizations)
>  {
>          /* Split functions creates parts that are not run through
>             early optimizations again.  It is thus good idea to do this
>             late.  */
>          NEXT_PASS (pass_split_functions);
>      NEXT_PASS (pass_release_ssa_names);
>      NEXT_PASS (pass_rebuild_cgraph_edges);
>      NEXT_PASS (pass_inline_parameters);
>  }

This simple division into 3 stages will not work.  On one hand, one needs t=
o add fixup/cleanup passes at the beginning and end of every SIMPLE_IPA pas=
s; without those cgraph_verify() gets upset very easily.  On the other hand=
, when iterative optimizations are not enabled, and there is no need to dec=
ouple main and late parts, these additional fixup/cleanup passes would be u=
nnecessary overhead.

Also, in the above division you effectively make 3 separate SIMPLE_IPA pass=
es, and I don't know what the effect of that will be.

The pass formation and scheduling in the patch I posted may not look as pre=
tty as what you describe above, but it is reasonably clean and easy to main=
tain.  One needs to change only passes in the main part, and those changes =
will be automatically picked up in iterative and late parts.

>=20
> +++ b/gcc/toplev.c
> @@ -1228,7 +1228,6 @@ general_init (const char *argv0)
>   /* This must be done after global_init_params but before argument
>      processing.  */
>   init_ggc_heuristics();
> -  init_optimization_passes ();
>   statistics_early_init ();
>   finish_params ();
> }
> @@ -1989,6 +1988,8 @@ toplev_main (int argc, char **argv)
>                  save_decoded_options, save_decoded_options_count,
>                  UNKNOWN_LOCATION, global_dc);
>=20
> +  init_optimization_passes ();
> +
>   handle_common_deferred_options ();
>=20
>   init_local_tick ();
>=20
> any reason for this?

Yes, this moves init_optimization_passes() after processing of --param opti=
ons.  This is needed to be able to use PARAM_VALUE in passes initialization.

>=20
> +  /* Don't recurse or wonder on to the next pass when running
> +     execute_ipa_pass_list below.  */
> +  current->execute =3D NULL;
> +  current->next =3D NULL;
>=20
> that looks awkward ;)  Shouldn't instead
>=20
> +    execute_ipa_pass_list (current);
>=20
> not do what it does?  Thus, split that into two pieces, one that we
> can iterate w/o the fiddling with current->?

The choice is between the above and duplicating post-processing logic of ex=
ecute_ipa_pass_list (i.e., call to cgraph_process_new_functions) in the ite=
ration loop.  I agree that the above is a tad awkward, but it is cleaner th=
an copy-paste of the final part of execute_ipa_pass_list.

>=20
> I like this variant a lot better than the last one - still it lacks any
> analysis-based justification for iteration (see my reply to Matt on
> what I discussed with Honza).

Yes, having a way to tell whether a function have significantly changed wou=
ld be awesome.  My approach here would be to make inline_parameters output =
feedback of how much the size/time metrics have changed for a function sinc=
e previous run.  If the change is above X%, then queue functions callers fo=
r more optimizations.  Similarly, Martin's rebuild_cgraph_edges_and_devirt =
(when that goes into trunk) could queue new direct callees and current func=
tion for another iteration if new direct edges were resolved.

But such analysis is way out of scope of this patch, which adds infrastruct=
ure for iterative optimizations to be possible and reliable.

>  Thus, I don't think we want to
> merge this in its current form or in this stage1.

What is the benefit of pushing this to a later release?  If anything, mergi=
ng the support for iterative optimizations now will allow us to consider ad=
ding the wonderful smartness to it later.  In the meantime, substituting th=
at smartness with a knob is still a great alternative.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics