public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PR tree-optimize/49373 (IPA-PTA regression)
@ 2011-06-23  0:03 Jan Hubicka
  2011-06-23 10:06 ` Richard Guenther
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Hubicka @ 2011-06-23  0:03 UTC (permalink / raw)
  To: gcc-patches, hp, rguenther

Hi,
this patch moves ipa-pta into new ipa pass queue of simple IPA passes executed
after regular IPA passes. The reason is that IPA-PTA is really implemented as
simple IPA pass (i.e. it looks into function bodies at its propagate stage and
does not support WHOPR mode) and I planned having place for such passes for a
while.  Until now, there has not been a reason however.

The patch fixes regression I introduced by my alias reorg that triggered latent
problem in IPA-PTA not really expecting to see cgraph with redirected edges.
In longer term we want full IPA IPA-PTA, but we are simply not there, yet.
Having place for late small IPA passes is however convenient for other reasons:
we can remove functions whose references are optimized out or we can re-do
some of early optimizations, inlcluding ipa-sra, late that might be interesting
for LTO.

The change needed quite a bit more unentaging of the old code than I would like
and I was not fully succesful on it.  The reason is that until now we
executed the transform stage of IPA passes all just before the first local
pass of late copmilation (i.e. all_passes) is done.

With this patch the ipa-transforms can take place either at beggining of all_passes
(when all late IPA passes are disabled) or just before first late IPA pass.

Original motivation for applying the transforms all at the time of late
compilation was the "half-WHOPR" compilation model I developed originally
cgraph for in 2003-2006.  The idea was that IPA passes will have function body
summaries, just like in our curent WHOPR implementation, but I did not intended
to implement the second streamping (WPA->LTRANS). Parallel compilation
was not that much of concern at that time. I simply expected the compiler to
produce final assembly in the same process as running WPA, but preventing a
need to load all function bodies into memory at once.  The late copilation
was expected to load function bodies one by one, optimize them and output to
assembly (modulo preloading of inlined functions).

With WHOPR this is not really so important (while it is theoretically possible
with -flto -flto-partition=none, just not implemented this way: lto.c proactively
loads all function bodies at early stages of LTO compilation).

This "half-WHOPR" makes such late optimization passes impossible. WHOPR solves
the problem by restricting late optimization passes to a parttions and thus makes
late IPA passes resonable.

Note that the trick reduces memory usage even w/o LTO because program after inline
decisions are applied is bigger then before.  Currently we do not apply inline
decisions "unit at a time".  This growth is however more or less bounded, since
inliner should not expand unit more then by inline-unit-growth limit.

For these reasons I don't really want to move applying of ipa transforms into
"unit at a time" by default, until we have compelling reasons to do so (i.e.
by default enabled late ipa pass that pays back for itself).

As a result I have bit of problem with cfg fixup:  cfg fixup is needed after
IPA passes because ipa-pure-const can turn functions to non-throwing pure or
const and those needs compensation at caller side that can't be done by proper
IPA pass.  We also need it at the beggining of all_passes because RTL code and
local-pure-const can do the same (i.e. turn functions to non-EH/pure/const).

Because we run cfg verifiers in between ipa transforms, we now need to run fixup
one extra time: once after inlining and once at beggining of all_passes.

I guess this is passmanager job to bookkeep this, but the passmanager is currently
bit too inflexible since all its properties are static.

There are laternatives, like fixing up cfg from the late pure const and RTL EH
code, but they seem just as ugly as one extra pass through the statements.

The patch also arranged cgraph to be valid after ipa transforms in the case some
late IPA pass is run.  This is done by simply rebuilding cgraph edges since we
do not preserve them through inline transform (it does cleanup_cfg and also we do
not really maintain ipa references). 

Finally we also now can disable ipa-inline at -O0.

I've bootstrapped/regtested x86_64-linux and verified that it fixed the regression.
I've also bootstrapped with ipa-pta enabled by default with c,c++ and fortran.
Libjava copmiles forever with ipa-pta.  There are some units needing about 3 hours
to complete.

OK?
Honza

	PR tree-optimize/49373
	* tree-pass.h (all_late_ipa_passes): Declare.
	* cgraphunit.c (init_lowered_empty_function): Fix properties.
	(cgraph_optimize): Execute late passes; remove unreachable funcions after
	materialization.
	* ipa-inline.c (gate_ipa_inline): Enable only when optimizing or LTOing.
	* passes.c (all_late_ipa_passes): Declare.
	(dump_passes, register_pass): Handle late ipa passes.
	(init_optimization_passes): Move ipa_pta to late passes; schedule fixup_cfg
	at beggining of all_passes.
	(apply_ipa_transforms): New function.
	(execute_one_pass): When doing simple ipa pass, apply all transforms.
Index: tree-pass.h
===================================================================
*** tree-pass.h	(revision 175293)
--- tree-pass.h	(working copy)
*************** extern struct gimple_opt_pass pass_conve
*** 577,583 ****
  
  /* The root of the compilation pass tree, once constructed.  */
  extern struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
!                        *all_regular_ipa_passes, *all_lto_gen_passes;
  
  /* Define a list of pass lists so that both passes.c and plugins can easily
     find all the pass lists.  */
--- 577,583 ----
  
  /* The root of the compilation pass tree, once constructed.  */
  extern struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
!                        *all_regular_ipa_passes, *all_lto_gen_passes, *all_late_ipa_passes;
  
  /* Define a list of pass lists so that both passes.c and plugins can easily
     find all the pass lists.  */
Index: cgraphunit.c
===================================================================
*** cgraphunit.c	(revision 175293)
--- cgraphunit.c	(working copy)
*************** init_lowered_empty_function (tree decl)
*** 1420,1426 ****
    DECL_SAVED_TREE (decl) = error_mark_node;
    cfun->curr_properties |=
      (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
!      PROP_ssa);
  
    /* Create BB for body of the function and connect it properly.  */
    bb = create_basic_block (NULL, (void *) 0, ENTRY_BLOCK_PTR);
--- 1420,1426 ----
    DECL_SAVED_TREE (decl) = error_mark_node;
    cfun->curr_properties |=
      (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
!      PROP_ssa | PROP_gimple_any);
  
    /* Create BB for body of the function and connect it properly.  */
    bb = create_basic_block (NULL, (void *) 0, ENTRY_BLOCK_PTR);
*************** cgraph_optimize (void)
*** 2101,2106 ****
--- 2101,2113 ----
  #endif
  
    cgraph_materialize_all_clones ();
+   bitmap_obstack_initialize (NULL);
+   execute_ipa_pass_list (all_late_ipa_passes);
+   cgraph_remove_unreachable_nodes (true, dump_file);
+ #ifdef ENABLE_CHECKING
+   verify_cgraph ();
+ #endif
+   bitmap_obstack_release (NULL);
    cgraph_mark_functions_to_output ();
  
    cgraph_state = CGRAPH_STATE_EXPANSION;
Index: ipa-inline.c
===================================================================
*** ipa-inline.c	(revision 175293)
--- ipa-inline.c	(working copy)
*************** struct gimple_opt_pass pass_early_inline
*** 1972,1988 ****
  
  
  /* When to run IPA inlining.  Inlining of always-inline functions
!    happens during early inlining.  */
  
  static bool
  gate_ipa_inline (void)
  {
!   /* ???  We'd like to skip this if not optimizing or not inlining as
!      all always-inline functions have been processed by early
!      inlining already.  But this at least breaks EH with C++ as
!      we need to unconditionally run fixup_cfg even at -O0.
!      So leave it on unconditionally for now.  */
!   return 1;
  }
  
  struct ipa_opt_pass_d pass_ipa_inline =
--- 1972,1986 ----
  
  
  /* When to run IPA inlining.  Inlining of always-inline functions
!    happens during early inlining.
! 
!    Enable inlining unconditoinally at -flto.  We need size estimates to
!    drive partitioning.  */
  
  static bool
  gate_ipa_inline (void)
  {
!   return optimize || flag_lto || flag_wpa;
  }
  
  struct ipa_opt_pass_d pass_ipa_inline =
Index: passes.c
===================================================================
*** passes.c	(revision 175293)
--- passes.c	(working copy)
*************** struct rtl_opt_pass pass_postreload =
*** 332,338 ****
  
  /* The root of the compilation pass tree, once constructed.  */
  struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
!   *all_regular_ipa_passes, *all_lto_gen_passes;
  
  /* This is used by plugins, and should also be used in register_pass.  */
  #define DEF_PASS_LIST(LIST) &LIST,
--- 332,338 ----
  
  /* The root of the compilation pass tree, once constructed.  */
  struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
!   *all_regular_ipa_passes, *all_late_ipa_passes, *all_lto_gen_passes;
  
  /* This is used by plugins, and should also be used in register_pass.  */
  #define DEF_PASS_LIST(LIST) &LIST,
*************** dump_passes (void)
*** 617,622 ****
--- 617,623 ----
    dump_pass_list (all_small_ipa_passes, 1);
    dump_pass_list (all_regular_ipa_passes, 1);
    dump_pass_list (all_lto_gen_passes, 1);
+   dump_pass_list (all_late_ipa_passes, 1);
    dump_pass_list (all_passes, 1);
  
    pop_cfun ();
*************** register_pass (struct register_pass_info
*** 1103,1108 ****
--- 1104,1111 ----
    if (!success || all_instances)
      success |= position_pass (pass_info, &all_lto_gen_passes);
    if (!success || all_instances)
+     success |= position_pass (pass_info, &all_late_ipa_passes);
+   if (!success || all_instances)
      success |= position_pass (pass_info, &all_passes);
    if (!success)
      fatal_error
*************** init_optimization_passes (void)
*** 1249,1255 ****
    NEXT_PASS (pass_ipa_inline);
    NEXT_PASS (pass_ipa_pure_const);
    NEXT_PASS (pass_ipa_reference);
-   NEXT_PASS (pass_ipa_pta);
    *p = NULL;
  
    p = &all_lto_gen_passes;
--- 1252,1257 ----
*************** init_optimization_passes (void)
*** 1257,1265 ****
--- 1259,1274 ----
    NEXT_PASS (pass_ipa_lto_finish_out);  /* This must be the last LTO pass.  */
    *p = NULL;
  
+   /* Simple IPA passes executed after the regular passes.  In WHOPR mode the
+      passes are executed after partitioning and thus see just parts of the
+      compiled unit.  */
+   p = &all_late_ipa_passes;
+   NEXT_PASS (pass_ipa_pta);
+   *p = NULL;
    /* These passes are run after IPA passes on every function that is being
       output to the assembler file.  */
    p = &all_passes;
+   NEXT_PASS (pass_fixup_cfg);
    NEXT_PASS (pass_lower_eh_dispatch);
    NEXT_PASS (pass_all_optimizations);
      {
*************** init_optimization_passes (void)
*** 1517,1522 ****
--- 1526,1534 ----
    register_dump_files (all_lto_gen_passes,
  		       PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh
  		       | PROP_cfg);
+   register_dump_files (all_late_ipa_passes,
+ 		       PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh
+ 		       | PROP_cfg);
    register_dump_files (all_passes,
  		       PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh
  		       | PROP_cfg);
*************** execute_all_ipa_transforms (void)
*** 1935,1940 ****
--- 1947,1966 ----
      }
  }
  
+ /* Callback for do_per_function to apply all IPA transforms.  */
+ 
+ static void
+ apply_ipa_transforms (void *data)
+ {
+   struct cgraph_node *node = cgraph_get_node (current_function_decl);
+   if (!node->global.inlined_to && node->ipa_transforms_to_apply)
+     {
+       *(bool *)data = true;
+       execute_all_ipa_transforms();
+       rebuild_cgraph_edges ();
+     }
+ }
+ 
  /* Check if PASS is explicitly disabled or enabled and return
     the gate status.  FUNC is the function to be processed, and
     GATE_STATUS is the gate status determined by pass manager by
*************** execute_one_pass (struct opt_pass *pass)
*** 1996,2001 ****
--- 2022,2037 ----
       executed.  */
    invoke_plugin_callbacks (PLUGIN_PASS_EXECUTION, pass);
  
+   /* SIPLE IPA passes do not handle callgraphs with IPA transforms in it.
+      Apply all trnasforms first.  */
+   if (pass->type == SIMPLE_IPA_PASS)
+     {
+       bool applied = false;
+       do_per_function (apply_ipa_transforms, (void *)&applied);
+       if (applied)
+         cgraph_remove_unreachable_nodes (true, dump_file);
+     }
+ 
    if (!quiet_flag && !cfun)
      fprintf (stderr, " <%s>", pass->name ? pass->name : "");
  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR tree-optimize/49373 (IPA-PTA regression)
  2011-06-23  0:03 PR tree-optimize/49373 (IPA-PTA regression) Jan Hubicka
@ 2011-06-23 10:06 ` Richard Guenther
  2011-06-23 11:57   ` Jan Hubicka
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Guenther @ 2011-06-23 10:06 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, hp, rguenther

On Thu, Jun 23, 2011 at 1:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> this patch moves ipa-pta into new ipa pass queue of simple IPA passes executed
> after regular IPA passes. The reason is that IPA-PTA is really implemented as
> simple IPA pass (i.e. it looks into function bodies at its propagate stage and
> does not support WHOPR mode) and I planned having place for such passes for a
> while.  Until now, there has not been a reason however.
>
> The patch fixes regression I introduced by my alias reorg that triggered latent
> problem in IPA-PTA not really expecting to see cgraph with redirected edges.
> In longer term we want full IPA IPA-PTA, but we are simply not there, yet.
> Having place for late small IPA passes is however convenient for other reasons:
> we can remove functions whose references are optimized out or we can re-do
> some of early optimizations, inlcluding ipa-sra, late that might be interesting
> for LTO.
>
> The change needed quite a bit more unentaging of the old code than I would like
> and I was not fully succesful on it.  The reason is that until now we
> executed the transform stage of IPA passes all just before the first local
> pass of late copmilation (i.e. all_passes) is done.
>
> With this patch the ipa-transforms can take place either at beggining of all_passes
> (when all late IPA passes are disabled) or just before first late IPA pass.
>
> Original motivation for applying the transforms all at the time of late
> compilation was the "half-WHOPR" compilation model I developed originally
> cgraph for in 2003-2006.  The idea was that IPA passes will have function body
> summaries, just like in our curent WHOPR implementation, but I did not intended
> to implement the second streamping (WPA->LTRANS). Parallel compilation
> was not that much of concern at that time. I simply expected the compiler to
> produce final assembly in the same process as running WPA, but preventing a
> need to load all function bodies into memory at once.  The late copilation
> was expected to load function bodies one by one, optimize them and output to
> assembly (modulo preloading of inlined functions).
>
> With WHOPR this is not really so important (while it is theoretically possible
> with -flto -flto-partition=none, just not implemented this way: lto.c proactively
> loads all function bodies at early stages of LTO compilation).
>
> This "half-WHOPR" makes such late optimization passes impossible. WHOPR solves
> the problem by restricting late optimization passes to a parttions and thus makes
> late IPA passes resonable.
>
> Note that the trick reduces memory usage even w/o LTO because program after inline
> decisions are applied is bigger then before.  Currently we do not apply inline
> decisions "unit at a time".  This growth is however more or less bounded, since
> inliner should not expand unit more then by inline-unit-growth limit.
>
> For these reasons I don't really want to move applying of ipa transforms into
> "unit at a time" by default, until we have compelling reasons to do so (i.e.
> by default enabled late ipa pass that pays back for itself).
>
> As a result I have bit of problem with cfg fixup:  cfg fixup is needed after
> IPA passes because ipa-pure-const can turn functions to non-throwing pure or
> const and those needs compensation at caller side that can't be done by proper
> IPA pass.  We also need it at the beggining of all_passes because RTL code and
> local-pure-const can do the same (i.e. turn functions to non-EH/pure/const).
>
> Because we run cfg verifiers in between ipa transforms, we now need to run fixup
> one extra time: once after inlining and once at beggining of all_passes.
>
> I guess this is passmanager job to bookkeep this, but the passmanager is currently
> bit too inflexible since all its properties are static.
>
> There are laternatives, like fixing up cfg from the late pure const and RTL EH
> code, but they seem just as ugly as one extra pass through the statements.
>
> The patch also arranged cgraph to be valid after ipa transforms in the case some
> late IPA pass is run.  This is done by simply rebuilding cgraph edges since we
> do not preserve them through inline transform (it does cleanup_cfg and also we do
> not really maintain ipa references).
>
> Finally we also now can disable ipa-inline at -O0.
>
> I've bootstrapped/regtested x86_64-linux and verified that it fixed the regression.
> I've also bootstrapped with ipa-pta enabled by default with c,c++ and fortran.
> Libjava copmiles forever with ipa-pta.  There are some units needing about 3 hours
> to complete.
>
> OK?

Ok, but please change the IPA inline gate to honor flag_no_inline
(thus, (optimize && !flag_no_inline) || flag_lto || flag_wpa).

Thanks for working on this, I'll look to some followup cleanups
for PTA.  Now, when it works on LTRANS units we have to do
some adjustments (like not disable it in opts.c ;)) - do we know
whether a function is only called from within a ltrans unit somehow?

Thanks,
Richard.

> Honza
>
>        PR tree-optimize/49373
>        * tree-pass.h (all_late_ipa_passes): Declare.
>        * cgraphunit.c (init_lowered_empty_function): Fix properties.
>        (cgraph_optimize): Execute late passes; remove unreachable funcions after
>        materialization.
>        * ipa-inline.c (gate_ipa_inline): Enable only when optimizing or LTOing.
>        * passes.c (all_late_ipa_passes): Declare.
>        (dump_passes, register_pass): Handle late ipa passes.
>        (init_optimization_passes): Move ipa_pta to late passes; schedule fixup_cfg
>        at beggining of all_passes.
>        (apply_ipa_transforms): New function.
>        (execute_one_pass): When doing simple ipa pass, apply all transforms.
> Index: tree-pass.h
> ===================================================================
> *** tree-pass.h (revision 175293)
> --- tree-pass.h (working copy)
> *************** extern struct gimple_opt_pass pass_conve
> *** 577,583 ****
>
>  /* The root of the compilation pass tree, once constructed.  */
>  extern struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
> !                        *all_regular_ipa_passes, *all_lto_gen_passes;
>
>  /* Define a list of pass lists so that both passes.c and plugins can easily
>     find all the pass lists.  */
> --- 577,583 ----
>
>  /* The root of the compilation pass tree, once constructed.  */
>  extern struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
> !                        *all_regular_ipa_passes, *all_lto_gen_passes, *all_late_ipa_passes;
>
>  /* Define a list of pass lists so that both passes.c and plugins can easily
>     find all the pass lists.  */
> Index: cgraphunit.c
> ===================================================================
> *** cgraphunit.c        (revision 175293)
> --- cgraphunit.c        (working copy)
> *************** init_lowered_empty_function (tree decl)
> *** 1420,1426 ****
>    DECL_SAVED_TREE (decl) = error_mark_node;
>    cfun->curr_properties |=
>      (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
> !      PROP_ssa);
>
>    /* Create BB for body of the function and connect it properly.  */
>    bb = create_basic_block (NULL, (void *) 0, ENTRY_BLOCK_PTR);
> --- 1420,1426 ----
>    DECL_SAVED_TREE (decl) = error_mark_node;
>    cfun->curr_properties |=
>      (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
> !      PROP_ssa | PROP_gimple_any);
>
>    /* Create BB for body of the function and connect it properly.  */
>    bb = create_basic_block (NULL, (void *) 0, ENTRY_BLOCK_PTR);
> *************** cgraph_optimize (void)
> *** 2101,2106 ****
> --- 2101,2113 ----
>  #endif
>
>    cgraph_materialize_all_clones ();
> +   bitmap_obstack_initialize (NULL);
> +   execute_ipa_pass_list (all_late_ipa_passes);
> +   cgraph_remove_unreachable_nodes (true, dump_file);
> + #ifdef ENABLE_CHECKING
> +   verify_cgraph ();
> + #endif
> +   bitmap_obstack_release (NULL);
>    cgraph_mark_functions_to_output ();
>
>    cgraph_state = CGRAPH_STATE_EXPANSION;
> Index: ipa-inline.c
> ===================================================================
> *** ipa-inline.c        (revision 175293)
> --- ipa-inline.c        (working copy)
> *************** struct gimple_opt_pass pass_early_inline
> *** 1972,1988 ****
>
>
>  /* When to run IPA inlining.  Inlining of always-inline functions
> !    happens during early inlining.  */
>
>  static bool
>  gate_ipa_inline (void)
>  {
> !   /* ???  We'd like to skip this if not optimizing or not inlining as
> !      all always-inline functions have been processed by early
> !      inlining already.  But this at least breaks EH with C++ as
> !      we need to unconditionally run fixup_cfg even at -O0.
> !      So leave it on unconditionally for now.  */
> !   return 1;
>  }
>
>  struct ipa_opt_pass_d pass_ipa_inline =
> --- 1972,1986 ----
>
>
>  /* When to run IPA inlining.  Inlining of always-inline functions
> !    happens during early inlining.
> !
> !    Enable inlining unconditoinally at -flto.  We need size estimates to
> !    drive partitioning.  */
>
>  static bool
>  gate_ipa_inline (void)
>  {
> !   return optimize || flag_lto || flag_wpa;
>  }
>
>  struct ipa_opt_pass_d pass_ipa_inline =
> Index: passes.c
> ===================================================================
> *** passes.c    (revision 175293)
> --- passes.c    (working copy)
> *************** struct rtl_opt_pass pass_postreload =
> *** 332,338 ****
>
>  /* The root of the compilation pass tree, once constructed.  */
>  struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
> !   *all_regular_ipa_passes, *all_lto_gen_passes;
>
>  /* This is used by plugins, and should also be used in register_pass.  */
>  #define DEF_PASS_LIST(LIST) &LIST,
> --- 332,338 ----
>
>  /* The root of the compilation pass tree, once constructed.  */
>  struct opt_pass *all_passes, *all_small_ipa_passes, *all_lowering_passes,
> !   *all_regular_ipa_passes, *all_late_ipa_passes, *all_lto_gen_passes;
>
>  /* This is used by plugins, and should also be used in register_pass.  */
>  #define DEF_PASS_LIST(LIST) &LIST,
> *************** dump_passes (void)
> *** 617,622 ****
> --- 617,623 ----
>    dump_pass_list (all_small_ipa_passes, 1);
>    dump_pass_list (all_regular_ipa_passes, 1);
>    dump_pass_list (all_lto_gen_passes, 1);
> +   dump_pass_list (all_late_ipa_passes, 1);
>    dump_pass_list (all_passes, 1);
>
>    pop_cfun ();
> *************** register_pass (struct register_pass_info
> *** 1103,1108 ****
> --- 1104,1111 ----
>    if (!success || all_instances)
>      success |= position_pass (pass_info, &all_lto_gen_passes);
>    if (!success || all_instances)
> +     success |= position_pass (pass_info, &all_late_ipa_passes);
> +   if (!success || all_instances)
>      success |= position_pass (pass_info, &all_passes);
>    if (!success)
>      fatal_error
> *************** init_optimization_passes (void)
> *** 1249,1255 ****
>    NEXT_PASS (pass_ipa_inline);
>    NEXT_PASS (pass_ipa_pure_const);
>    NEXT_PASS (pass_ipa_reference);
> -   NEXT_PASS (pass_ipa_pta);
>    *p = NULL;
>
>    p = &all_lto_gen_passes;
> --- 1252,1257 ----
> *************** init_optimization_passes (void)
> *** 1257,1265 ****
> --- 1259,1274 ----
>    NEXT_PASS (pass_ipa_lto_finish_out);  /* This must be the last LTO pass.  */
>    *p = NULL;
>
> +   /* Simple IPA passes executed after the regular passes.  In WHOPR mode the
> +      passes are executed after partitioning and thus see just parts of the
> +      compiled unit.  */
> +   p = &all_late_ipa_passes;
> +   NEXT_PASS (pass_ipa_pta);
> +   *p = NULL;
>    /* These passes are run after IPA passes on every function that is being
>       output to the assembler file.  */
>    p = &all_passes;
> +   NEXT_PASS (pass_fixup_cfg);
>    NEXT_PASS (pass_lower_eh_dispatch);
>    NEXT_PASS (pass_all_optimizations);
>      {
> *************** init_optimization_passes (void)
> *** 1517,1522 ****
> --- 1526,1534 ----
>    register_dump_files (all_lto_gen_passes,
>                       PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh
>                       | PROP_cfg);
> +   register_dump_files (all_late_ipa_passes,
> +                      PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh
> +                      | PROP_cfg);
>    register_dump_files (all_passes,
>                       PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh
>                       | PROP_cfg);
> *************** execute_all_ipa_transforms (void)
> *** 1935,1940 ****
> --- 1947,1966 ----
>      }
>  }
>
> + /* Callback for do_per_function to apply all IPA transforms.  */
> +
> + static void
> + apply_ipa_transforms (void *data)
> + {
> +   struct cgraph_node *node = cgraph_get_node (current_function_decl);
> +   if (!node->global.inlined_to && node->ipa_transforms_to_apply)
> +     {
> +       *(bool *)data = true;
> +       execute_all_ipa_transforms();
> +       rebuild_cgraph_edges ();
> +     }
> + }
> +
>  /* Check if PASS is explicitly disabled or enabled and return
>     the gate status.  FUNC is the function to be processed, and
>     GATE_STATUS is the gate status determined by pass manager by
> *************** execute_one_pass (struct opt_pass *pass)
> *** 1996,2001 ****
> --- 2022,2037 ----
>       executed.  */
>    invoke_plugin_callbacks (PLUGIN_PASS_EXECUTION, pass);
>
> +   /* SIPLE IPA passes do not handle callgraphs with IPA transforms in it.
> +      Apply all trnasforms first.  */
> +   if (pass->type == SIMPLE_IPA_PASS)
> +     {
> +       bool applied = false;
> +       do_per_function (apply_ipa_transforms, (void *)&applied);
> +       if (applied)
> +         cgraph_remove_unreachable_nodes (true, dump_file);
> +     }
> +
>    if (!quiet_flag && !cfun)
>      fprintf (stderr, " <%s>", pass->name ? pass->name : "");
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR tree-optimize/49373 (IPA-PTA regression)
  2011-06-23 10:06 ` Richard Guenther
@ 2011-06-23 11:57   ` Jan Hubicka
  2011-06-23 13:14     ` Jan Hubicka
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Hubicka @ 2011-06-23 11:57 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jan Hubicka, gcc-patches, hp, rguenther

> Ok, but please change the IPA inline gate to honor flag_no_inline
> (thus, (optimize && !flag_no_inline) || flag_lto || flag_wpa).
OK.
> 
> Thanks for working on this, I'll look to some followup cleanups
> for PTA.  Now, when it works on LTRANS units we have to do
> some adjustments (like not disable it in opts.c ;)) - do we know

Yep, I decided that it can go as a followup. Thanks for working on this!
BTW the PTA solving time seems rather high now not only for libjava, but
also for tramp3d and other bigger units I tested.

> whether a function is only called from within a ltrans unit somehow?

When you look at the cgraph, the flags are set as at WPA time.
I.e. if function is local to program it has externally_visible 0
and then you have used_from_other_partition/in_other_partition flags
saying how the other ltrans partitions behave to your function.

If you decide to ignore cgraph (that is probably not coolest idea),
you have the usual PUBLIC flag that is set for all objects used cross
ltrans boundary (since they are now hidden public symbols).

You also have the address taken shipped from WPA info, so you know if other
units reads/writes the objects or also take its address that probably comes
handy.

Honza

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR tree-optimize/49373 (IPA-PTA regression)
  2011-06-23 11:57   ` Jan Hubicka
@ 2011-06-23 13:14     ` Jan Hubicka
  2011-06-23 14:08       ` Richard Guenther
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Hubicka @ 2011-06-23 13:14 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Guenther, gcc-patches, hp, rguenther

> > Ok, but please change the IPA inline gate to honor flag_no_inline
> > (thus, (optimize && !flag_no_inline) || flag_lto || flag_wpa).
> OK.
Actually it won't work, since results of inline-analysis are used by most of other
IPA passes (i.e. ipa-cp and ipa-sra for cloning decisions, etc.).
As we chatted about shortly on the summit, perhaps it would make sense to declare
the jump-functions and ipa-inline analysis to be independent analysis passes (aka ipa-pta).
One computing jump functions and other deciding on size/time estimates.
But definitely incrementally.

Honza
> > 
> > Thanks for working on this, I'll look to some followup cleanups
> > for PTA.  Now, when it works on LTRANS units we have to do
> > some adjustments (like not disable it in opts.c ;)) - do we know
> 
> Yep, I decided that it can go as a followup. Thanks for working on this!
> BTW the PTA solving time seems rather high now not only for libjava, but
> also for tramp3d and other bigger units I tested.
> 
> > whether a function is only called from within a ltrans unit somehow?
> 
> When you look at the cgraph, the flags are set as at WPA time.
> I.e. if function is local to program it has externally_visible 0
> and then you have used_from_other_partition/in_other_partition flags
> saying how the other ltrans partitions behave to your function.
> 
> If you decide to ignore cgraph (that is probably not coolest idea),
> you have the usual PUBLIC flag that is set for all objects used cross
> ltrans boundary (since they are now hidden public symbols).
> 
> You also have the address taken shipped from WPA info, so you know if other
> units reads/writes the objects or also take its address that probably comes
> handy.
> 
> Honza

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR tree-optimize/49373 (IPA-PTA regression)
  2011-06-23 13:14     ` Jan Hubicka
@ 2011-06-23 14:08       ` Richard Guenther
  2011-06-25 20:19         ` Jan Hubicka
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Guenther @ 2011-06-23 14:08 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, hp, rguenther

On Thu, Jun 23, 2011 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > Ok, but please change the IPA inline gate to honor flag_no_inline
>> > (thus, (optimize && !flag_no_inline) || flag_lto || flag_wpa).
>> OK.
> Actually it won't work, since results of inline-analysis are used by most of other
> IPA passes (i.e. ipa-cp and ipa-sra for cloning decisions, etc.).
> As we chatted about shortly on the summit, perhaps it would make sense to declare
> the jump-functions and ipa-inline analysis to be independent analysis passes (aka ipa-pta).
> One computing jump functions and other deciding on size/time estimates.
> But definitely incrementally.

Ok, fair enough.

Richard.

> Honza
>> >
>> > Thanks for working on this, I'll look to some followup cleanups
>> > for PTA.  Now, when it works on LTRANS units we have to do
>> > some adjustments (like not disable it in opts.c ;)) - do we know
>>
>> Yep, I decided that it can go as a followup. Thanks for working on this!
>> BTW the PTA solving time seems rather high now not only for libjava, but
>> also for tramp3d and other bigger units I tested.
>>
>> > whether a function is only called from within a ltrans unit somehow?
>>
>> When you look at the cgraph, the flags are set as at WPA time.
>> I.e. if function is local to program it has externally_visible 0
>> and then you have used_from_other_partition/in_other_partition flags
>> saying how the other ltrans partitions behave to your function.
>>
>> If you decide to ignore cgraph (that is probably not coolest idea),
>> you have the usual PUBLIC flag that is set for all objects used cross
>> ltrans boundary (since they are now hidden public symbols).
>>
>> You also have the address taken shipped from WPA info, so you know if other
>> units reads/writes the objects or also take its address that probably comes
>> handy.
>>
>> Honza
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR tree-optimize/49373 (IPA-PTA regression)
  2011-06-23 14:08       ` Richard Guenther
@ 2011-06-25 20:19         ` Jan Hubicka
  2011-06-27 10:08           ` Richard Guenther
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Hubicka @ 2011-06-25 20:19 UTC (permalink / raw)
  To: Richard Guenther, mjambor; +Cc: Jan Hubicka, gcc-patches, hp, rguenther

Hi,
just for those who are interested, this is quick&dirty patch adding another
pass of local optimization passes at WPA time.  I've added early inliner and
IPA-SRA because I was curious how much of optimization oppurtunities we are
missing by limiting those to early pass.

With Early inlining it seems to be very little. We inline one extra call when
building Mozilla in LTO mode.

IPA SRA is different story.  While we do 579 IPA SRA clones in the early pass,
the late pass produces 13014 clones (22 times more ;) suggesting that the pass
might be interesting at IPA level after all.

There are 78686 functions after inlining in Mozilla, so one out of 7 functions
is touched.

Size difference of libxul is not great, about 100Kb reduction. I will try
benchmarking it eventually, too.

Honza


Index: cgraph.c
===================================================================
*** cgraph.c	(revision 175350)
--- cgraph.c	(working copy)
*************** cgraph_release_function_body (struct cgr
*** 1389,1396 ****
  	}
        if (cfun->cfg)
  	{
! 	  gcc_assert (dom_computed[0] == DOM_NONE);
! 	  gcc_assert (dom_computed[1] == DOM_NONE);
  	  clear_edges ();
  	}
        if (cfun->value_histograms)
--- 1393,1403 ----
  	}
        if (cfun->cfg)
  	{
! 	  /*gcc_assert (dom_computed[0] == DOM_NONE);
! 	  gcc_assert (dom_computed[1] == DOM_NONE);*/
! free_dominance_info (CDI_DOMINATORS);
! free_dominance_info (CDI_POST_DOMINATORS);
! 
  	  clear_edges ();
  	}
        if (cfun->value_histograms)
Index: tree-pass.h
===================================================================
*** tree-pass.h	(revision 175350)
--- tree-pass.h	(working copy)
*************** extern struct simple_ipa_opt_pass pass_i
*** 452,458 ****
  extern struct simple_ipa_opt_pass pass_ipa_function_and_variable_visibility;
  extern struct simple_ipa_opt_pass pass_ipa_tree_profile;
  
! extern struct simple_ipa_opt_pass pass_early_local_passes;
  
  extern struct ipa_opt_pass_d pass_ipa_whole_program_visibility;
  extern struct ipa_opt_pass_d pass_ipa_lto_gimple_out;
--- 452,458 ----
  extern struct simple_ipa_opt_pass pass_ipa_function_and_variable_visibility;
  extern struct simple_ipa_opt_pass pass_ipa_tree_profile;
  
! extern struct simple_ipa_opt_pass pass_early_local_passes, pass_late_local_passes, pass_late_local_passes2;
  
  extern struct ipa_opt_pass_d pass_ipa_whole_program_visibility;
  extern struct ipa_opt_pass_d pass_ipa_lto_gimple_out;
Index: ipa-inline-analysis.c
===================================================================
*** ipa-inline-analysis.c	(revision 175350)
--- ipa-inline-analysis.c	(working copy)
*************** estimate_function_body_sizes (struct cgr
*** 1535,1542 ****
  		  edge->call_stmt_cannot_inline_p = true;
  		  gimple_call_set_cannot_inline (stmt, true);
  		}
! 	      else
! 		gcc_assert (!gimple_call_cannot_inline_p (stmt));
  	    }
  
  	  /* TODO: When conditional jump or swithc is known to be constant, but
--- 1535,1542 ----
  		  edge->call_stmt_cannot_inline_p = true;
  		  gimple_call_set_cannot_inline (stmt, true);
  		}
! 	      /*else
! 		gcc_assert (!gimple_call_cannot_inline_p (stmt));*/
  	    }
  
  	  /* TODO: When conditional jump or swithc is known to be constant, but
Index: tree-inline.c
===================================================================
*** tree-inline.c	(revision 175350)
--- tree-inline.c	(working copy)
*************** expand_call_inline (basic_block bb, gimp
*** 3891,3897 ****
    id->src_cfun = DECL_STRUCT_FUNCTION (fn);
    id->gimple_call = stmt;
  
!   gcc_assert (!id->src_cfun->after_inlining);
  
    id->entry_bb = bb;
    if (lookup_attribute ("cold", DECL_ATTRIBUTES (fn)))
--- 3891,3897 ----
    id->src_cfun = DECL_STRUCT_FUNCTION (fn);
    id->gimple_call = stmt;
  
!   /*gcc_assert (!id->src_cfun->after_inlining);*/
  
    id->entry_bb = bb;
    if (lookup_attribute ("cold", DECL_ATTRIBUTES (fn)))
Index: tree-optimize.c
===================================================================
*** tree-optimize.c	(revision 175350)
--- tree-optimize.c	(working copy)
*************** struct simple_ipa_opt_pass pass_early_lo
*** 123,128 ****
--- 123,189 ----
  /* Gate: execute, or not, all of the non-trivial optimizations.  */
  
  static bool
+ gate_all_late_local_passes (void)
+ {
+ 	  /* Don't bother doing anything if the program has errors.  */
+   return (!seen_error () && optimize);
+ }
+ 
+ static unsigned int
+ execute_all_late_local_passes (void)
+ {
+   /* Once this pass (and its sub-passes) are complete, all functions
+      will be in SSA form.  Technically this state change is happening
+      a tad late, since the sub-passes have not yet run, but since
+      none of the sub-passes are IPA passes and do not create new
+      functions, this is ok.  We're setting this value for the benefit
+      of IPA passes that follow.  */
+   if (cgraph_state < CGRAPH_STATE_IPA_SSA)
+     cgraph_state = CGRAPH_STATE_IPA_SSA;
+   return 0;
+ }
+ 
+ struct simple_ipa_opt_pass pass_late_local_passes =
+ {
+  {
+   SIMPLE_IPA_PASS,
+   "late_local_cleanups",		/* name */
+   gate_all_late_local_passes,		/* gate */
+   execute_all_late_local_passes,	/* execute */
+   NULL,					/* sub */
+   NULL,					/* next */
+   0,					/* static_pass_number */
+   TV_EARLY_LOCAL,			/* tv_id */
+   0,					/* properties_required */
+   0,					/* properties_provided */
+   0,					/* properties_destroyed */
+   0,					/* todo_flags_start */
+   TODO_remove_functions	 		/* todo_flags_finish */
+  }
+ };
+ 
+ struct simple_ipa_opt_pass pass_late_local_passes2 =
+ {
+  {
+   SIMPLE_IPA_PASS,
+   "late_local_cleanups2",		/* name */
+   gate_all_late_local_passes,		/* gate */
+   execute_all_late_local_passes,	/* execute */
+   NULL,					/* sub */
+   NULL,					/* next */
+   0,					/* static_pass_number */
+   TV_EARLY_LOCAL,			/* tv_id */
+   0,					/* properties_required */
+   0,					/* properties_provided */
+   0,					/* properties_destroyed */
+   0,					/* todo_flags_start */
+   TODO_remove_functions	 		/* todo_flags_finish */
+  }
+ };
+ 
+ /* Gate: execute, or not, all of the non-trivial optimizations.  */
+ 
+ static bool
  gate_all_early_optimizations (void)
  {
    return (optimize >= 1
Index: passes.c
===================================================================
*** passes.c	(revision 175350)
--- passes.c	(working copy)
*************** init_optimization_passes (void)
*** 1263,1268 ****
--- 1263,1288 ----
       passes are executed after partitioning and thus see just parts of the
       compiled unit.  */
    p = &all_late_ipa_passes;
+   NEXT_PASS (pass_late_local_passes);
+     {
+       struct opt_pass **p = &pass_late_local_passes.pass.sub;
+       NEXT_PASS (pass_inline_parameters);
+       NEXT_PASS (pass_release_ssa_names);
+     }
+   NEXT_PASS (pass_late_local_passes2);
+     {
+       struct opt_pass **p = &pass_late_local_passes2.pass.sub;
+       NEXT_PASS (pass_early_inline);
+       NEXT_PASS (pass_remove_cgraph_callee_edges);
+       NEXT_PASS (pass_ccp);
+       NEXT_PASS (pass_forwprop);
+       NEXT_PASS (pass_fre);
+       NEXT_PASS (pass_cd_dce);
+       NEXT_PASS (pass_early_ipa_sra);
+       NEXT_PASS (pass_release_ssa_names);
+       NEXT_PASS (pass_rebuild_cgraph_edges);
+       NEXT_PASS (pass_inline_parameters);
+     }
    NEXT_PASS (pass_ipa_pta);
    *p = NULL;
    /* These passes are run after IPA passes on every function that is being
Index: statistics.c
===================================================================
*** statistics.c	(revision 175350)
--- statistics.c	(working copy)
*************** statistics_fini_pass_3 (void **slot, voi
*** 171,176 ****
--- 171,178 ----
  void
  statistics_fini_pass (void)
  {
+   if (!current_pass)
+     return;
    if (current_pass->static_pass_number == -1)
      return;
  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR tree-optimize/49373 (IPA-PTA regression)
  2011-06-25 20:19         ` Jan Hubicka
@ 2011-06-27 10:08           ` Richard Guenther
  2011-06-27 10:27             ` Jan Hubicka
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Guenther @ 2011-06-27 10:08 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Guenther, mjambor, gcc-patches, hp

[-- Attachment #1: Type: TEXT/PLAIN, Size: 8588 bytes --]

On Sat, 25 Jun 2011, Jan Hubicka wrote:

> Hi,
> just for those who are interested, this is quick&dirty patch adding another
> pass of local optimization passes at WPA time.  I've added early inliner and
> IPA-SRA because I was curious how much of optimization oppurtunities we are
> missing by limiting those to early pass.

At WPA time?  I thought we don't have function bodies around.

Richard.

> With Early inlining it seems to be very little. We inline one extra call when
> building Mozilla in LTO mode.
> 
> IPA SRA is different story.  While we do 579 IPA SRA clones in the early pass,
> the late pass produces 13014 clones (22 times more ;) suggesting that the pass
> might be interesting at IPA level after all.
> 
> There are 78686 functions after inlining in Mozilla, so one out of 7 functions
> is touched.
> 
> Size difference of libxul is not great, about 100Kb reduction. I will try
> benchmarking it eventually, too.
> 
> Honza
> 
> 
> Index: cgraph.c
> ===================================================================
> *** cgraph.c	(revision 175350)
> --- cgraph.c	(working copy)
> *************** cgraph_release_function_body (struct cgr
> *** 1389,1396 ****
>   	}
>         if (cfun->cfg)
>   	{
> ! 	  gcc_assert (dom_computed[0] == DOM_NONE);
> ! 	  gcc_assert (dom_computed[1] == DOM_NONE);
>   	  clear_edges ();
>   	}
>         if (cfun->value_histograms)
> --- 1393,1403 ----
>   	}
>         if (cfun->cfg)
>   	{
> ! 	  /*gcc_assert (dom_computed[0] == DOM_NONE);
> ! 	  gcc_assert (dom_computed[1] == DOM_NONE);*/
> ! free_dominance_info (CDI_DOMINATORS);
> ! free_dominance_info (CDI_POST_DOMINATORS);
> ! 
>   	  clear_edges ();
>   	}
>         if (cfun->value_histograms)
> Index: tree-pass.h
> ===================================================================
> *** tree-pass.h	(revision 175350)
> --- tree-pass.h	(working copy)
> *************** extern struct simple_ipa_opt_pass pass_i
> *** 452,458 ****
>   extern struct simple_ipa_opt_pass pass_ipa_function_and_variable_visibility;
>   extern struct simple_ipa_opt_pass pass_ipa_tree_profile;
>   
> ! extern struct simple_ipa_opt_pass pass_early_local_passes;
>   
>   extern struct ipa_opt_pass_d pass_ipa_whole_program_visibility;
>   extern struct ipa_opt_pass_d pass_ipa_lto_gimple_out;
> --- 452,458 ----
>   extern struct simple_ipa_opt_pass pass_ipa_function_and_variable_visibility;
>   extern struct simple_ipa_opt_pass pass_ipa_tree_profile;
>   
> ! extern struct simple_ipa_opt_pass pass_early_local_passes, pass_late_local_passes, pass_late_local_passes2;
>   
>   extern struct ipa_opt_pass_d pass_ipa_whole_program_visibility;
>   extern struct ipa_opt_pass_d pass_ipa_lto_gimple_out;
> Index: ipa-inline-analysis.c
> ===================================================================
> *** ipa-inline-analysis.c	(revision 175350)
> --- ipa-inline-analysis.c	(working copy)
> *************** estimate_function_body_sizes (struct cgr
> *** 1535,1542 ****
>   		  edge->call_stmt_cannot_inline_p = true;
>   		  gimple_call_set_cannot_inline (stmt, true);
>   		}
> ! 	      else
> ! 		gcc_assert (!gimple_call_cannot_inline_p (stmt));
>   	    }
>   
>   	  /* TODO: When conditional jump or swithc is known to be constant, but
> --- 1535,1542 ----
>   		  edge->call_stmt_cannot_inline_p = true;
>   		  gimple_call_set_cannot_inline (stmt, true);
>   		}
> ! 	      /*else
> ! 		gcc_assert (!gimple_call_cannot_inline_p (stmt));*/
>   	    }
>   
>   	  /* TODO: When conditional jump or swithc is known to be constant, but
> Index: tree-inline.c
> ===================================================================
> *** tree-inline.c	(revision 175350)
> --- tree-inline.c	(working copy)
> *************** expand_call_inline (basic_block bb, gimp
> *** 3891,3897 ****
>     id->src_cfun = DECL_STRUCT_FUNCTION (fn);
>     id->gimple_call = stmt;
>   
> !   gcc_assert (!id->src_cfun->after_inlining);
>   
>     id->entry_bb = bb;
>     if (lookup_attribute ("cold", DECL_ATTRIBUTES (fn)))
> --- 3891,3897 ----
>     id->src_cfun = DECL_STRUCT_FUNCTION (fn);
>     id->gimple_call = stmt;
>   
> !   /*gcc_assert (!id->src_cfun->after_inlining);*/
>   
>     id->entry_bb = bb;
>     if (lookup_attribute ("cold", DECL_ATTRIBUTES (fn)))
> Index: tree-optimize.c
> ===================================================================
> *** tree-optimize.c	(revision 175350)
> --- tree-optimize.c	(working copy)
> *************** struct simple_ipa_opt_pass pass_early_lo
> *** 123,128 ****
> --- 123,189 ----
>   /* Gate: execute, or not, all of the non-trivial optimizations.  */
>   
>   static bool
> + gate_all_late_local_passes (void)
> + {
> + 	  /* Don't bother doing anything if the program has errors.  */
> +   return (!seen_error () && optimize);
> + }
> + 
> + static unsigned int
> + execute_all_late_local_passes (void)
> + {
> +   /* Once this pass (and its sub-passes) are complete, all functions
> +      will be in SSA form.  Technically this state change is happening
> +      a tad late, since the sub-passes have not yet run, but since
> +      none of the sub-passes are IPA passes and do not create new
> +      functions, this is ok.  We're setting this value for the benefit
> +      of IPA passes that follow.  */
> +   if (cgraph_state < CGRAPH_STATE_IPA_SSA)
> +     cgraph_state = CGRAPH_STATE_IPA_SSA;
> +   return 0;
> + }
> + 
> + struct simple_ipa_opt_pass pass_late_local_passes =
> + {
> +  {
> +   SIMPLE_IPA_PASS,
> +   "late_local_cleanups",		/* name */
> +   gate_all_late_local_passes,		/* gate */
> +   execute_all_late_local_passes,	/* execute */
> +   NULL,					/* sub */
> +   NULL,					/* next */
> +   0,					/* static_pass_number */
> +   TV_EARLY_LOCAL,			/* tv_id */
> +   0,					/* properties_required */
> +   0,					/* properties_provided */
> +   0,					/* properties_destroyed */
> +   0,					/* todo_flags_start */
> +   TODO_remove_functions	 		/* todo_flags_finish */
> +  }
> + };
> + 
> + struct simple_ipa_opt_pass pass_late_local_passes2 =
> + {
> +  {
> +   SIMPLE_IPA_PASS,
> +   "late_local_cleanups2",		/* name */
> +   gate_all_late_local_passes,		/* gate */
> +   execute_all_late_local_passes,	/* execute */
> +   NULL,					/* sub */
> +   NULL,					/* next */
> +   0,					/* static_pass_number */
> +   TV_EARLY_LOCAL,			/* tv_id */
> +   0,					/* properties_required */
> +   0,					/* properties_provided */
> +   0,					/* properties_destroyed */
> +   0,					/* todo_flags_start */
> +   TODO_remove_functions	 		/* todo_flags_finish */
> +  }
> + };
> + 
> + /* Gate: execute, or not, all of the non-trivial optimizations.  */
> + 
> + static bool
>   gate_all_early_optimizations (void)
>   {
>     return (optimize >= 1
> Index: passes.c
> ===================================================================
> *** passes.c	(revision 175350)
> --- passes.c	(working copy)
> *************** init_optimization_passes (void)
> *** 1263,1268 ****
> --- 1263,1288 ----
>        passes are executed after partitioning and thus see just parts of the
>        compiled unit.  */
>     p = &all_late_ipa_passes;
> +   NEXT_PASS (pass_late_local_passes);
> +     {
> +       struct opt_pass **p = &pass_late_local_passes.pass.sub;
> +       NEXT_PASS (pass_inline_parameters);
> +       NEXT_PASS (pass_release_ssa_names);
> +     }
> +   NEXT_PASS (pass_late_local_passes2);
> +     {
> +       struct opt_pass **p = &pass_late_local_passes2.pass.sub;
> +       NEXT_PASS (pass_early_inline);
> +       NEXT_PASS (pass_remove_cgraph_callee_edges);
> +       NEXT_PASS (pass_ccp);
> +       NEXT_PASS (pass_forwprop);
> +       NEXT_PASS (pass_fre);
> +       NEXT_PASS (pass_cd_dce);
> +       NEXT_PASS (pass_early_ipa_sra);
> +       NEXT_PASS (pass_release_ssa_names);
> +       NEXT_PASS (pass_rebuild_cgraph_edges);
> +       NEXT_PASS (pass_inline_parameters);
> +     }
>     NEXT_PASS (pass_ipa_pta);
>     *p = NULL;
>     /* These passes are run after IPA passes on every function that is being
> Index: statistics.c
> ===================================================================
> *** statistics.c	(revision 175350)
> --- statistics.c	(working copy)
> *************** statistics_fini_pass_3 (void **slot, voi
> *** 171,176 ****
> --- 171,178 ----
>   void
>   statistics_fini_pass (void)
>   {
> +   if (!current_pass)
> +     return;
>     if (current_pass->static_pass_number == -1)
>       return;
>   
> 
> 

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR tree-optimize/49373 (IPA-PTA regression)
  2011-06-27 10:08           ` Richard Guenther
@ 2011-06-27 10:27             ` Jan Hubicka
  2011-06-27 11:08               ` Richard Guenther
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Hubicka @ 2011-06-27 10:27 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jan Hubicka, Richard Guenther, mjambor, gcc-patches, hp

> On Sat, 25 Jun 2011, Jan Hubicka wrote:
> 
> > Hi,
> > just for those who are interested, this is quick&dirty patch adding another
> > pass of local optimization passes at WPA time.  I've added early inliner and
> > IPA-SRA because I was curious how much of optimization oppurtunities we are
> > missing by limiting those to early pass.
> 
> At WPA time?  I thought we don't have function bodies around.

I meant LTRANS time, indeed.
Anyway the tests was made with -flto-partition=none.

Honza
> 
> Richard.
> 
> > With Early inlining it seems to be very little. We inline one extra call when
> > building Mozilla in LTO mode.
> > 
> > IPA SRA is different story.  While we do 579 IPA SRA clones in the early pass,
> > the late pass produces 13014 clones (22 times more ;) suggesting that the pass
> > might be interesting at IPA level after all.
> > 
> > There are 78686 functions after inlining in Mozilla, so one out of 7 functions
> > is touched.
> > 
> > Size difference of libxul is not great, about 100Kb reduction. I will try
> > benchmarking it eventually, too.
> > 
> > Honza
> > 
> > 
> > Index: cgraph.c
> > ===================================================================
> > *** cgraph.c	(revision 175350)
> > --- cgraph.c	(working copy)
> > *************** cgraph_release_function_body (struct cgr
> > *** 1389,1396 ****
> >   	}
> >         if (cfun->cfg)
> >   	{
> > ! 	  gcc_assert (dom_computed[0] == DOM_NONE);
> > ! 	  gcc_assert (dom_computed[1] == DOM_NONE);
> >   	  clear_edges ();
> >   	}
> >         if (cfun->value_histograms)
> > --- 1393,1403 ----
> >   	}
> >         if (cfun->cfg)
> >   	{
> > ! 	  /*gcc_assert (dom_computed[0] == DOM_NONE);
> > ! 	  gcc_assert (dom_computed[1] == DOM_NONE);*/
> > ! free_dominance_info (CDI_DOMINATORS);
> > ! free_dominance_info (CDI_POST_DOMINATORS);
> > ! 
> >   	  clear_edges ();
> >   	}
> >         if (cfun->value_histograms)
> > Index: tree-pass.h
> > ===================================================================
> > *** tree-pass.h	(revision 175350)
> > --- tree-pass.h	(working copy)
> > *************** extern struct simple_ipa_opt_pass pass_i
> > *** 452,458 ****
> >   extern struct simple_ipa_opt_pass pass_ipa_function_and_variable_visibility;
> >   extern struct simple_ipa_opt_pass pass_ipa_tree_profile;
> >   
> > ! extern struct simple_ipa_opt_pass pass_early_local_passes;
> >   
> >   extern struct ipa_opt_pass_d pass_ipa_whole_program_visibility;
> >   extern struct ipa_opt_pass_d pass_ipa_lto_gimple_out;
> > --- 452,458 ----
> >   extern struct simple_ipa_opt_pass pass_ipa_function_and_variable_visibility;
> >   extern struct simple_ipa_opt_pass pass_ipa_tree_profile;
> >   
> > ! extern struct simple_ipa_opt_pass pass_early_local_passes, pass_late_local_passes, pass_late_local_passes2;
> >   
> >   extern struct ipa_opt_pass_d pass_ipa_whole_program_visibility;
> >   extern struct ipa_opt_pass_d pass_ipa_lto_gimple_out;
> > Index: ipa-inline-analysis.c
> > ===================================================================
> > *** ipa-inline-analysis.c	(revision 175350)
> > --- ipa-inline-analysis.c	(working copy)
> > *************** estimate_function_body_sizes (struct cgr
> > *** 1535,1542 ****
> >   		  edge->call_stmt_cannot_inline_p = true;
> >   		  gimple_call_set_cannot_inline (stmt, true);
> >   		}
> > ! 	      else
> > ! 		gcc_assert (!gimple_call_cannot_inline_p (stmt));
> >   	    }
> >   
> >   	  /* TODO: When conditional jump or swithc is known to be constant, but
> > --- 1535,1542 ----
> >   		  edge->call_stmt_cannot_inline_p = true;
> >   		  gimple_call_set_cannot_inline (stmt, true);
> >   		}
> > ! 	      /*else
> > ! 		gcc_assert (!gimple_call_cannot_inline_p (stmt));*/
> >   	    }
> >   
> >   	  /* TODO: When conditional jump or swithc is known to be constant, but
> > Index: tree-inline.c
> > ===================================================================
> > *** tree-inline.c	(revision 175350)
> > --- tree-inline.c	(working copy)
> > *************** expand_call_inline (basic_block bb, gimp
> > *** 3891,3897 ****
> >     id->src_cfun = DECL_STRUCT_FUNCTION (fn);
> >     id->gimple_call = stmt;
> >   
> > !   gcc_assert (!id->src_cfun->after_inlining);
> >   
> >     id->entry_bb = bb;
> >     if (lookup_attribute ("cold", DECL_ATTRIBUTES (fn)))
> > --- 3891,3897 ----
> >     id->src_cfun = DECL_STRUCT_FUNCTION (fn);
> >     id->gimple_call = stmt;
> >   
> > !   /*gcc_assert (!id->src_cfun->after_inlining);*/
> >   
> >     id->entry_bb = bb;
> >     if (lookup_attribute ("cold", DECL_ATTRIBUTES (fn)))
> > Index: tree-optimize.c
> > ===================================================================
> > *** tree-optimize.c	(revision 175350)
> > --- tree-optimize.c	(working copy)
> > *************** struct simple_ipa_opt_pass pass_early_lo
> > *** 123,128 ****
> > --- 123,189 ----
> >   /* Gate: execute, or not, all of the non-trivial optimizations.  */
> >   
> >   static bool
> > + gate_all_late_local_passes (void)
> > + {
> > + 	  /* Don't bother doing anything if the program has errors.  */
> > +   return (!seen_error () && optimize);
> > + }
> > + 
> > + static unsigned int
> > + execute_all_late_local_passes (void)
> > + {
> > +   /* Once this pass (and its sub-passes) are complete, all functions
> > +      will be in SSA form.  Technically this state change is happening
> > +      a tad late, since the sub-passes have not yet run, but since
> > +      none of the sub-passes are IPA passes and do not create new
> > +      functions, this is ok.  We're setting this value for the benefit
> > +      of IPA passes that follow.  */
> > +   if (cgraph_state < CGRAPH_STATE_IPA_SSA)
> > +     cgraph_state = CGRAPH_STATE_IPA_SSA;
> > +   return 0;
> > + }
> > + 
> > + struct simple_ipa_opt_pass pass_late_local_passes =
> > + {
> > +  {
> > +   SIMPLE_IPA_PASS,
> > +   "late_local_cleanups",		/* name */
> > +   gate_all_late_local_passes,		/* gate */
> > +   execute_all_late_local_passes,	/* execute */
> > +   NULL,					/* sub */
> > +   NULL,					/* next */
> > +   0,					/* static_pass_number */
> > +   TV_EARLY_LOCAL,			/* tv_id */
> > +   0,					/* properties_required */
> > +   0,					/* properties_provided */
> > +   0,					/* properties_destroyed */
> > +   0,					/* todo_flags_start */
> > +   TODO_remove_functions	 		/* todo_flags_finish */
> > +  }
> > + };
> > + 
> > + struct simple_ipa_opt_pass pass_late_local_passes2 =
> > + {
> > +  {
> > +   SIMPLE_IPA_PASS,
> > +   "late_local_cleanups2",		/* name */
> > +   gate_all_late_local_passes,		/* gate */
> > +   execute_all_late_local_passes,	/* execute */
> > +   NULL,					/* sub */
> > +   NULL,					/* next */
> > +   0,					/* static_pass_number */
> > +   TV_EARLY_LOCAL,			/* tv_id */
> > +   0,					/* properties_required */
> > +   0,					/* properties_provided */
> > +   0,					/* properties_destroyed */
> > +   0,					/* todo_flags_start */
> > +   TODO_remove_functions	 		/* todo_flags_finish */
> > +  }
> > + };
> > + 
> > + /* Gate: execute, or not, all of the non-trivial optimizations.  */
> > + 
> > + static bool
> >   gate_all_early_optimizations (void)
> >   {
> >     return (optimize >= 1
> > Index: passes.c
> > ===================================================================
> > *** passes.c	(revision 175350)
> > --- passes.c	(working copy)
> > *************** init_optimization_passes (void)
> > *** 1263,1268 ****
> > --- 1263,1288 ----
> >        passes are executed after partitioning and thus see just parts of the
> >        compiled unit.  */
> >     p = &all_late_ipa_passes;
> > +   NEXT_PASS (pass_late_local_passes);
> > +     {
> > +       struct opt_pass **p = &pass_late_local_passes.pass.sub;
> > +       NEXT_PASS (pass_inline_parameters);
> > +       NEXT_PASS (pass_release_ssa_names);
> > +     }
> > +   NEXT_PASS (pass_late_local_passes2);
> > +     {
> > +       struct opt_pass **p = &pass_late_local_passes2.pass.sub;
> > +       NEXT_PASS (pass_early_inline);
> > +       NEXT_PASS (pass_remove_cgraph_callee_edges);
> > +       NEXT_PASS (pass_ccp);
> > +       NEXT_PASS (pass_forwprop);
> > +       NEXT_PASS (pass_fre);
> > +       NEXT_PASS (pass_cd_dce);
> > +       NEXT_PASS (pass_early_ipa_sra);
> > +       NEXT_PASS (pass_release_ssa_names);
> > +       NEXT_PASS (pass_rebuild_cgraph_edges);
> > +       NEXT_PASS (pass_inline_parameters);
> > +     }
> >     NEXT_PASS (pass_ipa_pta);
> >     *p = NULL;
> >     /* These passes are run after IPA passes on every function that is being
> > Index: statistics.c
> > ===================================================================
> > *** statistics.c	(revision 175350)
> > --- statistics.c	(working copy)
> > *************** statistics_fini_pass_3 (void **slot, voi
> > *** 171,176 ****
> > --- 171,178 ----
> >   void
> >   statistics_fini_pass (void)
> >   {
> > +   if (!current_pass)
> > +     return;
> >     if (current_pass->static_pass_number == -1)
> >       return;
> >   
> > 
> > 
> 
> -- 
> Richard Guenther <rguenther@suse.de>
> Novell / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> GF: Jeff Hawn, Jennifer Guild, Felix ImendĂśrffer

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR tree-optimize/49373 (IPA-PTA regression)
  2011-06-27 10:27             ` Jan Hubicka
@ 2011-06-27 11:08               ` Richard Guenther
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Guenther @ 2011-06-27 11:08 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Guenther, mjambor, gcc-patches, hp

On Mon, 27 Jun 2011, Jan Hubicka wrote:

> > On Sat, 25 Jun 2011, Jan Hubicka wrote:
> > 
> > > Hi,
> > > just for those who are interested, this is quick&dirty patch adding another
> > > pass of local optimization passes at WPA time.  I've added early inliner and
> > > IPA-SRA because I was curious how much of optimization oppurtunities we are
> > > missing by limiting those to early pass.
> > 
> > At WPA time?  I thought we don't have function bodies around.
> 
> I meant LTRANS time, indeed.
> Anyway the tests was made with -flto-partition=none.

Ok, I see.  I'd have expected early inlining to make no difference.
But yes, IPA SRA should be a real IPA pass anyway ... - the rest of the
passes you put in there repeat what we do after inlining anyway, so
I'm not sure what was your point adding them (just to help the other
early inlining?)

Richard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-06-27 10:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-23  0:03 PR tree-optimize/49373 (IPA-PTA regression) Jan Hubicka
2011-06-23 10:06 ` Richard Guenther
2011-06-23 11:57   ` Jan Hubicka
2011-06-23 13:14     ` Jan Hubicka
2011-06-23 14:08       ` Richard Guenther
2011-06-25 20:19         ` Jan Hubicka
2011-06-27 10:08           ` Richard Guenther
2011-06-27 10:27             ` Jan Hubicka
2011-06-27 11:08               ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).