public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Whole program optimization and functions-only-called-once.
       [not found]     ` <4AFEA884.9030003@moene.org>
@ 2009-11-16 10:15       ` Jan Hubicka
  2009-11-16 14:28         ` Richard Guenther
                           ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Jan Hubicka @ 2009-11-16 10:15 UTC (permalink / raw)
  To: Toon Moene; +Cc: Jan Hubicka, Richard Guenther, Jan Hubicka, gcc-patches

> Jan Hubicka wrote:
>
>> -fno-ipa-cp should work around your problem for time being.
>
> Indeed it did. Some figures:

Thanks for confirmation!

> Considering invlo4 size 1462.
>  Called once from lowpass 2293 insns.
>  Not inlined because --param large-function-growth limit reached.
>
> Considering invlo2 size 933.
>  Called once from lowpass 2293 insns.
>  Not inlined because --param large-function-growth limit reached.
>
> where the largest callee *does* get inlined, while two smaller ones  
> don't (I agree with Jan that this would have been solved by training the  
> inliner with profiling data, because only invlo4 gets called).

Using profiling data does not really make inliner to bypass
large-function-growth.  We can experiment with large-function-growth tweaking.
So far i didn't see any testcase where this limit would result in runtime
regression.

This is patch I intend to commit after re-testing at x86_64-linux after
some last minute changes.

	* cgraph.c (cgraph_release_function_body): Update use of
	ipa_transforms_to_apply.
	(cgraph_remove_node): Remove ipa_transforms_to_apply.
	* cgraph.h (struct cgraph_node): Add ipa_transforms_to_apply.
	* cgraphunit.c (save_inline_function_body): Clear ipa_transforms for
	copied body.
	(cgraph_materialize_clone): Remove original if dead.
	* lto-streamer-in.c (lto_read_body): Remove FIXME and
	ipa_transforms_to_apply hack.
	* function.h (struct function): Add ipa_transforms_to_apply.
	* ipa.c (cgraph_remove_unreachable_nodes): Handle dead clone originals.
	* tree-inline.c (copy_bb): Update sanity check.
	(initialize_cfun): Do not copy ipa_transforms_to_apply.
	(expand_call_inline): remove dead clone originals.
	(tree_function_versioning): Merge transformation queues.
	* passes.c (add_ipa_transform_pass): Remove.
	(execute_one_ipa_transform_pass): Update ipa_transforms_to_apply
	tracking.
	(execute_all_ipa_transforms): Update.
	(execute_one_pass): Update.

	* lto.c (read_cgraph_and_symbols): Set also ipa_transforms_to_apply.
Index: cgraph.c
===================================================================
*** cgraph.c	(revision 154198)
--- cgraph.c	(working copy)
*************** cgraph_release_function_body (struct cgr
*** 1132,1138 ****
        pop_cfun();
        gimple_set_body (node->decl, NULL);
        VEC_free (ipa_opt_pass, heap,
!       		DECL_STRUCT_FUNCTION (node->decl)->ipa_transforms_to_apply);
        /* Struct function hangs a lot of data that would leak if we didn't
           removed all pointers to it.   */
        ggc_free (DECL_STRUCT_FUNCTION (node->decl));
--- 1132,1138 ----
        pop_cfun();
        gimple_set_body (node->decl, NULL);
        VEC_free (ipa_opt_pass, heap,
!       		node->ipa_transforms_to_apply);
        /* Struct function hangs a lot of data that would leak if we didn't
           removed all pointers to it.   */
        ggc_free (DECL_STRUCT_FUNCTION (node->decl));
*************** cgraph_remove_node (struct cgraph_node *
*** 1159,1164 ****
--- 1159,1166 ----
    cgraph_call_node_removal_hooks (node);
    cgraph_node_remove_callers (node);
    cgraph_node_remove_callees (node);
+   VEC_free (ipa_opt_pass, heap,
+             node->ipa_transforms_to_apply);
  
    /* Incremental inlining access removed nodes stored in the postorder list.
       */
Index: cgraph.h
===================================================================
*** cgraph.h	(revision 154198)
--- cgraph.h	(working copy)
*************** struct GTY((chain_next ("%h.next"), chai
*** 190,195 ****
--- 190,200 ----
  
    PTR GTY ((skip)) aux;
  
+   /* Interprocedural passes scheduled to have their transform functions
+      applied next time we execute local pass on them.  We maintain it
+      per-function in order to allow IPA passes to introduce new functions.  */
+   VEC(ipa_opt_pass,heap) * GTY((skip)) ipa_transforms_to_apply;
+ 
    struct cgraph_local_info local;
    struct cgraph_global_info global;
    struct cgraph_rtl_info rtl;
*************** struct GTY((chain_next ("%h.next"), chai
*** 206,221 ****
       number of cfg nodes with -fprofile-generate and -fprofile-use */
    int pid;
  
!   /* Set when function must be output - it is externally visible
!      or its address is taken.  */
    unsigned needed : 1;
!   /* Set when function has address taken.  */
    unsigned address_taken : 1;
    /* Set when decl is an abstract function pointed to by the
       ABSTRACT_DECL_ORIGIN of a reachable function.  */
    unsigned abstract_and_needed : 1;
    /* Set when function is reachable by call from other function
!      that is either reachable or needed.  */
    unsigned reachable : 1;
    /* Set once the function is lowered (i.e. its CFG is built).  */
    unsigned lowered : 1;
--- 211,234 ----
       number of cfg nodes with -fprofile-generate and -fprofile-use */
    int pid;
  
!   /* Set when function must be output for some reason.  The primary
!      use of this flag is to mark functions needed to be output for
!      non-standard reason.  Functions that are externally visible
!      or reachable from functions needed to be output are marked
!      by specialized flags.  */
    unsigned needed : 1;
!   /* Set when function has address taken.
!      In current implementation it imply needed flag. */
    unsigned address_taken : 1;
    /* Set when decl is an abstract function pointed to by the
       ABSTRACT_DECL_ORIGIN of a reachable function.  */
    unsigned abstract_and_needed : 1;
    /* Set when function is reachable by call from other function
!      that is either reachable or needed.  
!      This flag is computed at original cgraph construction and then
!      updated in cgraph_remove_unreachable_nodes.  Note that after
!      cgraph_remove_unreachable_nodes cgraph still can contain unreachable
!      nodes when they are needed for virtual clone instantiation.  */
    unsigned reachable : 1;
    /* Set once the function is lowered (i.e. its CFG is built).  */
    unsigned lowered : 1;
Index: cgraphunit.c
===================================================================
*** cgraphunit.c	(revision 154198)
--- cgraphunit.c	(working copy)
*************** save_inline_function_body (struct cgraph
*** 1777,1784 ****
    TREE_PUBLIC (first_clone->decl) = 0;
    DECL_COMDAT (first_clone->decl) = 0;
    VEC_free (ipa_opt_pass, heap,
!             DECL_STRUCT_FUNCTION (first_clone->decl)->ipa_transforms_to_apply);
!   DECL_STRUCT_FUNCTION (first_clone->decl)->ipa_transforms_to_apply = NULL;
  
  #ifdef ENABLE_CHECKING
    verify_cgraph_node (first_clone);
--- 1777,1784 ----
    TREE_PUBLIC (first_clone->decl) = 0;
    DECL_COMDAT (first_clone->decl) = 0;
    VEC_free (ipa_opt_pass, heap,
!             first_clone->ipa_transforms_to_apply);
!   first_clone->ipa_transforms_to_apply = NULL;
  
  #ifdef ENABLE_CHECKING
    verify_cgraph_node (first_clone);
*************** cgraph_materialize_clone (struct cgraph_
*** 1810,1815 ****
--- 1810,1817 ----
      node->clone_of->clones = node->next_sibling_clone;
    node->next_sibling_clone = NULL;
    node->prev_sibling_clone = NULL;
+   if (!node->clone_of->analyzed && !node->clone_of->clones)
+     cgraph_remove_node (node->clone_of);
    node->clone_of = NULL;
    bitmap_obstack_release (NULL);
  }
Index: lto-streamer-in.c
===================================================================
*** lto-streamer-in.c	(revision 154198)
--- lto-streamer-in.c	(working copy)
*************** lto_read_body (struct lto_file_decl_data
*** 1476,1490 ****
        /* Restore decl state */
        file_data->current_decl_state = file_data->global_decl_state;
  
-       /* FIXME: ipa_transforms_to_apply holds list of passes that have optimization
-          summaries computed and needs to apply changes.  At the moment WHOPR only
-          supports inlining, so we can push it here by hand.  In future we need to stream
-          this field into ltrans compilation.  This will also need to move the field
- 	 from struct function into cgraph node where it belongs.  */
-       if (flag_ltrans && !cgraph_node (fn_decl)->global.inlined_to)
- 	 VEC_safe_push (ipa_opt_pass, heap,
- 			cfun->ipa_transforms_to_apply,
- 			(ipa_opt_pass)&pass_ipa_inline);
        pop_cfun ();
      }
    else 
--- 1476,1481 ----
Index: function.h
===================================================================
*** function.h	(revision 154198)
--- function.h	(working copy)
*************** struct GTY(()) function {
*** 522,532 ****
    unsigned int curr_properties;
    unsigned int last_verified;
  
-   /* Interprocedural passes scheduled to have their transform functions
-      applied next time we execute local pass on them.  We maintain it
-      per-function in order to allow IPA passes to introduce new functions.  */
-   VEC(ipa_opt_pass,heap) * GTY((skip)) ipa_transforms_to_apply;
- 
    /* Non-null if the function does something that would prevent it from
       being copied; this applies to both versioning and inlining.  Set to
       a string describing the reason for failure.  */
--- 522,527 ----
Index: ipa.c
===================================================================
*** ipa.c	(revision 154198)
--- ipa.c	(working copy)
*************** bool
*** 121,126 ****
--- 121,127 ----
  cgraph_remove_unreachable_nodes (bool before_inlining_p, FILE *file)
  {
    struct cgraph_node *first = (struct cgraph_node *) (void *) 1;
+   struct cgraph_node *processed = (struct cgraph_node *) (void *) 2;
    struct cgraph_node *node, *next;
    bool changed = false;
  
*************** cgraph_remove_unreachable_nodes (bool be
*** 142,150 ****
          gcc_assert (!node->global.inlined_to);
  	node->aux = first;
  	first = node;
        }
      else
!       gcc_assert (!node->aux);
  
    /* Perform reachability analysis.  As a special case do not consider
       extern inline functions not inlined as live because we won't output
--- 143,155 ----
          gcc_assert (!node->global.inlined_to);
  	node->aux = first;
  	first = node;
+ 	node->reachable = true;
        }
      else
!       {
!         gcc_assert (!node->aux);
! 	node->reachable = false;
!       }
  
    /* Perform reachability analysis.  As a special case do not consider
       extern inline functions not inlined as live because we won't output
*************** cgraph_remove_unreachable_nodes (bool be
*** 154,170 ****
        struct cgraph_edge *e;
        node = first;
        first = (struct cgraph_node *) first->aux;
  
!       for (e = node->callees; e; e = e->next_callee)
! 	if (!e->callee->aux
! 	    && node->analyzed
! 	    && (!e->inline_failed || !e->callee->analyzed
! 		|| (!DECL_EXTERNAL (e->callee->decl))
!                 || before_inlining_p))
! 	  {
! 	    e->callee->aux = first;
! 	    first = e->callee;
! 	  }
        while (node->clone_of && !node->clone_of->aux && !gimple_has_body_p (node->decl))
          {
  	  node = node->clone_of;
--- 159,184 ----
        struct cgraph_edge *e;
        node = first;
        first = (struct cgraph_node *) first->aux;
+       node->aux = processed;
  
!       if (node->reachable)
!         for (e = node->callees; e; e = e->next_callee)
! 	  if (!e->callee->reachable
! 	      && node->analyzed
! 	      && (!e->inline_failed || !e->callee->analyzed
! 		  || (!DECL_EXTERNAL (e->callee->decl))
!                   || before_inlining_p))
! 	    {
! 	      bool prev_reachable = e->callee->reachable;
! 	      e->callee->reachable |= node->reachable;
! 	      if (!e->callee->aux
! 	          || (e->callee->aux == processed
! 		      && prev_reachable != e->callee->reachable))
! 	        {
! 	          e->callee->aux = first;
! 	          first = e->callee;
! 	        }
! 	    }
        while (node->clone_of && !node->clone_of->aux && !gimple_has_body_p (node->decl))
          {
  	  node = node->clone_of;
*************** cgraph_remove_unreachable_nodes (bool be
*** 184,196 ****
    for (node = cgraph_nodes; node; node = next)
      {
        next = node->next;
        if (!node->aux)
  	{
            node->global.inlined_to = NULL;
  	  if (file)
  	    fprintf (file, " %s", cgraph_node_name (node));
! 	  if (!node->analyzed || !DECL_EXTERNAL (node->decl)
! 	      || before_inlining_p)
  	    cgraph_remove_node (node);
  	  else
  	    {
--- 198,215 ----
    for (node = cgraph_nodes; node; node = next)
      {
        next = node->next;
+       if (node->aux && !node->reachable)
+         {
+ 	  cgraph_node_remove_callees (node);
+ 	  node->analyzed = false;
+ 	  node->local.inlinable = false;
+ 	}
        if (!node->aux)
  	{
            node->global.inlined_to = NULL;
  	  if (file)
  	    fprintf (file, " %s", cgraph_node_name (node));
! 	  if (!node->analyzed || !DECL_EXTERNAL (node->decl) || before_inlining_p)
  	    cgraph_remove_node (node);
  	  else
  	    {
*************** cgraph_remove_unreachable_nodes (bool be
*** 219,224 ****
--- 238,249 ----
  		      node->analyzed = false;
  		      node->local.inlinable = false;
  		    }
+ 		  if (node->prev_sibling_clone)
+ 		    node->prev_sibling_clone->next_sibling_clone = node->next_sibling_clone;
+ 		  else if (node->clone_of)
+ 		    node->clone_of->clones = node->next_sibling_clone;
+ 		  if (node->next_sibling_clone)
+ 		    node->next_sibling_clone->prev_sibling_clone = node->prev_sibling_clone;
  		}
  	      else
  		cgraph_remove_node (node);
Index: lto/lto.c
===================================================================
*** lto/lto.c	(revision 154198)
--- lto/lto.c	(working copy)
*************** read_cgraph_and_symbols (unsigned nfiles
*** 1826,1834 ****
       phase. */
    if (flag_ltrans)
      for (node = cgraph_nodes; node; node = node->next)
!       if (!node->global.inlined_to
! 	  && cgraph_decide_is_function_needed (node, node->decl))
!         cgraph_mark_needed_node (node);
  
    timevar_push (TV_IPA_LTO_DECL_IO);
  
--- 1826,1844 ----
       phase. */
    if (flag_ltrans)
      for (node = cgraph_nodes; node; node = node->next)
!       {
!         if (!node->global.inlined_to
! 	    && cgraph_decide_is_function_needed (node, node->decl))
!           cgraph_mark_needed_node (node);
! 	/* FIXME: ipa_transforms_to_apply holds list of passes that have optimization
! 	   summaries computed and needs to apply changes.  At the moment WHOPR only
! 	   supports inlining, so we can push it here by hand.  In future we need to stream
! 	   this field into ltrans compilation.  */
! 	if (node->analyzed)
! 	  VEC_safe_push (ipa_opt_pass, heap,
! 			 node->ipa_transforms_to_apply,
! 			 (ipa_opt_pass)&pass_ipa_inline);
!       }
  
    timevar_push (TV_IPA_LTO_DECL_IO);
  
Index: tree-inline.c
===================================================================
*** tree-inline.c	(revision 154198)
--- tree-inline.c	(working copy)
*************** copy_bb (copy_body_data *id, basic_block
*** 1665,1674 ****
  
  		  /* We have missing edge in the callgraph.  This can happen
  		     when previous inlining turned an indirect call into a
! 		     direct call by constant propagating arguments.  In all
  		     other cases we hit a bug (incorrect node sharing is the
  		     most common reason for missing edges).  */
! 		  gcc_assert (dest->needed || !dest->analyzed);
  		  if (id->transform_call_graph_edges == CB_CGE_MOVE_CLONES)
  		    cgraph_create_edge_including_clones
  		      (id->dst_node, dest, stmt, bb->count,
--- 1665,1676 ----
  
  		  /* We have missing edge in the callgraph.  This can happen
  		     when previous inlining turned an indirect call into a
! 		     direct call by constant propagating arguments or we are
! 		     producing dead clone (for further clonning).  In all
  		     other cases we hit a bug (incorrect node sharing is the
  		     most common reason for missing edges).  */
! 		  gcc_assert (dest->needed || !dest->analyzed
! 		  	      || !id->src_node->analyzed);
  		  if (id->transform_call_graph_edges == CB_CGE_MOVE_CLONES)
  		    cgraph_create_edge_including_clones
  		      (id->dst_node, dest, stmt, bb->count,
*************** initialize_cfun (tree new_fndecl, tree c
*** 1983,1991 ****
    cfun->function_end_locus = src_cfun->function_end_locus;
    cfun->curr_properties = src_cfun->curr_properties;
    cfun->last_verified = src_cfun->last_verified;
-   if (src_cfun->ipa_transforms_to_apply)
-     cfun->ipa_transforms_to_apply = VEC_copy (ipa_opt_pass, heap,
- 					      src_cfun->ipa_transforms_to_apply);
    cfun->va_list_gpr_size = src_cfun->va_list_gpr_size;
    cfun->va_list_fpr_size = src_cfun->va_list_fpr_size;
    cfun->function_frequency = src_cfun->function_frequency;
--- 1985,1990 ----
*************** expand_call_inline (basic_block bb, gimp
*** 3822,3827 ****
--- 3821,3830 ----
    (*debug_hooks->outlining_inline_function) (cg_edge->callee->decl);
  
    /* Update callgraph if needed.  */
+   if (cg_edge->callee->clone_of
+       && !cg_edge->callee->clone_of->next_sibling_clone
+       && !cg_edge->callee->analyzed)
+     cgraph_remove_node (cg_edge->callee);
    cgraph_remove_node (cg_edge->callee);
  
    id->block = NULL_TREE;
*************** tree_function_versioning (tree old_decl,
*** 4848,4853 ****
--- 4851,4869 ----
    id.src_node = old_version_node;
    id.dst_node = new_version_node;
    id.src_cfun = DECL_STRUCT_FUNCTION (old_decl);
+   if (id.src_node->ipa_transforms_to_apply)
+     {
+       VEC(ipa_opt_pass,heap) * old_transforms_to_apply = id.dst_node->ipa_transforms_to_apply;
+       unsigned int i;
+ 
+       id.dst_node->ipa_transforms_to_apply = VEC_copy (ipa_opt_pass, heap,
+ 					               id.src_node->ipa_transforms_to_apply);
+       for (i = 0; i < VEC_length (ipa_opt_pass, old_transforms_to_apply); i++)
+         VEC_safe_push (ipa_opt_pass, heap, id.dst_node->ipa_transforms_to_apply,
+ 		       VEC_index (ipa_opt_pass,
+ 		       		  old_transforms_to_apply,
+ 				  i));
+     }
    
    id.copy_decl = copy_decl_no_change;
    id.transform_call_graph_edges
Index: passes.c
===================================================================
*** passes.c	(revision 154198)
--- passes.c	(working copy)
*************** update_properties_after_pass (void *data
*** 1376,1390 ****
  		           & ~pass->properties_destroyed;
  }
  
- /* Schedule IPA transform pass DATA for CFUN.  */
- 
- static void
- add_ipa_transform_pass (void *data)
- {
-   struct ipa_opt_pass_d *ipa_pass = (struct ipa_opt_pass_d *) data;
-   VEC_safe_push (ipa_opt_pass, heap, cfun->ipa_transforms_to_apply, ipa_pass);
- }
- 
  /* Execute summary generation for all of the passes in IPA_PASS.  */
  
  void
--- 1376,1381 ----
*************** execute_one_ipa_transform_pass (struct c
*** 1464,1482 ****
  void
  execute_all_ipa_transforms (void)
  {
!   if (cfun && cfun->ipa_transforms_to_apply)
      {
        unsigned int i;
-       struct cgraph_node *node = cgraph_node (current_function_decl);
  
!       for (i = 0; i < VEC_length (ipa_opt_pass, cfun->ipa_transforms_to_apply);
  	   i++)
  	execute_one_ipa_transform_pass (node,
  					VEC_index (ipa_opt_pass,
! 						   cfun->ipa_transforms_to_apply,
  						   i));
!       VEC_free (ipa_opt_pass, heap, cfun->ipa_transforms_to_apply);
!       cfun->ipa_transforms_to_apply = NULL;
      }
  }
  
--- 1455,1476 ----
  void
  execute_all_ipa_transforms (void)
  {
!   struct cgraph_node *node;
!   if (!cfun)
!     return;
!   node = cgraph_node (current_function_decl);
!   if (node->ipa_transforms_to_apply)
      {
        unsigned int i;
  
!       for (i = 0; i < VEC_length (ipa_opt_pass, node->ipa_transforms_to_apply);
  	   i++)
  	execute_one_ipa_transform_pass (node,
  					VEC_index (ipa_opt_pass,
! 						   node->ipa_transforms_to_apply,
  						   i));
!       VEC_free (ipa_opt_pass, heap, node->ipa_transforms_to_apply);
!       node->ipa_transforms_to_apply = NULL;
      }
  }
  
*************** execute_one_pass (struct opt_pass *pass)
*** 1551,1557 ****
    execute_todo (todo_after | pass->todo_flags_finish);
    verify_interpass_invariants ();
    if (pass->type == IPA_PASS)
!     do_per_function (add_ipa_transform_pass, pass);
  
    if (!current_function_decl)
      cgraph_process_new_functions ();
--- 1545,1557 ----
    execute_todo (todo_after | pass->todo_flags_finish);
    verify_interpass_invariants ();
    if (pass->type == IPA_PASS)
!     {
!       struct cgraph_node *node;
!       for (node = cgraph_nodes; node; node = node->next)
!         if (node->analyzed)
!           VEC_safe_push (ipa_opt_pass, heap, node->ipa_transforms_to_apply,
! 			 (struct ipa_opt_pass_d *)pass);
!     }
  
    if (!current_function_decl)
      cgraph_process_new_functions ();

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-16 10:15       ` Whole program optimization and functions-only-called-once Jan Hubicka
@ 2009-11-16 14:28         ` Richard Guenther
  2009-11-16 14:39           ` Jan Hubicka
  2009-11-16 18:25         ` Toon Moene
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: Richard Guenther @ 2009-11-16 14:28 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Toon Moene, Jan Hubicka, gcc-patches

On Mon, Nov 16, 2009 at 3:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Jan Hubicka wrote:
>>
>>> -fno-ipa-cp should work around your problem for time being.
>>
>> Indeed it did. Some figures:
>
> Thanks for confirmation!
>
>> Considering invlo4 size 1462.
>>  Called once from lowpass 2293 insns.
>>  Not inlined because --param large-function-growth limit reached.
>>
>> Considering invlo2 size 933.
>>  Called once from lowpass 2293 insns.
>>  Not inlined because --param large-function-growth limit reached.
>>
>> where the largest callee *does* get inlined, while two smaller ones
>> don't (I agree with Jan that this would have been solved by training the
>> inliner with profiling data, because only invlo4 gets called).
>
> Using profiling data does not really make inliner to bypass
> large-function-growth.  We can experiment with large-function-growth tweaking.
> So far i didn't see any testcase where this limit would result in runtime
> regression.

I think we shouldn't bypass large-function-growth.  We might want to scale
it somewhat for hot callgraph edges (and at the same time avoid inlining
once-called cold functions).  Or even better sort the fibheap according to
the callgraph edge frequency, not only according to sizes.

Richard.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-16 14:28         ` Richard Guenther
@ 2009-11-16 14:39           ` Jan Hubicka
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Hubicka @ 2009-11-16 14:39 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jan Hubicka, Toon Moene, Jan Hubicka, gcc-patches

> 
> I think we shouldn't bypass large-function-growth.  We might want to scale
> it somewhat for hot callgraph edges (and at the same time avoid inlining
> once-called cold functions).  Or even better sort the fibheap according to
> the callgraph edge frequency, not only according to sizes.

We do that for a while.  The fibheap is sorted by benefit that is pretty much
estimated speedup divided by estimated size cost. Speedups are scaled by counts
with profile info and frequencies without it.  (I even had it on slides last week ;))

Honza
> 
> Richard.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-16 10:15       ` Whole program optimization and functions-only-called-once Jan Hubicka
  2009-11-16 14:28         ` Richard Guenther
@ 2009-11-16 18:25         ` Toon Moene
  2009-11-17 14:14           ` Toon Moene
  2009-11-25 16:44         ` Martin Jambor
  2010-11-08 17:40         ` H.J. Lu
  3 siblings, 1 reply; 38+ messages in thread
From: Toon Moene @ 2009-11-16 18:25 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Guenther, Jan Hubicka, gcc-patches

Jan Hubicka wrote:

> This is patch I intend to commit after re-testing at x86_64-linux after
> some last minute changes.

Unfortunately, I can't help you for the time being.  Yesterday I updated 
the gcc trunk on my system from revision 153775 to 154195 and now simple 
things are failing:

The simple, six file example I sent you cannot be compiled with

$ /usr/snp/bin/gfortran  -O3 -flto -fwhole-program -fuse-linker-plugin 
-fdump-ipa-all -fno-ipa-cp *.f

anymore.  It returns:

/usr/snp/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../../../x86_64-unknown-linux-gnu/bin/ld: 
/tmp/ccPFwq8r.lto.o: in function main:ccfzQ7Cz.o(.text+0x1b): error: 
undefined reference to 'hlprog_'
collect2: ld returned 1 exit status

and in my compilation of the weather forecasting code I get:

gfortran -O3 -ffast-math -fbacktrace -march=native -mtune=native -flto 
-fno-ipa-cp -I/scratch/hirlam/hl_home/EXP/lib/src/linuxgfortran/hirmod 
-ffixed-form -c  checkoptions.f
gfortran checkoptions.o 
/scratch/hirlam/hl_home/EXP/lib/src/linuxgfortran/lib/util.a 
/scratch/hirlam/hl_home/EXP/lib/src/linuxgfortran/lib/port.a -o 
/scratch/hirlam/hl_home/EXP/lib/src/linuxgfortran/bin/checkoptions.x -O3 
-fuse-linker-plugin -ffast-math -fbacktrace -march=native -mtune=native 
-flto -fwhole-program -fdump-ipa-all -fno-ipa-cp  -llapack -lblas -lfftw3
lto1: internal compiler error: Invalid line in the resolution file.
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
lto-wrapper: gfortran returned 1 exit status
/usr/snp/lib/gcc/x86_64-unknown-linux-gnu/4.5.0/../../../../x86_64-unknown-linux-gnu/bin/ld: 
fatal error: lto-wrapper failed
collect2: ld returned 1 exit status
make[1]: *** 
[/scratch/hirlam/hl_home/EXP/lib/src/linuxgfortran/bin/checkoptions.x] 
Error 1
make[1]: Leaving directory 
`/scratch/hirlam/hl_home/EXP/lib/src/linuxgfortran/mainsrc'
make: *** [checkoptions.x] Error 2

Perhaps the compiler itself is miscompiled (this is on):

$ /usr/snp/bin/gfortran -v
Using built-in specs.
COLLECT_GCC=/usr/snp/bin/gfortran
COLLECT_LTO_WRAPPER=/usr/snp/libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc/configure --enable-checking=release 
--prefix=/usr/snp --with-libelf=/usr/local --enable-gold 
--enable-plugins --disable-multilib --disable-nls --with-arch-64=core2 
--with-tune-64=core2 --enable-languages=fortran,c++ 
--enable-stage1-languages=c++
Thread model: posix
gcc version 4.5.0 20091116 (experimental) (GCC)

[ You say you want a resolution ....
   well, you can count me out. ]

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-16 18:25         ` Toon Moene
@ 2009-11-17 14:14           ` Toon Moene
  2009-11-17 14:57             ` Rafael Espindola
  0 siblings, 1 reply; 38+ messages in thread
From: Toon Moene @ 2009-11-17 14:14 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Guenther, Jan Hubicka, gcc-patches

Toon Moene wrote:

> [ You say you want a resolution ....
>   well, you can count me out. ]

What I get is this:

resolution = 209 <AC><C0>N^B, t = 0
lto1: internal compiler error: Invalid line in the resolution file.

after adding the following code to lto/lto.c:

$ svn diff
Index: lto.c
===================================================================
--- lto.c       (revision 154244)
+++ lto.c       (working copy)
@@ -294,7 +294,10 @@

        t = fscanf (resolution, "%u %26s %*[^\n]\n", &index, r_str);
        if (t != 2)
+       {
+        fprintf (stderr, "resolution = %u %s, t = %d\n", index, r_str, t);
          internal_error ("Invalid line in the resolution file.");
+       }
        if (index > max_index)
         max_index = index;

Hope this helps,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-17 14:14           ` Toon Moene
@ 2009-11-17 14:57             ` Rafael Espindola
  2009-11-17 15:10               ` Toon Moene
  0 siblings, 1 reply; 38+ messages in thread
From: Rafael Espindola @ 2009-11-17 14:57 UTC (permalink / raw)
  To: Toon Moene; +Cc: Jan Hubicka, Richard Guenther, Jan Hubicka, gcc-patches

2009/11/17 Toon Moene <toon@moene.org>:
> Toon Moene wrote:
>
>> [ You say you want a resolution ....
>>  well, you can count me out. ]
>
> What I get is this:
>
> resolution = 209 <AC><C0>N^B, t = 0
> lto1: internal compiler error: Invalid line in the resolution file.

I think I have a patch for that should commit in 1h or so. Search for

[patch] Skip offsets when reading resolution files

The problem was caused by the change to pass archive offset to lto1.
Sorry about that.

Cheers,
-- 
Rafael Ávila de Espíndola

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-17 14:57             ` Rafael Espindola
@ 2009-11-17 15:10               ` Toon Moene
  2009-11-17 16:06                 ` Rafael Espindola
  0 siblings, 1 reply; 38+ messages in thread
From: Toon Moene @ 2009-11-17 15:10 UTC (permalink / raw)
  To: Rafael Espindola; +Cc: Jan Hubicka, Richard Guenther, Jan Hubicka, gcc-patches

Rafael Espindola wrote:

> 2009/11/17 Toon Moene <toon@moene.org>:

>> resolution = 209 <AC><C0>N^B, t = 0
>> lto1: internal compiler error: Invalid line in the resolution file.
> 
> I think I have a patch for that should commit in 1h or so. Search for
> 
> [patch] Skip offsets when reading resolution files
> 
> The problem was caused by the change to pass archive offset to lto1.
> Sorry about that.

If you manage to commit this before 6 o'clock UTC, I'll be able to test 
it tonight.

Thanks,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-17 15:10               ` Toon Moene
@ 2009-11-17 16:06                 ` Rafael Espindola
  2009-11-17 17:20                   ` Rafael Espindola
  0 siblings, 1 reply; 38+ messages in thread
From: Rafael Espindola @ 2009-11-17 16:06 UTC (permalink / raw)
  To: Toon Moene; +Cc: Jan Hubicka, Richard Guenther, Jan Hubicka, gcc-patches

> If you manage to commit this before 6 o'clock UTC, I'll be able to test it
> tonight.

Done. You will very likely hit another bug when using a resolution
file. I am testing a patch for it.

The thread subject is:

[patch] Merge cgraph nodes when using a resolution file


> Thanks,

Cheers,
-- 
Rafael Ávila de Espíndola

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-17 16:06                 ` Rafael Espindola
@ 2009-11-17 17:20                   ` Rafael Espindola
  2009-11-18 19:54                     ` Toon Moene
  0 siblings, 1 reply; 38+ messages in thread
From: Rafael Espindola @ 2009-11-17 17:20 UTC (permalink / raw)
  To: Toon Moene; +Cc: Jan Hubicka, Richard Guenther, Jan Hubicka, gcc-patches

> Done. You will very likely hit another bug when using a resolution
> file. I am testing a patch for it.

Done. It should now have only the old bugs :-)

Cheers,
-- 
Rafael Ávila de Espíndola

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-17 17:20                   ` Rafael Espindola
@ 2009-11-18 19:54                     ` Toon Moene
  2009-11-18 20:49                       ` Jan Hubicka
  0 siblings, 1 reply; 38+ messages in thread
From: Toon Moene @ 2009-11-18 19:54 UTC (permalink / raw)
  To: Rafael Espindola; +Cc: Jan Hubicka, Richard Guenther, Jan Hubicka, gcc-patches

Rafael Espindola wrote:

>> Done. You will very likely hit another bug when using a resolution
>> file. I am testing a patch for it.
> 
> Done. It should now have only the old bugs :-)

Yep, that worked.  Inlining all functions-only-called-once doesn't get 
any speed up however (on the contrary). but it was nice to see the link 
time optimization of the largest executable being terminated by the OOM 
killer (after 10 minutes of CPU time, when reaching 12.5 Gbytes of memory).

At least it gives the impression that some real work is put into 
optimizing our code :-)

Kind regards,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-18 19:54                     ` Toon Moene
@ 2009-11-18 20:49                       ` Jan Hubicka
  2009-11-18 20:56                         ` Toon Moene
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Hubicka @ 2009-11-18 20:49 UTC (permalink / raw)
  To: Toon Moene
  Cc: Rafael Espindola, Jan Hubicka, Richard Guenther, Jan Hubicka,
	gcc-patches

> Rafael Espindola wrote:
>
>>> Done. You will very likely hit another bug when using a resolution
>>> file. I am testing a patch for it.
>>
>> Done. It should now have only the old bugs :-)
>
> Yep, that worked.  Inlining all functions-only-called-once doesn't get  
> any speed up however (on the contrary). but it was nice to see the link  
> time optimization of the largest executable being terminated by the OOM  
> killer (after 10 minutes of CPU time, when reaching 12.5 Gbytes of 
> memory).
>
> At least it gives the impression that some real work is put into  
> optimizing our code :-)

:) It would be nice to know what caused the OOM.  Is just one of passes exploding
on presence of very large bodies?

Honza
>
> Kind regards,
>
> -- 
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> At home: http://moene.org/~toon/
> Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-18 20:49                       ` Jan Hubicka
@ 2009-11-18 20:56                         ` Toon Moene
  2009-11-18 21:05                           ` Richard Guenther
  2009-11-21 11:57                           ` Toon Moene
  0 siblings, 2 replies; 38+ messages in thread
From: Toon Moene @ 2009-11-18 20:56 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Rafael Espindola, Richard Guenther, Jan Hubicka, gcc-patches

Jan Hubicka wrote:

 > I wrote:

>> but it was nice to see the link  
>> time optimization of the largest executable being terminated by the OOM  
>> killer (after 10 minutes of CPU time, when reaching 12.5 Gbytes of 
>> memory).
>>
>> At least it gives the impression that some real work is put into  
>> optimizing our code :-)
> 
> :) It would be nice to know what caused the OOM.  Is just one of passes exploding
> on presence of very large bodies?

I'll try to figure this out over the weekend (sorry, don't have more 
spare time).

It's most probably a single pass, because the memory requirements kept 
creeping up to 12.5 Gbytes from 10, slowly increasing all the time over 
several minutes.

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-18 20:56                         ` Toon Moene
@ 2009-11-18 21:05                           ` Richard Guenther
  2009-11-18 21:16                             ` Toon Moene
  2009-11-21 11:57                           ` Toon Moene
  1 sibling, 1 reply; 38+ messages in thread
From: Richard Guenther @ 2009-11-18 21:05 UTC (permalink / raw)
  To: Toon Moene; +Cc: Jan Hubicka, Rafael Espindola, Jan Hubicka, gcc-patches

On Wed, Nov 18, 2009 at 9:50 PM, Toon Moene <toon@moene.org> wrote:
> Jan Hubicka wrote:
>
>> I wrote:
>
>>> but it was nice to see the link  time optimization of the largest
>>> executable being terminated by the OOM  killer (after 10 minutes of CPU
>>> time, when reaching 12.5 Gbytes of memory).
>>>
>>> At least it gives the impression that some real work is put into
>>>  optimizing our code :-)
>>
>> :) It would be nice to know what caused the OOM.  Is just one of passes
>> exploding
>> on presence of very large bodies?
>
> I'll try to figure this out over the weekend (sorry, don't have more spare
> time).
>
> It's most probably a single pass, because the memory requirements kept
> creeping up to 12.5 Gbytes from 10, slowly increasing all the time over
> several minutes.

The obvious bet is anything using DF on the RTL side or var-tracking if
you enabled -g.

Richard.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-18 21:05                           ` Richard Guenther
@ 2009-11-18 21:16                             ` Toon Moene
  2009-11-18 22:10                               ` Eric Botcazou
  0 siblings, 1 reply; 38+ messages in thread
From: Toon Moene @ 2009-11-18 21:16 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jan Hubicka, Rafael Espindola, Jan Hubicka, gcc-patches

Richard Guenther wrote:

> On Wed, Nov 18, 2009 at 9:50 PM, Toon Moene <toon@moene.org> wrote:

>> It's most probably a single pass, because the memory requirements kept
>> creeping up to 12.5 Gbytes from 10, slowly increasing all the time over
>> several minutes.
> 
> The obvious bet is anything using DF on the RTL side or var-tracking if
> you enabled -g.

It is without -g, because several weeks ago I got ICEs when trying to 
use -flto with -g (it might be that this is fixed now - I just didn't 
bother to change my compile flags).

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-18 21:16                             ` Toon Moene
@ 2009-11-18 22:10                               ` Eric Botcazou
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Botcazou @ 2009-11-18 22:10 UTC (permalink / raw)
  To: Toon Moene
  Cc: gcc-patches, Richard Guenther, Jan Hubicka, Rafael Espindola,
	Jan Hubicka

> It is without -g, because several weeks ago I got ICEs when trying to
> use -flto with -g (it might be that this is fixed now - I just didn't
> bother to change my compile flags).

Try -fno-forward-propagate.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-18 20:56                         ` Toon Moene
  2009-11-18 21:05                           ` Richard Guenther
@ 2009-11-21 11:57                           ` Toon Moene
  2009-11-21 15:37                             ` Jan Hubicka
  2009-11-22  7:20                             ` Vladimir Makarov
  1 sibling, 2 replies; 38+ messages in thread
From: Toon Moene @ 2009-11-21 11:57 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Rafael Espindola, Richard Guenther, Jan Hubicka, gcc-patches

Toon Moene wrote:

> Jan Hubicka wrote:

>> :) It would be nice to know what caused the OOM.  Is just one of 
>> passes exploding
>> on presence of very large bodies?
> 
> I'll try to figure this out over the weekend (sorry, don't have more 
> spare time).
> 
> It's most probably a single pass, because the memory requirements kept 
> creeping up to 12.5 Gbytes from 10, slowly increasing all the time over 
> several minutes.

Here are the tracebacks from gdb attached to the lto1 process, while it 
was expanding from 7 to 12 Gb:

(gdb) where
#0  0x00002b961290491e in memset () from /lib/libc.so.6
#1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at 
../../gcc/gcc/ira-build.c:155
#2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
#3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
#4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
#5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at 
../../gcc/gcc/passes.c:1522
#6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at 
../../gcc/gcc/passes.c:1577
#7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at 
../../gcc/gcc/passes.c:1578
#8  0x0000000000656e1c in tree_rest_of_compilation 
(fndecl=0x2b961f5d6a00) at ../../gcc/gcc/tree-optimize.c:407
#9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) 
at ../../gcc/gcc/cgraphunit.c:1178
#10 0x00000000007835ed in cgraph_expand_all_functions () at 
../../gcc/gcc/cgraphunit.c:1245
#11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
#12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at 
../../gcc/gcc/lto/lto.c:2054
#13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at 
../../gcc/gcc/toplev.c:1049
#14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
#15 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
#16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
#17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113

(gdb) where
#0  0x00002b961290490f in memset () from /lib/libc.so.6
#1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at 
../../gcc/gcc/ira-build.c:155
#2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
#3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
#4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
#5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at 
../../gcc/gcc/passes.c:1522
#6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at 
../../gcc/gcc/passes.c:1577
#7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at 
../../gcc/gcc/passes.c:1578
#8  0x0000000000656e1c in tree_rest_of_compilation 
(fndecl=0x2b961f5d6a00) at ../../gcc/gcc/tree-optimize.c:407
#9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) 
at ../../gcc/gcc/cgraphunit.c:1178
#10 0x00000000007835ed in cgraph_expand_all_functions () at 
../../gcc/gcc/cgraphunit.c:1245
#11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
#12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at 
../../gcc/gcc/lto/lto.c:2054
#13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at 
../../gcc/gcc/toplev.c:1049
#14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
#15 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
#16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
#17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113

(gdb) where
#0  0x00002b96128fc26c in ?? () from /lib/libc.so.6
#1  0x00002b96128fde24 in calloc () from /lib/libc.so.6
#2  0x0000000000a6ea7a in xcalloc (nelem=19, elsize=8) at 
../../gcc/libiberty/xmalloc.c:162
#3  0x000000000099d8c0 in get_loop_body (loop=0x2b966e38ecf0) at 
../../gcc/gcc/cfgloop.c:819
#4  0x000000000099e14c in get_loop_exit_edges (loop=0x2b966e38ecf0) at 
../../gcc/gcc/cfgloop.c:1157
#5  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at 
../../gcc/gcc/ira-build.c:155
#6  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
#7  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
#8  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
#9  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at 
../../gcc/gcc/passes.c:1522
#10 0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at 
../../gcc/gcc/passes.c:1577
#11 0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at 
../../gcc/gcc/passes.c:1578
#12 0x0000000000656e1c in tree_rest_of_compilation 
(fndecl=0x2b961f5d6a00) at ../../gcc/gcc/tree-optimize.c:407
#13 0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) 
at ../../gcc/gcc/cgraphunit.c:1178
#14 0x00000000007835ed in cgraph_expand_all_functions () at 
../../gcc/gcc/cgraphunit.c:1245
#15 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
#16 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at 
../../gcc/gcc/lto/lto.c:2054
#17 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at 
../../gcc/gcc/toplev.c:1049
#18 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
#19 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
#20 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
#21 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113

(gdb) where
#0  0x00002b961290491e in memset () from /lib/libc.so.6
#1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at 
../../gcc/gcc/ira-build.c:155
#2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
#3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
#4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
#5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at 
../../gcc/gcc/passes.c:1522
#6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at 
../../gcc/gcc/passes.c:1577
#7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at 
../../gcc/gcc/passes.c:1578
#8  0x0000000000656e1c in tree_rest_of_compilation 
(fndecl=0x2b961f5d6a00) at ../../gcc/gcc/tree-optimize.c:407
#9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) 
at ../../gcc/gcc/cgraphunit.c:1178
#10 0x00000000007835ed in cgraph_expand_all_functions () at 
../../gcc/gcc/cgraphunit.c:1245
#11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
#12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at 
../../gcc/gcc/lto/lto.c:2054
#13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at 
../../gcc/gcc/toplev.c:1049
#14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
#15 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
#16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
#17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113

So it seems to be stuck in a part of the IRA pass ...

Hope this helps (it's close to impossible to build a test case out of 
this, because the programs consists of around 3/4 of our ~ 1 million 
lines of Fortran code.

Kind regards,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-21 11:57                           ` Toon Moene
@ 2009-11-21 15:37                             ` Jan Hubicka
  2009-11-21 17:01                               ` Vladimir Makarov
  2009-11-22  7:20                             ` Vladimir Makarov
  1 sibling, 1 reply; 38+ messages in thread
From: Jan Hubicka @ 2009-11-21 15:37 UTC (permalink / raw)
  To: Toon Moene, vmakarov
  Cc: Jan Hubicka, Rafael Espindola, Richard Guenther, Jan Hubicka,
	gcc-patches

> Toon Moene wrote:
>
>> Jan Hubicka wrote:
>
>>> :) It would be nice to know what caused the OOM.  Is just one of  
>>> passes exploding
>>> on presence of very large bodies?
>>
>> I'll try to figure this out over the weekend (sorry, don't have more  
>> spare time).
>>
>> It's most probably a single pass, because the memory requirements kept  
>> creeping up to 12.5 Gbytes from 10, slowly increasing all the time over 
>> several minutes.
>
> Here are the tracebacks from gdb attached to the lto1 process, while it  
> was expanding from 7 to 12 Gb:

Hopefully Vladimir will know answer here.  Do you have any idea how many BBs
and loops are in the large function?

From inliner POV, all I can do is to ask you about size estimate of it (that is
in the -fdump-ipa-inline dump) and adjust large-function-insns and
large-function-growth parameters to outlaw whatever inlining happens there if
there is really no better way to cure this.

This is first time I see explossion in loop related code, it might be also
possible to trottle inliner about size of loop structure, but that might be bad
idea - it is usually good to produce huge loop trees if they appear in program
for LNO (that we don't really have and graphite is exponential here too, so who
knows ;)

> #3  0x000000000099d8c0 in get_loop_body (loop=0x2b966e38ecf0) at  
> ../../gcc/gcc/cfgloop.c:819
> #4  0x000000000099e14c in get_loop_exit_edges (loop=0x2b966e38ecf0) at  
> ../../gcc/gcc/cfgloop.c:1157
> #5  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at  
> ../../gcc/gcc/ira-build.c:155

Looking at the implementation of get_loop_exit_edges, probably we would be more
effecient here if IRA built loop structure with LOOPS_HAVE_RECORDED_EXITS.
There seems to be couple places where IRA is looking for loop exists.

Honza

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-21 15:37                             ` Jan Hubicka
@ 2009-11-21 17:01                               ` Vladimir Makarov
  2009-11-21 17:48                                 ` Jan Hubicka
  2009-11-21 18:09                                 ` Toon Moene
  0 siblings, 2 replies; 38+ messages in thread
From: Vladimir Makarov @ 2009-11-21 17:01 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Toon Moene, Rafael Espindola, Richard Guenther, Jan Hubicka, gcc-patches

Jan Hubicka wrote:
>> Toon Moene wrote:
>>
>>     
>>> Jan Hubicka wrote:
>>>       
>>>> :) It would be nice to know what caused the OOM.  Is just one of  
>>>> passes exploding
>>>> on presence of very large bodies?
>>>>         
>>> I'll try to figure this out over the weekend (sorry, don't have more  
>>> spare time).
>>>
>>> It's most probably a single pass, because the memory requirements kept  
>>> creeping up to 12.5 Gbytes from 10, slowly increasing all the time over 
>>> several minutes.
>>>       
>> Here are the tracebacks from gdb attached to the lto1 process, while it  
>> was expanding from 7 to 12 Gb:
>>     
>
> Hopefully Vladimir will know answer here.  Do you have any idea how many BBs
> and loops are in the large function?
>
>   
I saw some functions which could have slowed down regional IRA a lot.  
For example, a small function with 50 loops one after another.  
Therefore IRA has already code (decreasing # of regions/loops for 
consideration) to deal with the situation.

The worst what I saw a few years ago.  One customer system generated 
functions with 100K blocks and >1M pseudos.   But fortunately they 
started to split the code into several functions.
> From inliner POV, all I can do is to ask you about size estimate of it (that is
> in the -fdump-ipa-inline dump) and adjust large-function-insns and
> large-function-growth parameters to outlaw whatever inlining happens there if
> there is really no better way to cure this.
>
> This is first time I see explossion in loop related code, it might be also
> possible to trottle inliner about size of loop structure, but that might be bad
> idea - it is usually good to produce huge loop trees if they appear in program
> for LNO (that we don't really have and graphite is exponential here too, so who
> knows ;)
>
>   

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-21 17:01                               ` Vladimir Makarov
@ 2009-11-21 17:48                                 ` Jan Hubicka
  2009-11-21 17:56                                   ` Richard Guenther
  2009-11-21 18:09                                 ` Toon Moene
  1 sibling, 1 reply; 38+ messages in thread
From: Jan Hubicka @ 2009-11-21 17:48 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Jan Hubicka, Toon Moene, Rafael Espindola, Richard Guenther,
	Jan Hubicka, gcc-patches

> Jan Hubicka wrote:
>>> Toon Moene wrote:
>>>
>>>     
>>>> Jan Hubicka wrote:
>>>>       
>>>>> :) It would be nice to know what caused the OOM.  Is just one of  
>>>>> passes exploding
>>>>> on presence of very large bodies?
>>>>>         
>>>> I'll try to figure this out over the weekend (sorry, don't have 
>>>> more  spare time).
>>>>
>>>> It's most probably a single pass, because the memory requirements 
>>>> kept  creeping up to 12.5 Gbytes from 10, slowly increasing all the 
>>>> time over several minutes.
>>>>       
>>> Here are the tracebacks from gdb attached to the lto1 process, while 
>>> it  was expanding from 7 to 12 Gb:
>>>     
>>
>> Hopefully Vladimir will know answer here.  Do you have any idea how many BBs
>> and loops are in the large function?
>>
>>   
> I saw some functions which could have slowed down regional IRA a lot.   
> For example, a small function with 50 loops one after another.   
> Therefore IRA has already code (decreasing # of regions/loops for  
> consideration) to deal with the situation.

I know little about weather prediction, but as usual simulation I would not
expect it to have very deep loop nests. Toon, would be possible to have loop tree
of the function in question?
>
> The worst what I saw a few years ago.  One customer system generated  
> functions with 100K blocks and >1M pseudos.   But fortunately they  
> started to split the code into several functions.

What about the following patch that avoid ineffectivity of the loop exists
enumerating code? Or is IRA enumerating exists in every loop just once during
the analysis?

Honza

Index: ira.c
===================================================================
--- ira.c	(revision 154387)
+++ ira.c	(working copy)
@@ -3172,6 +3172,7 @@ ira (FILE *f)
   
   ira_assert (current_loops == NULL);
   flow_loops_find (&ira_loops);
+  record_loop_exits ();
   current_loops = &ira_loops;
       
   if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL)
@@ -3215,6 +3216,7 @@ ira (FILE *f)
 	  df_analyze ();
 	  
 	  flow_loops_find (&ira_loops);
+	  record_loop_exits ();
 	  current_loops = &ira_loops;
 
 	  setup_allocno_assignment_flags ();

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-21 17:48                                 ` Jan Hubicka
@ 2009-11-21 17:56                                   ` Richard Guenther
  2009-11-21 17:57                                     ` Jan Hubicka
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Guenther @ 2009-11-21 17:56 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Vladimir Makarov, Toon Moene, Rafael Espindola, Jan Hubicka, gcc-patches

On Sat, Nov 21, 2009 at 6:36 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Jan Hubicka wrote:
>>>> Toon Moene wrote:
>>>>
>>>>
>>>>> Jan Hubicka wrote:
>>>>>
>>>>>> :) It would be nice to know what caused the OOM.  Is just one of
>>>>>> passes exploding
>>>>>> on presence of very large bodies?
>>>>>>
>>>>> I'll try to figure this out over the weekend (sorry, don't have
>>>>> more  spare time).
>>>>>
>>>>> It's most probably a single pass, because the memory requirements
>>>>> kept  creeping up to 12.5 Gbytes from 10, slowly increasing all the
>>>>> time over several minutes.
>>>>>
>>>> Here are the tracebacks from gdb attached to the lto1 process, while
>>>> it  was expanding from 7 to 12 Gb:
>>>>
>>>
>>> Hopefully Vladimir will know answer here.  Do you have any idea how many BBs
>>> and loops are in the large function?
>>>
>>>
>> I saw some functions which could have slowed down regional IRA a lot.
>> For example, a small function with 50 loops one after another.
>> Therefore IRA has already code (decreasing # of regions/loops for
>> consideration) to deal with the situation.
>
> I know little about weather prediction, but as usual simulation I would not
> expect it to have very deep loop nests. Toon, would be possible to have loop tree
> of the function in question?
>>
>> The worst what I saw a few years ago.  One customer system generated
>> functions with 100K blocks and >1M pseudos.   But fortunately they
>> started to split the code into several functions.
>
> What about the following patch that avoid ineffectivity of the loop exists
> enumerating code? Or is IRA enumerating exists in every loop just once during
> the analysis?

I think Tom patched GCC to inline every called-once function into a gigantic
main function and the ineffiency is not from recording exit edges but from
allocating the IRA bitmaps for each loop in advance (compared to for
BBs where we seem to set them simply to NULL).

Richard.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-21 17:56                                   ` Richard Guenther
@ 2009-11-21 17:57                                     ` Jan Hubicka
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Hubicka @ 2009-11-21 17:57 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Jan Hubicka, Vladimir Makarov, Toon Moene, Rafael Espindola,
	Jan Hubicka, gcc-patches

> 
> I think Tom patched GCC to inline every called-once function into a gigantic
> main function and the ineffiency is not from recording exit edges but from
> allocating the IRA bitmaps for each loop in advance (compared to for
> BBs where we seem to set them simply to NULL).

Yep, I don't expect patch to solve the problem, just it seemed natural thing to
do since enumeration of exists is quite ineffecient otherwise (O(loop body
size) instead of O(num exists))
> 
> Richard.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-21 17:01                               ` Vladimir Makarov
  2009-11-21 17:48                                 ` Jan Hubicka
@ 2009-11-21 18:09                                 ` Toon Moene
  1 sibling, 0 replies; 38+ messages in thread
From: Toon Moene @ 2009-11-21 18:09 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Jan Hubicka, Rafael Espindola, Richard Guenther, Jan Hubicka,
	gcc-patches

Vladimir Makarov wrote:

> The worst what I saw a few years ago.  One customer system generated 
> functions with 100K blocks and >1M pseudos.   But fortunately they 
> started to split the code into several functions.

Very good point.  Note that I did the reverse: I ordered the compiler to 
inline all functions only-called-once.

This might have produced a gigantic "first function" with - indeed - > 
100K blocks and > 1M pseudos.

To put this into perspective:  I am not surprised that this might fail - 
I simply try to figure out if this is an error in some pass, or a design 
limitation ...

Cheers,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-21 11:57                           ` Toon Moene
  2009-11-21 15:37                             ` Jan Hubicka
@ 2009-11-22  7:20                             ` Vladimir Makarov
  2009-11-22 14:19                               ` Toon Moene
  2009-11-22 18:01                               ` Richard Guenther
  1 sibling, 2 replies; 38+ messages in thread
From: Vladimir Makarov @ 2009-11-22  7:20 UTC (permalink / raw)
  To: Toon Moene
  Cc: Jan Hubicka, Rafael Espindola, Richard Guenther, Jan Hubicka,
	gcc-patches

Toon Moene wrote:
> Toon Moene wrote:
>
>> Jan Hubicka wrote:
>
>>> :) It would be nice to know what caused the OOM.  Is just one of 
>>> passes exploding
>>> on presence of very large bodies?
>>
>> I'll try to figure this out over the weekend (sorry, don't have more 
>> spare time).
>>
>> It's most probably a single pass, because the memory requirements 
>> kept creeping up to 12.5 Gbytes from 10, slowly increasing all the 
>> time over several minutes.
>
> Here are the tracebacks from gdb attached to the lto1 process, while 
> it was expanding from 7 to 12 Gb:
>
> (gdb) where
> #0  0x00002b961290491e in memset () from /lib/libc.so.6
> #1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at 
> ../../gcc/gcc/ira-build.c:155
> #2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
> #3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
> #4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
> #5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at 
> ../../gcc/gcc/passes.c:1522
> #6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at 
> ../../gcc/gcc/passes.c:1577
> #7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at 
> ../../gcc/gcc/passes.c:1578
> #8  0x0000000000656e1c in tree_rest_of_compilation 
> (fndecl=0x2b961f5d6a00) at ../../gcc/gcc/tree-optimize.c:407
> #9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) 
> at ../../gcc/gcc/cgraphunit.c:1178
> #10 0x00000000007835ed in cgraph_expand_all_functions () at 
> ../../gcc/gcc/cgraphunit.c:1245
> #11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
> #12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at 
> ../../gcc/gcc/lto/lto.c:2054
> #13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at 
> ../../gcc/gcc/toplev.c:1049
> #14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
> #15 toplev_main (argc=1244, argv=0x291cfb0) at 
> ../../gcc/gcc/toplev.c:2446
> #16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
> #17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113
>
> (gdb) where
> #0  0x00002b961290490f in memset () from /lib/libc.so.6
> #1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at 
> ../../gcc/gcc/ira-build.c:155
> #2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
> #3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
> #4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
> #5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at 
> ../../gcc/gcc/passes.c:1522
> #6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at 
> ../../gcc/gcc/passes.c:1577
> #7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at 
> ../../gcc/gcc/passes.c:1578
> #8  0x0000000000656e1c in tree_rest_of_compilation 
> (fndecl=0x2b961f5d6a00) at ../../gcc/gcc/tree-optimize.c:407
> #9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) 
> at ../../gcc/gcc/cgraphunit.c:1178
> #10 0x00000000007835ed in cgraph_expand_all_functions () at 
> ../../gcc/gcc/cgraphunit.c:1245
> #11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
> #12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at 
> ../../gcc/gcc/lto/lto.c:2054
> #13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at 
> ../../gcc/gcc/toplev.c:1049
> #14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
> #15 toplev_main (argc=1244, argv=0x291cfb0) at 
> ../../gcc/gcc/toplev.c:2446
> #16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
> #17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113
>
> (gdb) where
> #0  0x00002b96128fc26c in ?? () from /lib/libc.so.6
> #1  0x00002b96128fde24 in calloc () from /lib/libc.so.6
> #2  0x0000000000a6ea7a in xcalloc (nelem=19, elsize=8) at 
> ../../gcc/libiberty/xmalloc.c:162
> #3  0x000000000099d8c0 in get_loop_body (loop=0x2b966e38ecf0) at 
> ../../gcc/gcc/cfgloop.c:819
> #4  0x000000000099e14c in get_loop_exit_edges (loop=0x2b966e38ecf0) at 
> ../../gcc/gcc/cfgloop.c:1157
> #5  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at 
> ../../gcc/gcc/ira-build.c:155
> #6  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
> #7  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
> #8  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
> #9  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at 
> ../../gcc/gcc/passes.c:1522
> #10 0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at 
> ../../gcc/gcc/passes.c:1577
> #11 0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at 
> ../../gcc/gcc/passes.c:1578
> #12 0x0000000000656e1c in tree_rest_of_compilation 
> (fndecl=0x2b961f5d6a00) at ../../gcc/gcc/tree-optimize.c:407
> #13 0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) 
> at ../../gcc/gcc/cgraphunit.c:1178
> #14 0x00000000007835ed in cgraph_expand_all_functions () at 
> ../../gcc/gcc/cgraphunit.c:1245
> #15 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
> #16 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at 
> ../../gcc/gcc/lto/lto.c:2054
> #17 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at 
> ../../gcc/gcc/toplev.c:1049
> #18 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
> #19 toplev_main (argc=1244, argv=0x291cfb0) at 
> ../../gcc/gcc/toplev.c:2446
> #20 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
> #21 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113
>
> (gdb) where
> #0  0x00002b961290491e in memset () from /lib/libc.so.6
> #1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at 
> ../../gcc/gcc/ira-build.c:155
> #2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
> #3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
> #4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
> #5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at 
> ../../gcc/gcc/passes.c:1522
> #6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at 
> ../../gcc/gcc/passes.c:1577
> #7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at 
> ../../gcc/gcc/passes.c:1578
> #8  0x0000000000656e1c in tree_rest_of_compilation 
> (fndecl=0x2b961f5d6a00) at ../../gcc/gcc/tree-optimize.c:407
> #9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) 
> at ../../gcc/gcc/cgraphunit.c:1178
> #10 0x00000000007835ed in cgraph_expand_all_functions () at 
> ../../gcc/gcc/cgraphunit.c:1245
> #11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
> #12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at 
> ../../gcc/gcc/lto/lto.c:2054
> #13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at 
> ../../gcc/gcc/toplev.c:1049
> #14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
> #15 toplev_main (argc=1244, argv=0x291cfb0) at 
> ../../gcc/gcc/toplev.c:2446
> #16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
> #17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113
>
> So it seems to be stuck in a part of the IRA pass ...
>
> Hope this helps (it's close to impossible to build a test case out of 
> this, because the programs consists of around 3/4 of our ~ 1 million 
> lines of Fortran code.
>
I'd recommend to try -fira-region=one and to see what memory 
requirements would be.

For such big function the conflict table would be very big.  This is a 
common problem for RA using the conflict table.  IRA uses sophisticated 
algorithm for conflict table compression.  I even have no idea now how 
to improve it.  IRA has a parameter ira-max-conflict-table-size which 
affects the decision to use the conflict table.  If the conflict table 
is decided not to be used, the quality of RA worsens.  The default value 
is 1GB.  But I guess the conflict table in your case would be bigger.  
So you need to play with this parameter.

If -fira-region work for you, we could prohibit regional allocation for 
functions containing basic blocks which number is more than some threshold.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-22  7:20                             ` Vladimir Makarov
@ 2009-11-22 14:19                               ` Toon Moene
  2009-11-23 23:27                                 ` Peter Bergner
  2009-11-22 18:01                               ` Richard Guenther
  1 sibling, 1 reply; 38+ messages in thread
From: Toon Moene @ 2009-11-22 14:19 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Jan Hubicka, Rafael Espindola, Richard Guenther, Jan Hubicka,
	gcc-patches

Vladimir Makarov wrote:

> Toon Moene wrote:

>> So it seems to be stuck in a part of the IRA pass ...
>>
>> Hope this helps (it's close to impossible to build a test case out of 
>> this, because the programs consists of around 3/4 of our ~ 1 million 
>> lines of Fortran code.
>>
> I'd recommend to try -fira-region=one and to see what memory 
> requirements would be.

Yep, that worked.  CPU time about 18 minutes, top memory usage 3.5 Gbyte.

> For such big function the conflict table would be very big.  This is a 
> common problem for RA using the conflict table.  IRA uses sophisticated 
> algorithm for conflict table compression.  I even have no idea now how 
> to improve it.  IRA has a parameter ira-max-conflict-table-size which 
> affects the decision to use the conflict table.  If the conflict table 
> is decided not to be used, the quality of RA worsens.  The default value 
> is 1GB.  But I guess the conflict table in your case would be bigger.  
> So you need to play with this parameter.

[ This might be wrong-headed ] Is it so that every pseudo *has* to 
(possibly) conflict with *every* other one (i.e., that the conflict 
table has as many elements as the square of the number of pseudo's in a 
function) ?

Intuitively one would think that no pseudo lives over the entire range 
of such a large function, so there might be "pockets" of conflict, but 
not the whole function ...

> If -fira-region work for you, we could prohibit regional allocation for 
> functions containing basic blocks which number is more than some threshold.

Well, it worked (see above).

Cheers,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-22  7:20                             ` Vladimir Makarov
  2009-11-22 14:19                               ` Toon Moene
@ 2009-11-22 18:01                               ` Richard Guenther
  2009-11-23 16:29                                 ` Vladimir Makarov
  1 sibling, 1 reply; 38+ messages in thread
From: Richard Guenther @ 2009-11-22 18:01 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Toon Moene, Jan Hubicka, Rafael Espindola, Jan Hubicka, gcc-patches

On Sun, Nov 22, 2009 at 5:18 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> Toon Moene wrote:
>>
>> Toon Moene wrote:
>>
>>> Jan Hubicka wrote:
>>
>>>> :) It would be nice to know what caused the OOM.  Is just one of passes
>>>> exploding
>>>> on presence of very large bodies?
>>>
>>> I'll try to figure this out over the weekend (sorry, don't have more
>>> spare time).
>>>
>>> It's most probably a single pass, because the memory requirements kept
>>> creeping up to 12.5 Gbytes from 10, slowly increasing all the time over
>>> several minutes.
>>
>> Here are the tracebacks from gdb attached to the lto1 process, while it
>> was expanding from 7 to 12 Gb:
>>
>> (gdb) where
>> #0  0x00002b961290491e in memset () from /lib/libc.so.6
>> #1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at
>> ../../gcc/gcc/ira-build.c:155
>> #2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
>> #3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
>> #4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
>> #5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at
>> ../../gcc/gcc/passes.c:1522
>> #6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at
>> ../../gcc/gcc/passes.c:1577
>> #7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at
>> ../../gcc/gcc/passes.c:1578
>> #8  0x0000000000656e1c in tree_rest_of_compilation (fndecl=0x2b961f5d6a00)
>> at ../../gcc/gcc/tree-optimize.c:407
>> #9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) at
>> ../../gcc/gcc/cgraphunit.c:1178
>> #10 0x00000000007835ed in cgraph_expand_all_functions () at
>> ../../gcc/gcc/cgraphunit.c:1245
>> #11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
>> #12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at
>> ../../gcc/gcc/lto/lto.c:2054
>> #13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at
>> ../../gcc/gcc/toplev.c:1049
>> #14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
>> #15 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
>> #16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
>> #17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113
>>
>> (gdb) where
>> #0  0x00002b961290490f in memset () from /lib/libc.so.6
>> #1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at
>> ../../gcc/gcc/ira-build.c:155
>> #2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
>> #3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
>> #4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
>> #5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at
>> ../../gcc/gcc/passes.c:1522
>> #6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at
>> ../../gcc/gcc/passes.c:1577
>> #7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at
>> ../../gcc/gcc/passes.c:1578
>> #8  0x0000000000656e1c in tree_rest_of_compilation (fndecl=0x2b961f5d6a00)
>> at ../../gcc/gcc/tree-optimize.c:407
>> #9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) at
>> ../../gcc/gcc/cgraphunit.c:1178
>> #10 0x00000000007835ed in cgraph_expand_all_functions () at
>> ../../gcc/gcc/cgraphunit.c:1245
>> #11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
>> #12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at
>> ../../gcc/gcc/lto/lto.c:2054
>> #13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at
>> ../../gcc/gcc/toplev.c:1049
>> #14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
>> #15 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
>> #16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
>> #17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113
>>
>> (gdb) where
>> #0  0x00002b96128fc26c in ?? () from /lib/libc.so.6
>> #1  0x00002b96128fde24 in calloc () from /lib/libc.so.6
>> #2  0x0000000000a6ea7a in xcalloc (nelem=19, elsize=8) at
>> ../../gcc/libiberty/xmalloc.c:162
>> #3  0x000000000099d8c0 in get_loop_body (loop=0x2b966e38ecf0) at
>> ../../gcc/gcc/cfgloop.c:819
>> #4  0x000000000099e14c in get_loop_exit_edges (loop=0x2b966e38ecf0) at
>> ../../gcc/gcc/cfgloop.c:1157
>> #5  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at
>> ../../gcc/gcc/ira-build.c:155
>> #6  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
>> #7  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
>> #8  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
>> #9  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at
>> ../../gcc/gcc/passes.c:1522
>> #10 0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at
>> ../../gcc/gcc/passes.c:1577
>> #11 0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at
>> ../../gcc/gcc/passes.c:1578
>> #12 0x0000000000656e1c in tree_rest_of_compilation (fndecl=0x2b961f5d6a00)
>> at ../../gcc/gcc/tree-optimize.c:407
>> #13 0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) at
>> ../../gcc/gcc/cgraphunit.c:1178
>> #14 0x00000000007835ed in cgraph_expand_all_functions () at
>> ../../gcc/gcc/cgraphunit.c:1245
>> #15 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
>> #16 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at
>> ../../gcc/gcc/lto/lto.c:2054
>> #17 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at
>> ../../gcc/gcc/toplev.c:1049
>> #18 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
>> #19 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
>> #20 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
>> #21 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113
>>
>> (gdb) where
>> #0  0x00002b961290491e in memset () from /lib/libc.so.6
>> #1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at
>> ../../gcc/gcc/ira-build.c:155
>> #2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
>> #3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
>> #4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
>> #5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at
>> ../../gcc/gcc/passes.c:1522
>> #6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at
>> ../../gcc/gcc/passes.c:1577
>> #7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at
>> ../../gcc/gcc/passes.c:1578
>> #8  0x0000000000656e1c in tree_rest_of_compilation (fndecl=0x2b961f5d6a00)
>> at ../../gcc/gcc/tree-optimize.c:407
>> #9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) at
>> ../../gcc/gcc/cgraphunit.c:1178
>> #10 0x00000000007835ed in cgraph_expand_all_functions () at
>> ../../gcc/gcc/cgraphunit.c:1245
>> #11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
>> #12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at
>> ../../gcc/gcc/lto/lto.c:2054
>> #13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at
>> ../../gcc/gcc/toplev.c:1049
>> #14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
>> #15 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
>> #16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
>> #17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113
>>
>> So it seems to be stuck in a part of the IRA pass ...
>>
>> Hope this helps (it's close to impossible to build a test case out of
>> this, because the programs consists of around 3/4 of our ~ 1 million lines
>> of Fortran code.
>>
> I'd recommend to try -fira-region=one and to see what memory requirements
> would be.
>
> For such big function the conflict table would be very big.  This is a
> common problem for RA using the conflict table.  IRA uses sophisticated
> algorithm for conflict table compression.  I even have no idea now how to
> improve it.  IRA has a parameter ira-max-conflict-table-size which affects
> the decision to use the conflict table.  If the conflict table is decided
> not to be used, the quality of RA worsens.  The default value is 1GB.  But I
> guess the conflict table in your case would be bigger.  So you need to play
> with this parameter.
>
> If -fira-region work for you, we could prohibit regional allocation for
> functions containing basic blocks which number is more than some threshold.

Can't we split a function at points of minimal # of life pseudos and allocate
the resulting regions independently?  Of course there would be hard
constraints on the entry of each such region, just like we have on
function entry
for parameters.

No idea if that would help in practice, of course.

Richard.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-22 18:01                               ` Richard Guenther
@ 2009-11-23 16:29                                 ` Vladimir Makarov
  2009-11-23 16:33                                   ` Jan Hubicka
                                                     ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Vladimir Makarov @ 2009-11-23 16:29 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Toon Moene, Jan Hubicka, Rafael Espindola, Jan Hubicka, gcc-patches

Richard Guenther wrote:
> On Sun, Nov 22, 2009 at 5:18 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>   
>> Toon Moene wrote:
>>     
>>> Toon Moene wrote:
>>>
>>>       
>>>> Jan Hubicka wrote:
>>>>         
>>>>> :) It would be nice to know what caused the OOM.  Is just one of passes
>>>>> exploding
>>>>> on presence of very large bodies?
>>>>>           
>>>> I'll try to figure this out over the weekend (sorry, don't have more
>>>> spare time).
>>>>
>>>> It's most probably a single pass, because the memory requirements kept
>>>> creeping up to 12.5 Gbytes from 10, slowly increasing all the time over
>>>> several minutes.
>>>>         
>>> Here are the tracebacks from gdb attached to the lto1 process, while it
>>> was expanding from 7 to 12 Gb:
>>>
>>> (gdb) where
>>> #0  0x00002b961290491e in memset () from /lib/libc.so.6
>>> #1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at
>>> ../../gcc/gcc/ira-build.c:155
>>> #2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
>>> #3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
>>> #4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
>>> #5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at
>>> ../../gcc/gcc/passes.c:1522
>>> #6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at
>>> ../../gcc/gcc/passes.c:1577
>>> #7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at
>>> ../../gcc/gcc/passes.c:1578
>>> #8  0x0000000000656e1c in tree_rest_of_compilation (fndecl=0x2b961f5d6a00)
>>> at ../../gcc/gcc/tree-optimize.c:407
>>> #9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) at
>>> ../../gcc/gcc/cgraphunit.c:1178
>>> #10 0x00000000007835ed in cgraph_expand_all_functions () at
>>> ../../gcc/gcc/cgraphunit.c:1245
>>> #11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
>>> #12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at
>>> ../../gcc/gcc/lto/lto.c:2054
>>> #13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at
>>> ../../gcc/gcc/toplev.c:1049
>>> #14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
>>> #15 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
>>> #16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
>>> #17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113
>>>
>>> (gdb) where
>>> #0  0x00002b961290490f in memset () from /lib/libc.so.6
>>> #1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at
>>> ../../gcc/gcc/ira-build.c:155
>>> #2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
>>> #3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
>>> #4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
>>> #5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at
>>> ../../gcc/gcc/passes.c:1522
>>> #6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at
>>> ../../gcc/gcc/passes.c:1577
>>> #7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at
>>> ../../gcc/gcc/passes.c:1578
>>> #8  0x0000000000656e1c in tree_rest_of_compilation (fndecl=0x2b961f5d6a00)
>>> at ../../gcc/gcc/tree-optimize.c:407
>>> #9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) at
>>> ../../gcc/gcc/cgraphunit.c:1178
>>> #10 0x00000000007835ed in cgraph_expand_all_functions () at
>>> ../../gcc/gcc/cgraphunit.c:1245
>>> #11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
>>> #12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at
>>> ../../gcc/gcc/lto/lto.c:2054
>>> #13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at
>>> ../../gcc/gcc/toplev.c:1049
>>> #14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
>>> #15 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
>>> #16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
>>> #17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113
>>>
>>> (gdb) where
>>> #0  0x00002b96128fc26c in ?? () from /lib/libc.so.6
>>> #1  0x00002b96128fde24 in calloc () from /lib/libc.so.6
>>> #2  0x0000000000a6ea7a in xcalloc (nelem=19, elsize=8) at
>>> ../../gcc/libiberty/xmalloc.c:162
>>> #3  0x000000000099d8c0 in get_loop_body (loop=0x2b966e38ecf0) at
>>> ../../gcc/gcc/cfgloop.c:819
>>> #4  0x000000000099e14c in get_loop_exit_edges (loop=0x2b966e38ecf0) at
>>> ../../gcc/gcc/cfgloop.c:1157
>>> #5  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at
>>> ../../gcc/gcc/ira-build.c:155
>>> #6  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
>>> #7  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
>>> #8  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
>>> #9  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at
>>> ../../gcc/gcc/passes.c:1522
>>> #10 0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at
>>> ../../gcc/gcc/passes.c:1577
>>> #11 0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at
>>> ../../gcc/gcc/passes.c:1578
>>> #12 0x0000000000656e1c in tree_rest_of_compilation (fndecl=0x2b961f5d6a00)
>>> at ../../gcc/gcc/tree-optimize.c:407
>>> #13 0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) at
>>> ../../gcc/gcc/cgraphunit.c:1178
>>> #14 0x00000000007835ed in cgraph_expand_all_functions () at
>>> ../../gcc/gcc/cgraphunit.c:1245
>>> #15 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
>>> #16 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at
>>> ../../gcc/gcc/lto/lto.c:2054
>>> #17 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at
>>> ../../gcc/gcc/toplev.c:1049
>>> #18 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
>>> #19 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
>>> #20 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
>>> #21 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113
>>>
>>> (gdb) where
>>> #0  0x00002b961290491e in memset () from /lib/libc.so.6
>>> #1  0x0000000000530632 in create_loop_tree_nodes (loops_p=1 '\001') at
>>> ../../gcc/gcc/ira-build.c:155
>>> #2  ira_build (loops_p=1 '\001') at ../../gcc/gcc/ira-build.c:2773
>>> #3  0x000000000052a3db in ira () at ../../gcc/gcc/ira.c:3179
>>> #4  rest_of_handle_ira () at ../../gcc/gcc/ira.c:3350
>>> #5  0x00000000005867ff in execute_one_pass (pass=0xd2f500) at
>>> ../../gcc/gcc/passes.c:1522
>>> #6  0x0000000000586a75 in execute_pass_list (pass=0xd2f500) at
>>> ../../gcc/gcc/passes.c:1577
>>> #7  0x0000000000586a87 in execute_pass_list (pass=0xdb0a20) at
>>> ../../gcc/gcc/passes.c:1578
>>> #8  0x0000000000656e1c in tree_rest_of_compilation (fndecl=0x2b961f5d6a00)
>>> at ../../gcc/gcc/tree-optimize.c:407
>>> #9  0x0000000000781c8c in cgraph_expand_function (node=0x2b9618367000) at
>>> ../../gcc/gcc/cgraphunit.c:1178
>>> #10 0x00000000007835ed in cgraph_expand_all_functions () at
>>> ../../gcc/gcc/cgraphunit.c:1245
>>> #11 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1491
>>> #12 0x00000000004165cf in lto_main (debug_p=<value optimized out>) at
>>> ../../gcc/gcc/lto/lto.c:2054
>>> #13 0x000000000061a28e in compile_file (argc=1244, argv=0x291cfb0) at
>>> ../../gcc/gcc/toplev.c:1049
>>> #14 do_compile (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2404
>>> #15 toplev_main (argc=1244, argv=0x291cfb0) at ../../gcc/gcc/toplev.c:2446
>>> #16 0x00002b96128a7a8d in __libc_start_main () from /lib/libc.so.6
>>> #17 0x0000000000400249 in _start () at ../sysdeps/x86_64/elf/start.S:113
>>>
>>> So it seems to be stuck in a part of the IRA pass ...
>>>
>>> Hope this helps (it's close to impossible to build a test case out of
>>> this, because the programs consists of around 3/4 of our ~ 1 million lines
>>> of Fortran code.
>>>
>>>       
>> I'd recommend to try -fira-region=one and to see what memory requirements
>> would be.
>>
>> For such big function the conflict table would be very big.  This is a
>> common problem for RA using the conflict table.  IRA uses sophisticated
>> algorithm for conflict table compression.  I even have no idea now how to
>> improve it.  IRA has a parameter ira-max-conflict-table-size which affects
>> the decision to use the conflict table.  If the conflict table is decided
>> not to be used, the quality of RA worsens.  The default value is 1GB.  But I
>> guess the conflict table in your case would be bigger.  So you need to play
>> with this parameter.
>>
>> If -fira-region work for you, we could prohibit regional allocation for
>> functions containing basic blocks which number is more than some threshold.
>>     
>
> Can't we split a function at points of minimal # of life pseudos and allocate
> the resulting regions independently?  Of course there would be hard
> constraints on the entry of each such region, just like we have on
> function entry
> for parameters.
>
> No idea if that would help in practice, of course.
>   
That is an interesting proposal, Richard.  I think it could help.   
There are a lot questions  about  heuristics (# min number may divide in 
two very different parts -- one very small and one very big).  Also I'd 
prefer to implement something like non-regional allocation first and 
than splitting live ranges over regions for pseudos transparent over the 
region and which got hard-register.  On my estimation it could decrease 
demand for resources for regional allocation a lot.

In any case, the both solutions need sometime in implementation and 
evaluation.  Meanwhile, I'll submit a patch preventing regional 
allocation where there are a lot of BBs.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-23 16:29                                 ` Vladimir Makarov
@ 2009-11-23 16:33                                   ` Jan Hubicka
  2009-11-23 17:05                                     ` Vladimir Makarov
  2009-11-24 10:21                                   ` Paolo Bonzini
  2009-11-24 15:18                                   ` Peter Bergner
  2 siblings, 1 reply; 38+ messages in thread
From: Jan Hubicka @ 2009-11-23 16:33 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Richard Guenther, Toon Moene, Jan Hubicka, Rafael Espindola,
	Jan Hubicka, gcc-patches

> That is an interesting proposal, Richard.  I think it could help.    
> There are a lot questions  about  heuristics (# min number may divide in  
> two very different parts -- one very small and one very big).  Also I'd  
> prefer to implement something like non-regional allocation first and  
> than splitting live ranges over regions for pseudos transparent over the  
> region and which got hard-register.  On my estimation it could decrease  
> demand for resources for regional allocation a lot.
>
> In any case, the both solutions need sometime in implementation and  
> evaluation.  Meanwhile, I'll submit a patch preventing regional  
> allocation where there are a lot of BBs.

Just out of curiosity and because we might want to synchronize your cutoff with
inliner's idea of large functions (even if BB counts seen by inliner not really
match very closely BB counts seen by RA).  What code quality degradation do you
think one can expect by this?

Honza

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-23 16:33                                   ` Jan Hubicka
@ 2009-11-23 17:05                                     ` Vladimir Makarov
  2009-11-23 17:07                                       ` Jan Hubicka
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Makarov @ 2009-11-23 17:05 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Richard Guenther, Toon Moene, Rafael Espindola, Jan Hubicka, gcc-patches

Jan Hubicka wrote:
>> That is an interesting proposal, Richard.  I think it could help.    
>> There are a lot questions  about  heuristics (# min number may divide in  
>> two very different parts -- one very small and one very big).  Also I'd  
>> prefer to implement something like non-regional allocation first and  
>> than splitting live ranges over regions for pseudos transparent over the  
>> region and which got hard-register.  On my estimation it could decrease  
>> demand for resources for regional allocation a lot.
>>
>> In any case, the both solutions need sometime in implementation and  
>> evaluation.  Meanwhile, I'll submit a patch preventing regional  
>> allocation where there are a lot of BBs.
>>     
>
> Just out of curiosity and because we might want to synchronize your cutoff with
> inliner's idea of large functions (even if BB counts seen by inliner not really
> match very closely BB counts seen by RA).  What code quality degradation do you
> think one can expect by this?
>
>   
I'd guess 1-2% if it was in average SPEC2000 like program.  But I think 
it is compensated by improvements from other optimizations working on 
bigger functions.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-23 17:05                                     ` Vladimir Makarov
@ 2009-11-23 17:07                                       ` Jan Hubicka
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Hubicka @ 2009-11-23 17:07 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Jan Hubicka, Richard Guenther, Toon Moene, Rafael Espindola,
	Jan Hubicka, gcc-patches

> I'd guess 1-2% if it was in average SPEC2000 like program.  But I think 
> it is compensated by improvements from other optimizations working on 
> bigger functions.

Agreed, this sounds like quite resonable tradeoff. I just wondered if it
won't slow down everything by 20% or so because then we would get into
usual unexplainable issues with more inlining suddenly making everything
a lot slower.

Honza

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-22 14:19                               ` Toon Moene
@ 2009-11-23 23:27                                 ` Peter Bergner
  2009-11-24  6:48                                   ` Vladimir Makarov
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Bergner @ 2009-11-23 23:27 UTC (permalink / raw)
  To: Toon Moene
  Cc: Vladimir Makarov, Jan Hubicka, Rafael Espindola,
	Richard Guenther, Jan Hubicka, gcc-patches

On Sun, 2009-11-22 at 12:41 +0100, Toon Moene wrote:
> [ This might be wrong-headed ] Is it so that every pseudo *has* to 
> (possibly) conflict with *every* other one (i.e., that the conflict 
> table has as many elements as the square of the number of pseudo's in a 
> function) ?
> 
> Intuitively one would think that no pseudo lives over the entire range 
> of such a large function, so there might be "pockets" of conflict, but 
> not the whole function ...

This is exactly what I did for the 4.3 (ie, pre IRA) compiler and I saw
some good reductions in the size of the interference graph, so no, you're
not wrong-headed. :)

    http://gcc.gnu.org/ml/gcc-patches/2007-09/msg00529.html

Basically, if you ever ask whether two pseudos interfere, then you need to
reserve space in the interference graph to hold their interference info.
The way Chaitin/Briggs and GCC's (4.3) implementations work, we only "ask"
whether two pseudos conflict if they are live simultaneously (actually,
due to copy coalescing, we have to relax this to being live in the same
instruction).  The patch above which is in the FSF 4.3 sources, eliminated
the space for live ranges that live in separate blocks.

For SPEC2000, we roughly use 84% less space than the old square bit
matirx and 68% less space than a conventional triangular bit matrix.
In some case like 177.mesa/get.c:gl_GetBooleanv (it's basically just a
huge switch statement), we use a LOT less (99.6%).  I'll note that the
-fdump-rtl-greg output displays how much space the old square matrix would
use, how much space a conventional triangular bit matrix would use and how
much we used for our compressed triangular bit matrix.  It'd be interesting
to see how much space is used compiling with the 4.3 compiler.

The patch above does not help with single basic block functions (like the
old SPEC92 fpppp routine) or where all live ranges are global.  I'll note
I do have code to help the fppp case, but it didn't make the 4.3 timeframe
and now that we've switched to IRA, it's moot.

I'll also note that I have completely ignored the fact that by switching
to a triangular bit matrix (compressed or conventional), we had to
introduce an adjacency list which does take some additional space, but
allows much faster access to ones neighbors.

Peter



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-23 23:27                                 ` Peter Bergner
@ 2009-11-24  6:48                                   ` Vladimir Makarov
  2009-11-24 15:29                                     ` Peter Bergner
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Makarov @ 2009-11-24  6:48 UTC (permalink / raw)
  To: Peter Bergner
  Cc: Toon Moene, Jan Hubicka, Rafael Espindola, Richard Guenther,
	Jan Hubicka, gcc-patches

Peter Bergner wrote:
> On Sun, 2009-11-22 at 12:41 +0100, Toon Moene wrote:
>   
>> [ This might be wrong-headed ] Is it so that every pseudo *has* to 
>> (possibly) conflict with *every* other one (i.e., that the conflict 
>> table has as many elements as the square of the number of pseudo's in a 
>> function) ?
>>
>> Intuitively one would think that no pseudo lives over the entire range 
>> of such a large function, so there might be "pockets" of conflict, but 
>> not the whole function ...
>>     
>
> This is exactly what I did for the 4.3 (ie, pre IRA) compiler and I saw
> some good reductions in the size of the interference graph, so no, you're
> not wrong-headed. :)
>
>     http://gcc.gnu.org/ml/gcc-patches/2007-09/msg00529.html
>
> Basically, if you ever ask whether two pseudos interfere, then you need to
> reserve space in the interference graph to hold their interference info.
> The way Chaitin/Briggs and GCC's (4.3) implementations work, we only "ask"
> whether two pseudos conflict if they are live simultaneously (actually,
> due to copy coalescing, we have to relax this to being live in the same
> instruction).  The patch above which is in the FSF 4.3 sources, eliminated
> the space for live ranges that live in separate blocks.
>
> For SPEC2000, we roughly use 84% less space than the old square bit
> matirx and 68% less space than a conventional triangular bit matrix.
> In some case like 177.mesa/get.c:gl_GetBooleanv (it's basically just a
> huge switch statement), we use a LOT less (99.6%).  I'll note that the
> -fdump-rtl-greg output displays how much space the old square matrix would
> use, how much space a conventional triangular bit matrix would use and how
> much we used for our compressed triangular bit matrix.  It'd be interesting
> to see how much space is used compiling with the 4.3 compiler.
>
> The patch above does not help with single basic block functions (like the
> old SPEC92 fpppp routine) or where all live ranges are global.  I'll note
> I do have code to help the fppp case, but it didn't make the 4.3 timeframe
> and now that we've switched to IRA, it's moot.
>
>   
Peter, the analogous code you mentioned is actually in IRA.  Your 
approach has been even improved because it works not on BB base but on 
live ranges.  Therefore it helps even fpppp or for compression of global 
pseudos conflicts.
> I'll also note that I have completely ignored the fact that by switching
> to a triangular bit matrix (compressed or conventional), we had to
> introduce an adjacency list which does take some additional space, but
> allows much faster access to ones neighbors.
>
>
>   
It has been done in IRA too.  IRA decides what is the best 
representation for adjacent list.

The problem with IRA regional allocation is that #allocnos might be much 
more #pseudos.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-23 16:29                                 ` Vladimir Makarov
  2009-11-23 16:33                                   ` Jan Hubicka
@ 2009-11-24 10:21                                   ` Paolo Bonzini
  2009-11-24 15:18                                   ` Peter Bergner
  2 siblings, 0 replies; 38+ messages in thread
From: Paolo Bonzini @ 2009-11-24 10:21 UTC (permalink / raw)
  To: gcc-patches

On 11/23/2009 05:21 PM, Vladimir Makarov wrote:
> There are a lot questions  about  heuristics (# min number may divide in
> two very different parts -- one very small and one very big).

You can find blocks that postdominate the entry BB and dominate the exit 
BB, and split at the middle one.

Paolo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-23 16:29                                 ` Vladimir Makarov
  2009-11-23 16:33                                   ` Jan Hubicka
  2009-11-24 10:21                                   ` Paolo Bonzini
@ 2009-11-24 15:18                                   ` Peter Bergner
  2 siblings, 0 replies; 38+ messages in thread
From: Peter Bergner @ 2009-11-24 15:18 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Richard Guenther, Toon Moene, Jan Hubicka, Rafael Espindola,
	Jan Hubicka, gcc-patches

Richard Guenther wrote:
> Can't we split a function at points of minimal # of life pseudos and allocate
> the resulting regions independently?  Of course there would be hard
> constraints on the entry of each such region, just like we have on
> function entry
> for parameters.

Splitting at points where the # of live pseudos is minimal is not what you
want, since points where we have minimal number of live pseudos might end
up being on highly executed paths.  What you want is to find the locations
where the split cost is minimal.



Vladimir Makarov wrote:
> Also I'd prefer to implement something like non-regional allocation first
> and than splitting live ranges over regions ...

I prefer this type of solution too.

Peter




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-24  6:48                                   ` Vladimir Makarov
@ 2009-11-24 15:29                                     ` Peter Bergner
  2009-11-25  4:32                                       ` Vladimir Makarov
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Bergner @ 2009-11-24 15:29 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Toon Moene, Jan Hubicka, Rafael Espindola, Richard Guenther,
	Jan Hubicka, gcc-patches

On Mon, 2009-11-23 at 22:56 -0500, Vladimir Makarov wrote:
> The problem with IRA regional allocation is that #allocnos might be much 
> more #pseudos.

Remind me, unlike the 4.3 and earlier code where allocnos were just a
compressed list (ie, no unused numbers) of pseudo numbers so there was
a one-to-one mapping between pseudo numbers and allocno numbers, with
the new code, we actually have multiple allocnos per pseudo?

Peter




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-24 15:29                                     ` Peter Bergner
@ 2009-11-25  4:32                                       ` Vladimir Makarov
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Makarov @ 2009-11-25  4:32 UTC (permalink / raw)
  To: Peter Bergner
  Cc: Toon Moene, Jan Hubicka, Rafael Espindola, Richard Guenther,
	Jan Hubicka, gcc-patches

Peter Bergner wrote:
> On Mon, 2009-11-23 at 22:56 -0500, Vladimir Makarov wrote:
>   
>> The problem with IRA regional allocation is that #allocnos might be much 
>> more #pseudos.
>>     
>
> Remind me, unlike the 4.3 and earlier code where allocnos were just a
> compressed list (ie, no unused numbers) of pseudo numbers so there was
> a one-to-one mapping between pseudo numbers and allocno numbers, with
> the new code, we actually have multiple allocnos per pseudo?
>
>
>   
In IRA allocno is a pseudo live range through a region.  Therefore the 
more regions (and regions are nested and form a tree with top region 
corresponding to all function), the more allocnos. If there is only one 
region, allocno could be considered a whole pseudo.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-16 10:15       ` Whole program optimization and functions-only-called-once Jan Hubicka
  2009-11-16 14:28         ` Richard Guenther
  2009-11-16 18:25         ` Toon Moene
@ 2009-11-25 16:44         ` Martin Jambor
  2009-11-27 16:23           ` Jan Hubicka
  2010-11-08 17:40         ` H.J. Lu
  3 siblings, 1 reply; 38+ messages in thread
From: Martin Jambor @ 2009-11-25 16:44 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Toon Moene, Richard Guenther, Jan Hubicka, gcc-patches

Hi,


I have two comments and a question about this one too.  Sorry for
going through it this late.

On Mon, Nov 16, 2009 at 10:54:35AM +0100, Jan Hubicka wrote:
> This is patch I intend to commit after re-testing at x86_64-linux after
> some last minute changes.
> 
> 	* cgraph.c (cgraph_release_function_body): Update use of
> 	ipa_transforms_to_apply.
> 	(cgraph_remove_node): Remove ipa_transforms_to_apply.
> 	* cgraph.h (struct cgraph_node): Add ipa_transforms_to_apply.
> 	* cgraphunit.c (save_inline_function_body): Clear ipa_transforms for
> 	copied body.
> 	(cgraph_materialize_clone): Remove original if dead.
> 	* lto-streamer-in.c (lto_read_body): Remove FIXME and
> 	ipa_transforms_to_apply hack.
> 	* function.h (struct function): Add ipa_transforms_to_apply.
> 	* ipa.c (cgraph_remove_unreachable_nodes): Handle dead clone originals.
> 	* tree-inline.c (copy_bb): Update sanity check.
> 	(initialize_cfun): Do not copy ipa_transforms_to_apply.
> 	(expand_call_inline): remove dead clone originals.
> 	(tree_function_versioning): Merge transformation queues.
> 	* passes.c (add_ipa_transform_pass): Remove.
> 	(execute_one_ipa_transform_pass): Update ipa_transforms_to_apply
> 	tracking.
> 	(execute_all_ipa_transforms): Update.
> 	(execute_one_pass): Update.
> 
> 	* lto.c (read_cgraph_and_symbols): Set also ipa_transforms_to_apply.

...

> Index: tree-inline.c
> ===================================================================
> *** tree-inline.c	(revision 154198)
> --- tree-inline.c	(working copy)
> *************** expand_call_inline (basic_block bb, gimp
> *** 3822,3827 ****
> --- 3821,3830 ----
>     (*debug_hooks->outlining_inline_function) (cg_edge->callee->decl);
>   
>     /* Update callgraph if needed.  */
> +   if (cg_edge->callee->clone_of
> +       && !cg_edge->callee->clone_of->next_sibling_clone
> +       && !cg_edge->callee->analyzed)
> +     cgraph_remove_node (cg_edge->callee);
>     cgraph_remove_node (cg_edge->callee);

I guess you forgot to remove the unconditional call to
cgraph_remove_node.  The conditional one was probably never executed
when testing, by the way.

>   
>     id->block = NULL_TREE;
> *************** tree_function_versioning (tree old_decl,
> *** 4848,4853 ****
> --- 4851,4869 ----
>     id.src_node = old_version_node;
>     id.dst_node = new_version_node;
>     id.src_cfun = DECL_STRUCT_FUNCTION (old_decl);
> +   if (id.src_node->ipa_transforms_to_apply)
> +     {
> +       VEC(ipa_opt_pass,heap) * old_transforms_to_apply = id.dst_node->ipa_transforms_to_apply;
> +       unsigned int i;
> + 
> +       id.dst_node->ipa_transforms_to_apply = VEC_copy (ipa_opt_pass, heap,
> + 					               id.src_node->ipa_transforms_to_apply);
> +       for (i = 0; i < VEC_length (ipa_opt_pass, old_transforms_to_apply); i++)
> +         VEC_safe_push (ipa_opt_pass, heap, id.dst_node->ipa_transforms_to_apply,
> + 		       VEC_index (ipa_opt_pass,
> + 		       		  old_transforms_to_apply,
> + 				  i));
> +     }

This really looks like doubling the contents of the vector, rather
than copying it once.  Or am I missing something?

By the way, is it OK to copy it when cloning the body rather than when
creating virtual clones?  (If so, I wonder whether it is necessary at
all.)

Thanks,

Martin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-25 16:44         ` Martin Jambor
@ 2009-11-27 16:23           ` Jan Hubicka
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Hubicka @ 2009-11-27 16:23 UTC (permalink / raw)
  To: Jan Hubicka, Toon Moene, Richard Guenther, Jan Hubicka, gcc-patches

> > Index: tree-inline.c
> > ===================================================================
> > *** tree-inline.c	(revision 154198)
> > --- tree-inline.c	(working copy)
> > *************** expand_call_inline (basic_block bb, gimp
> > *** 3822,3827 ****
> > --- 3821,3830 ----
> >     (*debug_hooks->outlining_inline_function) (cg_edge->callee->decl);
> >   
> >     /* Update callgraph if needed.  */
> > +   if (cg_edge->callee->clone_of
> > +       && !cg_edge->callee->clone_of->next_sibling_clone
> > +       && !cg_edge->callee->analyzed)
> > +     cgraph_remove_node (cg_edge->callee);
> >     cgraph_remove_node (cg_edge->callee);
> 
> I guess you forgot to remove the unconditional call to
> cgraph_remove_node.  The conditional one was probably never executed
> when testing, by the way.

Ah, it should be cg_edge->calle->clone_of (since I was wondering about situation
where master of clone is kept around just to futfil inlining, but perhaps this is
not really needed, since we turn the masters into inline clones in unreachable function
removal when masters are unreachable).
So I will just drop this hunk, obviously we would ICE here if it ever matched.
> 
> >   
> >     id->block = NULL_TREE;
> > *************** tree_function_versioning (tree old_decl,
> > *** 4848,4853 ****
> > --- 4851,4869 ----
> >     id.src_node = old_version_node;
> >     id.dst_node = new_version_node;
> >     id.src_cfun = DECL_STRUCT_FUNCTION (old_decl);
> > +   if (id.src_node->ipa_transforms_to_apply)
> > +     {
> > +       VEC(ipa_opt_pass,heap) * old_transforms_to_apply = id.dst_node->ipa_transforms_to_apply;
> > +       unsigned int i;
> > + 
> > +       id.dst_node->ipa_transforms_to_apply = VEC_copy (ipa_opt_pass, heap,
> > + 					               id.src_node->ipa_transforms_to_apply);
> > +       for (i = 0; i < VEC_length (ipa_opt_pass, old_transforms_to_apply); i++)
> > +         VEC_safe_push (ipa_opt_pass, heap, id.dst_node->ipa_transforms_to_apply,
> > + 		       VEC_index (ipa_opt_pass,
> > + 		       		  old_transforms_to_apply,
> > + 				  i));
> > +     }
> 
> This really looks like doubling the contents of the vector, rather
> than copying it once.  Or am I missing something?

Ha, you are right, the VEC_copy was leftover of previous implementation.  Will fix that too.
(we need to concatenate the orignal and new, since the clone now might have transformations
already enqueued).  I guess this only passes by chance since we don't really have transform
except for apply_inline.

Honza
> 
> By the way, is it OK to copy it when cloning the body rather than when
> creating virtual clones?  (If so, I wonder whether it is necessary at
> all.)
> 
> Thanks,
> 
> Martin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Whole program optimization and functions-only-called-once.
  2009-11-16 10:15       ` Whole program optimization and functions-only-called-once Jan Hubicka
                           ` (2 preceding siblings ...)
  2009-11-25 16:44         ` Martin Jambor
@ 2010-11-08 17:40         ` H.J. Lu
  3 siblings, 0 replies; 38+ messages in thread
From: H.J. Lu @ 2010-11-08 17:40 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Toon Moene, Richard Guenther, Jan Hubicka, gcc-patches

On Mon, Nov 16, 2009 at 1:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Jan Hubicka wrote:
>>
>>> -fno-ipa-cp should work around your problem for time being.
>>
>> Indeed it did. Some figures:
>
> Thanks for confirmation!
>
>> Considering invlo4 size 1462.
>>  Called once from lowpass 2293 insns.
>>  Not inlined because --param large-function-growth limit reached.
>>
>> Considering invlo2 size 933.
>>  Called once from lowpass 2293 insns.
>>  Not inlined because --param large-function-growth limit reached.
>>
>> where the largest callee *does* get inlined, while two smaller ones
>> don't (I agree with Jan that this would have been solved by training the
>> inliner with profiling data, because only invlo4 gets called).
>
> Using profiling data does not really make inliner to bypass
> large-function-growth.  We can experiment with large-function-growth tweaking.
> So far i didn't see any testcase where this limit would result in runtime
> regression.
>
> This is patch I intend to commit after re-testing at x86_64-linux after
> some last minute changes.
>
>        * cgraph.c (cgraph_release_function_body): Update use of
>        ipa_transforms_to_apply.
>        (cgraph_remove_node): Remove ipa_transforms_to_apply.
>        * cgraph.h (struct cgraph_node): Add ipa_transforms_to_apply.
>        * cgraphunit.c (save_inline_function_body): Clear ipa_transforms for
>        copied body.
>        (cgraph_materialize_clone): Remove original if dead.
>        * lto-streamer-in.c (lto_read_body): Remove FIXME and
>        ipa_transforms_to_apply hack.
>        * function.h (struct function): Add ipa_transforms_to_apply.
>        * ipa.c (cgraph_remove_unreachable_nodes): Handle dead clone originals.
>        * tree-inline.c (copy_bb): Update sanity check.
>        (initialize_cfun): Do not copy ipa_transforms_to_apply.
>        (expand_call_inline): remove dead clone originals.
>        (tree_function_versioning): Merge transformation queues.
>        * passes.c (add_ipa_transform_pass): Remove.
>        (execute_one_ipa_transform_pass): Update ipa_transforms_to_apply
>        tracking.
>        (execute_all_ipa_transforms): Update.
>        (execute_one_pass): Update.
>
>        * lto.c (read_cgraph_and_symbols): Set also ipa_transforms_to_apply.

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46364


H.J.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2010-11-08 17:26 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4AF1D3C2.3000001@moene.org>
     [not found] ` <84fc9c000911041126x14ce9226w9dfb781ea284de6b@mail.gmail.com>
     [not found]   ` <20091112161638.GC11341@atrey.karlin.mff.cuni.cz>
     [not found]     ` <4AFEA884.9030003@moene.org>
2009-11-16 10:15       ` Whole program optimization and functions-only-called-once Jan Hubicka
2009-11-16 14:28         ` Richard Guenther
2009-11-16 14:39           ` Jan Hubicka
2009-11-16 18:25         ` Toon Moene
2009-11-17 14:14           ` Toon Moene
2009-11-17 14:57             ` Rafael Espindola
2009-11-17 15:10               ` Toon Moene
2009-11-17 16:06                 ` Rafael Espindola
2009-11-17 17:20                   ` Rafael Espindola
2009-11-18 19:54                     ` Toon Moene
2009-11-18 20:49                       ` Jan Hubicka
2009-11-18 20:56                         ` Toon Moene
2009-11-18 21:05                           ` Richard Guenther
2009-11-18 21:16                             ` Toon Moene
2009-11-18 22:10                               ` Eric Botcazou
2009-11-21 11:57                           ` Toon Moene
2009-11-21 15:37                             ` Jan Hubicka
2009-11-21 17:01                               ` Vladimir Makarov
2009-11-21 17:48                                 ` Jan Hubicka
2009-11-21 17:56                                   ` Richard Guenther
2009-11-21 17:57                                     ` Jan Hubicka
2009-11-21 18:09                                 ` Toon Moene
2009-11-22  7:20                             ` Vladimir Makarov
2009-11-22 14:19                               ` Toon Moene
2009-11-23 23:27                                 ` Peter Bergner
2009-11-24  6:48                                   ` Vladimir Makarov
2009-11-24 15:29                                     ` Peter Bergner
2009-11-25  4:32                                       ` Vladimir Makarov
2009-11-22 18:01                               ` Richard Guenther
2009-11-23 16:29                                 ` Vladimir Makarov
2009-11-23 16:33                                   ` Jan Hubicka
2009-11-23 17:05                                     ` Vladimir Makarov
2009-11-23 17:07                                       ` Jan Hubicka
2009-11-24 10:21                                   ` Paolo Bonzini
2009-11-24 15:18                                   ` Peter Bergner
2009-11-25 16:44         ` Martin Jambor
2009-11-27 16:23           ` Jan Hubicka
2010-11-08 17:40         ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).