public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Account for devirtualization opportunities in inliner
@ 2011-09-28  7:53 Maxim Kuvyrkov
  2011-10-19 22:21 ` Maxim Kuvyrkov
  0 siblings, 1 reply; 8+ messages in thread
From: Maxim Kuvyrkov @ 2011-09-28  7:53 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches, Martin Jambor

[-- Attachment #1: Type: text/plain, Size: 621 bytes --]

Jan,

The following patch starts a series of patches which improve devirtualization optimizations in GCC.

This patch builds on ipa-cp.c and ipa-prop.c infrastructure for analyzing parameters and jump functions and adds basic estimation of devirtualization benefit from inlining an edge.  E.g., if inlining A across edge E into B will allow some of the indirect edges of A to be resolved, then inlining cost of edge E is reduced.

The patch was bootstrapped and regtested on x86_64-linux-gnu on both -m32 and -m64 multilibs.

OK to commit?

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



[-- Attachment #2: fsf-gcc-devirt-account.ChangeLog --]
[-- Type: application/octet-stream, Size: 843 bytes --]

2011-09-28  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* ipa-cp.c (ipa_value_from_jfunc): Make global.
	(ipa_cst_from_jfunc): Remove, use ipa_value_from_jfunc instead.
	(get_indirect_edge_target): Rename, make global.
	(devirtualization_time_bonus, estimate_local_effects,)
	(ipcp_discover_new_direct_edges): Update.
	* ipa-inline-analysis.c (evaluate_conditions_for_edge):
	Generalize to also handle types.  Rename to ...
	(evaluate_conditions_vals_binfos_for_edge): Use instead of
	evaluate_conditions_for_edge.
	(estimate_edge_devirt_benefit): New function.
	(estimate_calls_size_and_time): Use it.
	(estimate_node_size_and_time, estimate_ipcp_clone_size_and_time,)
	(inline_merge_summary):	Update.
	(do_estimate_edge_time, do_estimate_edge_growth): Update.  Calculate
	parameter information at the call site and pass it on to subroutines.

[-- Attachment #3: fsf-gcc-devirt-account.patch --]
[-- Type: application/octet-stream, Size: 16185 bytes --]

commit d073388fe354eceb2e5b6f9ec2287b71f4ae6ccc
Author: Maxim Kuvyrkov <maxim@codesourcery.com>
Date:   Wed Sep 21 16:27:25 2011 -0700

    Account for devirtualization opportunities in inlining.

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 45cb00b..10e1834 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -693,7 +693,7 @@ ipa_value_from_known_type_jfunc (struct ipa_jump_func *jfunc)
    describes the caller node so that pass-through jump functions can be
    evaluated.  */
 
-static tree
+tree
 ipa_value_from_jfunc (struct ipa_node_params *info, struct ipa_jump_func *jfunc)
 {
   if (jfunc->type == IPA_JF_CONST)
@@ -753,21 +753,6 @@ ipa_value_from_jfunc (struct ipa_node_params *info, struct ipa_jump_func *jfunc)
     return NULL_TREE;
 }
 
-/* Determine whether JFUNC evaluates to a constant and if so, return it.
-   Otherwise return NULL. INFO describes the caller node so that pass-through
-   jump functions can be evaluated.  */
-
-tree
-ipa_cst_from_jfunc (struct ipa_node_params *info, struct ipa_jump_func *jfunc)
-{
-  tree res = ipa_value_from_jfunc (info, jfunc);
-
-  if (res && TREE_CODE (res) == TREE_BINFO)
-    return NULL_TREE;
-  else
-    return res;
-}
-
 
 /* If checking is enabled, verify that no lattice is in the TOP state, i.e. not
    bottom, not containing a variable component and without any known value at
@@ -1112,10 +1097,10 @@ propagate_constants_accross_call (struct cgraph_edge *cs)
    (which can contain both constants and binfos) or KNOWN_BINFOS (which can be
    NULL) return the destination.  */
 
-static tree
-get_indirect_edge_target (struct cgraph_edge *ie,
-			  VEC (tree, heap) *known_vals,
-			  VEC (tree, heap) *known_binfos)
+tree
+ipa_get_indirect_edge_target (struct cgraph_edge *ie,
+			      VEC (tree, heap) *known_vals,
+			      VEC (tree, heap) *known_binfos)
 {
   int param_index = ie->indirect_info->param_index;
   HOST_WIDE_INT token, anc_offset;
@@ -1185,7 +1170,7 @@ devirtualization_time_bonus (struct cgraph_node *node,
       struct inline_summary *isummary;
       tree target;
 
-      target = get_indirect_edge_target (ie, known_csts, known_binfos);
+      target = ipa_get_indirect_edge_target (ie, known_csts, known_binfos);
       if (!target)
 	continue;
 
@@ -1344,7 +1329,8 @@ estimate_local_effects (struct cgraph_node *node)
 
       init_caller_stats (&stats);
       cgraph_for_node_and_aliases (node, gather_caller_stats, &stats, false);
-      estimate_ipcp_clone_size_and_time (node, known_csts, &size, &time);
+      estimate_ipcp_clone_size_and_time (node, known_csts, known_binfos,
+					 &size, &time);
       time -= devirtualization_time_bonus (node, known_csts, known_binfos);
       time -= removable_params_cost;
       size -= stats.n_calls * removable_params_cost;
@@ -1415,7 +1401,8 @@ estimate_local_effects (struct cgraph_node *node)
 	  else
 	    continue;
 
-	  estimate_ipcp_clone_size_and_time (node, known_csts, &size, &time);
+	  estimate_ipcp_clone_size_and_time (node, known_csts, known_binfos,
+					     &size, &time);
 	  time_benefit = base_time - time
 	    + devirtualization_time_bonus (node, known_csts, known_binfos)
 	    + removable_params_cost + emc;
@@ -1673,7 +1660,7 @@ ipcp_discover_new_direct_edges (struct cgraph_node *node,
       tree target;
 
       next_ie = ie->next_callee;
-      target = get_indirect_edge_target (ie, known_vals, NULL);
+      target = ipa_get_indirect_edge_target (ie, known_vals, NULL);
       if (target)
 	ipa_make_edge_direct_to_target (ie, target);
     }
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index bd4d2ea..5e88c2d 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -711,14 +711,23 @@ evaluate_conditions_for_known_args (struct cgraph_node *node,
 /* Work out what conditions might be true at invocation of E.  */
 
 static clause_t
-evaluate_conditions_for_edge (struct cgraph_edge *e, bool inline_p)
+evaluate_conditions_vals_binfos_for_edge (struct cgraph_edge *e,
+					  bool inline_p,
+					  VEC (tree, heap) **known_vals_ptr,
+					  VEC (tree, heap) **known_binfos_ptr)
 {
   clause_t clause = inline_p ? 0 : 1 << predicate_not_inlined_condition;
   struct cgraph_node *callee = cgraph_function_or_thunk_node (e->callee, NULL);
   struct inline_summary *info = inline_summary (callee);
   int i;
 
-  if (ipa_node_params_vector && info->conds)
+  if (known_vals_ptr)
+    *known_vals_ptr = NULL;
+  if (known_binfos_ptr)
+    *known_binfos_ptr = NULL;
+
+  if (ipa_node_params_vector
+      && (info->conds || known_vals_ptr || known_binfos_ptr))
     {
       struct ipa_node_params *parms_info;
       struct ipa_edge_args *args = IPA_EDGE_REF (e);
@@ -731,25 +740,40 @@ evaluate_conditions_for_edge (struct cgraph_edge *e, bool inline_p)
       else
         parms_info = IPA_NODE_REF (e->caller);
 
-      if (count)
-        VEC_safe_grow_cleared (tree, heap, known_vals, count);
+      if (count && (info->conds || known_vals_ptr))
+	VEC_safe_grow_cleared (tree, heap, known_vals, count);
+      if (count && known_binfos_ptr)
+	VEC_safe_grow_cleared (tree, heap, *known_binfos_ptr, count);
+
       for (i = 0; i < count; i++)
 	{
-	  tree cst = ipa_cst_from_jfunc (parms_info,
-					 ipa_get_ith_jump_func (args, i));
+	  tree cst = ipa_value_from_jfunc (parms_info,
+					   ipa_get_ith_jump_func (args, i));
 	  if (cst)
-	    VEC_replace (tree, known_vals, i, cst);
+	    {
+	      if (info->conds && TREE_CODE (cst) != TREE_BINFO)
+		VEC_replace (tree, known_vals, i, cst);
+	      else if (known_binfos_ptr != NULL)
+		VEC_replace (tree, *known_binfos_ptr, i, cst);
+	    }
 	  else if (inline_p
 		   && !VEC_index (inline_param_summary_t,
 				  es->param,
 				  i)->change_prob)
 	    VEC_replace (tree, known_vals, i, error_mark_node);
 	}
-      clause = evaluate_conditions_for_known_args (callee,
-						   inline_p, known_vals);
-      VEC_free (tree, heap, known_vals);
+
+      if (info->conds)
+	clause = evaluate_conditions_for_known_args (callee, inline_p,
+						     known_vals);
+
+      if (known_vals_ptr)
+	*known_vals_ptr = known_vals;
+      else
+	VEC_free (tree, heap, known_vals);
     }
-  else
+
+  if (!info->conds)
     for (i = 0; i < (int)VEC_length (condition, info->conds); i++)
       clause |= 1 << (i + predicate_first_dynamic_condition);
 
@@ -2105,11 +2129,69 @@ estimate_edge_size_and_time (struct cgraph_edge *e, int *size, int *time,
 }
 
 
-/* Increase SIZE and TIME for size and time needed to handle all calls in NODE.  */
+/* Estimate benefit devirtualizing indirect edge IE, provided KNOWN_VALS and
+   KNOWN_BINFOS.  */
+
+static void
+estimate_edge_devirt_benefit (struct cgraph_edge *ie,
+			      int *size, int *time, int prob,
+			      VEC (tree, heap) *known_vals,
+			      VEC (tree, heap) *known_binfos)
+{
+  tree target;
+  struct cgraph_node *callee;
+  struct inline_summary *isummary;
+  int edge_size = 0, edge_time = 0;
+
+  if (!known_vals || !known_binfos)
+    return;
+
+  target = ipa_get_indirect_edge_target (ie, known_vals, known_binfos);
+  if (!target)
+    return;
+
+  /* Account for difference in cost between indirect and direct calls.  */
+  *size -= 2 * INLINE_SIZE_SCALE;
+  *time -= 2 * INLINE_TIME_SCALE * prob / REG_BR_PROB_BASE;
+
+  callee = cgraph_get_node (target);
+  if (!callee || !callee->analyzed)
+    return;
+  isummary = inline_summary (callee);
+  if (!isummary->inlinable)
+    return;
+
+  estimate_edge_size_and_time (ie, &edge_size, &edge_time, prob);
+
+  /* Count benefit only from functions that definitely will be inlined
+     if additional context from NODE's caller were available.  */
+  if (edge_size >= isummary->size * INLINE_SIZE_SCALE)
+    {
+      /* Subtract size and time that we just added for edge IE.  */
+      *size -= edge_size;
+      *time -= edge_time;
+
+      /* Subtract benefit from inlining devirtualized call.  */
+      *size -= edge_size - isummary->size * INLINE_SIZE_SCALE;
+      *time -= edge_time - (isummary->time * INLINE_TIME_SCALE * prob
+			    / REG_BR_PROB_BASE);
+
+      /* TODO: estimate benefit from optimizing CALLEE's body provided
+	 additional context from IE call site.
+	 For insipiration see ipa-cp.c: devirtualization_time_bonus().  */
+    }
+}
+
+
+/* Increase SIZE and TIME for size and time needed to handle all calls in NODE.
+   POSSIBLE_TRUTHS, KNOWN_VALS and KNOWN_BINFOS describe context of the call
+   site.  */
 
 static void
 estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time,
-			      clause_t possible_truths)
+			      clause_t possible_truths,
+			      VEC (tree, heap) *known_vals,
+			      VEC (tree, heap) *known_binfos)
 {
   struct cgraph_edge *e;
   for (e = node->callees; e; e = e->next_callee)
@@ -2125,25 +2207,35 @@ estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time,
 	    }
 	  else
 	    estimate_calls_size_and_time (e->callee, size, time,
-					  possible_truths);
+					  possible_truths,
+					  /* TODO: remap KNOWN_VALS and
+					     KNOWN_BINFOS to E->CALLEE
+					     parameters, and use them.  */
+					  NULL, NULL);
 	}
     }
-  /* TODO: look for devirtualizing oppurtunities.  */
   for (e = node->indirect_calls; e; e = e->next_callee)
     {
       struct inline_edge_summary *es = inline_edge_summary (e);
       if (!es->predicate || evaluate_predicate (es->predicate, possible_truths))
-        estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE);
+	{
+	  estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE);
+	  estimate_edge_devirt_benefit (e, size, time, REG_BR_PROB_BASE,
+					known_vals, known_binfos);
+	}
     }
 }
 
 
 /* Estimate size and time needed to execute NODE assuming
-   POSSIBLE_TRUTHS clause. */
+   POSSIBLE_TRUTHS clause, and KNOWN_VALS and KNOWN_BINFOS information
+   about NODE's arguments. */
 
 static void
 estimate_node_size_and_time (struct cgraph_node *node,
 			     clause_t possible_truths,
+			     VEC (tree, heap) *known_vals,
+			     VEC (tree, heap) *known_binfos,
 		       	     int *ret_size, int *ret_time,
 			     VEC (inline_param_summary_t, heap)
 			       *inline_param_summary)
@@ -2194,7 +2286,8 @@ estimate_node_size_and_time (struct cgraph_node *node,
   if (time > MAX_TIME * INLINE_TIME_SCALE)
     time = MAX_TIME * INLINE_TIME_SCALE;
 
-  estimate_calls_size_and_time (node, &size, &time, possible_truths);
+  estimate_calls_size_and_time (node, &size, &time, possible_truths,
+				known_vals, known_binfos);
   time = (time + INLINE_TIME_SCALE / 2) / INLINE_TIME_SCALE;
   size = (size + INLINE_SIZE_SCALE / 2) / INLINE_SIZE_SCALE;
 
@@ -2212,17 +2305,20 @@ estimate_node_size_and_time (struct cgraph_node *node,
 
 /* Estimate size and time needed to execute callee of EDGE assuming that
    parameters known to be constant at caller of EDGE are propagated.
-   KNOWN_VALs is a vector of assumed known constant values for parameters.  */
+   KNOWN_VALS and KNOWN_BINFOS are vectors of assumed known constant values
+   and types for parameters.  */
 
 void
 estimate_ipcp_clone_size_and_time (struct cgraph_node *node,
 				   VEC (tree, heap) *known_vals,
+				   VEC (tree, heap) *known_binfos,
 		                   int *ret_size, int *ret_time)
 {
   clause_t clause;
 
   clause = evaluate_conditions_for_known_args (node, false, known_vals);
-  estimate_node_size_and_time (node, clause, ret_size, ret_time,
+  estimate_node_size_and_time (node, clause, known_vals, known_binfos,
+			       ret_size, ret_time,
 			       NULL);
 }
 
@@ -2478,9 +2574,10 @@ inline_merge_summary (struct cgraph_edge *edge)
       int count = ipa_get_cs_argument_count (args);
       int i;
 
-      clause = evaluate_conditions_for_edge (edge, true);
+      clause = evaluate_conditions_vals_binfos_for_edge (edge, true,
+							 NULL, NULL);
       if (count)
-        VEC_safe_grow_cleared (int, heap, operand_map, count);
+	VEC_safe_grow_cleared (int, heap, operand_map, count);
       for (i = 0; i < count; i++)
 	{
 	  struct ipa_jump_func *jfunc = ipa_get_ith_jump_func (args, i);
@@ -2524,7 +2621,8 @@ inline_merge_summary (struct cgraph_edge *edge)
   for (i = 0; VEC_iterate (size_time_entry, info->entry, i, e); i++)
     info->size += e->size, info->time += e->time;
   estimate_calls_size_and_time (to, &info->size, &info->time,
-				~(clause_t)(1 << predicate_false_condition));
+				~(clause_t)(1 << predicate_false_condition),
+				NULL, NULL);
 
   inline_update_callee_summaries (edge->callee,
 				  inline_edge_summary (edge)->loop_depth);
@@ -2552,13 +2650,21 @@ do_estimate_edge_time (struct cgraph_edge *edge)
   int time;
   int size;
   gcov_type ret;
+  struct cgraph_node *callee;
+  clause_t clause;
+  VEC (tree, heap) *known_vals;
+  VEC (tree, heap) *known_binfos;
   struct inline_edge_summary *es = inline_edge_summary (edge);
 
+  callee = cgraph_function_or_thunk_node (edge->callee, NULL);
+
   gcc_checking_assert (edge->inline_failed);
-  estimate_node_size_and_time (cgraph_function_or_thunk_node (edge->callee,
-							      NULL),
-			       evaluate_conditions_for_edge (edge, true),
+  clause = evaluate_conditions_vals_binfos_for_edge (edge, true, &known_vals,
+						     &known_binfos);
+  estimate_node_size_and_time (callee, clause, known_vals, known_binfos,
 			       &size, &time, es->param);
+  VEC_free (tree, heap, known_vals);
+  VEC_free (tree, heap, known_binfos);
 
   ret = (((gcov_type)time
 	   - es->call_stmt_time) * edge->frequency
@@ -2592,6 +2698,9 @@ do_estimate_edge_growth (struct cgraph_edge *edge)
 {
   int size;
   struct cgraph_node *callee;
+  clause_t clause;
+  VEC (tree, heap) *known_vals;
+  VEC (tree, heap) *known_binfos;
 
   /* When we do caching, use do_estimate_edge_time to populate the entry.  */
 
@@ -2604,13 +2713,17 @@ do_estimate_edge_growth (struct cgraph_edge *edge)
       gcc_checking_assert (size);
       return size - (size > 0);
     }
+
   callee = cgraph_function_or_thunk_node (edge->callee, NULL);
 
   /* Early inliner runs without caching, go ahead and do the dirty work.  */
   gcc_checking_assert (edge->inline_failed);
-  estimate_node_size_and_time (callee,
-			       evaluate_conditions_for_edge (edge, true),
+  clause = evaluate_conditions_vals_binfos_for_edge (edge, true, &known_vals,
+						     &known_binfos);
+  estimate_node_size_and_time (callee, clause, known_vals, known_binfos,
 			       &size, NULL, NULL);
+  VEC_free (tree, heap, known_vals);
+  VEC_free (tree, heap, known_binfos);
   gcc_checking_assert (inline_edge_summary (edge)->call_stmt_size);
   return size - inline_edge_summary (edge)->call_stmt_size;
 }
diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
index 6df7867..a2c6cac 100644
--- a/gcc/ipa-inline.h
+++ b/gcc/ipa-inline.h
@@ -169,6 +169,7 @@ int estimate_time_after_inlining (struct cgraph_node *, struct cgraph_edge *);
 int estimate_size_after_inlining (struct cgraph_node *, struct cgraph_edge *);
 void estimate_ipcp_clone_size_and_time (struct cgraph_node *,
 					VEC (tree, heap) *known_vals,
+					VEC (tree, heap) *known_binfos,
 					int *, int *);
 int do_estimate_growth (struct cgraph_node *);
 void inline_merge_summary (struct cgraph_edge *edge);
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 58caa92..1a609a5 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -346,6 +346,9 @@ bool ipa_propagate_indirect_call_infos (struct cgraph_edge *cs,
 					VEC (cgraph_edge_p, heap) **new_edges);
 
 /* Indirect edge and binfo processing.  */
+tree ipa_get_indirect_edge_target (struct cgraph_edge *ie,
+				   VEC (tree, heap) *known_csts,
+				   VEC (tree, heap) *known_binfs);
 struct cgraph_edge *ipa_make_edge_direct_to_target (struct cgraph_edge *, tree);
 
 /* Functions related to both.  */
@@ -437,8 +440,8 @@ void ipa_prop_write_jump_functions (cgraph_node_set set);
 void ipa_prop_read_jump_functions (void);
 void ipa_update_after_lto_read (void);
 int ipa_get_param_decl_index (struct ipa_node_params *, tree);
-tree ipa_cst_from_jfunc (struct ipa_node_params *info,
-			 struct ipa_jump_func *jfunc);
+tree ipa_value_from_jfunc (struct ipa_node_params *info,
+			   struct ipa_jump_func *jfunc);
 
 
 /* From tree-sra.c:  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Account for devirtualization opportunities in inliner
  2011-09-28  7:53 [PATCH] Account for devirtualization opportunities in inliner Maxim Kuvyrkov
@ 2011-10-19 22:21 ` Maxim Kuvyrkov
  2011-10-20  8:30   ` Richard Guenther
  2011-10-20  9:31   ` Jan Hubicka
  0 siblings, 2 replies; 8+ messages in thread
From: Maxim Kuvyrkov @ 2011-10-19 22:21 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches, Martin Jambor

[-- Attachment #1: Type: text/plain, Size: 1221 bytes --]

On 28/09/2011, at 4:56 PM, Maxim Kuvyrkov wrote:

> Jan,
> 
> The following patch starts a series of patches which improve devirtualization optimizations in GCC.
> 
> This patch builds on ipa-cp.c and ipa-prop.c infrastructure for analyzing parameters and jump functions and adds basic estimation of devirtualization benefit from inlining an edge.  E.g., if inlining A across edge E into B will allow some of the indirect edges of A to be resolved, then inlining cost of edge E is reduced.
> 
> The patch was bootstrapped and regtested on x86_64-linux-gnu on both -m32 and -m64 multilibs.
> 
> OK to commit?

Ping.

The primary change of this patch is to make evaluate_conditions_for_edge to output KNOWN_VALS and KNOWN_BINFOS arrays in addition to conditions for a callsite.  KNOWN_VALS and KNOWN_BINFOS are then passed on to a subroutine of estimate_calls_size_and_time, which uses ipa-prop.c infrastructure to check if it will be possible to devirtualize any of the indirect edged within callee.  If possible, then *size and *time returned by estimate_calls_size_and_time are reduced to account for the devirtualization benefits.

OK for trunk?

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics


[-- Attachment #2: fsf-gcc-devirt-account.ChangeLog --]
[-- Type: application/octet-stream, Size: 843 bytes --]

2011-09-28  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* ipa-cp.c (ipa_value_from_jfunc): Make global.
	(ipa_cst_from_jfunc): Remove, use ipa_value_from_jfunc instead.
	(get_indirect_edge_target): Rename, make global.
	(devirtualization_time_bonus, estimate_local_effects,)
	(ipcp_discover_new_direct_edges): Update.
	* ipa-inline-analysis.c (evaluate_conditions_for_edge):
	Generalize to also handle types.  Rename to ...
	(evaluate_conditions_vals_binfos_for_edge): Use instead of
	evaluate_conditions_for_edge.
	(estimate_edge_devirt_benefit): New function.
	(estimate_calls_size_and_time): Use it.
	(estimate_node_size_and_time, estimate_ipcp_clone_size_and_time,)
	(inline_merge_summary):	Update.
	(do_estimate_edge_time, do_estimate_edge_growth): Update.  Calculate
	parameter information at the call site and pass it on to subroutines.

[-- Attachment #3: fsf-gcc-devirt-account.patch --]
[-- Type: application/octet-stream, Size: 16185 bytes --]

commit d073388fe354eceb2e5b6f9ec2287b71f4ae6ccc
Author: Maxim Kuvyrkov <maxim@codesourcery.com>
Date:   Wed Sep 21 16:27:25 2011 -0700

    Account for devirtualization opportunities in inlining.

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 45cb00b..10e1834 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -693,7 +693,7 @@ ipa_value_from_known_type_jfunc (struct ipa_jump_func *jfunc)
    describes the caller node so that pass-through jump functions can be
    evaluated.  */
 
-static tree
+tree
 ipa_value_from_jfunc (struct ipa_node_params *info, struct ipa_jump_func *jfunc)
 {
   if (jfunc->type == IPA_JF_CONST)
@@ -753,21 +753,6 @@ ipa_value_from_jfunc (struct ipa_node_params *info, struct ipa_jump_func *jfunc)
     return NULL_TREE;
 }
 
-/* Determine whether JFUNC evaluates to a constant and if so, return it.
-   Otherwise return NULL. INFO describes the caller node so that pass-through
-   jump functions can be evaluated.  */
-
-tree
-ipa_cst_from_jfunc (struct ipa_node_params *info, struct ipa_jump_func *jfunc)
-{
-  tree res = ipa_value_from_jfunc (info, jfunc);
-
-  if (res && TREE_CODE (res) == TREE_BINFO)
-    return NULL_TREE;
-  else
-    return res;
-}
-
 
 /* If checking is enabled, verify that no lattice is in the TOP state, i.e. not
    bottom, not containing a variable component and without any known value at
@@ -1112,10 +1097,10 @@ propagate_constants_accross_call (struct cgraph_edge *cs)
    (which can contain both constants and binfos) or KNOWN_BINFOS (which can be
    NULL) return the destination.  */
 
-static tree
-get_indirect_edge_target (struct cgraph_edge *ie,
-			  VEC (tree, heap) *known_vals,
-			  VEC (tree, heap) *known_binfos)
+tree
+ipa_get_indirect_edge_target (struct cgraph_edge *ie,
+			      VEC (tree, heap) *known_vals,
+			      VEC (tree, heap) *known_binfos)
 {
   int param_index = ie->indirect_info->param_index;
   HOST_WIDE_INT token, anc_offset;
@@ -1185,7 +1170,7 @@ devirtualization_time_bonus (struct cgraph_node *node,
       struct inline_summary *isummary;
       tree target;
 
-      target = get_indirect_edge_target (ie, known_csts, known_binfos);
+      target = ipa_get_indirect_edge_target (ie, known_csts, known_binfos);
       if (!target)
 	continue;
 
@@ -1344,7 +1329,8 @@ estimate_local_effects (struct cgraph_node *node)
 
       init_caller_stats (&stats);
       cgraph_for_node_and_aliases (node, gather_caller_stats, &stats, false);
-      estimate_ipcp_clone_size_and_time (node, known_csts, &size, &time);
+      estimate_ipcp_clone_size_and_time (node, known_csts, known_binfos,
+					 &size, &time);
       time -= devirtualization_time_bonus (node, known_csts, known_binfos);
       time -= removable_params_cost;
       size -= stats.n_calls * removable_params_cost;
@@ -1415,7 +1401,8 @@ estimate_local_effects (struct cgraph_node *node)
 	  else
 	    continue;
 
-	  estimate_ipcp_clone_size_and_time (node, known_csts, &size, &time);
+	  estimate_ipcp_clone_size_and_time (node, known_csts, known_binfos,
+					     &size, &time);
 	  time_benefit = base_time - time
 	    + devirtualization_time_bonus (node, known_csts, known_binfos)
 	    + removable_params_cost + emc;
@@ -1673,7 +1660,7 @@ ipcp_discover_new_direct_edges (struct cgraph_node *node,
       tree target;
 
       next_ie = ie->next_callee;
-      target = get_indirect_edge_target (ie, known_vals, NULL);
+      target = ipa_get_indirect_edge_target (ie, known_vals, NULL);
       if (target)
 	ipa_make_edge_direct_to_target (ie, target);
     }
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index bd4d2ea..5e88c2d 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -711,14 +711,23 @@ evaluate_conditions_for_known_args (struct cgraph_node *node,
 /* Work out what conditions might be true at invocation of E.  */
 
 static clause_t
-evaluate_conditions_for_edge (struct cgraph_edge *e, bool inline_p)
+evaluate_conditions_vals_binfos_for_edge (struct cgraph_edge *e,
+					  bool inline_p,
+					  VEC (tree, heap) **known_vals_ptr,
+					  VEC (tree, heap) **known_binfos_ptr)
 {
   clause_t clause = inline_p ? 0 : 1 << predicate_not_inlined_condition;
   struct cgraph_node *callee = cgraph_function_or_thunk_node (e->callee, NULL);
   struct inline_summary *info = inline_summary (callee);
   int i;
 
-  if (ipa_node_params_vector && info->conds)
+  if (known_vals_ptr)
+    *known_vals_ptr = NULL;
+  if (known_binfos_ptr)
+    *known_binfos_ptr = NULL;
+
+  if (ipa_node_params_vector
+      && (info->conds || known_vals_ptr || known_binfos_ptr))
     {
       struct ipa_node_params *parms_info;
       struct ipa_edge_args *args = IPA_EDGE_REF (e);
@@ -731,25 +740,40 @@ evaluate_conditions_for_edge (struct cgraph_edge *e, bool inline_p)
       else
         parms_info = IPA_NODE_REF (e->caller);
 
-      if (count)
-        VEC_safe_grow_cleared (tree, heap, known_vals, count);
+      if (count && (info->conds || known_vals_ptr))
+	VEC_safe_grow_cleared (tree, heap, known_vals, count);
+      if (count && known_binfos_ptr)
+	VEC_safe_grow_cleared (tree, heap, *known_binfos_ptr, count);
+
       for (i = 0; i < count; i++)
 	{
-	  tree cst = ipa_cst_from_jfunc (parms_info,
-					 ipa_get_ith_jump_func (args, i));
+	  tree cst = ipa_value_from_jfunc (parms_info,
+					   ipa_get_ith_jump_func (args, i));
 	  if (cst)
-	    VEC_replace (tree, known_vals, i, cst);
+	    {
+	      if (info->conds && TREE_CODE (cst) != TREE_BINFO)
+		VEC_replace (tree, known_vals, i, cst);
+	      else if (known_binfos_ptr != NULL)
+		VEC_replace (tree, *known_binfos_ptr, i, cst);
+	    }
 	  else if (inline_p
 		   && !VEC_index (inline_param_summary_t,
 				  es->param,
 				  i)->change_prob)
 	    VEC_replace (tree, known_vals, i, error_mark_node);
 	}
-      clause = evaluate_conditions_for_known_args (callee,
-						   inline_p, known_vals);
-      VEC_free (tree, heap, known_vals);
+
+      if (info->conds)
+	clause = evaluate_conditions_for_known_args (callee, inline_p,
+						     known_vals);
+
+      if (known_vals_ptr)
+	*known_vals_ptr = known_vals;
+      else
+	VEC_free (tree, heap, known_vals);
     }
-  else
+
+  if (!info->conds)
     for (i = 0; i < (int)VEC_length (condition, info->conds); i++)
       clause |= 1 << (i + predicate_first_dynamic_condition);
 
@@ -2105,11 +2129,69 @@ estimate_edge_size_and_time (struct cgraph_edge *e, int *size, int *time,
 }
 
 
-/* Increase SIZE and TIME for size and time needed to handle all calls in NODE.  */
+/* Estimate benefit devirtualizing indirect edge IE, provided KNOWN_VALS and
+   KNOWN_BINFOS.  */
+
+static void
+estimate_edge_devirt_benefit (struct cgraph_edge *ie,
+			      int *size, int *time, int prob,
+			      VEC (tree, heap) *known_vals,
+			      VEC (tree, heap) *known_binfos)
+{
+  tree target;
+  struct cgraph_node *callee;
+  struct inline_summary *isummary;
+  int edge_size = 0, edge_time = 0;
+
+  if (!known_vals || !known_binfos)
+    return;
+
+  target = ipa_get_indirect_edge_target (ie, known_vals, known_binfos);
+  if (!target)
+    return;
+
+  /* Account for difference in cost between indirect and direct calls.  */
+  *size -= 2 * INLINE_SIZE_SCALE;
+  *time -= 2 * INLINE_TIME_SCALE * prob / REG_BR_PROB_BASE;
+
+  callee = cgraph_get_node (target);
+  if (!callee || !callee->analyzed)
+    return;
+  isummary = inline_summary (callee);
+  if (!isummary->inlinable)
+    return;
+
+  estimate_edge_size_and_time (ie, &edge_size, &edge_time, prob);
+
+  /* Count benefit only from functions that definitely will be inlined
+     if additional context from NODE's caller were available.  */
+  if (edge_size >= isummary->size * INLINE_SIZE_SCALE)
+    {
+      /* Subtract size and time that we just added for edge IE.  */
+      *size -= edge_size;
+      *time -= edge_time;
+
+      /* Subtract benefit from inlining devirtualized call.  */
+      *size -= edge_size - isummary->size * INLINE_SIZE_SCALE;
+      *time -= edge_time - (isummary->time * INLINE_TIME_SCALE * prob
+			    / REG_BR_PROB_BASE);
+
+      /* TODO: estimate benefit from optimizing CALLEE's body provided
+	 additional context from IE call site.
+	 For insipiration see ipa-cp.c: devirtualization_time_bonus().  */
+    }
+}
+
+
+/* Increase SIZE and TIME for size and time needed to handle all calls in NODE.
+   POSSIBLE_TRUTHS, KNOWN_VALS and KNOWN_BINFOS describe context of the call
+   site.  */
 
 static void
 estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time,
-			      clause_t possible_truths)
+			      clause_t possible_truths,
+			      VEC (tree, heap) *known_vals,
+			      VEC (tree, heap) *known_binfos)
 {
   struct cgraph_edge *e;
   for (e = node->callees; e; e = e->next_callee)
@@ -2125,25 +2207,35 @@ estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time,
 	    }
 	  else
 	    estimate_calls_size_and_time (e->callee, size, time,
-					  possible_truths);
+					  possible_truths,
+					  /* TODO: remap KNOWN_VALS and
+					     KNOWN_BINFOS to E->CALLEE
+					     parameters, and use them.  */
+					  NULL, NULL);
 	}
     }
-  /* TODO: look for devirtualizing oppurtunities.  */
   for (e = node->indirect_calls; e; e = e->next_callee)
     {
       struct inline_edge_summary *es = inline_edge_summary (e);
       if (!es->predicate || evaluate_predicate (es->predicate, possible_truths))
-        estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE);
+	{
+	  estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE);
+	  estimate_edge_devirt_benefit (e, size, time, REG_BR_PROB_BASE,
+					known_vals, known_binfos);
+	}
     }
 }
 
 
 /* Estimate size and time needed to execute NODE assuming
-   POSSIBLE_TRUTHS clause. */
+   POSSIBLE_TRUTHS clause, and KNOWN_VALS and KNOWN_BINFOS information
+   about NODE's arguments. */
 
 static void
 estimate_node_size_and_time (struct cgraph_node *node,
 			     clause_t possible_truths,
+			     VEC (tree, heap) *known_vals,
+			     VEC (tree, heap) *known_binfos,
 		       	     int *ret_size, int *ret_time,
 			     VEC (inline_param_summary_t, heap)
 			       *inline_param_summary)
@@ -2194,7 +2286,8 @@ estimate_node_size_and_time (struct cgraph_node *node,
   if (time > MAX_TIME * INLINE_TIME_SCALE)
     time = MAX_TIME * INLINE_TIME_SCALE;
 
-  estimate_calls_size_and_time (node, &size, &time, possible_truths);
+  estimate_calls_size_and_time (node, &size, &time, possible_truths,
+				known_vals, known_binfos);
   time = (time + INLINE_TIME_SCALE / 2) / INLINE_TIME_SCALE;
   size = (size + INLINE_SIZE_SCALE / 2) / INLINE_SIZE_SCALE;
 
@@ -2212,17 +2305,20 @@ estimate_node_size_and_time (struct cgraph_node *node,
 
 /* Estimate size and time needed to execute callee of EDGE assuming that
    parameters known to be constant at caller of EDGE are propagated.
-   KNOWN_VALs is a vector of assumed known constant values for parameters.  */
+   KNOWN_VALS and KNOWN_BINFOS are vectors of assumed known constant values
+   and types for parameters.  */
 
 void
 estimate_ipcp_clone_size_and_time (struct cgraph_node *node,
 				   VEC (tree, heap) *known_vals,
+				   VEC (tree, heap) *known_binfos,
 		                   int *ret_size, int *ret_time)
 {
   clause_t clause;
 
   clause = evaluate_conditions_for_known_args (node, false, known_vals);
-  estimate_node_size_and_time (node, clause, ret_size, ret_time,
+  estimate_node_size_and_time (node, clause, known_vals, known_binfos,
+			       ret_size, ret_time,
 			       NULL);
 }
 
@@ -2478,9 +2574,10 @@ inline_merge_summary (struct cgraph_edge *edge)
       int count = ipa_get_cs_argument_count (args);
       int i;
 
-      clause = evaluate_conditions_for_edge (edge, true);
+      clause = evaluate_conditions_vals_binfos_for_edge (edge, true,
+							 NULL, NULL);
       if (count)
-        VEC_safe_grow_cleared (int, heap, operand_map, count);
+	VEC_safe_grow_cleared (int, heap, operand_map, count);
       for (i = 0; i < count; i++)
 	{
 	  struct ipa_jump_func *jfunc = ipa_get_ith_jump_func (args, i);
@@ -2524,7 +2621,8 @@ inline_merge_summary (struct cgraph_edge *edge)
   for (i = 0; VEC_iterate (size_time_entry, info->entry, i, e); i++)
     info->size += e->size, info->time += e->time;
   estimate_calls_size_and_time (to, &info->size, &info->time,
-				~(clause_t)(1 << predicate_false_condition));
+				~(clause_t)(1 << predicate_false_condition),
+				NULL, NULL);
 
   inline_update_callee_summaries (edge->callee,
 				  inline_edge_summary (edge)->loop_depth);
@@ -2552,13 +2650,21 @@ do_estimate_edge_time (struct cgraph_edge *edge)
   int time;
   int size;
   gcov_type ret;
+  struct cgraph_node *callee;
+  clause_t clause;
+  VEC (tree, heap) *known_vals;
+  VEC (tree, heap) *known_binfos;
   struct inline_edge_summary *es = inline_edge_summary (edge);
 
+  callee = cgraph_function_or_thunk_node (edge->callee, NULL);
+
   gcc_checking_assert (edge->inline_failed);
-  estimate_node_size_and_time (cgraph_function_or_thunk_node (edge->callee,
-							      NULL),
-			       evaluate_conditions_for_edge (edge, true),
+  clause = evaluate_conditions_vals_binfos_for_edge (edge, true, &known_vals,
+						     &known_binfos);
+  estimate_node_size_and_time (callee, clause, known_vals, known_binfos,
 			       &size, &time, es->param);
+  VEC_free (tree, heap, known_vals);
+  VEC_free (tree, heap, known_binfos);
 
   ret = (((gcov_type)time
 	   - es->call_stmt_time) * edge->frequency
@@ -2592,6 +2698,9 @@ do_estimate_edge_growth (struct cgraph_edge *edge)
 {
   int size;
   struct cgraph_node *callee;
+  clause_t clause;
+  VEC (tree, heap) *known_vals;
+  VEC (tree, heap) *known_binfos;
 
   /* When we do caching, use do_estimate_edge_time to populate the entry.  */
 
@@ -2604,13 +2713,17 @@ do_estimate_edge_growth (struct cgraph_edge *edge)
       gcc_checking_assert (size);
       return size - (size > 0);
     }
+
   callee = cgraph_function_or_thunk_node (edge->callee, NULL);
 
   /* Early inliner runs without caching, go ahead and do the dirty work.  */
   gcc_checking_assert (edge->inline_failed);
-  estimate_node_size_and_time (callee,
-			       evaluate_conditions_for_edge (edge, true),
+  clause = evaluate_conditions_vals_binfos_for_edge (edge, true, &known_vals,
+						     &known_binfos);
+  estimate_node_size_and_time (callee, clause, known_vals, known_binfos,
 			       &size, NULL, NULL);
+  VEC_free (tree, heap, known_vals);
+  VEC_free (tree, heap, known_binfos);
   gcc_checking_assert (inline_edge_summary (edge)->call_stmt_size);
   return size - inline_edge_summary (edge)->call_stmt_size;
 }
diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
index 6df7867..a2c6cac 100644
--- a/gcc/ipa-inline.h
+++ b/gcc/ipa-inline.h
@@ -169,6 +169,7 @@ int estimate_time_after_inlining (struct cgraph_node *, struct cgraph_edge *);
 int estimate_size_after_inlining (struct cgraph_node *, struct cgraph_edge *);
 void estimate_ipcp_clone_size_and_time (struct cgraph_node *,
 					VEC (tree, heap) *known_vals,
+					VEC (tree, heap) *known_binfos,
 					int *, int *);
 int do_estimate_growth (struct cgraph_node *);
 void inline_merge_summary (struct cgraph_edge *edge);
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 58caa92..1a609a5 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -346,6 +346,9 @@ bool ipa_propagate_indirect_call_infos (struct cgraph_edge *cs,
 					VEC (cgraph_edge_p, heap) **new_edges);
 
 /* Indirect edge and binfo processing.  */
+tree ipa_get_indirect_edge_target (struct cgraph_edge *ie,
+				   VEC (tree, heap) *known_csts,
+				   VEC (tree, heap) *known_binfs);
 struct cgraph_edge *ipa_make_edge_direct_to_target (struct cgraph_edge *, tree);
 
 /* Functions related to both.  */
@@ -437,8 +440,8 @@ void ipa_prop_write_jump_functions (cgraph_node_set set);
 void ipa_prop_read_jump_functions (void);
 void ipa_update_after_lto_read (void);
 int ipa_get_param_decl_index (struct ipa_node_params *, tree);
-tree ipa_cst_from_jfunc (struct ipa_node_params *info,
-			 struct ipa_jump_func *jfunc);
+tree ipa_value_from_jfunc (struct ipa_node_params *info,
+			   struct ipa_jump_func *jfunc);
 
 
 /* From tree-sra.c:  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Account for devirtualization opportunities in inliner
  2011-10-19 22:21 ` Maxim Kuvyrkov
@ 2011-10-20  8:30   ` Richard Guenther
  2011-10-20  9:31   ` Jan Hubicka
  1 sibling, 0 replies; 8+ messages in thread
From: Richard Guenther @ 2011-10-20  8:30 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Jan Hubicka, GCC Patches, Martin Jambor

On Wed, Oct 19, 2011 at 11:59 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote:
> On 28/09/2011, at 4:56 PM, Maxim Kuvyrkov wrote:
>
>> Jan,
>>
>> The following patch starts a series of patches which improve devirtualization optimizations in GCC.
>>
>> This patch builds on ipa-cp.c and ipa-prop.c infrastructure for analyzing parameters and jump functions and adds basic estimation of devirtualization benefit from inlining an edge.  E.g., if inlining A across edge E into B will allow some of the indirect edges of A to be resolved, then inlining cost of edge E is reduced.
>>
>> The patch was bootstrapped and regtested on x86_64-linux-gnu on both -m32 and -m64 multilibs.
>>
>> OK to commit?
>
> Ping.
>
> The primary change of this patch is to make evaluate_conditions_for_edge to output KNOWN_VALS and KNOWN_BINFOS arrays in addition to conditions for a callsite.  KNOWN_VALS and KNOWN_BINFOS are then passed on to a subroutine of estimate_calls_size_and_time, which uses ipa-prop.c infrastructure to check if it will be possible to devirtualize any of the indirect edged within callee.  If possible, then *size and *time returned by estimate_calls_size_and_time are reduced to account for the devirtualization benefits.
>
> OK for trunk?

I miss testcase(s).  Any assesment on how this improves devirtualization
in practice (for example for Mozilla)?

Thanks,
Richard.

> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Account for devirtualization opportunities in inliner
  2011-10-19 22:21 ` Maxim Kuvyrkov
  2011-10-20  8:30   ` Richard Guenther
@ 2011-10-20  9:31   ` Jan Hubicka
  2011-10-28  4:08     ` Maxim Kuvyrkov
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Hubicka @ 2011-10-20  9:31 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Jan Hubicka, GCC Patches, Martin Jambor

Hi,
sorry for delayed review.  I am still trying to get ipa-inline-analysis to behave well on real codebases
and make my mind around how to get more advanced hints, like this one, into it.

diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index bd4d2ea..5e88c2d 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -711,14 +711,23 @@ evaluate_conditions_for_known_args (struct cgraph_node *node,
 /* Work out what conditions might be true at invocation of E.  */
 
 static clause_t
-evaluate_conditions_for_edge (struct cgraph_edge *e, bool inline_p)
+evaluate_conditions_vals_binfos_for_edge (struct cgraph_edge *e,
+					  bool inline_p,
+					  VEC (tree, heap) **known_vals_ptr,
+					  VEC (tree, heap) **known_binfos_ptr)

Hmm, I would make clause also returned by reference to be sonsistent and perhaps
call it something like edge_properties
since it is not really only about evaulating the clause anymore.

-/* Increase SIZE and TIME for size and time needed to handle all calls in NODE.  */
+/* Estimate benefit devirtualizing indirect edge IE, provided KNOWN_VALS and
+   KNOWN_BINFOS.  */
+
+static void
+estimate_edge_devirt_benefit (struct cgraph_edge *ie,
+			      int *size, int *time, int prob,
+			      VEC (tree, heap) *known_vals,
+			      VEC (tree, heap) *known_binfos)

I think this whole logic should go into estimate_edge_time_and_size.  This way
we will save all the duplication of scaling logic
Just add the known_vals/binfos arguments.

I am not quite sure how to estimate the actual benefits.  estimate_num_insns
doesn't really make a difference in between direct and indirect calls.

I see it is good idea to inline more then the destination is known & inlinable.
This is an example when we have additional knowledge that we want to mix into
badness metric that does not directly translate to time/size.  There are multiple
cases like this.  I was thinking of adding kind of bonus metric for this purpose,
but I would suggest doing this incrementally.

What about
 1) extending estimate_num_insns wieghts to account direct calls differently
    from indirect calls (i.e. adding indirect_call cost value into eni wights)
    I would set it 2 for size metrics and 15 for time metrics for start
 2) make estimate_edge_time_and_size to subtract difference of those two metrics
    from edge costs when destination is direct.

Incrementally we can think of how to extra prioritize direct calls with inlinable
targets.
+/* Increase SIZE and TIME for size and time needed to handle all calls in NODE.
+   POSSIBLE_TRUTHS, KNOWN_VALS and KNOWN_BINFOS describe context of the call
+   site.  */
 
 static void
 estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time,
-			      clause_t possible_truths)
+			      clause_t possible_truths,
+			      VEC (tree, heap) *known_vals,
+			      VEC (tree, heap) *known_binfos)
 {
   struct cgraph_edge *e;
   for (e = node->callees; e; e = e->next_callee)
@@ -2125,25 +2207,35 @@ estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time,
 	    }
 	  else
 	    estimate_calls_size_and_time (e->callee, size, time,
-					  possible_truths);
+					  possible_truths,
+					  /* TODO: remap KNOWN_VALS and
+					     KNOWN_BINFOS to E->CALLEE
+					     parameters, and use them.  */
+					  NULL, NULL);

Remapping should not be needed here - the jump functions are merged after marking edge inline, so jump
functions in inlined functions actually reffer to the parameters of the function they are inlined to.

Honza

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Account for devirtualization opportunities in inliner
  2011-10-20  9:31   ` Jan Hubicka
@ 2011-10-28  4:08     ` Maxim Kuvyrkov
  2011-10-28  8:47       ` Maxim Kuvyrkov
  0 siblings, 1 reply; 8+ messages in thread
From: Maxim Kuvyrkov @ 2011-10-28  4:08 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches, Martin Jambor

On 20/10/2011, at 10:11 PM, Jan Hubicka wrote:
> static clause_t
> -evaluate_conditions_for_edge (struct cgraph_edge *e, bool inline_p)
> +evaluate_conditions_vals_binfos_for_edge (struct cgraph_edge *e,
> +					  bool inline_p,
> +					  VEC (tree, heap) **known_vals_ptr,
> +					  VEC (tree, heap) **known_binfos_ptr)
> 
> Hmm, I would make clause also returned by reference to be sonsistent and perhaps
> call it something like edge_properties
> since it is not really only about evaulating the clause anymore.

Agree.

> 
> -/* Increase SIZE and TIME for size and time needed to handle all calls in NODE.  */
> +/* Estimate benefit devirtualizing indirect edge IE, provided KNOWN_VALS and
> +   KNOWN_BINFOS.  */
> +
> +static void
> +estimate_edge_devirt_benefit (struct cgraph_edge *ie,
> +			      int *size, int *time, int prob,
> +			      VEC (tree, heap) *known_vals,
> +			      VEC (tree, heap) *known_binfos)
> 
> I think this whole logic should go into estimate_edge_time_and_size.  This way
> we will save all the duplication of scaling logic
> Just add the known_vals/binfos arguments.

Then devirtualization benefit will not be available through estimate_node_size_and_time, which is the primary interface for users of ipa-inline-analysis other than the inliner itself.  I.e., estimate_ipcp_clone_size_and_time, which is the only other user of the analysis at the moment, will not see devirtualization benefit.

> 
> I am not quite sure how to estimate the actual benefits.  estimate_num_insns
> doesn't really make a difference in between direct and indirect calls.
> 
> I see it is good idea to inline more then the destination is known & inlinable.
> This is an example when we have additional knowledge that we want to mix into
> badness metric that does not directly translate to time/size.  There are multiple
> cases like this.  I was thinking of adding kind of bonus metric for this purpose,
> but I would suggest doing this incrementally.

I too thought about this, and decided to keep the bonus metric part to bare minimum in this patch.

> 
> What about
> 1) extending estimate_num_insns wieghts to account direct calls differently
>    from indirect calls (i.e. adding indirect_call cost value into eni wights)
>    I would set it 2 for size metrics and 15 for time metrics for start
> 2) make estimate_edge_time_and_size to subtract difference of those two metrics
>    from edge costs when destination is direct.

OK, I'll try this.

>  @@ -2125,25 +2207,35 @@ estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time,
> 	    }
> 	  else
> 	    estimate_calls_size_and_time (e->callee, size, time,
> -					  possible_truths);
> +					  possible_truths,
> +					  /* TODO: remap KNOWN_VALS and
> +					     KNOWN_BINFOS to E->CALLEE
> +					     parameters, and use them.  */
> +					  NULL, NULL);
> 
> Remapping should not be needed here - the jump functions are merged after marking edge inline, so jump
> functions in inlined functions actually reffer to the parameters of the function they are inlined to.

I remember it crashing on some testcase and thought the lack of remapping was the cause.  I'll look into this.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Account for devirtualization opportunities in inliner
  2011-10-28  4:08     ` Maxim Kuvyrkov
@ 2011-10-28  8:47       ` Maxim Kuvyrkov
  2011-11-09  4:34         ` Maxim Kuvyrkov
  0 siblings, 1 reply; 8+ messages in thread
From: Maxim Kuvyrkov @ 2011-10-28  8:47 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches, Martin Jambor

[-- Attachment #1: Type: text/plain, Size: 4251 bytes --]

Jan,

Attached is the updated patch.  The only major change is the addition of indirect_call_cost to size and time weights.  I've set the size cost of indirect call to 3, which is what I remember calculating when I looked into costs couple of months ago: one call instruction for the call itself, one memory instruction to pull the call address out of vtable, and one ALU instruction to calculate the address inside vtable.  On architectures with base+offset addressing the above can be shrunk into 2 instructions.

The remapping of the known_vals and known_binfos did indeed turned out to work just fine.  Probably, that was a bug that was fixed since.

The patch bootstraps and passes regtest.

Comments?  OK for trunk?

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



On 28/10/2011, at 3:43 PM, Maxim Kuvyrkov wrote:

> On 20/10/2011, at 10:11 PM, Jan Hubicka wrote:
>> static clause_t
>> -evaluate_conditions_for_edge (struct cgraph_edge *e, bool inline_p)
>> +evaluate_conditions_vals_binfos_for_edge (struct cgraph_edge *e,
>> +					  bool inline_p,
>> +					  VEC (tree, heap) **known_vals_ptr,
>> +					  VEC (tree, heap) **known_binfos_ptr)
>> 
>> Hmm, I would make clause also returned by reference to be sonsistent and perhaps
>> call it something like edge_properties
>> since it is not really only about evaulating the clause anymore.
> 
> Agree.
> 
>> 
>> -/* Increase SIZE and TIME for size and time needed to handle all calls in NODE.  */
>> +/* Estimate benefit devirtualizing indirect edge IE, provided KNOWN_VALS and
>> +   KNOWN_BINFOS.  */
>> +
>> +static void
>> +estimate_edge_devirt_benefit (struct cgraph_edge *ie,
>> +			      int *size, int *time, int prob,
>> +			      VEC (tree, heap) *known_vals,
>> +			      VEC (tree, heap) *known_binfos)
>> 
>> I think this whole logic should go into estimate_edge_time_and_size.  This way
>> we will save all the duplication of scaling logic
>> Just add the known_vals/binfos arguments.
> 
> Then devirtualization benefit will not be available through estimate_node_size_and_time, which is the primary interface for users of ipa-inline-analysis other than the inliner itself.  I.e., estimate_ipcp_clone_size_and_time, which is the only other user of the analysis at the moment, will not see devirtualization benefit.
> 
>> 
>> I am not quite sure how to estimate the actual benefits.  estimate_num_insns
>> doesn't really make a difference in between direct and indirect calls.
>> 
>> I see it is good idea to inline more then the destination is known & inlinable.
>> This is an example when we have additional knowledge that we want to mix into
>> badness metric that does not directly translate to time/size.  There are multiple
>> cases like this.  I was thinking of adding kind of bonus metric for this purpose,
>> but I would suggest doing this incrementally.
> 
> I too thought about this, and decided to keep the bonus metric part to bare minimum in this patch.
> 
>> 
>> What about
>> 1) extending estimate_num_insns wieghts to account direct calls differently
>>   from indirect calls (i.e. adding indirect_call cost value into eni wights)
>>   I would set it 2 for size metrics and 15 for time metrics for start
>> 2) make estimate_edge_time_and_size to subtract difference of those two metrics
>>   from edge costs when destination is direct.
> 
> OK, I'll try this.
> 
>> @@ -2125,25 +2207,35 @@ estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time,
>> 	    }
>> 	  else
>> 	    estimate_calls_size_and_time (e->callee, size, time,
>> -					  possible_truths);
>> +					  possible_truths,
>> +					  /* TODO: remap KNOWN_VALS and
>> +					     KNOWN_BINFOS to E->CALLEE
>> +					     parameters, and use them.  */
>> +					  NULL, NULL);
>> 
>> Remapping should not be needed here - the jump functions are merged after marking edge inline, so jump
>> functions in inlined functions actually reffer to the parameters of the function they are inlined to.
> 
> I remember it crashing on some testcase and thought the lack of remapping was the cause.  I'll look into this.
> 
> Thank you,
> 
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
> 


[-- Attachment #2: fsf-gcc-devirt-account-2.ChangeLog --]
[-- Type: application/octet-stream, Size: 1038 bytes --]

2011-10-28  Maxim Kuvyrkov  <maxim@codesourcery.com>

	* ipa-cp.c (ipa_value_from_jfunc): Make global.
	(ipa_cst_from_jfunc): Remove, use ipa_value_from_jfunc instead.
	(get_indirect_edge_target): Rename, make global.
	(devirtualization_time_bonus, estimate_local_effects,)
	(ipcp_discover_new_direct_edges): Update.
	* ipa-inline-analysis.c (evaluate_conditions_for_edge):
	Generalize to also handle types.  Rename to ...
	(evaluate_properties_for_edge): Use instead of
	evaluate_conditions_for_edge.
	(estimate_edge_devirt_benefit): New function.
	(estimate_calls_size_and_time): Use it.
	(estimate_node_size_and_time, estimate_ipcp_clone_size_and_time,)
	(inline_merge_summary):	Update.
	(do_estimate_edge_time, do_estimate_edge_growth): Update.  Calculate
	parameter information at the call site and pass it on to subroutines.
	* tree-inline.c (estimate_num_insns): Distinguish between direct and
	indirect calls.
	(init_inline_once): Set size and time costs or indirect calls.
	* tree-inline.h (eni_weights): Add indirect_call_cost.

[-- Attachment #3: fsf-gcc-devirt-account-2.patch --]
[-- Type: application/octet-stream, Size: 18286 bytes --]

commit 5d3ec86fd62ff7cee19da8f2fa99fdf7611eb4e1
Author: Maxim Kuvyrkov <maxim@codesourcery.com>
Date:   Wed Sep 21 16:27:25 2011 -0700

    Account for devirtualization opportunities in inlining.

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 45cb00b..10e1834 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -693,7 +693,7 @@ ipa_value_from_known_type_jfunc (struct ipa_jump_func *jfunc)
    describes the caller node so that pass-through jump functions can be
    evaluated.  */
 
-static tree
+tree
 ipa_value_from_jfunc (struct ipa_node_params *info, struct ipa_jump_func *jfunc)
 {
   if (jfunc->type == IPA_JF_CONST)
@@ -753,21 +753,6 @@ ipa_value_from_jfunc (struct ipa_node_params *info, struct ipa_jump_func *jfunc)
     return NULL_TREE;
 }
 
-/* Determine whether JFUNC evaluates to a constant and if so, return it.
-   Otherwise return NULL. INFO describes the caller node so that pass-through
-   jump functions can be evaluated.  */
-
-tree
-ipa_cst_from_jfunc (struct ipa_node_params *info, struct ipa_jump_func *jfunc)
-{
-  tree res = ipa_value_from_jfunc (info, jfunc);
-
-  if (res && TREE_CODE (res) == TREE_BINFO)
-    return NULL_TREE;
-  else
-    return res;
-}
-
 
 /* If checking is enabled, verify that no lattice is in the TOP state, i.e. not
    bottom, not containing a variable component and without any known value at
@@ -1112,10 +1097,10 @@ propagate_constants_accross_call (struct cgraph_edge *cs)
    (which can contain both constants and binfos) or KNOWN_BINFOS (which can be
    NULL) return the destination.  */
 
-static tree
-get_indirect_edge_target (struct cgraph_edge *ie,
-			  VEC (tree, heap) *known_vals,
-			  VEC (tree, heap) *known_binfos)
+tree
+ipa_get_indirect_edge_target (struct cgraph_edge *ie,
+			      VEC (tree, heap) *known_vals,
+			      VEC (tree, heap) *known_binfos)
 {
   int param_index = ie->indirect_info->param_index;
   HOST_WIDE_INT token, anc_offset;
@@ -1185,7 +1170,7 @@ devirtualization_time_bonus (struct cgraph_node *node,
       struct inline_summary *isummary;
       tree target;
 
-      target = get_indirect_edge_target (ie, known_csts, known_binfos);
+      target = ipa_get_indirect_edge_target (ie, known_csts, known_binfos);
       if (!target)
 	continue;
 
@@ -1344,7 +1329,8 @@ estimate_local_effects (struct cgraph_node *node)
 
       init_caller_stats (&stats);
       cgraph_for_node_and_aliases (node, gather_caller_stats, &stats, false);
-      estimate_ipcp_clone_size_and_time (node, known_csts, &size, &time);
+      estimate_ipcp_clone_size_and_time (node, known_csts, known_binfos,
+					 &size, &time);
       time -= devirtualization_time_bonus (node, known_csts, known_binfos);
       time -= removable_params_cost;
       size -= stats.n_calls * removable_params_cost;
@@ -1415,7 +1401,8 @@ estimate_local_effects (struct cgraph_node *node)
 	  else
 	    continue;
 
-	  estimate_ipcp_clone_size_and_time (node, known_csts, &size, &time);
+	  estimate_ipcp_clone_size_and_time (node, known_csts, known_binfos,
+					     &size, &time);
 	  time_benefit = base_time - time
 	    + devirtualization_time_bonus (node, known_csts, known_binfos)
 	    + removable_params_cost + emc;
@@ -1673,7 +1660,7 @@ ipcp_discover_new_direct_edges (struct cgraph_node *node,
       tree target;
 
       next_ie = ie->next_callee;
-      target = get_indirect_edge_target (ie, known_vals, NULL);
+      target = ipa_get_indirect_edge_target (ie, known_vals, NULL);
       if (target)
 	ipa_make_edge_direct_to_target (ie, target);
     }
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 1820b0c..85f5f6b 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -710,15 +710,25 @@ evaluate_conditions_for_known_args (struct cgraph_node *node,
 
 /* Work out what conditions might be true at invocation of E.  */
 
-static clause_t
-evaluate_conditions_for_edge (struct cgraph_edge *e, bool inline_p)
+static void
+evaluate_properties_for_edge (struct cgraph_edge *e, bool inline_p,
+			      clause_t *clause_ptr,
+			      VEC (tree, heap) **known_vals_ptr,
+			      VEC (tree, heap) **known_binfos_ptr)
 {
-  clause_t clause = inline_p ? 0 : 1 << predicate_not_inlined_condition;
   struct cgraph_node *callee = cgraph_function_or_thunk_node (e->callee, NULL);
   struct inline_summary *info = inline_summary (callee);
   int i;
 
-  if (ipa_node_params_vector && info->conds)
+  if (clause_ptr)
+    *clause_ptr = inline_p ? 0 : 1 << predicate_not_inlined_condition;
+  if (known_vals_ptr)
+    *known_vals_ptr = NULL;
+  if (known_binfos_ptr)
+    *known_binfos_ptr = NULL;
+
+  if (ipa_node_params_vector
+      && ((clause_ptr && info->conds) || known_vals_ptr || known_binfos_ptr))
     {
       struct ipa_node_params *parms_info;
       struct ipa_edge_args *args = IPA_EDGE_REF (e);
@@ -731,29 +741,42 @@ evaluate_conditions_for_edge (struct cgraph_edge *e, bool inline_p)
       else
         parms_info = IPA_NODE_REF (e->caller);
 
-      if (count)
-        VEC_safe_grow_cleared (tree, heap, known_vals, count);
+      if (count && (info->conds || known_vals_ptr))
+	VEC_safe_grow_cleared (tree, heap, known_vals, count);
+      if (count && known_binfos_ptr)
+	VEC_safe_grow_cleared (tree, heap, *known_binfos_ptr, count);
+
       for (i = 0; i < count; i++)
 	{
-	  tree cst = ipa_cst_from_jfunc (parms_info,
-					 ipa_get_ith_jump_func (args, i));
+	  tree cst = ipa_value_from_jfunc (parms_info,
+					   ipa_get_ith_jump_func (args, i));
 	  if (cst)
-	    VEC_replace (tree, known_vals, i, cst);
+	    {
+	      if (info->conds && TREE_CODE (cst) != TREE_BINFO)
+		VEC_replace (tree, known_vals, i, cst);
+	      else if (known_binfos_ptr != NULL)
+		VEC_replace (tree, *known_binfos_ptr, i, cst);
+	    }
 	  else if (inline_p
 		   && !VEC_index (inline_param_summary_t,
 				  es->param,
 				  i)->change_prob)
 	    VEC_replace (tree, known_vals, i, error_mark_node);
 	}
-      clause = evaluate_conditions_for_known_args (callee,
-						   inline_p, known_vals);
-      VEC_free (tree, heap, known_vals);
+
+      if (clause_ptr && info->conds)
+	*clause_ptr = evaluate_conditions_for_known_args (callee, inline_p,
+							  known_vals);
+
+      if (known_vals_ptr)
+	*known_vals_ptr = known_vals;
+      else
+	VEC_free (tree, heap, known_vals);
     }
-  else
-    for (i = 0; i < (int)VEC_length (condition, info->conds); i++)
-      clause |= 1 << (i + predicate_first_dynamic_condition);
 
-  return clause;
+  if (clause_ptr && !info->conds)
+    for (i = 0; i < (int)VEC_length (condition, info->conds); i++)
+      *clause_ptr |= 1 << (i + predicate_first_dynamic_condition);
 }
 
 
@@ -2183,11 +2206,71 @@ estimate_edge_size_and_time (struct cgraph_edge *e, int *size, int *time,
 }
 
 
-/* Increase SIZE and TIME for size and time needed to handle all calls in NODE.  */
+/* Estimate benefit devirtualizing indirect edge IE, provided KNOWN_VALS and
+   KNOWN_BINFOS.  */
+
+static void
+estimate_edge_devirt_benefit (struct cgraph_edge *ie,
+			      int *size, int *time, int prob,
+			      VEC (tree, heap) *known_vals,
+			      VEC (tree, heap) *known_binfos)
+{
+  tree target;
+  struct cgraph_node *callee;
+  struct inline_summary *isummary;
+  int edge_size = 0, edge_time = 0;
+
+  if (!known_vals || !known_binfos)
+    return;
+
+  target = ipa_get_indirect_edge_target (ie, known_vals, known_binfos);
+  if (!target)
+    return;
+
+  /* Account for difference in cost between indirect and direct calls.  */
+  *size -= ((eni_size_weights.indirect_call_cost - eni_size_weights.call_cost)
+	    * INLINE_SIZE_SCALE);
+  *time -= ((eni_time_weights.indirect_call_cost - eni_time_weights.call_cost)
+	    * INLINE_TIME_SCALE * prob / REG_BR_PROB_BASE);
+
+  callee = cgraph_get_node (target);
+  if (!callee || !callee->analyzed)
+    return;
+  isummary = inline_summary (callee);
+  if (!isummary->inlinable)
+    return;
+
+  estimate_edge_size_and_time (ie, &edge_size, &edge_time, prob);
+
+  /* Count benefit only from functions that definitely will be inlined
+     if additional context from NODE's caller were available.  */
+  if (edge_size >= isummary->size * INLINE_SIZE_SCALE)
+    {
+      /* Subtract size and time that we added for edge IE.  */
+      *size -= edge_size;
+      *time -= edge_time;
+
+      /* Subtract benefit from inlining devirtualized call.  */
+      *size -= edge_size - isummary->size * INLINE_SIZE_SCALE;
+      *time -= edge_time - (isummary->time * INLINE_TIME_SCALE * prob
+			    / REG_BR_PROB_BASE);
+
+      /* TODO: estimate benefit from optimizing CALLEE's body provided
+	 additional context from IE call site.
+	 For insipiration see ipa-cp.c: devirtualization_time_bonus().  */
+    }
+}
+
+
+/* Increase SIZE and TIME for size and time needed to handle all calls in NODE.
+   POSSIBLE_TRUTHS, KNOWN_VALS and KNOWN_BINFOS describe context of the call
+   site.  */
 
 static void
 estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time,
-			      clause_t possible_truths)
+			      clause_t possible_truths,
+			      VEC (tree, heap) *known_vals,
+			      VEC (tree, heap) *known_binfos)
 {
   struct cgraph_edge *e;
   for (e = node->callees; e; e = e->next_callee)
@@ -2203,25 +2286,32 @@ estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time,
 	    }
 	  else
 	    estimate_calls_size_and_time (e->callee, size, time,
-					  possible_truths);
+					  possible_truths,
+					  known_vals, known_binfos);
 	}
     }
-  /* TODO: look for devirtualizing oppurtunities.  */
   for (e = node->indirect_calls; e; e = e->next_callee)
     {
       struct inline_edge_summary *es = inline_edge_summary (e);
       if (!es->predicate || evaluate_predicate (es->predicate, possible_truths))
-        estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE);
+	{
+	  estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE);
+	  estimate_edge_devirt_benefit (e, size, time, REG_BR_PROB_BASE,
+					known_vals, known_binfos);
+	}
     }
 }
 
 
 /* Estimate size and time needed to execute NODE assuming
-   POSSIBLE_TRUTHS clause. */
+   POSSIBLE_TRUTHS clause, and KNOWN_VALS and KNOWN_BINFOS information
+   about NODE's arguments. */
 
 static void
 estimate_node_size_and_time (struct cgraph_node *node,
 			     clause_t possible_truths,
+			     VEC (tree, heap) *known_vals,
+			     VEC (tree, heap) *known_binfos,
 		       	     int *ret_size, int *ret_time,
 			     VEC (inline_param_summary_t, heap)
 			       *inline_param_summary)
@@ -2272,7 +2362,8 @@ estimate_node_size_and_time (struct cgraph_node *node,
   if (time > MAX_TIME * INLINE_TIME_SCALE)
     time = MAX_TIME * INLINE_TIME_SCALE;
 
-  estimate_calls_size_and_time (node, &size, &time, possible_truths);
+  estimate_calls_size_and_time (node, &size, &time, possible_truths,
+				known_vals, known_binfos);
   time = (time + INLINE_TIME_SCALE / 2) / INLINE_TIME_SCALE;
   size = (size + INLINE_SIZE_SCALE / 2) / INLINE_SIZE_SCALE;
 
@@ -2290,17 +2381,20 @@ estimate_node_size_and_time (struct cgraph_node *node,
 
 /* Estimate size and time needed to execute callee of EDGE assuming that
    parameters known to be constant at caller of EDGE are propagated.
-   KNOWN_VALs is a vector of assumed known constant values for parameters.  */
+   KNOWN_VALS and KNOWN_BINFOS are vectors of assumed known constant values
+   and types for parameters.  */
 
 void
 estimate_ipcp_clone_size_and_time (struct cgraph_node *node,
 				   VEC (tree, heap) *known_vals,
+				   VEC (tree, heap) *known_binfos,
 		                   int *ret_size, int *ret_time)
 {
   clause_t clause;
 
   clause = evaluate_conditions_for_known_args (node, false, known_vals);
-  estimate_node_size_and_time (node, clause, ret_size, ret_time,
+  estimate_node_size_and_time (node, clause, known_vals, known_binfos,
+			       ret_size, ret_time,
 			       NULL);
 }
 
@@ -2556,9 +2650,9 @@ inline_merge_summary (struct cgraph_edge *edge)
       int count = ipa_get_cs_argument_count (args);
       int i;
 
-      clause = evaluate_conditions_for_edge (edge, true);
+      evaluate_properties_for_edge (edge, true, &clause, NULL, NULL);
       if (count)
-        VEC_safe_grow_cleared (int, heap, operand_map, count);
+	VEC_safe_grow_cleared (int, heap, operand_map, count);
       for (i = 0; i < count; i++)
 	{
 	  struct ipa_jump_func *jfunc = ipa_get_ith_jump_func (args, i);
@@ -2602,7 +2696,8 @@ inline_merge_summary (struct cgraph_edge *edge)
   for (i = 0; VEC_iterate (size_time_entry, info->entry, i, e); i++)
     info->size += e->size, info->time += e->time;
   estimate_calls_size_and_time (to, &info->size, &info->time,
-				~(clause_t)(1 << predicate_false_condition));
+				~(clause_t)(1 << predicate_false_condition),
+				NULL, NULL);
 
   inline_update_callee_summaries (edge->callee,
 				  inline_edge_summary (edge)->loop_depth);
@@ -2630,13 +2725,21 @@ do_estimate_edge_time (struct cgraph_edge *edge)
   int time;
   int size;
   gcov_type ret;
+  struct cgraph_node *callee;
+  clause_t clause;
+  VEC (tree, heap) *known_vals;
+  VEC (tree, heap) *known_binfos;
   struct inline_edge_summary *es = inline_edge_summary (edge);
 
+  callee = cgraph_function_or_thunk_node (edge->callee, NULL);
+
   gcc_checking_assert (edge->inline_failed);
-  estimate_node_size_and_time (cgraph_function_or_thunk_node (edge->callee,
-							      NULL),
-			       evaluate_conditions_for_edge (edge, true),
+  evaluate_properties_for_edge (edge, true,
+				&clause, &known_vals, &known_binfos);
+  estimate_node_size_and_time (callee, clause, known_vals, known_binfos,
 			       &size, &time, es->param);
+  VEC_free (tree, heap, known_vals);
+  VEC_free (tree, heap, known_binfos);
 
   ret = (((gcov_type)time
 	   - es->call_stmt_time) * edge->frequency
@@ -2670,6 +2773,9 @@ do_estimate_edge_growth (struct cgraph_edge *edge)
 {
   int size;
   struct cgraph_node *callee;
+  clause_t clause;
+  VEC (tree, heap) *known_vals;
+  VEC (tree, heap) *known_binfos;
 
   /* When we do caching, use do_estimate_edge_time to populate the entry.  */
 
@@ -2682,13 +2788,17 @@ do_estimate_edge_growth (struct cgraph_edge *edge)
       gcc_checking_assert (size);
       return size - (size > 0);
     }
+
   callee = cgraph_function_or_thunk_node (edge->callee, NULL);
 
   /* Early inliner runs without caching, go ahead and do the dirty work.  */
   gcc_checking_assert (edge->inline_failed);
-  estimate_node_size_and_time (callee,
-			       evaluate_conditions_for_edge (edge, true),
+  evaluate_properties_for_edge (edge, true,
+				&clause, &known_vals, &known_binfos);
+  estimate_node_size_and_time (callee, clause, known_vals, known_binfos,
 			       &size, NULL, NULL);
+  VEC_free (tree, heap, known_vals);
+  VEC_free (tree, heap, known_binfos);
   gcc_checking_assert (inline_edge_summary (edge)->call_stmt_size);
   return size - inline_edge_summary (edge)->call_stmt_size;
 }
diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
index 6df7867..a2c6cac 100644
--- a/gcc/ipa-inline.h
+++ b/gcc/ipa-inline.h
@@ -169,6 +169,7 @@ int estimate_time_after_inlining (struct cgraph_node *, struct cgraph_edge *);
 int estimate_size_after_inlining (struct cgraph_node *, struct cgraph_edge *);
 void estimate_ipcp_clone_size_and_time (struct cgraph_node *,
 					VEC (tree, heap) *known_vals,
+					VEC (tree, heap) *known_binfos,
 					int *, int *);
 int do_estimate_growth (struct cgraph_node *);
 void inline_merge_summary (struct cgraph_edge *edge);
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 58caa92..1a609a5 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -346,6 +346,9 @@ bool ipa_propagate_indirect_call_infos (struct cgraph_edge *cs,
 					VEC (cgraph_edge_p, heap) **new_edges);
 
 /* Indirect edge and binfo processing.  */
+tree ipa_get_indirect_edge_target (struct cgraph_edge *ie,
+				   VEC (tree, heap) *known_csts,
+				   VEC (tree, heap) *known_binfs);
 struct cgraph_edge *ipa_make_edge_direct_to_target (struct cgraph_edge *, tree);
 
 /* Functions related to both.  */
@@ -437,8 +440,8 @@ void ipa_prop_write_jump_functions (cgraph_node_set set);
 void ipa_prop_read_jump_functions (void);
 void ipa_update_after_lto_read (void);
 int ipa_get_param_decl_index (struct ipa_node_params *, tree);
-tree ipa_cst_from_jfunc (struct ipa_node_params *info,
-			 struct ipa_jump_func *jfunc);
+tree ipa_value_from_jfunc (struct ipa_node_params *info,
+			   struct ipa_jump_func *jfunc);
 
 
 /* From tree-sra.c:  */
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 11be8d0..3f33f72 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3488,7 +3488,7 @@ estimate_num_insns (gimple stmt, eni_weights *weights)
     case GIMPLE_CALL:
       {
 	tree decl = gimple_call_fndecl (stmt);
-	struct cgraph_node *node;
+	struct cgraph_node *node = NULL;
 
 	/* Do not special case builtins where we see the body.
 	   This just confuse inliner.  */
@@ -3523,7 +3523,7 @@ estimate_num_insns (gimple stmt, eni_weights *weights)
 	      }
 	  }
 
-	cost = weights->call_cost;
+	cost = node ? weights->call_cost : weights->indirect_call_cost;
 	if (gimple_call_lhs (stmt))
 	  cost += estimate_move_cost (TREE_TYPE (gimple_call_lhs (stmt)));
 	for (i = 0; i < gimple_call_num_args (stmt); i++)
@@ -3636,6 +3636,7 @@ void
 init_inline_once (void)
 {
   eni_size_weights.call_cost = 1;
+  eni_size_weights.indirect_call_cost = 3;
   eni_size_weights.target_builtin_call_cost = 1;
   eni_size_weights.div_mod_cost = 1;
   eni_size_weights.omp_cost = 40;
@@ -3647,6 +3648,7 @@ init_inline_once (void)
      underestimating the cost does less harm than overestimating it, so
      we choose a rather small value here.  */
   eni_time_weights.call_cost = 10;
+  eni_time_weights.indirect_call_cost = 15;
   eni_time_weights.target_builtin_call_cost = 1;
   eni_time_weights.div_mod_cost = 10;
   eni_time_weights.omp_cost = 40;
diff --git a/gcc/tree-inline.h b/gcc/tree-inline.h
index fb039e3..5508e09 100644
--- a/gcc/tree-inline.h
+++ b/gcc/tree-inline.h
@@ -135,6 +135,9 @@ typedef struct eni_weights_d
   /* Cost per call.  */
   unsigned call_cost;
 
+  /* Cost per indirect call.  */
+  unsigned indirect_call_cost;
+
   /* Cost per call to a target specific builtin */
   unsigned target_builtin_call_cost;
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Account for devirtualization opportunities in inliner
  2011-10-28  8:47       ` Maxim Kuvyrkov
@ 2011-11-09  4:34         ` Maxim Kuvyrkov
  2011-11-14 17:48           ` Jan Hubicka
  0 siblings, 1 reply; 8+ messages in thread
From: Maxim Kuvyrkov @ 2011-11-09  4:34 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches, Martin Jambor

Ping.

--
Maxim Kuvyrkov
www.trivialbugs.com



On 28/10/2011, at 9:01 PM, Maxim Kuvyrkov wrote:

> Jan,
> 
> Attached is the updated patch.  The only major change is the addition of indirect_call_cost to size and time weights.  I've set the size cost of indirect call to 3, which is what I remember calculating when I looked into costs couple of months ago: one call instruction for the call itself, one memory instruction to pull the call address out of vtable, and one ALU instruction to calculate the address inside vtable.  On architectures with base+offset addressing the above can be shrunk into 2 instructions.
> 
> The remapping of the known_vals and known_binfos did indeed turned out to work just fine.  Probably, that was a bug that was fixed since.
> 
> The patch bootstraps and passes regtest.
> 
> Comments?  OK for trunk?
> 
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
> 
> 
> 
> On 28/10/2011, at 3:43 PM, Maxim Kuvyrkov wrote:
> 
>> On 20/10/2011, at 10:11 PM, Jan Hubicka wrote:
>>> static clause_t
>>> -evaluate_conditions_for_edge (struct cgraph_edge *e, bool inline_p)
>>> +evaluate_conditions_vals_binfos_for_edge (struct cgraph_edge *e,
>>> +					  bool inline_p,
>>> +					  VEC (tree, heap) **known_vals_ptr,
>>> +					  VEC (tree, heap) **known_binfos_ptr)
>>> 
>>> Hmm, I would make clause also returned by reference to be sonsistent and perhaps
>>> call it something like edge_properties
>>> since it is not really only about evaulating the clause anymore.
>> 
>> Agree.
>> 
>>> 
>>> -/* Increase SIZE and TIME for size and time needed to handle all calls in NODE.  */
>>> +/* Estimate benefit devirtualizing indirect edge IE, provided KNOWN_VALS and
>>> +   KNOWN_BINFOS.  */
>>> +
>>> +static void
>>> +estimate_edge_devirt_benefit (struct cgraph_edge *ie,
>>> +			      int *size, int *time, int prob,
>>> +			      VEC (tree, heap) *known_vals,
>>> +			      VEC (tree, heap) *known_binfos)
>>> 
>>> I think this whole logic should go into estimate_edge_time_and_size.  This way
>>> we will save all the duplication of scaling logic
>>> Just add the known_vals/binfos arguments.
>> 
>> Then devirtualization benefit will not be available through estimate_node_size_and_time, which is the primary interface for users of ipa-inline-analysis other than the inliner itself.  I.e., estimate_ipcp_clone_size_and_time, which is the only other user of the analysis at the moment, will not see devirtualization benefit.
>> 
>>> 
>>> I am not quite sure how to estimate the actual benefits.  estimate_num_insns
>>> doesn't really make a difference in between direct and indirect calls.
>>> 
>>> I see it is good idea to inline more then the destination is known & inlinable.
>>> This is an example when we have additional knowledge that we want to mix into
>>> badness metric that does not directly translate to time/size.  There are multiple
>>> cases like this.  I was thinking of adding kind of bonus metric for this purpose,
>>> but I would suggest doing this incrementally.
>> 
>> I too thought about this, and decided to keep the bonus metric part to bare minimum in this patch.
>> 
>>> 
>>> What about
>>> 1) extending estimate_num_insns wieghts to account direct calls differently
>>>  from indirect calls (i.e. adding indirect_call cost value into eni wights)
>>>  I would set it 2 for size metrics and 15 for time metrics for start
>>> 2) make estimate_edge_time_and_size to subtract difference of those two metrics
>>>  from edge costs when destination is direct.
>> 
>> OK, I'll try this.
>> 
>>> @@ -2125,25 +2207,35 @@ estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time,
>>> 	    }
>>> 	  else
>>> 	    estimate_calls_size_and_time (e->callee, size, time,
>>> -					  possible_truths);
>>> +					  possible_truths,
>>> +					  /* TODO: remap KNOWN_VALS and
>>> +					     KNOWN_BINFOS to E->CALLEE
>>> +					     parameters, and use them.  */
>>> +					  NULL, NULL);
>>> 
>>> Remapping should not be needed here - the jump functions are merged after marking edge inline, so jump
>>> functions in inlined functions actually reffer to the parameters of the function they are inlined to.
>> 
>> I remember it crashing on some testcase and thought the lack of remapping was the cause.  I'll look into this.
>> 
>> Thank you,
>> 
>> --
>> Maxim Kuvyrkov
>> CodeSourcery / Mentor Graphics
>> 
> 
> <fsf-gcc-devirt-account-2.ChangeLog><fsf-gcc-devirt-account-2.patch>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Account for devirtualization opportunities in inliner
  2011-11-09  4:34         ` Maxim Kuvyrkov
@ 2011-11-14 17:48           ` Jan Hubicka
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Hubicka @ 2011-11-14 17:48 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Jan Hubicka, GCC Patches, Martin Jambor

Hi,
the patch is OK. Thanks!
Honza

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-11-14 17:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-28  7:53 [PATCH] Account for devirtualization opportunities in inliner Maxim Kuvyrkov
2011-10-19 22:21 ` Maxim Kuvyrkov
2011-10-20  8:30   ` Richard Guenther
2011-10-20  9:31   ` Jan Hubicka
2011-10-28  4:08     ` Maxim Kuvyrkov
2011-10-28  8:47       ` Maxim Kuvyrkov
2011-11-09  4:34         ` Maxim Kuvyrkov
2011-11-14 17:48           ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).