public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
@ 2019-06-18  1:46 Xiong Hu Luo
  2019-06-18  5:51 ` Martin Liška
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Xiong Hu Luo @ 2019-06-18  1:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: hubicka, mliska, segher, wschmidt, luoxhu, Xiong Hu Luo

This patch aims to fix PR69678 caused by PGO indirect call profiling bugs.
Currently the default instrument function can only find the indirect function
that called more than 50% with an incorrect count number returned.  This patch
leverages the "--param indir-call-topn-profile=1" and enables multiple indirect
targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, function
specialization, profiling, partial devirtualization, inlining and cloning could
be done successfully based on it.
Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests.
Details are:
  1.  When do PGO with indir-call-topn-profile, the gcda data format is not
  supported in ipa-profile pass, so add variables to pass the information
  through passes, and postpone gimple_ic to ipa-profile like default as inline
  pass will decide whether it is benefit to transform indirect call.
  2.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
  profile full support in ipa passes and cgraph_edge functions.
  3.  Fix various hidden speculative call ICEs exposed after enabling this
  feature when running SPEC2017.
  4.  Add 1 in module testcase and 2 cross module testcases.
  5.  TODOs:
    5.1.  Some reference info will be dropped from WPA to LTRANS, so
    reference check will be difficult in LTRANS, need replace the strstr
    with reference compare.
    5.2.  Some duplicate code need be removed as top1 and topn share same logic.
    Actually top1 related logic could be eliminated totally as topn includes it.
    5.3.  Split patch maybe needed as too big but not sure how many would be
    reasonable.
  6.  Performance result for ppc64le:
    6.1.  Representative test: indir-call-prof-topn.c runtime improved from
    1.7s to 0.4s.
    6.2.  SPEC2017 peakrate:
        523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r (+13.33%);
        525.x264_r (-5.29%).
        No big changes of other benchmarks.
        Option: -Ofast -mcpu=power8
        PASS1_OPTIMIZE: -fprofile-generate --param indir-call-topn-profile=1 -flto
        PASS2_OPTIMIZE: -fprofile-use --param indir-call-topn-profile=1 -flto
        -fprofile-correction
    6.3.  No performance change on PHP benchmark.
  7.  Bootstrap and regression test passed on Power8-LE.

gcc/ChangeLog

	2019-06-17  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR ipa/69678
	* cgraph.c (cgraph_node::get_create): Copy profile_id.
	(cgraph_edge::speculative_call_info): Find real
	reference for indirect targets.
	(cgraph_edge::resolve_speculation): Add speculative code process
	for indirect targets.
	(cgraph_edge::redirect_call_stmt_to_callee): Likewise.
	(cgraph_node::verify_node): Likewise.
	* cgraph.h (common_target_ids): New variable.
	(common_target_probabilities): Likewise.
	(num_of_ics): Likewise.
	* cgraphclones.c (cgraph_node::create_clone): Copy profile_id.
	* ipa-inline.c (inline_small_functions): Add iterator update.
	* ipa-profile.c (ipa_profile_generate_summary): Add indirect
	multiple targets logic.
	(ipa_profile): Likewise.
	* ipa-utils.c (ipa_merge_profiles): Clone speculative src's
	referrings to dst.
	* ipa.c (process_references): Fix typo.
	* lto-cgraph.c (lto_output_edge): Add indirect multiple targets
	logic.
	(input_edge): Likewise.
	* predict.c (dump_prediction): Revome edges count assert to be
	precise.
	* tree-profile.c (gimple_gen_ic_profiler): Use the new variable
	__gcov_indirect_call.counters and __gcov_indirect_call.callee.
	(gimple_gen_ic_func_profiler): Likewise.
	(pass_ipa_tree_profile::gate): Fix comment typos.
	* tree-inline.c (copy_bb): Duplicate all the speculative edges
	if indirect call contains multiple speculative targets.
	* value-prof.c (check_counter): Proportion the counter for
	multiple targets.
	(ic_transform_topn): New function.
	(gimple_ic_transform): Handle topn case, fix comment typos.

gcc/testsuite/ChangeLog

	2019-06-17  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR ipa/69678
	* gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase.
	* gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase.
	* gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase.
	* gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase.
---
 gcc/cgraph.c                                  |  38 +++-
 gcc/cgraph.h                                  |   9 +-
 gcc/cgraphclones.c                            |   1 +
 gcc/ipa-inline.c                              |   3 +
 gcc/ipa-profile.c                             | 185 +++++++++++++++++-
 gcc/ipa-utils.c                               |   5 +
 gcc/ipa.c                                     |   2 +-
 gcc/lto-cgraph.c                              |  38 ++++
 gcc/predict.c                                 |   1 -
 .../tree-prof/crossmodule-indir-call-topn-1.c |  35 ++++
 .../crossmodule-indir-call-topn-1a.c          |  22 +++
 .../tree-prof/crossmodule-indir-call-topn-2.c |  42 ++++
 .../gcc.dg/tree-prof/indir-call-prof-topn.c   |  38 ++++
 gcc/tree-inline.c                             |  97 +++++----
 gcc/tree-profile.c                            |  12 +-
 gcc/value-prof.c                              | 146 +++++++++++++-
 16 files changed, 606 insertions(+), 68 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index de82316d4b1..0d373a67d1b 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
 	fprintf (dump_file, "Introduced new external node "
 		 "(%s) and turned into root of the clone tree.\n",
 		 node->dump_name ());
+      node->profile_id = first_clone->profile_id;
     }
   else if (dump_file)
     fprintf (dump_file, "Introduced new external node "
@@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
   int i;
   cgraph_edge *e2;
   cgraph_edge *e = this;
+  cgraph_node *referred_node;
 
   if (!e->indirect_unknown_callee)
     for (e2 = e->caller->indirect_calls;
@@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
 	&& ((ref->stmt && ref->stmt == e->call_stmt)
 	    || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
       {
-	reference = ref;
-	break;
+	if (e2->indirect_info && e2->indirect_info->num_of_ics)
+	  {
+	    referred_node = dyn_cast<cgraph_node *> (ref->referred);
+	    if (strstr (e->callee->name (), referred_node->name ()))
+	      {
+		reference = ref;
+		break;
+	      }
+	  }
+	else
+	  {
+	    reference = ref;
+	    break;
+	  }
       }
 
   /* Speculative edge always consist of all three components - direct edge,
@@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
          in the functions inlined through it.  */
     }
   edge->count += e2->count;
-  edge->speculative = false;
+  if (edge->indirect_info && edge->indirect_info->num_of_ics)
+    {
+      edge->indirect_info->num_of_ics--;
+      if (edge->indirect_info->num_of_ics == 0)
+	edge->speculative = false;
+    }
+  else
+    edge->speculative = false;
   e2->speculative = false;
   ref->remove_reference ();
   if (e2->indirect_unknown_callee || e2->inline_failed)
@@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
 	  e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
 						     false);
 	  e->count = gimple_bb (e->call_stmt)->count;
-	  e2->speculative = false;
+	  if (e2->indirect_info && e2->indirect_info->num_of_ics)
+	    {
+	      e2->indirect_info->num_of_ics--;
+	      if (e2->indirect_info->num_of_ics == 0)
+		e2->speculative = false;
+	    }
+	  else
+	    e2->speculative = false;
 	  e2->count = gimple_bb (e2->call_stmt)->count;
 	  ref->speculative = false;
 	  ref->stmt = NULL;
@@ -3407,7 +3435,7 @@ cgraph_node::verify_node (void)
 
       for (e = callees; e; e = e->next_callee)
 	{
-	  if (!e->aux)
+	  if (!e->aux && !e->speculative)
 	    {
 	      error ("edge %s->%s has no corresponding call_stmt",
 		     identifier_to_locale (e->caller->name ()),
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index c294602d762..ed0fbc60432 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "profile-count.h"
 #include "ipa-ref.h"
 #include "plugin-api.h"
+#include "gcov-io.h"
 
 extern void debuginfo_early_init (void);
 extern void debuginfo_init (void);
@@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info
   int param_index;
   /* ECF flags determined from the caller.  */
   int ecf_flags;
-  /* Profile_id of common target obtrained from profile.  */
+  /* Profile_id of common target obtained from profile.  */
   int common_target_id;
   /* Probability that call will land in function with COMMON_TARGET_ID.  */
   int common_target_probability;
 
+  /* Profile_id of common target obtained from profile.  */
+  int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2];
+  /* Probabilities that call will land in function with COMMON_TARGET_IDS.  */
+  int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2];
+  unsigned num_of_ics;
+
   /* Set when the call is a virtual call with the parameter being the
      associated object pointer rather than a simple direct call.  */
   unsigned polymorphic : 1;
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index 15f7e119d18..94f424bc10c 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -467,6 +467,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
   new_node->icf_merged = icf_merged;
   new_node->merged_comdat = merged_comdat;
   new_node->thunk = thunk;
+  new_node->profile_id = profile_id;
 
   new_node->clone.tree_map = NULL;
   new_node->clone.args_to_skip = args_to_skip;
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 360c3de3289..ef2b217b3f9 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -1866,12 +1866,15 @@ inline_small_functions (void)
 	}
       if (has_speculative)
 	for (edge = node->callees; edge; edge = next)
+	{
+	  next = edge->next_callee;
 	  if (edge->speculative && !speculation_useful_p (edge,
 							  edge->aux != NULL))
 	    {
 	      edge->resolve_speculation ();
 	      update = true;
 	    }
+	}
       if (update)
 	{
 	  struct cgraph_node *where = node->global.inlined_to
diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
index de9563d808c..d04476295a0 100644
--- a/gcc/ipa-profile.c
+++ b/gcc/ipa-profile.c
@@ -168,6 +168,10 @@ ipa_profile_generate_summary (void)
   struct cgraph_node *node;
   gimple_stmt_iterator gsi;
   basic_block bb;
+  enum hist_type type;
+
+  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN
+						     : HIST_TYPE_INDIR_CALL;
 
   hash_table<histogram_hash> hashtable (10);
   
@@ -186,10 +190,10 @@ ipa_profile_generate_summary (void)
 		  histogram_value h;
 		  h = gimple_histogram_value_of_type
 			(DECL_STRUCT_FUNCTION (node->decl),
-			 stmt, HIST_TYPE_INDIR_CALL);
+			 stmt, type);
 		  /* No need to do sanity check: gimple_ic_transform already
 		     takes away bad histograms.  */
-		  if (h)
+		  if (h && type == HIST_TYPE_INDIR_CALL)
 		    {
 		      /* counter 0 is target, counter 1 is number of execution we called target,
 			 counter 2 is total number of executions.  */
@@ -212,6 +216,46 @@ ipa_profile_generate_summary (void)
 		      gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (node->decl),
 						      stmt, h);
 		    }
+		  else if (h && type == HIST_TYPE_INDIR_CALL_TOPN)
+		    {
+		      unsigned j;
+		      struct cgraph_edge *e = node->get_edge (stmt);
+		      if (e && !e->indirect_unknown_callee)
+			continue;
+
+		      e->indirect_info->num_of_ics = 0;
+		      for (j = 1; j < h->n_counters; j += 2)
+			{
+			  if (h->hvalue.counters[j] == 0)
+			    continue;
+
+			  e->indirect_info->common_target_ids[j / 2]
+			    = h->hvalue.counters[j];
+			  e->indirect_info->common_target_probabilities[j / 2]
+			    = GCOV_COMPUTE_SCALE (
+			      h->hvalue.counters[j + 1],
+			      gimple_bb (stmt)->count.ipa ().to_gcov_type ());
+			  if (e->indirect_info
+				->common_target_probabilities[j / 2]
+			      > REG_BR_PROB_BASE)
+			    {
+			      if (dump_file)
+				fprintf (dump_file,
+					 "Probability capped to 1\n");
+			      e->indirect_info
+				->common_target_probabilities[j / 2]
+				= REG_BR_PROB_BASE;
+			    }
+			  e->indirect_info->num_of_ics++;
+			}
+
+		      gcc_assert (e->indirect_info->num_of_ics
+				  <= GCOV_ICALL_TOPN_NCOUNTS / 2);
+
+		      gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (
+						       node->decl),
+						     stmt, h);
+		    }
 		}
 	      time += estimate_num_insns (stmt, &eni_time_weights);
 	      size += estimate_num_insns (stmt, &eni_size_weights);
@@ -492,6 +536,7 @@ ipa_profile (void)
   int nindirect = 0, ncommon = 0, nunknown = 0, nuseless = 0, nconverted = 0;
   int nmismatch = 0, nimpossible = 0;
   bool node_map_initialized = false;
+  gcov_type threshold;
 
   if (dump_file)
     dump_histogram (dump_file, histogram);
@@ -500,14 +545,12 @@ ipa_profile (void)
       overall_time += histogram[i]->count * histogram[i]->time;
       overall_size += histogram[i]->size;
     }
+  threshold = 0;
   if (overall_time)
     {
-      gcov_type threshold;
-
       gcc_assert (overall_size);
 
       cutoff = (overall_time * PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE) + 500) / 1000;
-      threshold = 0;
       for (i = 0; cumulated < cutoff; i++)
 	{
 	  cumulated += histogram[i]->count * histogram[i]->time;
@@ -543,7 +586,7 @@ ipa_profile (void)
   histogram.release ();
   histogram_pool.release ();
 
-  /* Produce speculative calls: we saved common traget from porfiling into
+  /* Produce speculative calls: we saved common target from profiling into
      e->common_target_id.  Now, at link time, we can look up corresponding
      function node and produce speculative call.  */
 
@@ -558,7 +601,8 @@ ipa_profile (void)
 	{
 	  if (n->count.initialized_p ())
 	    nindirect++;
-	  if (e->indirect_info->common_target_id)
+	  if (e->indirect_info->common_target_id
+	      || (e->indirect_info && e->indirect_info->num_of_ics == 1))
 	    {
 	      if (!node_map_initialized)
 	        init_node_map (false);
@@ -613,7 +657,7 @@ ipa_profile (void)
 		      if (dump_file)
 			fprintf (dump_file,
 				 "Not speculating: "
-				 "parameter count mistmatch\n");
+				 "parameter count mismatch\n");
 		    }
 		  else if (e->indirect_info->polymorphic
 			   && !opt_for_fn (n->decl, flag_devirtualize)
@@ -655,7 +699,130 @@ ipa_profile (void)
 		  nunknown++;
 		}
 	    }
-	 }
+	  if (e->indirect_info && e->indirect_info->num_of_ics > 1)
+	    {
+	      if (in_lto_p)
+		{
+		  if (dump_file)
+		    {
+		      fprintf (dump_file,
+			       "Updating hotness threshold in LTO mode.\n");
+		      fprintf (dump_file, "Updated min count: %" PRId64 "\n",
+			       (int64_t) threshold);
+		    }
+		  set_hot_bb_threshold (threshold
+					/ e->indirect_info->num_of_ics);
+		}
+	      if (!node_map_initialized)
+		init_node_map (false);
+	      node_map_initialized = true;
+	      ncommon++;
+	      unsigned speculative = 0;
+	      for (i = 0; i < (int)e->indirect_info->num_of_ics; i++)
+		{
+		  n2 = find_func_by_profile_id (
+		    e->indirect_info->common_target_ids[i]);
+		  if (n2)
+		    {
+		      if (dump_file)
+			{
+			  fprintf (
+			    dump_file,
+			    "Indirect call -> direct call from"
+			    " other module %s => %s, prob %3.2f\n",
+			    n->dump_name (), n2->dump_name (),
+			    e->indirect_info->common_target_probabilities[i]
+			      / (float) REG_BR_PROB_BASE);
+			}
+		      if (e->indirect_info->common_target_probabilities[i]
+			  < REG_BR_PROB_BASE / 2)
+			{
+			  nuseless++;
+			  if (dump_file)
+			    fprintf (
+			      dump_file,
+			      "Not speculating: probability is too low.\n");
+			}
+		      else if (!e->maybe_hot_p ())
+			{
+			  nuseless++;
+			  if (dump_file)
+			    fprintf (dump_file,
+				     "Not speculating: call is cold.\n");
+			}
+		      else if (n2->get_availability () <= AVAIL_INTERPOSABLE
+			       && n2->can_be_discarded_p ())
+			{
+			  nuseless++;
+			  if (dump_file)
+			    fprintf (dump_file,
+				     "Not speculating: target is overwritable "
+				     "and can be discarded.\n");
+			}
+		      else if (ipa_node_params_sum && ipa_edge_args_sum
+			       && (!vec_safe_is_empty (
+				 IPA_NODE_REF (n2)->descriptors))
+			       && ipa_get_param_count (IPA_NODE_REF (n2))
+				    != ipa_get_cs_argument_count (
+				      IPA_EDGE_REF (e))
+			       && (ipa_get_param_count (IPA_NODE_REF (n2))
+				     >= ipa_get_cs_argument_count (
+				       IPA_EDGE_REF (e))
+				   || !stdarg_p (TREE_TYPE (n2->decl))))
+			{
+			  nmismatch++;
+			  if (dump_file)
+			    fprintf (dump_file, "Not speculating: "
+						"parameter count mismatch\n");
+			}
+		      else if (e->indirect_info->polymorphic
+			       && !opt_for_fn (n->decl, flag_devirtualize)
+			       && !possible_polymorphic_call_target_p (e, n2))
+			{
+			  nimpossible++;
+			  if (dump_file)
+			    fprintf (dump_file,
+				     "Not speculating: "
+				     "function is not in the polymorphic "
+				     "call target list\n");
+			}
+		      else
+			{
+			  /* Target may be overwritable, but profile says that
+			     control flow goes to this particular implementation
+			     of N2.  Speculate on the local alias to allow
+			     inlining.
+			     */
+			  if (!n2->can_be_discarded_p ())
+			    {
+			      cgraph_node *alias;
+			      alias = dyn_cast<cgraph_node *> (
+				n2->noninterposable_alias ());
+			      if (alias)
+				n2 = alias;
+			    }
+			  nconverted++;
+			  e->make_speculative (
+			    n2, e->count.apply_probability (
+				  e->indirect_info
+				    ->common_target_probabilities[i]));
+			  update = true;
+			  speculative++;
+			}
+		    }
+		  else
+		    {
+		      if (dump_file)
+			fprintf (dump_file,
+				 "Function with profile-id %i not found.\n",
+				 e->indirect_info->common_target_ids[i]);
+		      nunknown++;
+		    }
+		}
+	      if (speculative < e->indirect_info->num_of_ics)
+		e->indirect_info->num_of_ics = speculative;
+	    }
+	}
        if (update)
 	 ipa_update_overall_fn_summary (n);
      }
diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
index 79b250c3943..30347691029 100644
--- a/gcc/ipa-utils.c
+++ b/gcc/ipa-utils.c
@@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst,
       update_max_bb_count ();
       compute_function_frequency ();
       pop_cfun ();
+      /* When src is speculative, clone the referrings.  */
+      if (src->indirect_call_target)
+	for (e = src->callers; e; e = e->next_caller)
+	  if (e->callee == src && e->speculative)
+	    dst->clone_referring (src);
       for (e = dst->callees; e; e = e->next_callee)
 	{
 	  if (e->speculative)
diff --git a/gcc/ipa.c b/gcc/ipa.c
index 2496694124c..c1fe081a72d 100644
--- a/gcc/ipa.c
+++ b/gcc/ipa.c
@@ -166,7 +166,7 @@ process_references (symtab_node *snode,
    devirtualization happens.  After inlining still keep their declarations
    around, so we can devirtualize to a direct call.
 
-   Also try to make trivial devirutalization when no or only one target is
+   Also try to make trivial devirtualization when no or only one target is
    possible.  */
 
 static void
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 4dfa2862be3..0c8f547d44e 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -238,6 +238,7 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
   unsigned int uid;
   intptr_t ref;
   struct bitpack_d bp;
+  unsigned i;
 
   if (edge->indirect_unknown_callee)
     streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
@@ -296,6 +297,25 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
       if (edge->indirect_info->common_target_id)
 	streamer_write_hwi_stream
 	   (ob->main_stream, edge->indirect_info->common_target_probability);
+
+      gcc_assert (edge->indirect_info->num_of_ics
+		  <= GCOV_ICALL_TOPN_NCOUNTS / 2);
+
+      streamer_write_hwi_stream (ob->main_stream,
+				 edge->indirect_info->num_of_ics);
+
+      if (edge->indirect_info->num_of_ics)
+	{
+	  for (i = 0; i < edge->indirect_info->num_of_ics; i++)
+	    {
+	      streamer_write_hwi_stream (
+		ob->main_stream, edge->indirect_info->common_target_ids[i]);
+	      if (edge->indirect_info->common_target_ids[i])
+		streamer_write_hwi_stream (
+		  ob->main_stream,
+		  edge->indirect_info->common_target_probabilities[i]);
+	    }
+	}
     }
 }
 
@@ -1438,6 +1458,7 @@ input_edge (struct lto_input_block *ib, vec<symtab_node *> nodes,
   cgraph_inline_failed_t inline_failed;
   struct bitpack_d bp;
   int ecf_flags = 0;
+  unsigned i;
 
   caller = dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]);
   if (caller == NULL || caller->decl == NULL_TREE)
@@ -1488,6 +1509,23 @@ input_edge (struct lto_input_block *ib, vec<symtab_node *> nodes,
       edge->indirect_info->common_target_id = streamer_read_hwi (ib);
       if (edge->indirect_info->common_target_id)
         edge->indirect_info->common_target_probability = streamer_read_hwi (ib);
+
+      edge->indirect_info->num_of_ics = streamer_read_hwi (ib);
+
+      gcc_assert (edge->indirect_info->num_of_ics
+		  <= GCOV_ICALL_TOPN_NCOUNTS / 2);
+
+      if (edge->indirect_info->num_of_ics)
+	{
+	  for (i = 0; i < edge->indirect_info->num_of_ics; i++)
+	    {
+	      edge->indirect_info->common_target_ids[i]
+		= streamer_read_hwi (ib);
+	      if (edge->indirect_info->common_target_ids[i])
+		edge->indirect_info->common_target_probabilities[i]
+		  = streamer_read_hwi (ib);
+	    }
+	}
     }
 }
 
diff --git a/gcc/predict.c b/gcc/predict.c
index 43ee91a5b13..b7f38891c72 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -763,7 +763,6 @@ dump_prediction (FILE *file, enum br_predictor predictor, int probability,
       && bb->count.precise_p ()
       && reason == REASON_NONE)
     {
-      gcc_assert (e->count ().precise_p ());
       fprintf (file, ";;heuristics;%s;%" PRId64 ";%" PRId64 ";%.1f;\n",
 	       predictor_info[predictor].name,
 	       bb->count.to_gcov_type (), e->count ().to_gcov_type (),
diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
new file mode 100644
index 00000000000..e0a83c2e067
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
@@ -0,0 +1,35 @@
+/* { dg-require-effective-target lto } */
+/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
+/* { dg-require-profiling "-fprofile-generate" } */
+/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */
+
+#include <stdio.h>
+
+typedef int (*fptr) (int);
+int
+one (int a);
+
+int
+two (int a);
+
+fptr table[] = {&one, &two};
+
+int
+main()
+{
+  int i, x;
+  fptr p = &one;
+
+  x = one (3);
+
+  for (i = 0; i < 350000000; i++)
+    {
+      x = (*p) (3);
+      p = table[x];
+    }
+  printf ("done:%d\n", x);
+}
+
+/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */
+/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
new file mode 100644
index 00000000000..a8c6e365fb9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
@@ -0,0 +1,22 @@
+/* It seems there is no way to avoid the other source of mulitple
+   source testcase from being compiled independently.  Just avoid
+   error.  */
+#ifdef DOJOB
+int
+one (int a)
+{
+  return 1;
+}
+
+int
+two (int a)
+{
+  return 0;
+}
+#else
+int
+main()
+{
+  return 0;
+}
+#endif
diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
new file mode 100644
index 00000000000..aa3887fde83
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target lto } */
+/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
+/* { dg-require-profiling "-fprofile-generate" } */
+/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */
+
+#include <stdio.h>
+
+typedef int (*fptr) (int);
+int
+one (int a);
+
+int
+two (int a);
+
+fptr table[] = {&one, &two};
+
+int foo ()
+{
+  int i, x;
+  fptr p = &one;
+
+  x = one (3);
+
+  for (i = 0; i < 350000000; i++)
+    {
+      x = (*p) (3);
+      p = table[x];
+    }
+  return x;
+}
+
+int
+main()
+{
+  int x = foo ();
+  printf ("done:%d\n", x);
+}
+
+/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */
+/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */
+
+
diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
new file mode 100644
index 00000000000..951bc7ddd19
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
@@ -0,0 +1,38 @@
+/* { dg-require-profiling "-fprofile-generate" } */
+/* { dg-options "-O2 -fdump-ipa-profile --param indir-call-topn-profile=1" } */
+
+#include <stdio.h>
+
+typedef int (*fptr) (int);
+int
+one (int a)
+{
+  return 1;
+}
+
+int
+two (int a)
+{
+  return 0;
+}
+
+fptr table[] = {&one, &two};
+
+int
+main()
+{
+  int i, x;
+  fptr p = &one;
+
+  one (3);
+
+  for (i = 0; i < 350000000; i++)
+    {
+      x = (*p) (3);
+      p = table[x];
+    }
+  printf ("done:%d\n", x);
+}
+
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile" } } */
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 9017da878b1..f69b31b197e 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -2028,43 +2028,66 @@ copy_bb (copy_body_data *id, basic_block bb,
 	      switch (id->transform_call_graph_edges)
 		{
 		case CB_CGE_DUPLICATE:
-		  edge = id->src_node->get_edge (orig_stmt);
-		  if (edge)
-		    {
-		      struct cgraph_edge *old_edge = edge;
-		      profile_count old_cnt = edge->count;
-		      edge = edge->clone (id->dst_node, call_stmt,
-					  gimple_uid (stmt),
-					  num, den,
-					  true);
-
-		      /* Speculative calls consist of two edges - direct and
-			 indirect.  Duplicate the whole thing and distribute
-			 frequencies accordingly.  */
-		      if (edge->speculative)
-			{
-			  struct cgraph_edge *direct, *indirect;
-			  struct ipa_ref *ref;
-
-			  gcc_assert (!edge->indirect_unknown_callee);
-			  old_edge->speculative_call_info (direct, indirect, ref);
-
-			  profile_count indir_cnt = indirect->count;
-			  indirect = indirect->clone (id->dst_node, call_stmt,
-						      gimple_uid (stmt),
-						      num, den,
-						      true);
-
-			  profile_probability prob
-			     = indir_cnt.probability_in (old_cnt + indir_cnt);
-			  indirect->count
-			     = copy_basic_block->count.apply_probability (prob);
-			  edge->count = copy_basic_block->count - indirect->count;
-			  id->dst_node->clone_reference (ref, stmt);
-			}
-		      else
-			edge->count = copy_basic_block->count;
-		    }
+		  {
+		    edge = id->src_node->get_edge (orig_stmt);
+		    struct cgraph_edge *old_edge = edge;
+		    struct cgraph_edge *direct, *indirect;
+		    bool next_speculative;
+		    do
+		      {
+			next_speculative = false;
+			if (edge)
+			  {
+			    profile_count old_cnt = edge->count;
+			    edge
+			      = edge->clone (id->dst_node, call_stmt,
+					     gimple_uid (stmt), num, den, true);
+
+			    /* Speculative calls consist of two edges - direct
+			       and indirect.  Duplicate the whole thing and
+			       distribute frequencies accordingly.  */
+			    if (edge->speculative)
+			      {
+				struct ipa_ref *ref;
+
+				gcc_assert (!edge->indirect_unknown_callee);
+				old_edge->speculative_call_info (direct,
+								 indirect, ref);
+
+				profile_count indir_cnt = indirect->count;
+				indirect
+				  = indirect->clone (id->dst_node, call_stmt,
+						     gimple_uid (stmt), num,
+						     den, true);
+
+				profile_probability prob
+				  = indir_cnt.probability_in (old_cnt
+							      + indir_cnt);
+				indirect->count
+				  = copy_basic_block->count.apply_probability (
+				    prob);
+				edge->count
+				  = copy_basic_block->count - indirect->count;
+				id->dst_node->clone_reference (ref, stmt);
+			      }
+			    else
+			      edge->count = copy_basic_block->count;
+			  }
+			/* If the indirect call contains more than one indirect
+			   targets, need clone all speculative edges here.  */
+			if (old_edge && old_edge->next_callee
+			    && old_edge->speculative && indirect
+			    && indirect->indirect_info
+			    && indirect->indirect_info->num_of_ics > 1)
+			  {
+			    edge = old_edge->next_callee;
+			    old_edge = old_edge->next_callee;
+			    if (edge->speculative)
+			      next_speculative = true;
+			  }
+		      }
+		    while (next_speculative);
+		  }
 		  break;
 
 		case CB_CGE_MOVE_CLONES:
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 1c3034aac10..4964dbdebb5 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -74,8 +74,8 @@ static GTY(()) tree ic_tuple_callee_field;
 /* Do initialization work for the edge profiler.  */
 
 /* Add code:
-   __thread gcov*	__gcov_indirect_call_counters; // pointer to actual counter
-   __thread void*	__gcov_indirect_call_callee; // actual callee address
+   __thread gcov*	__gcov_indirect_call.counters; // pointer to actual counter
+   __thread void*	__gcov_indirect_call.callee; // actual callee address
    __thread int __gcov_function_counter; // time profiler function counter
 */
 static void
@@ -395,7 +395,7 @@ gimple_gen_ic_profiler (histogram_value value, unsigned tag, unsigned base)
       f_1 = foo;
       __gcov_indirect_call.counters = &__gcov4.main[0];
       PROF_9 = f_1;
-      __gcov_indirect_call_callee = PROF_9;
+      __gcov_indirect_call.callee = PROF_9;
       _4 = f_1 ();
    */
 
@@ -458,11 +458,11 @@ gimple_gen_ic_func_profiler (void)
 
   /* Insert code:
 
-     if (__gcov_indirect_call_callee != NULL)
+     if (__gcov_indirect_call.callee != NULL)
        __gcov_indirect_call_profiler_v3 (profile_id, &current_function_decl);
 
      The function __gcov_indirect_call_profiler_v3 is responsible for
-     resetting __gcov_indirect_call_callee to NULL.  */
+     resetting __gcov_indirect_call.callee to NULL.  */
 
   gimple_stmt_iterator gsi = gsi_start_bb (cond_bb);
   void0 = build_int_cst (ptr_type_node, 0);
@@ -904,7 +904,7 @@ pass_ipa_tree_profile::gate (function *)
 {
   /* When profile instrumentation, use or test coverage shall be performed.
      But for AutoFDO, this there is no instrumentation, thus this pass is
-     diabled.  */
+     disabled.  */
   return (!in_lto_p && !flag_auto_profile
 	  && (flag_branch_probabilities || flag_test_coverage
 	      || profile_arc_flag));
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index 5013956cf86..4869ab8ccd6 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -579,8 +579,8 @@ free_histograms (struct function *fn)
    somehow.  */
 
 static bool
-check_counter (gimple *stmt, const char * name,
-	       gcov_type *count, gcov_type *all, profile_count bb_count_d)
+check_counter (gimple *stmt, const char *name, gcov_type *count, gcov_type *all,
+	       profile_count bb_count_d, float ratio = 1.0f)
 {
   gcov_type bb_count = bb_count_d.ipa ().to_gcov_type ();
   if (*all != bb_count || *count > *all)
@@ -599,7 +599,7 @@ check_counter (gimple *stmt, const char * name,
                              "count (%d)\n", name, (int)*all, (int)bb_count);
 	  *all = bb_count;
 	  if (*count > *all)
-            *count = *all;
+	    *count = *all * ratio;
 	  return false;
 	}
       else
@@ -1410,9 +1410,132 @@ gimple_ic (gcall *icall_stmt, struct cgraph_node *direct_call,
   return dcall_stmt;
 }
 
+/* If --param=indir-call-topn-profile=1 is specified when compiling, there maybe
+   multiple indirect targets in histogram.  Check every indirect/virtual call
+   if callee function exists, if not exit, leave it to LTO stage for later
+   process.  Modify code of this indirect call to an if-else structure in
+   ipa-profile finally.  */
+static bool
+ic_transform_topn (gimple_stmt_iterator *gsi)
+{
+  unsigned j;
+  gcall *stmt;
+  histogram_value histogram;
+  gcov_type val, count, count_all, all, bb_all;
+  struct cgraph_node *d_call;
+  profile_count bb_count;
+
+  stmt = dyn_cast<gcall *> (gsi_stmt (*gsi));
+  if (!stmt)
+    return false;
+
+  if (gimple_call_fndecl (stmt) != NULL_TREE)
+    return false;
+
+  if (gimple_call_internal_p (stmt))
+    return false;
+
+  histogram
+    = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL_TOPN);
+  if (!histogram)
+    return false;
+
+  count = 0;
+  all = 0;
+  bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
+  bb_count = gimple_bb (stmt)->count;
+
+  /* n_counters need be odd to avoid access violation.  */
+  gcc_assert (histogram->n_counters % 2 == 1);
+
+  /* For indirect call topn, accumulate all the counts first.  */
+  for (j = 1; j < histogram->n_counters; j += 2)
+    {
+      val = histogram->hvalue.counters[j];
+      count = histogram->hvalue.counters[j + 1];
+      if (val)
+	all += count;
+    }
+
+  count_all = all;
+  /* Do the indirect call conversion if function body exists, or else leave it
+     to LTO stage.  */
+  for (j = 1; j < histogram->n_counters; j += 2)
+    {
+      val = histogram->hvalue.counters[j];
+      count = histogram->hvalue.counters[j + 1];
+      if (val)
+	{
+	  /* The order of CHECK_COUNTER calls is important
+	     since check_counter can correct the third parameter
+	     and we want to make count <= all <= bb_count.  */
+	  if (check_counter (stmt, "ic", &all, &bb_all, bb_count)
+	      || check_counter (stmt, "ic", &count, &all,
+				profile_count::from_gcov_type (all),
+				(float) count / count_all))
+	    {
+	      gimple_remove_histogram_value (cfun, stmt, histogram);
+	      return false;
+	    }
+
+	  d_call = find_func_by_profile_id ((int) val);
+
+	  if (d_call == NULL)
+	    {
+	      if (val)
+		{
+		  if (dump_file)
+		    {
+		      fprintf (
+			dump_file,
+			"Indirect call -> direct call from other module");
+		      print_generic_expr (dump_file, gimple_call_fn (stmt),
+					  TDF_SLIM);
+		      fprintf (dump_file,
+			       "=> %i (will resolve only with LTO)\n",
+			       (int) val);
+		    }
+		}
+	      return false;
+	    }
+
+	  if (!check_ic_target (stmt, d_call))
+	    {
+	      if (dump_file)
+		{
+		  fprintf (dump_file, "Indirect call -> direct call ");
+		  print_generic_expr (dump_file, gimple_call_fn (stmt),
+				      TDF_SLIM);
+		  fprintf (dump_file, "=> ");
+		  print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
+		  fprintf (dump_file,
+			   " transformation skipped because of type mismatch");
+		  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+		}
+	      gimple_remove_histogram_value (cfun, stmt, histogram);
+	      return false;
+	    }
+
+	  if (dump_file)
+	  {
+	    fprintf (dump_file, "Indirect call -> direct call ");
+	    print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
+	    fprintf (dump_file, "=> ");
+	    print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
+	    fprintf (dump_file,
+		     " transformation on insn postponed to ipa-profile");
+	    print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+	    fprintf (dump_file, "hist->count %" PRId64
+		" hist->all %" PRId64"\n", count, all);
+	  }
+	}
+    }
+
+  return true;
+}
 /*
   For every checked indirect/virtual call determine if most common pid of
-  function/class method has probability more than 50%. If yes modify code of
+  function/class method has probability more than 50%.  If yes modify code of
   this call to:
  */
 
@@ -1423,6 +1546,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
   histogram_value histogram;
   gcov_type val, count, all, bb_all;
   struct cgraph_node *direct_call;
+  enum hist_type type;
 
   stmt = dyn_cast <gcall *> (gsi_stmt (*gsi));
   if (!stmt)
@@ -1434,18 +1558,24 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
   if (gimple_call_internal_p (stmt))
     return false;
 
-  histogram = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL);
+  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN
+						     : HIST_TYPE_INDIR_CALL;
+
+  histogram = gimple_histogram_value_of_type (cfun, stmt, type);
   if (!histogram)
     return false;
 
+  if (type == HIST_TYPE_INDIR_CALL_TOPN)
+      return ic_transform_topn (gsi);
+
   val = histogram->hvalue.counters [0];
   count = histogram->hvalue.counters [1];
   all = histogram->hvalue.counters [2];
 
   bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
-  /* The order of CHECK_COUNTER calls is important -
+  /* The order of CHECK_COUNTER calls is important
      since check_counter can correct the third parameter
-     and we want to make count <= all <= bb_all. */
+     and we want to make count <= all <= bb_all.  */
   if (check_counter (stmt, "ic", &all, &bb_all, gimple_bb (stmt)->count)
       || check_counter (stmt, "ic", &count, &all,
 		        profile_count::from_gcov_type (all)))
@@ -1494,7 +1624,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
       print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
       fprintf (dump_file, "=> ");
       print_generic_expr (dump_file, direct_call->decl, TDF_SLIM);
-      fprintf (dump_file, " transformation on insn postponned to ipa-profile");
+      fprintf (dump_file, " transformation on insn postponed to ipa-profile");
       print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
       fprintf (dump_file, "hist->count %" PRId64
 	       " hist->all %" PRId64"\n", count, all);
-- 
2.21.0.777.g83232e3864

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-18  1:46 [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization Xiong Hu Luo
@ 2019-06-18  5:51 ` Martin Liška
  2019-06-18  9:03   ` luoxhu
  2019-06-18 10:21 ` Martin Liška
  2019-06-20 13:47 ` Jan Hubicka
  2 siblings, 1 reply; 25+ messages in thread
From: Martin Liška @ 2019-06-18  5:51 UTC (permalink / raw)
  To: Xiong Hu Luo, gcc-patches; +Cc: hubicka, segher, wschmidt, luoxhu

On 6/18/19 3:45 AM, Xiong Hu Luo wrote:

Hello.

Thank you for the interest in the area.

> This patch aims to fix PR69678 caused by PGO indirect call profiling bugs.
> Currently the default instrument function can only find the indirect function
> that called more than 50% with an incorrect count number returned.

Can you please explain what you mean by 'an incorrect count number returned'?

>  This patch
> leverages the "--param indir-call-topn-profile=1" and enables multiple indirect

Note that I've remove indir-call-topn-profile last week, the patch will not apply
on current trunk. However, I can help you how to adapt single-value counters
to support tracking of multiple values.

> targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, function
> specialization, profiling, partial devirtualization, inlining and cloning could
> be done successfully based on it.

This decision is definitely big question for Honza?

> Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests.
> Details are:
>   1.  When do PGO with indir-call-topn-profile, the gcda data format is not
>   supported in ipa-profile pass,

If you take a look at gcc/ipa-profile.c:195 you can see how the probability
is propagated to IPA passes. Why is that not sufficient?

Martin

> so add variables to pass the information
>   through passes, and postpone gimple_ic to ipa-profile like default as inline
>   pass will decide whether it is benefit to transform indirect call.
>   2.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
>   profile full support in ipa passes and cgraph_edge functions.
>   3.  Fix various hidden speculative call ICEs exposed after enabling this
>   feature when running SPEC2017.
>   4.  Add 1 in module testcase and 2 cross module testcases.
>   5.  TODOs:
>     5.1.  Some reference info will be dropped from WPA to LTRANS, so
>     reference check will be difficult in LTRANS, need replace the strstr
>     with reference compare.
>     5.2.  Some duplicate code need be removed as top1 and topn share same logic.
>     Actually top1 related logic could be eliminated totally as topn includes it.
>     5.3.  Split patch maybe needed as too big but not sure how many would be
>     reasonable.
>   6.  Performance result for ppc64le:
>     6.1.  Representative test: indir-call-prof-topn.c runtime improved from
>     1.7s to 0.4s.
>     6.2.  SPEC2017 peakrate:
>         523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r (+13.33%);
>         525.x264_r (-5.29%).
>         No big changes of other benchmarks.
>         Option: -Ofast -mcpu=power8
>         PASS1_OPTIMIZE: -fprofile-generate --param indir-call-topn-profile=1 -flto
>         PASS2_OPTIMIZE: -fprofile-use --param indir-call-topn-profile=1 -flto
>         -fprofile-correction
>     6.3.  No performance change on PHP benchmark.
>   7.  Bootstrap and regression test passed on Power8-LE.
> 
> gcc/ChangeLog
> 
> 	2019-06-17  Xiong Hu Luo  <luoxhu@linux.ibm.com>
> 
> 	PR ipa/69678
> 	* cgraph.c (cgraph_node::get_create): Copy profile_id.
> 	(cgraph_edge::speculative_call_info): Find real
> 	reference for indirect targets.
> 	(cgraph_edge::resolve_speculation): Add speculative code process
> 	for indirect targets.
> 	(cgraph_edge::redirect_call_stmt_to_callee): Likewise.
> 	(cgraph_node::verify_node): Likewise.
> 	* cgraph.h (common_target_ids): New variable.
> 	(common_target_probabilities): Likewise.
> 	(num_of_ics): Likewise.
> 	* cgraphclones.c (cgraph_node::create_clone): Copy profile_id.
> 	* ipa-inline.c (inline_small_functions): Add iterator update.
> 	* ipa-profile.c (ipa_profile_generate_summary): Add indirect
> 	multiple targets logic.
> 	(ipa_profile): Likewise.
> 	* ipa-utils.c (ipa_merge_profiles): Clone speculative src's
> 	referrings to dst.
> 	* ipa.c (process_references): Fix typo.
> 	* lto-cgraph.c (lto_output_edge): Add indirect multiple targets
> 	logic.
> 	(input_edge): Likewise.
> 	* predict.c (dump_prediction): Revome edges count assert to be
> 	precise.
> 	* tree-profile.c (gimple_gen_ic_profiler): Use the new variable
> 	__gcov_indirect_call.counters and __gcov_indirect_call.callee.
> 	(gimple_gen_ic_func_profiler): Likewise.
> 	(pass_ipa_tree_profile::gate): Fix comment typos.
> 	* tree-inline.c (copy_bb): Duplicate all the speculative edges
> 	if indirect call contains multiple speculative targets.
> 	* value-prof.c (check_counter): Proportion the counter for
> 	multiple targets.
> 	(ic_transform_topn): New function.
> 	(gimple_ic_transform): Handle topn case, fix comment typos.
> 
> gcc/testsuite/ChangeLog
> 
> 	2019-06-17  Xiong Hu Luo  <luoxhu@linux.ibm.com>
> 
> 	PR ipa/69678
> 	* gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase.
> 	* gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase.
> 	* gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase.
> 	* gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase.
> ---
>  gcc/cgraph.c                                  |  38 +++-
>  gcc/cgraph.h                                  |   9 +-
>  gcc/cgraphclones.c                            |   1 +
>  gcc/ipa-inline.c                              |   3 +
>  gcc/ipa-profile.c                             | 185 +++++++++++++++++-
>  gcc/ipa-utils.c                               |   5 +
>  gcc/ipa.c                                     |   2 +-
>  gcc/lto-cgraph.c                              |  38 ++++
>  gcc/predict.c                                 |   1 -
>  .../tree-prof/crossmodule-indir-call-topn-1.c |  35 ++++
>  .../crossmodule-indir-call-topn-1a.c          |  22 +++
>  .../tree-prof/crossmodule-indir-call-topn-2.c |  42 ++++
>  .../gcc.dg/tree-prof/indir-call-prof-topn.c   |  38 ++++
>  gcc/tree-inline.c                             |  97 +++++----
>  gcc/tree-profile.c                            |  12 +-
>  gcc/value-prof.c                              | 146 +++++++++++++-
>  16 files changed, 606 insertions(+), 68 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
> 
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index de82316d4b1..0d373a67d1b 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
>  	fprintf (dump_file, "Introduced new external node "
>  		 "(%s) and turned into root of the clone tree.\n",
>  		 node->dump_name ());
> +      node->profile_id = first_clone->profile_id;
>      }
>    else if (dump_file)
>      fprintf (dump_file, "Introduced new external node "
> @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
>    int i;
>    cgraph_edge *e2;
>    cgraph_edge *e = this;
> +  cgraph_node *referred_node;
>  
>    if (!e->indirect_unknown_callee)
>      for (e2 = e->caller->indirect_calls;
> @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
>  	&& ((ref->stmt && ref->stmt == e->call_stmt)
>  	    || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
>        {
> -	reference = ref;
> -	break;
> +	if (e2->indirect_info && e2->indirect_info->num_of_ics)
> +	  {
> +	    referred_node = dyn_cast<cgraph_node *> (ref->referred);
> +	    if (strstr (e->callee->name (), referred_node->name ()))
> +	      {
> +		reference = ref;
> +		break;
> +	      }
> +	  }
> +	else
> +	  {
> +	    reference = ref;
> +	    break;
> +	  }
>        }
>  
>    /* Speculative edge always consist of all three components - direct edge,
> @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
>           in the functions inlined through it.  */
>      }
>    edge->count += e2->count;
> -  edge->speculative = false;
> +  if (edge->indirect_info && edge->indirect_info->num_of_ics)
> +    {
> +      edge->indirect_info->num_of_ics--;
> +      if (edge->indirect_info->num_of_ics == 0)
> +	edge->speculative = false;
> +    }
> +  else
> +    edge->speculative = false;
>    e2->speculative = false;
>    ref->remove_reference ();
>    if (e2->indirect_unknown_callee || e2->inline_failed)
> @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>  	  e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
>  						     false);
>  	  e->count = gimple_bb (e->call_stmt)->count;
> -	  e2->speculative = false;
> +	  if (e2->indirect_info && e2->indirect_info->num_of_ics)
> +	    {
> +	      e2->indirect_info->num_of_ics--;
> +	      if (e2->indirect_info->num_of_ics == 0)
> +		e2->speculative = false;
> +	    }
> +	  else
> +	    e2->speculative = false;
>  	  e2->count = gimple_bb (e2->call_stmt)->count;
>  	  ref->speculative = false;
>  	  ref->stmt = NULL;
> @@ -3407,7 +3435,7 @@ cgraph_node::verify_node (void)
>  
>        for (e = callees; e; e = e->next_callee)
>  	{
> -	  if (!e->aux)
> +	  if (!e->aux && !e->speculative)
>  	    {
>  	      error ("edge %s->%s has no corresponding call_stmt",
>  		     identifier_to_locale (e->caller->name ()),
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index c294602d762..ed0fbc60432 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "profile-count.h"
>  #include "ipa-ref.h"
>  #include "plugin-api.h"
> +#include "gcov-io.h"
>  
>  extern void debuginfo_early_init (void);
>  extern void debuginfo_init (void);
> @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info
>    int param_index;
>    /* ECF flags determined from the caller.  */
>    int ecf_flags;
> -  /* Profile_id of common target obtrained from profile.  */
> +  /* Profile_id of common target obtained from profile.  */
>    int common_target_id;
>    /* Probability that call will land in function with COMMON_TARGET_ID.  */
>    int common_target_probability;
>  
> +  /* Profile_id of common target obtained from profile.  */
> +  int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2];
> +  /* Probabilities that call will land in function with COMMON_TARGET_IDS.  */
> +  int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2];
> +  unsigned num_of_ics;
> +
>    /* Set when the call is a virtual call with the parameter being the
>       associated object pointer rather than a simple direct call.  */
>    unsigned polymorphic : 1;
> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
> index 15f7e119d18..94f424bc10c 100644
> --- a/gcc/cgraphclones.c
> +++ b/gcc/cgraphclones.c
> @@ -467,6 +467,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
>    new_node->icf_merged = icf_merged;
>    new_node->merged_comdat = merged_comdat;
>    new_node->thunk = thunk;
> +  new_node->profile_id = profile_id;
>  
>    new_node->clone.tree_map = NULL;
>    new_node->clone.args_to_skip = args_to_skip;
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index 360c3de3289..ef2b217b3f9 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -1866,12 +1866,15 @@ inline_small_functions (void)
>  	}
>        if (has_speculative)
>  	for (edge = node->callees; edge; edge = next)
> +	{
> +	  next = edge->next_callee;
>  	  if (edge->speculative && !speculation_useful_p (edge,
>  							  edge->aux != NULL))
>  	    {
>  	      edge->resolve_speculation ();
>  	      update = true;
>  	    }
> +	}
>        if (update)
>  	{
>  	  struct cgraph_node *where = node->global.inlined_to
> diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
> index de9563d808c..d04476295a0 100644
> --- a/gcc/ipa-profile.c
> +++ b/gcc/ipa-profile.c
> @@ -168,6 +168,10 @@ ipa_profile_generate_summary (void)
>    struct cgraph_node *node;
>    gimple_stmt_iterator gsi;
>    basic_block bb;
> +  enum hist_type type;
> +
> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN
> +						     : HIST_TYPE_INDIR_CALL;
>  
>    hash_table<histogram_hash> hashtable (10);
>    
> @@ -186,10 +190,10 @@ ipa_profile_generate_summary (void)
>  		  histogram_value h;
>  		  h = gimple_histogram_value_of_type
>  			(DECL_STRUCT_FUNCTION (node->decl),
> -			 stmt, HIST_TYPE_INDIR_CALL);
> +			 stmt, type);
>  		  /* No need to do sanity check: gimple_ic_transform already
>  		     takes away bad histograms.  */
> -		  if (h)
> +		  if (h && type == HIST_TYPE_INDIR_CALL)
>  		    {
>  		      /* counter 0 is target, counter 1 is number of execution we called target,
>  			 counter 2 is total number of executions.  */
> @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void)
>  		      gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (node->decl),
>  						      stmt, h);
>  		    }
> +		  else if (h && type == HIST_TYPE_INDIR_CALL_TOPN)
> +		    {
> +		      unsigned j;
> +		      struct cgraph_edge *e = node->get_edge (stmt);
> +		      if (e && !e->indirect_unknown_callee)
> +			continue;
> +
> +		      e->indirect_info->num_of_ics = 0;
> +		      for (j = 1; j < h->n_counters; j += 2)
> +			{
> +			  if (h->hvalue.counters[j] == 0)
> +			    continue;
> +
> +			  e->indirect_info->common_target_ids[j / 2]
> +			    = h->hvalue.counters[j];
> +			  e->indirect_info->common_target_probabilities[j / 2]
> +			    = GCOV_COMPUTE_SCALE (
> +			      h->hvalue.counters[j + 1],
> +			      gimple_bb (stmt)->count.ipa ().to_gcov_type ());
> +			  if (e->indirect_info
> +				->common_target_probabilities[j / 2]
> +			      > REG_BR_PROB_BASE)
> +			    {
> +			      if (dump_file)
> +				fprintf (dump_file,
> +					 "Probability capped to 1\n");
> +			      e->indirect_info
> +				->common_target_probabilities[j / 2]
> +				= REG_BR_PROB_BASE;
> +			    }
> +			  e->indirect_info->num_of_ics++;
> +			}
> +
> +		      gcc_assert (e->indirect_info->num_of_ics
> +				  <= GCOV_ICALL_TOPN_NCOUNTS / 2);
> +
> +		      gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (
> +						       node->decl),
> +						     stmt, h);
> +		    }
>  		}
>  	      time += estimate_num_insns (stmt, &eni_time_weights);
>  	      size += estimate_num_insns (stmt, &eni_size_weights);
> @@ -492,6 +536,7 @@ ipa_profile (void)
>    int nindirect = 0, ncommon = 0, nunknown = 0, nuseless = 0, nconverted = 0;
>    int nmismatch = 0, nimpossible = 0;
>    bool node_map_initialized = false;
> +  gcov_type threshold;
>  
>    if (dump_file)
>      dump_histogram (dump_file, histogram);
> @@ -500,14 +545,12 @@ ipa_profile (void)
>        overall_time += histogram[i]->count * histogram[i]->time;
>        overall_size += histogram[i]->size;
>      }
> +  threshold = 0;
>    if (overall_time)
>      {
> -      gcov_type threshold;
> -
>        gcc_assert (overall_size);
>  
>        cutoff = (overall_time * PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE) + 500) / 1000;
> -      threshold = 0;
>        for (i = 0; cumulated < cutoff; i++)
>  	{
>  	  cumulated += histogram[i]->count * histogram[i]->time;
> @@ -543,7 +586,7 @@ ipa_profile (void)
>    histogram.release ();
>    histogram_pool.release ();
>  
> -  /* Produce speculative calls: we saved common traget from porfiling into
> +  /* Produce speculative calls: we saved common target from profiling into
>       e->common_target_id.  Now, at link time, we can look up corresponding
>       function node and produce speculative call.  */
>  
> @@ -558,7 +601,8 @@ ipa_profile (void)
>  	{
>  	  if (n->count.initialized_p ())
>  	    nindirect++;
> -	  if (e->indirect_info->common_target_id)
> +	  if (e->indirect_info->common_target_id
> +	      || (e->indirect_info && e->indirect_info->num_of_ics == 1))
>  	    {
>  	      if (!node_map_initialized)
>  	        init_node_map (false);
> @@ -613,7 +657,7 @@ ipa_profile (void)
>  		      if (dump_file)
>  			fprintf (dump_file,
>  				 "Not speculating: "
> -				 "parameter count mistmatch\n");
> +				 "parameter count mismatch\n");
>  		    }
>  		  else if (e->indirect_info->polymorphic
>  			   && !opt_for_fn (n->decl, flag_devirtualize)
> @@ -655,7 +699,130 @@ ipa_profile (void)
>  		  nunknown++;
>  		}
>  	    }
> -	 }
> +	  if (e->indirect_info && e->indirect_info->num_of_ics > 1)
> +	    {
> +	      if (in_lto_p)
> +		{
> +		  if (dump_file)
> +		    {
> +		      fprintf (dump_file,
> +			       "Updating hotness threshold in LTO mode.\n");
> +		      fprintf (dump_file, "Updated min count: %" PRId64 "\n",
> +			       (int64_t) threshold);
> +		    }
> +		  set_hot_bb_threshold (threshold
> +					/ e->indirect_info->num_of_ics);
> +		}
> +	      if (!node_map_initialized)
> +		init_node_map (false);
> +	      node_map_initialized = true;
> +	      ncommon++;
> +	      unsigned speculative = 0;
> +	      for (i = 0; i < (int)e->indirect_info->num_of_ics; i++)
> +		{
> +		  n2 = find_func_by_profile_id (
> +		    e->indirect_info->common_target_ids[i]);
> +		  if (n2)
> +		    {
> +		      if (dump_file)
> +			{
> +			  fprintf (
> +			    dump_file,
> +			    "Indirect call -> direct call from"
> +			    " other module %s => %s, prob %3.2f\n",
> +			    n->dump_name (), n2->dump_name (),
> +			    e->indirect_info->common_target_probabilities[i]
> +			      / (float) REG_BR_PROB_BASE);
> +			}
> +		      if (e->indirect_info->common_target_probabilities[i]
> +			  < REG_BR_PROB_BASE / 2)
> +			{
> +			  nuseless++;
> +			  if (dump_file)
> +			    fprintf (
> +			      dump_file,
> +			      "Not speculating: probability is too low.\n");
> +			}
> +		      else if (!e->maybe_hot_p ())
> +			{
> +			  nuseless++;
> +			  if (dump_file)
> +			    fprintf (dump_file,
> +				     "Not speculating: call is cold.\n");
> +			}
> +		      else if (n2->get_availability () <= AVAIL_INTERPOSABLE
> +			       && n2->can_be_discarded_p ())
> +			{
> +			  nuseless++;
> +			  if (dump_file)
> +			    fprintf (dump_file,
> +				     "Not speculating: target is overwritable "
> +				     "and can be discarded.\n");
> +			}
> +		      else if (ipa_node_params_sum && ipa_edge_args_sum
> +			       && (!vec_safe_is_empty (
> +				 IPA_NODE_REF (n2)->descriptors))
> +			       && ipa_get_param_count (IPA_NODE_REF (n2))
> +				    != ipa_get_cs_argument_count (
> +				      IPA_EDGE_REF (e))
> +			       && (ipa_get_param_count (IPA_NODE_REF (n2))
> +				     >= ipa_get_cs_argument_count (
> +				       IPA_EDGE_REF (e))
> +				   || !stdarg_p (TREE_TYPE (n2->decl))))
> +			{
> +			  nmismatch++;
> +			  if (dump_file)
> +			    fprintf (dump_file, "Not speculating: "
> +						"parameter count mismatch\n");
> +			}
> +		      else if (e->indirect_info->polymorphic
> +			       && !opt_for_fn (n->decl, flag_devirtualize)
> +			       && !possible_polymorphic_call_target_p (e, n2))
> +			{
> +			  nimpossible++;
> +			  if (dump_file)
> +			    fprintf (dump_file,
> +				     "Not speculating: "
> +				     "function is not in the polymorphic "
> +				     "call target list\n");
> +			}
> +		      else
> +			{
> +			  /* Target may be overwritable, but profile says that
> +			     control flow goes to this particular implementation
> +			     of N2.  Speculate on the local alias to allow
> +			     inlining.
> +			     */
> +			  if (!n2->can_be_discarded_p ())
> +			    {
> +			      cgraph_node *alias;
> +			      alias = dyn_cast<cgraph_node *> (
> +				n2->noninterposable_alias ());
> +			      if (alias)
> +				n2 = alias;
> +			    }
> +			  nconverted++;
> +			  e->make_speculative (
> +			    n2, e->count.apply_probability (
> +				  e->indirect_info
> +				    ->common_target_probabilities[i]));
> +			  update = true;
> +			  speculative++;
> +			}
> +		    }
> +		  else
> +		    {
> +		      if (dump_file)
> +			fprintf (dump_file,
> +				 "Function with profile-id %i not found.\n",
> +				 e->indirect_info->common_target_ids[i]);
> +		      nunknown++;
> +		    }
> +		}
> +	      if (speculative < e->indirect_info->num_of_ics)
> +		e->indirect_info->num_of_ics = speculative;
> +	    }
> +	}
>         if (update)
>  	 ipa_update_overall_fn_summary (n);
>       }
> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
> index 79b250c3943..30347691029 100644
> --- a/gcc/ipa-utils.c
> +++ b/gcc/ipa-utils.c
> @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst,
>        update_max_bb_count ();
>        compute_function_frequency ();
>        pop_cfun ();
> +      /* When src is speculative, clone the referrings.  */
> +      if (src->indirect_call_target)
> +	for (e = src->callers; e; e = e->next_caller)
> +	  if (e->callee == src && e->speculative)
> +	    dst->clone_referring (src);
>        for (e = dst->callees; e; e = e->next_callee)
>  	{
>  	  if (e->speculative)
> diff --git a/gcc/ipa.c b/gcc/ipa.c
> index 2496694124c..c1fe081a72d 100644
> --- a/gcc/ipa.c
> +++ b/gcc/ipa.c
> @@ -166,7 +166,7 @@ process_references (symtab_node *snode,
>     devirtualization happens.  After inlining still keep their declarations
>     around, so we can devirtualize to a direct call.
>  
> -   Also try to make trivial devirutalization when no or only one target is
> +   Also try to make trivial devirtualization when no or only one target is
>     possible.  */
>  
>  static void
> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> index 4dfa2862be3..0c8f547d44e 100644
> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -238,6 +238,7 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
>    unsigned int uid;
>    intptr_t ref;
>    struct bitpack_d bp;
> +  unsigned i;
>  
>    if (edge->indirect_unknown_callee)
>      streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
> @@ -296,6 +297,25 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
>        if (edge->indirect_info->common_target_id)
>  	streamer_write_hwi_stream
>  	   (ob->main_stream, edge->indirect_info->common_target_probability);
> +
> +      gcc_assert (edge->indirect_info->num_of_ics
> +		  <= GCOV_ICALL_TOPN_NCOUNTS / 2);
> +
> +      streamer_write_hwi_stream (ob->main_stream,
> +				 edge->indirect_info->num_of_ics);
> +
> +      if (edge->indirect_info->num_of_ics)
> +	{
> +	  for (i = 0; i < edge->indirect_info->num_of_ics; i++)
> +	    {
> +	      streamer_write_hwi_stream (
> +		ob->main_stream, edge->indirect_info->common_target_ids[i]);
> +	      if (edge->indirect_info->common_target_ids[i])
> +		streamer_write_hwi_stream (
> +		  ob->main_stream,
> +		  edge->indirect_info->common_target_probabilities[i]);
> +	    }
> +	}
>      }
>  }
>  
> @@ -1438,6 +1458,7 @@ input_edge (struct lto_input_block *ib, vec<symtab_node *> nodes,
>    cgraph_inline_failed_t inline_failed;
>    struct bitpack_d bp;
>    int ecf_flags = 0;
> +  unsigned i;
>  
>    caller = dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]);
>    if (caller == NULL || caller->decl == NULL_TREE)
> @@ -1488,6 +1509,23 @@ input_edge (struct lto_input_block *ib, vec<symtab_node *> nodes,
>        edge->indirect_info->common_target_id = streamer_read_hwi (ib);
>        if (edge->indirect_info->common_target_id)
>          edge->indirect_info->common_target_probability = streamer_read_hwi (ib);
> +
> +      edge->indirect_info->num_of_ics = streamer_read_hwi (ib);
> +
> +      gcc_assert (edge->indirect_info->num_of_ics
> +		  <= GCOV_ICALL_TOPN_NCOUNTS / 2);
> +
> +      if (edge->indirect_info->num_of_ics)
> +	{
> +	  for (i = 0; i < edge->indirect_info->num_of_ics; i++)
> +	    {
> +	      edge->indirect_info->common_target_ids[i]
> +		= streamer_read_hwi (ib);
> +	      if (edge->indirect_info->common_target_ids[i])
> +		edge->indirect_info->common_target_probabilities[i]
> +		  = streamer_read_hwi (ib);
> +	    }
> +	}
>      }
>  }
>  
> diff --git a/gcc/predict.c b/gcc/predict.c
> index 43ee91a5b13..b7f38891c72 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -763,7 +763,6 @@ dump_prediction (FILE *file, enum br_predictor predictor, int probability,
>        && bb->count.precise_p ()
>        && reason == REASON_NONE)
>      {
> -      gcc_assert (e->count ().precise_p ());
>        fprintf (file, ";;heuristics;%s;%" PRId64 ";%" PRId64 ";%.1f;\n",
>  	       predictor_info[predictor].name,
>  	       bb->count.to_gcov_type (), e->count ().to_gcov_type (),
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
> new file mode 100644
> index 00000000000..e0a83c2e067
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
> @@ -0,0 +1,35 @@
> +/* { dg-require-effective-target lto } */
> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */
> +
> +#include <stdio.h>
> +
> +typedef int (*fptr) (int);
> +int
> +one (int a);
> +
> +int
> +two (int a);
> +
> +fptr table[] = {&one, &two};
> +
> +int
> +main()
> +{
> +  int i, x;
> +  fptr p = &one;
> +
> +  x = one (3);
> +
> +  for (i = 0; i < 350000000; i++)
> +    {
> +      x = (*p) (3);
> +      p = table[x];
> +    }
> +  printf ("done:%d\n", x);
> +}
> +
> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */
> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
> new file mode 100644
> index 00000000000..a8c6e365fb9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
> @@ -0,0 +1,22 @@
> +/* It seems there is no way to avoid the other source of mulitple
> +   source testcase from being compiled independently.  Just avoid
> +   error.  */
> +#ifdef DOJOB
> +int
> +one (int a)
> +{
> +  return 1;
> +}
> +
> +int
> +two (int a)
> +{
> +  return 0;
> +}
> +#else
> +int
> +main()
> +{
> +  return 0;
> +}
> +#endif
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
> new file mode 100644
> index 00000000000..aa3887fde83
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
> @@ -0,0 +1,42 @@
> +/* { dg-require-effective-target lto } */
> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */
> +
> +#include <stdio.h>
> +
> +typedef int (*fptr) (int);
> +int
> +one (int a);
> +
> +int
> +two (int a);
> +
> +fptr table[] = {&one, &two};
> +
> +int foo ()
> +{
> +  int i, x;
> +  fptr p = &one;
> +
> +  x = one (3);
> +
> +  for (i = 0; i < 350000000; i++)
> +    {
> +      x = (*p) (3);
> +      p = table[x];
> +    }
> +  return x;
> +}
> +
> +int
> +main()
> +{
> +  int x = foo ();
> +  printf ("done:%d\n", x);
> +}
> +
> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */
> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */
> +
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
> new file mode 100644
> index 00000000000..951bc7ddd19
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
> @@ -0,0 +1,38 @@
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -fdump-ipa-profile --param indir-call-topn-profile=1" } */
> +
> +#include <stdio.h>
> +
> +typedef int (*fptr) (int);
> +int
> +one (int a)
> +{
> +  return 1;
> +}
> +
> +int
> +two (int a)
> +{
> +  return 0;
> +}
> +
> +fptr table[] = {&one, &two};
> +
> +int
> +main()
> +{
> +  int i, x;
> +  fptr p = &one;
> +
> +  one (3);
> +
> +  for (i = 0; i < 350000000; i++)
> +    {
> +      x = (*p) (3);
> +      p = table[x];
> +    }
> +  printf ("done:%d\n", x);
> +}
> +
> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile" } } */
> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile" } } */
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index 9017da878b1..f69b31b197e 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -2028,43 +2028,66 @@ copy_bb (copy_body_data *id, basic_block bb,
>  	      switch (id->transform_call_graph_edges)
>  		{
>  		case CB_CGE_DUPLICATE:
> -		  edge = id->src_node->get_edge (orig_stmt);
> -		  if (edge)
> -		    {
> -		      struct cgraph_edge *old_edge = edge;
> -		      profile_count old_cnt = edge->count;
> -		      edge = edge->clone (id->dst_node, call_stmt,
> -					  gimple_uid (stmt),
> -					  num, den,
> -					  true);
> -
> -		      /* Speculative calls consist of two edges - direct and
> -			 indirect.  Duplicate the whole thing and distribute
> -			 frequencies accordingly.  */
> -		      if (edge->speculative)
> -			{
> -			  struct cgraph_edge *direct, *indirect;
> -			  struct ipa_ref *ref;
> -
> -			  gcc_assert (!edge->indirect_unknown_callee);
> -			  old_edge->speculative_call_info (direct, indirect, ref);
> -
> -			  profile_count indir_cnt = indirect->count;
> -			  indirect = indirect->clone (id->dst_node, call_stmt,
> -						      gimple_uid (stmt),
> -						      num, den,
> -						      true);
> -
> -			  profile_probability prob
> -			     = indir_cnt.probability_in (old_cnt + indir_cnt);
> -			  indirect->count
> -			     = copy_basic_block->count.apply_probability (prob);
> -			  edge->count = copy_basic_block->count - indirect->count;
> -			  id->dst_node->clone_reference (ref, stmt);
> -			}
> -		      else
> -			edge->count = copy_basic_block->count;
> -		    }
> +		  {
> +		    edge = id->src_node->get_edge (orig_stmt);
> +		    struct cgraph_edge *old_edge = edge;
> +		    struct cgraph_edge *direct, *indirect;
> +		    bool next_speculative;
> +		    do
> +		      {
> +			next_speculative = false;
> +			if (edge)
> +			  {
> +			    profile_count old_cnt = edge->count;
> +			    edge
> +			      = edge->clone (id->dst_node, call_stmt,
> +					     gimple_uid (stmt), num, den, true);
> +
> +			    /* Speculative calls consist of two edges - direct
> +			       and indirect.  Duplicate the whole thing and
> +			       distribute frequencies accordingly.  */
> +			    if (edge->speculative)
> +			      {
> +				struct ipa_ref *ref;
> +
> +				gcc_assert (!edge->indirect_unknown_callee);
> +				old_edge->speculative_call_info (direct,
> +								 indirect, ref);
> +
> +				profile_count indir_cnt = indirect->count;
> +				indirect
> +				  = indirect->clone (id->dst_node, call_stmt,
> +						     gimple_uid (stmt), num,
> +						     den, true);
> +
> +				profile_probability prob
> +				  = indir_cnt.probability_in (old_cnt
> +							      + indir_cnt);
> +				indirect->count
> +				  = copy_basic_block->count.apply_probability (
> +				    prob);
> +				edge->count
> +				  = copy_basic_block->count - indirect->count;
> +				id->dst_node->clone_reference (ref, stmt);
> +			      }
> +			    else
> +			      edge->count = copy_basic_block->count;
> +			  }
> +			/* If the indirect call contains more than one indirect
> +			   targets, need clone all speculative edges here.  */
> +			if (old_edge && old_edge->next_callee
> +			    && old_edge->speculative && indirect
> +			    && indirect->indirect_info
> +			    && indirect->indirect_info->num_of_ics > 1)
> +			  {
> +			    edge = old_edge->next_callee;
> +			    old_edge = old_edge->next_callee;
> +			    if (edge->speculative)
> +			      next_speculative = true;
> +			  }
> +		      }
> +		    while (next_speculative);
> +		  }
>  		  break;
>  
>  		case CB_CGE_MOVE_CLONES:
> diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
> index 1c3034aac10..4964dbdebb5 100644
> --- a/gcc/tree-profile.c
> +++ b/gcc/tree-profile.c
> @@ -74,8 +74,8 @@ static GTY(()) tree ic_tuple_callee_field;
>  /* Do initialization work for the edge profiler.  */
>  
>  /* Add code:
> -   __thread gcov*	__gcov_indirect_call_counters; // pointer to actual counter
> -   __thread void*	__gcov_indirect_call_callee; // actual callee address
> +   __thread gcov*	__gcov_indirect_call.counters; // pointer to actual counter
> +   __thread void*	__gcov_indirect_call.callee; // actual callee address
>     __thread int __gcov_function_counter; // time profiler function counter
>  */
>  static void
> @@ -395,7 +395,7 @@ gimple_gen_ic_profiler (histogram_value value, unsigned tag, unsigned base)
>        f_1 = foo;
>        __gcov_indirect_call.counters = &__gcov4.main[0];
>        PROF_9 = f_1;
> -      __gcov_indirect_call_callee = PROF_9;
> +      __gcov_indirect_call.callee = PROF_9;
>        _4 = f_1 ();
>     */
>  
> @@ -458,11 +458,11 @@ gimple_gen_ic_func_profiler (void)
>  
>    /* Insert code:
>  
> -     if (__gcov_indirect_call_callee != NULL)
> +     if (__gcov_indirect_call.callee != NULL)
>         __gcov_indirect_call_profiler_v3 (profile_id, &current_function_decl);
>  
>       The function __gcov_indirect_call_profiler_v3 is responsible for
> -     resetting __gcov_indirect_call_callee to NULL.  */
> +     resetting __gcov_indirect_call.callee to NULL.  */
>  
>    gimple_stmt_iterator gsi = gsi_start_bb (cond_bb);
>    void0 = build_int_cst (ptr_type_node, 0);
> @@ -904,7 +904,7 @@ pass_ipa_tree_profile::gate (function *)
>  {
>    /* When profile instrumentation, use or test coverage shall be performed.
>       But for AutoFDO, this there is no instrumentation, thus this pass is
> -     diabled.  */
> +     disabled.  */
>    return (!in_lto_p && !flag_auto_profile
>  	  && (flag_branch_probabilities || flag_test_coverage
>  	      || profile_arc_flag));
> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
> index 5013956cf86..4869ab8ccd6 100644
> --- a/gcc/value-prof.c
> +++ b/gcc/value-prof.c
> @@ -579,8 +579,8 @@ free_histograms (struct function *fn)
>     somehow.  */
>  
>  static bool
> -check_counter (gimple *stmt, const char * name,
> -	       gcov_type *count, gcov_type *all, profile_count bb_count_d)
> +check_counter (gimple *stmt, const char *name, gcov_type *count, gcov_type *all,
> +	       profile_count bb_count_d, float ratio = 1.0f)
>  {
>    gcov_type bb_count = bb_count_d.ipa ().to_gcov_type ();
>    if (*all != bb_count || *count > *all)
> @@ -599,7 +599,7 @@ check_counter (gimple *stmt, const char * name,
>                               "count (%d)\n", name, (int)*all, (int)bb_count);
>  	  *all = bb_count;
>  	  if (*count > *all)
> -            *count = *all;
> +	    *count = *all * ratio;
>  	  return false;
>  	}
>        else
> @@ -1410,9 +1410,132 @@ gimple_ic (gcall *icall_stmt, struct cgraph_node *direct_call,
>    return dcall_stmt;
>  }
>  
> +/* If --param=indir-call-topn-profile=1 is specified when compiling, there maybe
> +   multiple indirect targets in histogram.  Check every indirect/virtual call
> +   if callee function exists, if not exit, leave it to LTO stage for later
> +   process.  Modify code of this indirect call to an if-else structure in
> +   ipa-profile finally.  */
> +static bool
> +ic_transform_topn (gimple_stmt_iterator *gsi)
> +{
> +  unsigned j;
> +  gcall *stmt;
> +  histogram_value histogram;
> +  gcov_type val, count, count_all, all, bb_all;
> +  struct cgraph_node *d_call;
> +  profile_count bb_count;
> +
> +  stmt = dyn_cast<gcall *> (gsi_stmt (*gsi));
> +  if (!stmt)
> +    return false;
> +
> +  if (gimple_call_fndecl (stmt) != NULL_TREE)
> +    return false;
> +
> +  if (gimple_call_internal_p (stmt))
> +    return false;
> +
> +  histogram
> +    = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL_TOPN);
> +  if (!histogram)
> +    return false;
> +
> +  count = 0;
> +  all = 0;
> +  bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
> +  bb_count = gimple_bb (stmt)->count;
> +
> +  /* n_counters need be odd to avoid access violation.  */
> +  gcc_assert (histogram->n_counters % 2 == 1);
> +
> +  /* For indirect call topn, accumulate all the counts first.  */
> +  for (j = 1; j < histogram->n_counters; j += 2)
> +    {
> +      val = histogram->hvalue.counters[j];
> +      count = histogram->hvalue.counters[j + 1];
> +      if (val)
> +	all += count;
> +    }
> +
> +  count_all = all;
> +  /* Do the indirect call conversion if function body exists, or else leave it
> +     to LTO stage.  */
> +  for (j = 1; j < histogram->n_counters; j += 2)
> +    {
> +      val = histogram->hvalue.counters[j];
> +      count = histogram->hvalue.counters[j + 1];
> +      if (val)
> +	{
> +	  /* The order of CHECK_COUNTER calls is important
> +	     since check_counter can correct the third parameter
> +	     and we want to make count <= all <= bb_count.  */
> +	  if (check_counter (stmt, "ic", &all, &bb_all, bb_count)
> +	      || check_counter (stmt, "ic", &count, &all,
> +				profile_count::from_gcov_type (all),
> +				(float) count / count_all))
> +	    {
> +	      gimple_remove_histogram_value (cfun, stmt, histogram);
> +	      return false;
> +	    }
> +
> +	  d_call = find_func_by_profile_id ((int) val);
> +
> +	  if (d_call == NULL)
> +	    {
> +	      if (val)
> +		{
> +		  if (dump_file)
> +		    {
> +		      fprintf (
> +			dump_file,
> +			"Indirect call -> direct call from other module");
> +		      print_generic_expr (dump_file, gimple_call_fn (stmt),
> +					  TDF_SLIM);
> +		      fprintf (dump_file,
> +			       "=> %i (will resolve only with LTO)\n",
> +			       (int) val);
> +		    }
> +		}
> +	      return false;
> +	    }
> +
> +	  if (!check_ic_target (stmt, d_call))
> +	    {
> +	      if (dump_file)
> +		{
> +		  fprintf (dump_file, "Indirect call -> direct call ");
> +		  print_generic_expr (dump_file, gimple_call_fn (stmt),
> +				      TDF_SLIM);
> +		  fprintf (dump_file, "=> ");
> +		  print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
> +		  fprintf (dump_file,
> +			   " transformation skipped because of type mismatch");
> +		  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> +		}
> +	      gimple_remove_histogram_value (cfun, stmt, histogram);
> +	      return false;
> +	    }
> +
> +	  if (dump_file)
> +	  {
> +	    fprintf (dump_file, "Indirect call -> direct call ");
> +	    print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
> +	    fprintf (dump_file, "=> ");
> +	    print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
> +	    fprintf (dump_file,
> +		     " transformation on insn postponed to ipa-profile");
> +	    print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> +	    fprintf (dump_file, "hist->count %" PRId64
> +		" hist->all %" PRId64"\n", count, all);
> +	  }
> +	}
> +    }
> +
> +  return true;
> +}
>  /*
>    For every checked indirect/virtual call determine if most common pid of
> -  function/class method has probability more than 50%. If yes modify code of
> +  function/class method has probability more than 50%.  If yes modify code of
>    this call to:
>   */
>  
> @@ -1423,6 +1546,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>    histogram_value histogram;
>    gcov_type val, count, all, bb_all;
>    struct cgraph_node *direct_call;
> +  enum hist_type type;
>  
>    stmt = dyn_cast <gcall *> (gsi_stmt (*gsi));
>    if (!stmt)
> @@ -1434,18 +1558,24 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>    if (gimple_call_internal_p (stmt))
>      return false;
>  
> -  histogram = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL);
> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN
> +						     : HIST_TYPE_INDIR_CALL;
> +
> +  histogram = gimple_histogram_value_of_type (cfun, stmt, type);
>    if (!histogram)
>      return false;
>  
> +  if (type == HIST_TYPE_INDIR_CALL_TOPN)
> +      return ic_transform_topn (gsi);
> +
>    val = histogram->hvalue.counters [0];
>    count = histogram->hvalue.counters [1];
>    all = histogram->hvalue.counters [2];
>  
>    bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
> -  /* The order of CHECK_COUNTER calls is important -
> +  /* The order of CHECK_COUNTER calls is important
>       since check_counter can correct the third parameter
> -     and we want to make count <= all <= bb_all. */
> +     and we want to make count <= all <= bb_all.  */
>    if (check_counter (stmt, "ic", &all, &bb_all, gimple_bb (stmt)->count)
>        || check_counter (stmt, "ic", &count, &all,
>  		        profile_count::from_gcov_type (all)))
> @@ -1494,7 +1624,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>        print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
>        fprintf (dump_file, "=> ");
>        print_generic_expr (dump_file, direct_call->decl, TDF_SLIM);
> -      fprintf (dump_file, " transformation on insn postponned to ipa-profile");
> +      fprintf (dump_file, " transformation on insn postponed to ipa-profile");
>        print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>        fprintf (dump_file, "hist->count %" PRId64
>  	       " hist->all %" PRId64"\n", count, all);
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-18  5:51 ` Martin Liška
@ 2019-06-18  9:03   ` luoxhu
  2019-06-18  9:34     ` Martin Liška
  0 siblings, 1 reply; 25+ messages in thread
From: luoxhu @ 2019-06-18  9:03 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: hubicka, segher, wschmidt, luoxhu

Hi,

On 2019/6/18 13:51, Martin Liška wrote:
> On 6/18/19 3:45 AM, Xiong Hu Luo wrote:
>
> Hello.
>
> Thank you for the interest in the area.
>
>> This patch aims to fix PR69678 caused by PGO indirect call profiling bugs.
>> Currently the default instrument function can only find the indirect function
>> that called more than 50% with an incorrect count number returned.
> Can you please explain what you mean by 'an incorrect count number returned'?

For a test case indir-call-topn.c, it include 2 indirect calls "one" and 
"two". the profiling data is as below with trunk code (including your 
patch, count[0] and count[2] is switched by your code, the count[0] is 
used in ipa-profile but only support the top1 format, my patch adds the 
support for the topn format. count[0] was incorrect as WITHOUT your 
patch it is 0,  things getting better with your fix as the count[0] is 
350000000, but still not correct, in fact, "one" is running 175000000 
times, and "two" is running the other 175000000 times):

indir-call-topn.gcda:   22:    01a90000:  18:COUNTERS indirect_call 9 counts
indir-call-topn.gcda:   24:                   0: *350000000 1868707024 
0* 0 0 0 0 0

Running with the "--param indir-call-topn-profile=1" will give below 
profile data, My patch is based on this profile result and do the 
optimization for multiple indirect targets, performance can get much 
improve on this testcase and SPEC2017 for some benchmarks(LLVM already 
support this several years ago...).

indir-call-topn.gcda:   26:    01b10000:  18:COUNTERS indirect_call_topn 
9 counts
indir-call-topn.gcda:   28:                   0: *0 969338501 175000000 
1868707024 175000000* 0 0 0


test case indir-call-topn.c:

#include <stdio.h>


typedef int (*fptr) (int);
int
one (int a)
{
   return 1;
}

int
two (int a)
{
   return 0;
}

fptr table[] = {&one, &two};

int
main()
{
   int i, x;
   fptr p = &one;

   one (3);

   for (i = 0; i < 350000000; i++)
     {
       x = (*p) (3);
       p = table[x];
     }
   printf ("done:%d\n", x);
}

>
>>   This patch
>> leverages the "--param indir-call-topn-profile=1" and enables multiple indirect
> Note that I've remove indir-call-topn-profile last week, the patch will not apply
> on current trunk. However, I can help you how to adapt single-value counters
> to support tracking of multiple values.

It will be very useful if you help me to track multiple values similarly 
on trunk code. I will rebase to your code once topn is ready again. 
Actually topn is more general and top1 is included in, I thought that 
top1 should be removed instead of topn, though topn will consume longer 
time than top1 in profile-generate.

>
>> targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, function
>> specialization, profiling, partial devirtualization, inlining and cloning could
>> be done successfully based on it.
> This decision is definitely big question for Honza?
>
>> Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests.
>> Details are:
>>    1.  When do PGO with indir-call-topn-profile, the gcda data format is not
>>    supported in ipa-profile pass,
> If you take a look at gcc/ipa-profile.c:195 you can see how the probability
> is propagated to IPA passes. Why is that not sufficient?

Current code only support single indirect target, I need track multiple 
indirect targets and create multiple speculative edges on single 
indirect call statement.

What's more, many ICEs happened in later stage due to single speculative 
target design, part of this patch is to solve the ICEs of multiple 
speculative target edges handling.


Thanks

Xionghu

>
> Martin
>
>> so add variables to pass the information
>>    through passes, and postpone gimple_ic to ipa-profile like default as inline
>>    pass will decide whether it is benefit to transform indirect call.
>>    2.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
>>    profile full support in ipa passes and cgraph_edge functions.
>>    3.  Fix various hidden speculative call ICEs exposed after enabling this
>>    feature when running SPEC2017.
>>    4.  Add 1 in module testcase and 2 cross module testcases.
>>    5.  TODOs:
>>      5.1.  Some reference info will be dropped from WPA to LTRANS, so
>>      reference check will be difficult in LTRANS, need replace the strstr
>>      with reference compare.
>>      5.2.  Some duplicate code need be removed as top1 and topn share same logic.
>>      Actually top1 related logic could be eliminated totally as topn includes it.
>>      5.3.  Split patch maybe needed as too big but not sure how many would be
>>      reasonable.
>>    6.  Performance result for ppc64le:
>>      6.1.  Representative test: indir-call-prof-topn.c runtime improved from
>>      1.7s to 0.4s.
>>      6.2.  SPEC2017 peakrate:
>>          523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r (+13.33%);
>>          525.x264_r (-5.29%).
>>          No big changes of other benchmarks.
>>          Option: -Ofast -mcpu=power8
>>          PASS1_OPTIMIZE: -fprofile-generate --param indir-call-topn-profile=1 -flto
>>          PASS2_OPTIMIZE: -fprofile-use --param indir-call-topn-profile=1 -flto
>>          -fprofile-correction
>>      6.3.  No performance change on PHP benchmark.
>>    7.  Bootstrap and regression test passed on Power8-LE.
>>
>> gcc/ChangeLog
>>
>> 	2019-06-17  Xiong Hu Luo  <luoxhu@linux.ibm.com>
>>
>> 	PR ipa/69678
>> 	* cgraph.c (cgraph_node::get_create): Copy profile_id.
>> 	(cgraph_edge::speculative_call_info): Find real
>> 	reference for indirect targets.
>> 	(cgraph_edge::resolve_speculation): Add speculative code process
>> 	for indirect targets.
>> 	(cgraph_edge::redirect_call_stmt_to_callee): Likewise.
>> 	(cgraph_node::verify_node): Likewise.
>> 	* cgraph.h (common_target_ids): New variable.
>> 	(common_target_probabilities): Likewise.
>> 	(num_of_ics): Likewise.
>> 	* cgraphclones.c (cgraph_node::create_clone): Copy profile_id.
>> 	* ipa-inline.c (inline_small_functions): Add iterator update.
>> 	* ipa-profile.c (ipa_profile_generate_summary): Add indirect
>> 	multiple targets logic.
>> 	(ipa_profile): Likewise.
>> 	* ipa-utils.c (ipa_merge_profiles): Clone speculative src's
>> 	referrings to dst.
>> 	* ipa.c (process_references): Fix typo.
>> 	* lto-cgraph.c (lto_output_edge): Add indirect multiple targets
>> 	logic.
>> 	(input_edge): Likewise.
>> 	* predict.c (dump_prediction): Revome edges count assert to be
>> 	precise.
>> 	* tree-profile.c (gimple_gen_ic_profiler): Use the new variable
>> 	__gcov_indirect_call.counters and __gcov_indirect_call.callee.
>> 	(gimple_gen_ic_func_profiler): Likewise.
>> 	(pass_ipa_tree_profile::gate): Fix comment typos.
>> 	* tree-inline.c (copy_bb): Duplicate all the speculative edges
>> 	if indirect call contains multiple speculative targets.
>> 	* value-prof.c (check_counter): Proportion the counter for
>> 	multiple targets.
>> 	(ic_transform_topn): New function.
>> 	(gimple_ic_transform): Handle topn case, fix comment typos.
>>
>> gcc/testsuite/ChangeLog
>>
>> 	2019-06-17  Xiong Hu Luo  <luoxhu@linux.ibm.com>
>>
>> 	PR ipa/69678
>> 	* gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase.
>> 	* gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase.
>> 	* gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase.
>> 	* gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase.
>> ---
>>   gcc/cgraph.c                                  |  38 +++-
>>   gcc/cgraph.h                                  |   9 +-
>>   gcc/cgraphclones.c                            |   1 +
>>   gcc/ipa-inline.c                              |   3 +
>>   gcc/ipa-profile.c                             | 185 +++++++++++++++++-
>>   gcc/ipa-utils.c                               |   5 +
>>   gcc/ipa.c                                     |   2 +-
>>   gcc/lto-cgraph.c                              |  38 ++++
>>   gcc/predict.c                                 |   1 -
>>   .../tree-prof/crossmodule-indir-call-topn-1.c |  35 ++++
>>   .../crossmodule-indir-call-topn-1a.c          |  22 +++
>>   .../tree-prof/crossmodule-indir-call-topn-2.c |  42 ++++
>>   .../gcc.dg/tree-prof/indir-call-prof-topn.c   |  38 ++++
>>   gcc/tree-inline.c                             |  97 +++++----
>>   gcc/tree-profile.c                            |  12 +-
>>   gcc/value-prof.c                              | 146 +++++++++++++-
>>   16 files changed, 606 insertions(+), 68 deletions(-)
>>   create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>>   create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>>   create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>>   create mode 100644 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
>>
>> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
>> index de82316d4b1..0d373a67d1b 100644
>> --- a/gcc/cgraph.c
>> +++ b/gcc/cgraph.c
>> @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
>>   	fprintf (dump_file, "Introduced new external node "
>>   		 "(%s) and turned into root of the clone tree.\n",
>>   		 node->dump_name ());
>> +      node->profile_id = first_clone->profile_id;
>>       }
>>     else if (dump_file)
>>       fprintf (dump_file, "Introduced new external node "
>> @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
>>     int i;
>>     cgraph_edge *e2;
>>     cgraph_edge *e = this;
>> +  cgraph_node *referred_node;
>>   
>>     if (!e->indirect_unknown_callee)
>>       for (e2 = e->caller->indirect_calls;
>> @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
>>   	&& ((ref->stmt && ref->stmt == e->call_stmt)
>>   	    || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
>>         {
>> -	reference = ref;
>> -	break;
>> +	if (e2->indirect_info && e2->indirect_info->num_of_ics)
>> +	  {
>> +	    referred_node = dyn_cast<cgraph_node *> (ref->referred);
>> +	    if (strstr (e->callee->name (), referred_node->name ()))
>> +	      {
>> +		reference = ref;
>> +		break;
>> +	      }
>> +	  }
>> +	else
>> +	  {
>> +	    reference = ref;
>> +	    break;
>> +	  }
>>         }
>>   
>>     /* Speculative edge always consist of all three components - direct edge,
>> @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
>>            in the functions inlined through it.  */
>>       }
>>     edge->count += e2->count;
>> -  edge->speculative = false;
>> +  if (edge->indirect_info && edge->indirect_info->num_of_ics)
>> +    {
>> +      edge->indirect_info->num_of_ics--;
>> +      if (edge->indirect_info->num_of_ics == 0)
>> +	edge->speculative = false;
>> +    }
>> +  else
>> +    edge->speculative = false;
>>     e2->speculative = false;
>>     ref->remove_reference ();
>>     if (e2->indirect_unknown_callee || e2->inline_failed)
>> @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>>   	  e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
>>   						     false);
>>   	  e->count = gimple_bb (e->call_stmt)->count;
>> -	  e2->speculative = false;
>> +	  if (e2->indirect_info && e2->indirect_info->num_of_ics)
>> +	    {
>> +	      e2->indirect_info->num_of_ics--;
>> +	      if (e2->indirect_info->num_of_ics == 0)
>> +		e2->speculative = false;
>> +	    }
>> +	  else
>> +	    e2->speculative = false;
>>   	  e2->count = gimple_bb (e2->call_stmt)->count;
>>   	  ref->speculative = false;
>>   	  ref->stmt = NULL;
>> @@ -3407,7 +3435,7 @@ cgraph_node::verify_node (void)
>>   
>>         for (e = callees; e; e = e->next_callee)
>>   	{
>> -	  if (!e->aux)
>> +	  if (!e->aux && !e->speculative)
>>   	    {
>>   	      error ("edge %s->%s has no corresponding call_stmt",
>>   		     identifier_to_locale (e->caller->name ()),
>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
>> index c294602d762..ed0fbc60432 100644
>> --- a/gcc/cgraph.h
>> +++ b/gcc/cgraph.h
>> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "profile-count.h"
>>   #include "ipa-ref.h"
>>   #include "plugin-api.h"
>> +#include "gcov-io.h"
>>   
>>   extern void debuginfo_early_init (void);
>>   extern void debuginfo_init (void);
>> @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info
>>     int param_index;
>>     /* ECF flags determined from the caller.  */
>>     int ecf_flags;
>> -  /* Profile_id of common target obtrained from profile.  */
>> +  /* Profile_id of common target obtained from profile.  */
>>     int common_target_id;
>>     /* Probability that call will land in function with COMMON_TARGET_ID.  */
>>     int common_target_probability;
>>   
>> +  /* Profile_id of common target obtained from profile.  */
>> +  int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2];
>> +  /* Probabilities that call will land in function with COMMON_TARGET_IDS.  */
>> +  int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2];
>> +  unsigned num_of_ics;
>> +
>>     /* Set when the call is a virtual call with the parameter being the
>>        associated object pointer rather than a simple direct call.  */
>>     unsigned polymorphic : 1;
>> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
>> index 15f7e119d18..94f424bc10c 100644
>> --- a/gcc/cgraphclones.c
>> +++ b/gcc/cgraphclones.c
>> @@ -467,6 +467,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
>>     new_node->icf_merged = icf_merged;
>>     new_node->merged_comdat = merged_comdat;
>>     new_node->thunk = thunk;
>> +  new_node->profile_id = profile_id;
>>   
>>     new_node->clone.tree_map = NULL;
>>     new_node->clone.args_to_skip = args_to_skip;
>> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
>> index 360c3de3289..ef2b217b3f9 100644
>> --- a/gcc/ipa-inline.c
>> +++ b/gcc/ipa-inline.c
>> @@ -1866,12 +1866,15 @@ inline_small_functions (void)
>>   	}
>>         if (has_speculative)
>>   	for (edge = node->callees; edge; edge = next)
>> +	{
>> +	  next = edge->next_callee;
>>   	  if (edge->speculative && !speculation_useful_p (edge,
>>   							  edge->aux != NULL))
>>   	    {
>>   	      edge->resolve_speculation ();
>>   	      update = true;
>>   	    }
>> +	}
>>         if (update)
>>   	{
>>   	  struct cgraph_node *where = node->global.inlined_to
>> diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
>> index de9563d808c..d04476295a0 100644
>> --- a/gcc/ipa-profile.c
>> +++ b/gcc/ipa-profile.c
>> @@ -168,6 +168,10 @@ ipa_profile_generate_summary (void)
>>     struct cgraph_node *node;
>>     gimple_stmt_iterator gsi;
>>     basic_block bb;
>> +  enum hist_type type;
>> +
>> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN
>> +						     : HIST_TYPE_INDIR_CALL;
>>   
>>     hash_table<histogram_hash> hashtable (10);
>>     
>> @@ -186,10 +190,10 @@ ipa_profile_generate_summary (void)
>>   		  histogram_value h;
>>   		  h = gimple_histogram_value_of_type
>>   			(DECL_STRUCT_FUNCTION (node->decl),
>> -			 stmt, HIST_TYPE_INDIR_CALL);
>> +			 stmt, type);
>>   		  /* No need to do sanity check: gimple_ic_transform already
>>   		     takes away bad histograms.  */
>> -		  if (h)
>> +		  if (h && type == HIST_TYPE_INDIR_CALL)
>>   		    {
>>   		      /* counter 0 is target, counter 1 is number of execution we called target,
>>   			 counter 2 is total number of executions.  */
>> @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void)
>>   		      gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (node->decl),
>>   						      stmt, h);
>>   		    }
>> +		  else if (h && type == HIST_TYPE_INDIR_CALL_TOPN)
>> +		    {
>> +		      unsigned j;
>> +		      struct cgraph_edge *e = node->get_edge (stmt);
>> +		      if (e && !e->indirect_unknown_callee)
>> +			continue;
>> +
>> +		      e->indirect_info->num_of_ics = 0;
>> +		      for (j = 1; j < h->n_counters; j += 2)
>> +			{
>> +			  if (h->hvalue.counters[j] == 0)
>> +			    continue;
>> +
>> +			  e->indirect_info->common_target_ids[j / 2]
>> +			    = h->hvalue.counters[j];
>> +			  e->indirect_info->common_target_probabilities[j / 2]
>> +			    = GCOV_COMPUTE_SCALE (
>> +			      h->hvalue.counters[j + 1],
>> +			      gimple_bb (stmt)->count.ipa ().to_gcov_type ());
>> +			  if (e->indirect_info
>> +				->common_target_probabilities[j / 2]
>> +			      > REG_BR_PROB_BASE)
>> +			    {
>> +			      if (dump_file)
>> +				fprintf (dump_file,
>> +					 "Probability capped to 1\n");
>> +			      e->indirect_info
>> +				->common_target_probabilities[j / 2]
>> +				= REG_BR_PROB_BASE;
>> +			    }
>> +			  e->indirect_info->num_of_ics++;
>> +			}
>> +
>> +		      gcc_assert (e->indirect_info->num_of_ics
>> +				  <= GCOV_ICALL_TOPN_NCOUNTS / 2);
>> +
>> +		      gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (
>> +						       node->decl),
>> +						     stmt, h);
>> +		    }
>>   		}
>>   	      time += estimate_num_insns (stmt, &eni_time_weights);
>>   	      size += estimate_num_insns (stmt, &eni_size_weights);
>> @@ -492,6 +536,7 @@ ipa_profile (void)
>>     int nindirect = 0, ncommon = 0, nunknown = 0, nuseless = 0, nconverted = 0;
>>     int nmismatch = 0, nimpossible = 0;
>>     bool node_map_initialized = false;
>> +  gcov_type threshold;
>>   
>>     if (dump_file)
>>       dump_histogram (dump_file, histogram);
>> @@ -500,14 +545,12 @@ ipa_profile (void)
>>         overall_time += histogram[i]->count * histogram[i]->time;
>>         overall_size += histogram[i]->size;
>>       }
>> +  threshold = 0;
>>     if (overall_time)
>>       {
>> -      gcov_type threshold;
>> -
>>         gcc_assert (overall_size);
>>   
>>         cutoff = (overall_time * PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE) + 500) / 1000;
>> -      threshold = 0;
>>         for (i = 0; cumulated < cutoff; i++)
>>   	{
>>   	  cumulated += histogram[i]->count * histogram[i]->time;
>> @@ -543,7 +586,7 @@ ipa_profile (void)
>>     histogram.release ();
>>     histogram_pool.release ();
>>   
>> -  /* Produce speculative calls: we saved common traget from porfiling into
>> +  /* Produce speculative calls: we saved common target from profiling into
>>        e->common_target_id.  Now, at link time, we can look up corresponding
>>        function node and produce speculative call.  */
>>   
>> @@ -558,7 +601,8 @@ ipa_profile (void)
>>   	{
>>   	  if (n->count.initialized_p ())
>>   	    nindirect++;
>> -	  if (e->indirect_info->common_target_id)
>> +	  if (e->indirect_info->common_target_id
>> +	      || (e->indirect_info && e->indirect_info->num_of_ics == 1))
>>   	    {
>>   	      if (!node_map_initialized)
>>   	        init_node_map (false);
>> @@ -613,7 +657,7 @@ ipa_profile (void)
>>   		      if (dump_file)
>>   			fprintf (dump_file,
>>   				 "Not speculating: "
>> -				 "parameter count mistmatch\n");
>> +				 "parameter count mismatch\n");
>>   		    }
>>   		  else if (e->indirect_info->polymorphic
>>   			   && !opt_for_fn (n->decl, flag_devirtualize)
>> @@ -655,7 +699,130 @@ ipa_profile (void)
>>   		  nunknown++;
>>   		}
>>   	    }
>> -	 }
>> +	  if (e->indirect_info && e->indirect_info->num_of_ics > 1)
>> +	    {
>> +	      if (in_lto_p)
>> +		{
>> +		  if (dump_file)
>> +		    {
>> +		      fprintf (dump_file,
>> +			       "Updating hotness threshold in LTO mode.\n");
>> +		      fprintf (dump_file, "Updated min count: %" PRId64 "\n",
>> +			       (int64_t) threshold);
>> +		    }
>> +		  set_hot_bb_threshold (threshold
>> +					/ e->indirect_info->num_of_ics);
>> +		}
>> +	      if (!node_map_initialized)
>> +		init_node_map (false);
>> +	      node_map_initialized = true;
>> +	      ncommon++;
>> +	      unsigned speculative = 0;
>> +	      for (i = 0; i < (int)e->indirect_info->num_of_ics; i++)
>> +		{
>> +		  n2 = find_func_by_profile_id (
>> +		    e->indirect_info->common_target_ids[i]);
>> +		  if (n2)
>> +		    {
>> +		      if (dump_file)
>> +			{
>> +			  fprintf (
>> +			    dump_file,
>> +			    "Indirect call -> direct call from"
>> +			    " other module %s => %s, prob %3.2f\n",
>> +			    n->dump_name (), n2->dump_name (),
>> +			    e->indirect_info->common_target_probabilities[i]
>> +			      / (float) REG_BR_PROB_BASE);
>> +			}
>> +		      if (e->indirect_info->common_target_probabilities[i]
>> +			  < REG_BR_PROB_BASE / 2)
>> +			{
>> +			  nuseless++;
>> +			  if (dump_file)
>> +			    fprintf (
>> +			      dump_file,
>> +			      "Not speculating: probability is too low.\n");
>> +			}
>> +		      else if (!e->maybe_hot_p ())
>> +			{
>> +			  nuseless++;
>> +			  if (dump_file)
>> +			    fprintf (dump_file,
>> +				     "Not speculating: call is cold.\n");
>> +			}
>> +		      else if (n2->get_availability () <= AVAIL_INTERPOSABLE
>> +			       && n2->can_be_discarded_p ())
>> +			{
>> +			  nuseless++;
>> +			  if (dump_file)
>> +			    fprintf (dump_file,
>> +				     "Not speculating: target is overwritable "
>> +				     "and can be discarded.\n");
>> +			}
>> +		      else if (ipa_node_params_sum && ipa_edge_args_sum
>> +			       && (!vec_safe_is_empty (
>> +				 IPA_NODE_REF (n2)->descriptors))
>> +			       && ipa_get_param_count (IPA_NODE_REF (n2))
>> +				    != ipa_get_cs_argument_count (
>> +				      IPA_EDGE_REF (e))
>> +			       && (ipa_get_param_count (IPA_NODE_REF (n2))
>> +				     >= ipa_get_cs_argument_count (
>> +				       IPA_EDGE_REF (e))
>> +				   || !stdarg_p (TREE_TYPE (n2->decl))))
>> +			{
>> +			  nmismatch++;
>> +			  if (dump_file)
>> +			    fprintf (dump_file, "Not speculating: "
>> +						"parameter count mismatch\n");
>> +			}
>> +		      else if (e->indirect_info->polymorphic
>> +			       && !opt_for_fn (n->decl, flag_devirtualize)
>> +			       && !possible_polymorphic_call_target_p (e, n2))
>> +			{
>> +			  nimpossible++;
>> +			  if (dump_file)
>> +			    fprintf (dump_file,
>> +				     "Not speculating: "
>> +				     "function is not in the polymorphic "
>> +				     "call target list\n");
>> +			}
>> +		      else
>> +			{
>> +			  /* Target may be overwritable, but profile says that
>> +			     control flow goes to this particular implementation
>> +			     of N2.  Speculate on the local alias to allow
>> +			     inlining.
>> +			     */
>> +			  if (!n2->can_be_discarded_p ())
>> +			    {
>> +			      cgraph_node *alias;
>> +			      alias = dyn_cast<cgraph_node *> (
>> +				n2->noninterposable_alias ());
>> +			      if (alias)
>> +				n2 = alias;
>> +			    }
>> +			  nconverted++;
>> +			  e->make_speculative (
>> +			    n2, e->count.apply_probability (
>> +				  e->indirect_info
>> +				    ->common_target_probabilities[i]));
>> +			  update = true;
>> +			  speculative++;
>> +			}
>> +		    }
>> +		  else
>> +		    {
>> +		      if (dump_file)
>> +			fprintf (dump_file,
>> +				 "Function with profile-id %i not found.\n",
>> +				 e->indirect_info->common_target_ids[i]);
>> +		      nunknown++;
>> +		    }
>> +		}
>> +	      if (speculative < e->indirect_info->num_of_ics)
>> +		e->indirect_info->num_of_ics = speculative;
>> +	    }
>> +	}
>>          if (update)
>>   	 ipa_update_overall_fn_summary (n);
>>        }
>> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
>> index 79b250c3943..30347691029 100644
>> --- a/gcc/ipa-utils.c
>> +++ b/gcc/ipa-utils.c
>> @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst,
>>         update_max_bb_count ();
>>         compute_function_frequency ();
>>         pop_cfun ();
>> +      /* When src is speculative, clone the referrings.  */
>> +      if (src->indirect_call_target)
>> +	for (e = src->callers; e; e = e->next_caller)
>> +	  if (e->callee == src && e->speculative)
>> +	    dst->clone_referring (src);
>>         for (e = dst->callees; e; e = e->next_callee)
>>   	{
>>   	  if (e->speculative)
>> diff --git a/gcc/ipa.c b/gcc/ipa.c
>> index 2496694124c..c1fe081a72d 100644
>> --- a/gcc/ipa.c
>> +++ b/gcc/ipa.c
>> @@ -166,7 +166,7 @@ process_references (symtab_node *snode,
>>      devirtualization happens.  After inlining still keep their declarations
>>      around, so we can devirtualize to a direct call.
>>   
>> -   Also try to make trivial devirutalization when no or only one target is
>> +   Also try to make trivial devirtualization when no or only one target is
>>      possible.  */
>>   
>>   static void
>> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
>> index 4dfa2862be3..0c8f547d44e 100644
>> --- a/gcc/lto-cgraph.c
>> +++ b/gcc/lto-cgraph.c
>> @@ -238,6 +238,7 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
>>     unsigned int uid;
>>     intptr_t ref;
>>     struct bitpack_d bp;
>> +  unsigned i;
>>   
>>     if (edge->indirect_unknown_callee)
>>       streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
>> @@ -296,6 +297,25 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
>>         if (edge->indirect_info->common_target_id)
>>   	streamer_write_hwi_stream
>>   	   (ob->main_stream, edge->indirect_info->common_target_probability);
>> +
>> +      gcc_assert (edge->indirect_info->num_of_ics
>> +		  <= GCOV_ICALL_TOPN_NCOUNTS / 2);
>> +
>> +      streamer_write_hwi_stream (ob->main_stream,
>> +				 edge->indirect_info->num_of_ics);
>> +
>> +      if (edge->indirect_info->num_of_ics)
>> +	{
>> +	  for (i = 0; i < edge->indirect_info->num_of_ics; i++)
>> +	    {
>> +	      streamer_write_hwi_stream (
>> +		ob->main_stream, edge->indirect_info->common_target_ids[i]);
>> +	      if (edge->indirect_info->common_target_ids[i])
>> +		streamer_write_hwi_stream (
>> +		  ob->main_stream,
>> +		  edge->indirect_info->common_target_probabilities[i]);
>> +	    }
>> +	}
>>       }
>>   }
>>   
>> @@ -1438,6 +1458,7 @@ input_edge (struct lto_input_block *ib, vec<symtab_node *> nodes,
>>     cgraph_inline_failed_t inline_failed;
>>     struct bitpack_d bp;
>>     int ecf_flags = 0;
>> +  unsigned i;
>>   
>>     caller = dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]);
>>     if (caller == NULL || caller->decl == NULL_TREE)
>> @@ -1488,6 +1509,23 @@ input_edge (struct lto_input_block *ib, vec<symtab_node *> nodes,
>>         edge->indirect_info->common_target_id = streamer_read_hwi (ib);
>>         if (edge->indirect_info->common_target_id)
>>           edge->indirect_info->common_target_probability = streamer_read_hwi (ib);
>> +
>> +      edge->indirect_info->num_of_ics = streamer_read_hwi (ib);
>> +
>> +      gcc_assert (edge->indirect_info->num_of_ics
>> +		  <= GCOV_ICALL_TOPN_NCOUNTS / 2);
>> +
>> +      if (edge->indirect_info->num_of_ics)
>> +	{
>> +	  for (i = 0; i < edge->indirect_info->num_of_ics; i++)
>> +	    {
>> +	      edge->indirect_info->common_target_ids[i]
>> +		= streamer_read_hwi (ib);
>> +	      if (edge->indirect_info->common_target_ids[i])
>> +		edge->indirect_info->common_target_probabilities[i]
>> +		  = streamer_read_hwi (ib);
>> +	    }
>> +	}
>>       }
>>   }
>>   
>> diff --git a/gcc/predict.c b/gcc/predict.c
>> index 43ee91a5b13..b7f38891c72 100644
>> --- a/gcc/predict.c
>> +++ b/gcc/predict.c
>> @@ -763,7 +763,6 @@ dump_prediction (FILE *file, enum br_predictor predictor, int probability,
>>         && bb->count.precise_p ()
>>         && reason == REASON_NONE)
>>       {
>> -      gcc_assert (e->count ().precise_p ());
>>         fprintf (file, ";;heuristics;%s;%" PRId64 ";%" PRId64 ";%.1f;\n",
>>   	       predictor_info[predictor].name,
>>   	       bb->count.to_gcov_type (), e->count ().to_gcov_type (),
>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>> new file mode 100644
>> index 00000000000..e0a83c2e067
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>> @@ -0,0 +1,35 @@
>> +/* { dg-require-effective-target lto } */
>> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
>> +/* { dg-require-profiling "-fprofile-generate" } */
>> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */
>> +
>> +#include <stdio.h>
>> +
>> +typedef int (*fptr) (int);
>> +int
>> +one (int a);
>> +
>> +int
>> +two (int a);
>> +
>> +fptr table[] = {&one, &two};
>> +
>> +int
>> +main()
>> +{
>> +  int i, x;
>> +  fptr p = &one;
>> +
>> +  x = one (3);
>> +
>> +  for (i = 0; i < 350000000; i++)
>> +    {
>> +      x = (*p) (3);
>> +      p = table[x];
>> +    }
>> +  printf ("done:%d\n", x);
>> +}
>> +
>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */
>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */
>> +
>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>> new file mode 100644
>> index 00000000000..a8c6e365fb9
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>> @@ -0,0 +1,22 @@
>> +/* It seems there is no way to avoid the other source of mulitple
>> +   source testcase from being compiled independently.  Just avoid
>> +   error.  */
>> +#ifdef DOJOB
>> +int
>> +one (int a)
>> +{
>> +  return 1;
>> +}
>> +
>> +int
>> +two (int a)
>> +{
>> +  return 0;
>> +}
>> +#else
>> +int
>> +main()
>> +{
>> +  return 0;
>> +}
>> +#endif
>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>> new file mode 100644
>> index 00000000000..aa3887fde83
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>> @@ -0,0 +1,42 @@
>> +/* { dg-require-effective-target lto } */
>> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
>> +/* { dg-require-profiling "-fprofile-generate" } */
>> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */
>> +
>> +#include <stdio.h>
>> +
>> +typedef int (*fptr) (int);
>> +int
>> +one (int a);
>> +
>> +int
>> +two (int a);
>> +
>> +fptr table[] = {&one, &two};
>> +
>> +int foo ()
>> +{
>> +  int i, x;
>> +  fptr p = &one;
>> +
>> +  x = one (3);
>> +
>> +  for (i = 0; i < 350000000; i++)
>> +    {
>> +      x = (*p) (3);
>> +      p = table[x];
>> +    }
>> +  return x;
>> +}
>> +
>> +int
>> +main()
>> +{
>> +  int x = foo ();
>> +  printf ("done:%d\n", x);
>> +}
>> +
>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */
>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */
>> +
>> +
>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
>> new file mode 100644
>> index 00000000000..951bc7ddd19
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
>> @@ -0,0 +1,38 @@
>> +/* { dg-require-profiling "-fprofile-generate" } */
>> +/* { dg-options "-O2 -fdump-ipa-profile --param indir-call-topn-profile=1" } */
>> +
>> +#include <stdio.h>
>> +
>> +typedef int (*fptr) (int);
>> +int
>> +one (int a)
>> +{
>> +  return 1;
>> +}
>> +
>> +int
>> +two (int a)
>> +{
>> +  return 0;
>> +}
>> +
>> +fptr table[] = {&one, &two};
>> +
>> +int
>> +main()
>> +{
>> +  int i, x;
>> +  fptr p = &one;
>> +
>> +  one (3);
>> +
>> +  for (i = 0; i < 350000000; i++)
>> +    {
>> +      x = (*p) (3);
>> +      p = table[x];
>> +    }
>> +  printf ("done:%d\n", x);
>> +}
>> +
>> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile" } } */
>> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile" } } */
>> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
>> index 9017da878b1..f69b31b197e 100644
>> --- a/gcc/tree-inline.c
>> +++ b/gcc/tree-inline.c
>> @@ -2028,43 +2028,66 @@ copy_bb (copy_body_data *id, basic_block bb,
>>   	      switch (id->transform_call_graph_edges)
>>   		{
>>   		case CB_CGE_DUPLICATE:
>> -		  edge = id->src_node->get_edge (orig_stmt);
>> -		  if (edge)
>> -		    {
>> -		      struct cgraph_edge *old_edge = edge;
>> -		      profile_count old_cnt = edge->count;
>> -		      edge = edge->clone (id->dst_node, call_stmt,
>> -					  gimple_uid (stmt),
>> -					  num, den,
>> -					  true);
>> -
>> -		      /* Speculative calls consist of two edges - direct and
>> -			 indirect.  Duplicate the whole thing and distribute
>> -			 frequencies accordingly.  */
>> -		      if (edge->speculative)
>> -			{
>> -			  struct cgraph_edge *direct, *indirect;
>> -			  struct ipa_ref *ref;
>> -
>> -			  gcc_assert (!edge->indirect_unknown_callee);
>> -			  old_edge->speculative_call_info (direct, indirect, ref);
>> -
>> -			  profile_count indir_cnt = indirect->count;
>> -			  indirect = indirect->clone (id->dst_node, call_stmt,
>> -						      gimple_uid (stmt),
>> -						      num, den,
>> -						      true);
>> -
>> -			  profile_probability prob
>> -			     = indir_cnt.probability_in (old_cnt + indir_cnt);
>> -			  indirect->count
>> -			     = copy_basic_block->count.apply_probability (prob);
>> -			  edge->count = copy_basic_block->count - indirect->count;
>> -			  id->dst_node->clone_reference (ref, stmt);
>> -			}
>> -		      else
>> -			edge->count = copy_basic_block->count;
>> -		    }
>> +		  {
>> +		    edge = id->src_node->get_edge (orig_stmt);
>> +		    struct cgraph_edge *old_edge = edge;
>> +		    struct cgraph_edge *direct, *indirect;
>> +		    bool next_speculative;
>> +		    do
>> +		      {
>> +			next_speculative = false;
>> +			if (edge)
>> +			  {
>> +			    profile_count old_cnt = edge->count;
>> +			    edge
>> +			      = edge->clone (id->dst_node, call_stmt,
>> +					     gimple_uid (stmt), num, den, true);
>> +
>> +			    /* Speculative calls consist of two edges - direct
>> +			       and indirect.  Duplicate the whole thing and
>> +			       distribute frequencies accordingly.  */
>> +			    if (edge->speculative)
>> +			      {
>> +				struct ipa_ref *ref;
>> +
>> +				gcc_assert (!edge->indirect_unknown_callee);
>> +				old_edge->speculative_call_info (direct,
>> +								 indirect, ref);
>> +
>> +				profile_count indir_cnt = indirect->count;
>> +				indirect
>> +				  = indirect->clone (id->dst_node, call_stmt,
>> +						     gimple_uid (stmt), num,
>> +						     den, true);
>> +
>> +				profile_probability prob
>> +				  = indir_cnt.probability_in (old_cnt
>> +							      + indir_cnt);
>> +				indirect->count
>> +				  = copy_basic_block->count.apply_probability (
>> +				    prob);
>> +				edge->count
>> +				  = copy_basic_block->count - indirect->count;
>> +				id->dst_node->clone_reference (ref, stmt);
>> +			      }
>> +			    else
>> +			      edge->count = copy_basic_block->count;
>> +			  }
>> +			/* If the indirect call contains more than one indirect
>> +			   targets, need clone all speculative edges here.  */
>> +			if (old_edge && old_edge->next_callee
>> +			    && old_edge->speculative && indirect
>> +			    && indirect->indirect_info
>> +			    && indirect->indirect_info->num_of_ics > 1)
>> +			  {
>> +			    edge = old_edge->next_callee;
>> +			    old_edge = old_edge->next_callee;
>> +			    if (edge->speculative)
>> +			      next_speculative = true;
>> +			  }
>> +		      }
>> +		    while (next_speculative);
>> +		  }
>>   		  break;
>>   
>>   		case CB_CGE_MOVE_CLONES:
>> diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
>> index 1c3034aac10..4964dbdebb5 100644
>> --- a/gcc/tree-profile.c
>> +++ b/gcc/tree-profile.c
>> @@ -74,8 +74,8 @@ static GTY(()) tree ic_tuple_callee_field;
>>   /* Do initialization work for the edge profiler.  */
>>   
>>   /* Add code:
>> -   __thread gcov*	__gcov_indirect_call_counters; // pointer to actual counter
>> -   __thread void*	__gcov_indirect_call_callee; // actual callee address
>> +   __thread gcov*	__gcov_indirect_call.counters; // pointer to actual counter
>> +   __thread void*	__gcov_indirect_call.callee; // actual callee address
>>      __thread int __gcov_function_counter; // time profiler function counter
>>   */
>>   static void
>> @@ -395,7 +395,7 @@ gimple_gen_ic_profiler (histogram_value value, unsigned tag, unsigned base)
>>         f_1 = foo;
>>         __gcov_indirect_call.counters = &__gcov4.main[0];
>>         PROF_9 = f_1;
>> -      __gcov_indirect_call_callee = PROF_9;
>> +      __gcov_indirect_call.callee = PROF_9;
>>         _4 = f_1 ();
>>      */
>>   
>> @@ -458,11 +458,11 @@ gimple_gen_ic_func_profiler (void)
>>   
>>     /* Insert code:
>>   
>> -     if (__gcov_indirect_call_callee != NULL)
>> +     if (__gcov_indirect_call.callee != NULL)
>>          __gcov_indirect_call_profiler_v3 (profile_id, &current_function_decl);
>>   
>>        The function __gcov_indirect_call_profiler_v3 is responsible for
>> -     resetting __gcov_indirect_call_callee to NULL.  */
>> +     resetting __gcov_indirect_call.callee to NULL.  */
>>   
>>     gimple_stmt_iterator gsi = gsi_start_bb (cond_bb);
>>     void0 = build_int_cst (ptr_type_node, 0);
>> @@ -904,7 +904,7 @@ pass_ipa_tree_profile::gate (function *)
>>   {
>>     /* When profile instrumentation, use or test coverage shall be performed.
>>        But for AutoFDO, this there is no instrumentation, thus this pass is
>> -     diabled.  */
>> +     disabled.  */
>>     return (!in_lto_p && !flag_auto_profile
>>   	  && (flag_branch_probabilities || flag_test_coverage
>>   	      || profile_arc_flag));
>> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
>> index 5013956cf86..4869ab8ccd6 100644
>> --- a/gcc/value-prof.c
>> +++ b/gcc/value-prof.c
>> @@ -579,8 +579,8 @@ free_histograms (struct function *fn)
>>      somehow.  */
>>   
>>   static bool
>> -check_counter (gimple *stmt, const char * name,
>> -	       gcov_type *count, gcov_type *all, profile_count bb_count_d)
>> +check_counter (gimple *stmt, const char *name, gcov_type *count, gcov_type *all,
>> +	       profile_count bb_count_d, float ratio = 1.0f)
>>   {
>>     gcov_type bb_count = bb_count_d.ipa ().to_gcov_type ();
>>     if (*all != bb_count || *count > *all)
>> @@ -599,7 +599,7 @@ check_counter (gimple *stmt, const char * name,
>>                                "count (%d)\n", name, (int)*all, (int)bb_count);
>>   	  *all = bb_count;
>>   	  if (*count > *all)
>> -            *count = *all;
>> +	    *count = *all * ratio;
>>   	  return false;
>>   	}
>>         else
>> @@ -1410,9 +1410,132 @@ gimple_ic (gcall *icall_stmt, struct cgraph_node *direct_call,
>>     return dcall_stmt;
>>   }
>>   
>> +/* If --param=indir-call-topn-profile=1 is specified when compiling, there maybe
>> +   multiple indirect targets in histogram.  Check every indirect/virtual call
>> +   if callee function exists, if not exit, leave it to LTO stage for later
>> +   process.  Modify code of this indirect call to an if-else structure in
>> +   ipa-profile finally.  */
>> +static bool
>> +ic_transform_topn (gimple_stmt_iterator *gsi)
>> +{
>> +  unsigned j;
>> +  gcall *stmt;
>> +  histogram_value histogram;
>> +  gcov_type val, count, count_all, all, bb_all;
>> +  struct cgraph_node *d_call;
>> +  profile_count bb_count;
>> +
>> +  stmt = dyn_cast<gcall *> (gsi_stmt (*gsi));
>> +  if (!stmt)
>> +    return false;
>> +
>> +  if (gimple_call_fndecl (stmt) != NULL_TREE)
>> +    return false;
>> +
>> +  if (gimple_call_internal_p (stmt))
>> +    return false;
>> +
>> +  histogram
>> +    = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL_TOPN);
>> +  if (!histogram)
>> +    return false;
>> +
>> +  count = 0;
>> +  all = 0;
>> +  bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
>> +  bb_count = gimple_bb (stmt)->count;
>> +
>> +  /* n_counters need be odd to avoid access violation.  */
>> +  gcc_assert (histogram->n_counters % 2 == 1);
>> +
>> +  /* For indirect call topn, accumulate all the counts first.  */
>> +  for (j = 1; j < histogram->n_counters; j += 2)
>> +    {
>> +      val = histogram->hvalue.counters[j];
>> +      count = histogram->hvalue.counters[j + 1];
>> +      if (val)
>> +	all += count;
>> +    }
>> +
>> +  count_all = all;
>> +  /* Do the indirect call conversion if function body exists, or else leave it
>> +     to LTO stage.  */
>> +  for (j = 1; j < histogram->n_counters; j += 2)
>> +    {
>> +      val = histogram->hvalue.counters[j];
>> +      count = histogram->hvalue.counters[j + 1];
>> +      if (val)
>> +	{
>> +	  /* The order of CHECK_COUNTER calls is important
>> +	     since check_counter can correct the third parameter
>> +	     and we want to make count <= all <= bb_count.  */
>> +	  if (check_counter (stmt, "ic", &all, &bb_all, bb_count)
>> +	      || check_counter (stmt, "ic", &count, &all,
>> +				profile_count::from_gcov_type (all),
>> +				(float) count / count_all))
>> +	    {
>> +	      gimple_remove_histogram_value (cfun, stmt, histogram);
>> +	      return false;
>> +	    }
>> +
>> +	  d_call = find_func_by_profile_id ((int) val);
>> +
>> +	  if (d_call == NULL)
>> +	    {
>> +	      if (val)
>> +		{
>> +		  if (dump_file)
>> +		    {
>> +		      fprintf (
>> +			dump_file,
>> +			"Indirect call -> direct call from other module");
>> +		      print_generic_expr (dump_file, gimple_call_fn (stmt),
>> +					  TDF_SLIM);
>> +		      fprintf (dump_file,
>> +			       "=> %i (will resolve only with LTO)\n",
>> +			       (int) val);
>> +		    }
>> +		}
>> +	      return false;
>> +	    }
>> +
>> +	  if (!check_ic_target (stmt, d_call))
>> +	    {
>> +	      if (dump_file)
>> +		{
>> +		  fprintf (dump_file, "Indirect call -> direct call ");
>> +		  print_generic_expr (dump_file, gimple_call_fn (stmt),
>> +				      TDF_SLIM);
>> +		  fprintf (dump_file, "=> ");
>> +		  print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
>> +		  fprintf (dump_file,
>> +			   " transformation skipped because of type mismatch");
>> +		  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>> +		}
>> +	      gimple_remove_histogram_value (cfun, stmt, histogram);
>> +	      return false;
>> +	    }
>> +
>> +	  if (dump_file)
>> +	  {
>> +	    fprintf (dump_file, "Indirect call -> direct call ");
>> +	    print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
>> +	    fprintf (dump_file, "=> ");
>> +	    print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
>> +	    fprintf (dump_file,
>> +		     " transformation on insn postponed to ipa-profile");
>> +	    print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>> +	    fprintf (dump_file, "hist->count %" PRId64
>> +		" hist->all %" PRId64"\n", count, all);
>> +	  }
>> +	}
>> +    }
>> +
>> +  return true;
>> +}
>>   /*
>>     For every checked indirect/virtual call determine if most common pid of
>> -  function/class method has probability more than 50%. If yes modify code of
>> +  function/class method has probability more than 50%.  If yes modify code of
>>     this call to:
>>    */
>>   
>> @@ -1423,6 +1546,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>>     histogram_value histogram;
>>     gcov_type val, count, all, bb_all;
>>     struct cgraph_node *direct_call;
>> +  enum hist_type type;
>>   
>>     stmt = dyn_cast <gcall *> (gsi_stmt (*gsi));
>>     if (!stmt)
>> @@ -1434,18 +1558,24 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>>     if (gimple_call_internal_p (stmt))
>>       return false;
>>   
>> -  histogram = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL);
>> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN
>> +						     : HIST_TYPE_INDIR_CALL;
>> +
>> +  histogram = gimple_histogram_value_of_type (cfun, stmt, type);
>>     if (!histogram)
>>       return false;
>>   
>> +  if (type == HIST_TYPE_INDIR_CALL_TOPN)
>> +      return ic_transform_topn (gsi);
>> +
>>     val = histogram->hvalue.counters [0];
>>     count = histogram->hvalue.counters [1];
>>     all = histogram->hvalue.counters [2];
>>   
>>     bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
>> -  /* The order of CHECK_COUNTER calls is important -
>> +  /* The order of CHECK_COUNTER calls is important
>>        since check_counter can correct the third parameter
>> -     and we want to make count <= all <= bb_all. */
>> +     and we want to make count <= all <= bb_all.  */
>>     if (check_counter (stmt, "ic", &all, &bb_all, gimple_bb (stmt)->count)
>>         || check_counter (stmt, "ic", &count, &all,
>>   		        profile_count::from_gcov_type (all)))
>> @@ -1494,7 +1624,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>>         print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
>>         fprintf (dump_file, "=> ");
>>         print_generic_expr (dump_file, direct_call->decl, TDF_SLIM);
>> -      fprintf (dump_file, " transformation on insn postponned to ipa-profile");
>> +      fprintf (dump_file, " transformation on insn postponed to ipa-profile");
>>         print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>         fprintf (dump_file, "hist->count %" PRId64
>>   	       " hist->all %" PRId64"\n", count, all);
>>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-18  9:03   ` luoxhu
@ 2019-06-18  9:34     ` Martin Liška
  2019-06-18 10:07       ` Segher Boessenkool
  2019-06-19  5:38       ` luoxhu
  0 siblings, 2 replies; 25+ messages in thread
From: Martin Liška @ 2019-06-18  9:34 UTC (permalink / raw)
  To: luoxhu, gcc-patches; +Cc: hubicka, segher, wschmidt, luoxhu

On 6/18/19 11:02 AM, luoxhu wrote:
> Hi,
> 
> On 2019/6/18 13:51, Martin Liška wrote:
>> On 6/18/19 3:45 AM, Xiong Hu Luo wrote:
>>
>> Hello.
>>
>> Thank you for the interest in the area.
>>
>>> This patch aims to fix PR69678 caused by PGO indirect call profiling bugs.
>>> Currently the default instrument function can only find the indirect function
>>> that called more than 50% with an incorrect count number returned.
>> Can you please explain what you mean by 'an incorrect count number returned'?
> 
> For a test case indir-call-topn.c, it include 2 indirect calls "one" and "two". the profiling data is as below with trunk code (including your patch, count[0] and count[2] is switched by your code, the count[0] is used in ipa-profile but only support the top1 format, my patch adds the support for the topn format. count[0] was incorrect as WITHOUT your patch it is 0,  things getting better with your fix as the count[0] is 350000000, but still not correct, in fact, "one" is running 175000000 times, and "two" is running the other 175000000 times):
> 
> indir-call-topn.gcda:   22:    01a90000:  18:COUNTERS indirect_call 9 counts
> indir-call-topn.gcda:   24:                   0: *350000000 1868707024 0* 0 0 0 0 0
> 
> Running with the "--param indir-call-topn-profile=1" will give below profile data, My patch is based on this profile result and do the optimization for multiple indirect targets, performance can get much improve on this testcase and SPEC2017 for some benchmarks(LLVM already support this several years ago...).
> 
> indir-call-topn.gcda:   26:    01b10000:  18:COUNTERS indirect_call_topn 9 counts
> indir-call-topn.gcda:   28:                   0: *0 969338501 175000000 1868707024 175000000* 0 0 0
> 
> 
> test case indir-call-topn.c:
> 
> #include <stdio.h>
> 
> 
> typedef int (*fptr) (int);
> int
> one (int a)
> {
>   return 1;
> }
> 
> int
> two (int a)
> {
>   return 0;
> }
> 
> fptr table[] = {&one, &two};
> 
> int
> main()
> {
>   int i, x;
>   fptr p = &one;
> 
>   one (3);
> 
>   for (i = 0; i < 350000000; i++)
>     {
>       x = (*p) (3);
>       p = table[x];
>     }
>   printf ("done:%d\n", x);
> }

I've got it. So it's situation where you have distribution equal to 50% and 50%. Note that it's
the only valid situation where both edges with be >= 50%. That's the threshold for which
we speculatively devirtualize edges. That said, you don't need generic topn counter, but a probably
only a top2 counter which can be generalized from single-value counter type. I'm saying that
because I removed the TOPN, mainly due to:
https://github.com/gcc-mirror/gcc/commit/5cb221f2b9c268df47c97b4837230b15e65f9c14#diff-d003c64ae14449d86df03508de98bde7L179

which is over-complicated profiling function. And the changes that I've done recently are motivated
to preserve a stable builds. That's achieved by noticing that a single-value counter can't handle all
seen values.

> 
>>
>>>   This patch
>>> leverages the "--param indir-call-topn-profile=1" and enables multiple indirect
>> Note that I've remove indir-call-topn-profile last week, the patch will not apply
>> on current trunk. However, I can help you how to adapt single-value counters
>> to support tracking of multiple values.
> 
> It will be very useful if you help me to track multiple values similarly on trunk code. I will rebase to your code once topn is ready again. Actually topn is more general and top1 is included in, I thought that top1 should be removed instead of topn, though topn will consume longer time than top1 in profile-generate.

As mentioned earlier, I really don't want to put TOPN back. I can help you once Honza will agree with the general IPA changes.

> 
>>
>>> targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, function
>>> specialization, profiling, partial devirtualization, inlining and cloning could
>>> be done successfully based on it.
>> This decision is definitely big question for Honza?
>>
>>> Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests.
>>> Details are:
>>>    1.  When do PGO with indir-call-topn-profile, the gcda data format is not
>>>    supported in ipa-profile pass,
>> If you take a look at gcc/ipa-profile.c:195 you can see how the probability
>> is propagated to IPA passes. Why is that not sufficient?
> 
> Current code only support single indirect target, I need track multiple indirect targets and create multiple speculative edges on single indirect call statement.
> 
> What's more, many ICEs happened in later stage due to single speculative target design, part of this patch is to solve the ICEs of multiple speculative target edges handling.

Well, to be honest I don't like the patch much. It brings another level of complexity for a quite rare situation where one
calls 2 functions via an indirect call. And as mentioned, current IPA optimization are not happy about multiple indirect branches.

Martin

> 
> 
> Thanks
> 
> Xionghu
> 
>>
>> Martin
>>
>>> so add variables to pass the information
>>>    through passes, and postpone gimple_ic to ipa-profile like default as inline
>>>    pass will decide whether it is benefit to transform indirect call.
>>>    2.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
>>>    profile full support in ipa passes and cgraph_edge functions.
>>>    3.  Fix various hidden speculative call ICEs exposed after enabling this
>>>    feature when running SPEC2017.
>>>    4.  Add 1 in module testcase and 2 cross module testcases.
>>>    5.  TODOs:
>>>      5.1.  Some reference info will be dropped from WPA to LTRANS, so
>>>      reference check will be difficult in LTRANS, need replace the strstr
>>>      with reference compare.
>>>      5.2.  Some duplicate code need be removed as top1 and topn share same logic.
>>>      Actually top1 related logic could be eliminated totally as topn includes it.
>>>      5.3.  Split patch maybe needed as too big but not sure how many would be
>>>      reasonable.
>>>    6.  Performance result for ppc64le:
>>>      6.1.  Representative test: indir-call-prof-topn.c runtime improved from
>>>      1.7s to 0.4s.
>>>      6.2.  SPEC2017 peakrate:
>>>          523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r (+13.33%);
>>>          525.x264_r (-5.29%).
>>>          No big changes of other benchmarks.
>>>          Option: -Ofast -mcpu=power8
>>>          PASS1_OPTIMIZE: -fprofile-generate --param indir-call-topn-profile=1 -flto
>>>          PASS2_OPTIMIZE: -fprofile-use --param indir-call-topn-profile=1 -flto
>>>          -fprofile-correction
>>>      6.3.  No performance change on PHP benchmark.
>>>    7.  Bootstrap and regression test passed on Power8-LE.
>>>
>>> gcc/ChangeLog
>>>
>>>     2019-06-17  Xiong Hu Luo  <luoxhu@linux.ibm.com>
>>>
>>>     PR ipa/69678
>>>     * cgraph.c (cgraph_node::get_create): Copy profile_id.
>>>     (cgraph_edge::speculative_call_info): Find real
>>>     reference for indirect targets.
>>>     (cgraph_edge::resolve_speculation): Add speculative code process
>>>     for indirect targets.
>>>     (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
>>>     (cgraph_node::verify_node): Likewise.
>>>     * cgraph.h (common_target_ids): New variable.
>>>     (common_target_probabilities): Likewise.
>>>     (num_of_ics): Likewise.
>>>     * cgraphclones.c (cgraph_node::create_clone): Copy profile_id.
>>>     * ipa-inline.c (inline_small_functions): Add iterator update.
>>>     * ipa-profile.c (ipa_profile_generate_summary): Add indirect
>>>     multiple targets logic.
>>>     (ipa_profile): Likewise.
>>>     * ipa-utils.c (ipa_merge_profiles): Clone speculative src's
>>>     referrings to dst.
>>>     * ipa.c (process_references): Fix typo.
>>>     * lto-cgraph.c (lto_output_edge): Add indirect multiple targets
>>>     logic.
>>>     (input_edge): Likewise.
>>>     * predict.c (dump_prediction): Revome edges count assert to be
>>>     precise.
>>>     * tree-profile.c (gimple_gen_ic_profiler): Use the new variable
>>>     __gcov_indirect_call.counters and __gcov_indirect_call.callee.
>>>     (gimple_gen_ic_func_profiler): Likewise.
>>>     (pass_ipa_tree_profile::gate): Fix comment typos.
>>>     * tree-inline.c (copy_bb): Duplicate all the speculative edges
>>>     if indirect call contains multiple speculative targets.
>>>     * value-prof.c (check_counter): Proportion the counter for
>>>     multiple targets.
>>>     (ic_transform_topn): New function.
>>>     (gimple_ic_transform): Handle topn case, fix comment typos.
>>>
>>> gcc/testsuite/ChangeLog
>>>
>>>     2019-06-17  Xiong Hu Luo  <luoxhu@linux.ibm.com>
>>>
>>>     PR ipa/69678
>>>     * gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase.
>>>     * gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase.
>>>     * gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase.
>>>     * gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase.
>>> ---
>>>   gcc/cgraph.c                                  |  38 +++-
>>>   gcc/cgraph.h                                  |   9 +-
>>>   gcc/cgraphclones.c                            |   1 +
>>>   gcc/ipa-inline.c                              |   3 +
>>>   gcc/ipa-profile.c                             | 185 +++++++++++++++++-
>>>   gcc/ipa-utils.c                               |   5 +
>>>   gcc/ipa.c                                     |   2 +-
>>>   gcc/lto-cgraph.c                              |  38 ++++
>>>   gcc/predict.c                                 |   1 -
>>>   .../tree-prof/crossmodule-indir-call-topn-1.c |  35 ++++
>>>   .../crossmodule-indir-call-topn-1a.c          |  22 +++
>>>   .../tree-prof/crossmodule-indir-call-topn-2.c |  42 ++++
>>>   .../gcc.dg/tree-prof/indir-call-prof-topn.c   |  38 ++++
>>>   gcc/tree-inline.c                             |  97 +++++----
>>>   gcc/tree-profile.c                            |  12 +-
>>>   gcc/value-prof.c                              | 146 +++++++++++++-
>>>   16 files changed, 606 insertions(+), 68 deletions(-)
>>>   create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>>>   create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>>>   create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>>>   create mode 100644 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
>>>
>>> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
>>> index de82316d4b1..0d373a67d1b 100644
>>> --- a/gcc/cgraph.c
>>> +++ b/gcc/cgraph.c
>>> @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
>>>       fprintf (dump_file, "Introduced new external node "
>>>            "(%s) and turned into root of the clone tree.\n",
>>>            node->dump_name ());
>>> +      node->profile_id = first_clone->profile_id;
>>>       }
>>>     else if (dump_file)
>>>       fprintf (dump_file, "Introduced new external node "
>>> @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
>>>     int i;
>>>     cgraph_edge *e2;
>>>     cgraph_edge *e = this;
>>> +  cgraph_node *referred_node;
>>>       if (!e->indirect_unknown_callee)
>>>       for (e2 = e->caller->indirect_calls;
>>> @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
>>>       && ((ref->stmt && ref->stmt == e->call_stmt)
>>>           || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
>>>         {
>>> -    reference = ref;
>>> -    break;
>>> +    if (e2->indirect_info && e2->indirect_info->num_of_ics)
>>> +      {
>>> +        referred_node = dyn_cast<cgraph_node *> (ref->referred);
>>> +        if (strstr (e->callee->name (), referred_node->name ()))
>>> +          {
>>> +        reference = ref;
>>> +        break;
>>> +          }
>>> +      }
>>> +    else
>>> +      {
>>> +        reference = ref;
>>> +        break;
>>> +      }
>>>         }
>>>       /* Speculative edge always consist of all three components - direct edge,
>>> @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
>>>            in the functions inlined through it.  */
>>>       }
>>>     edge->count += e2->count;
>>> -  edge->speculative = false;
>>> +  if (edge->indirect_info && edge->indirect_info->num_of_ics)
>>> +    {
>>> +      edge->indirect_info->num_of_ics--;
>>> +      if (edge->indirect_info->num_of_ics == 0)
>>> +    edge->speculative = false;
>>> +    }
>>> +  else
>>> +    edge->speculative = false;
>>>     e2->speculative = false;
>>>     ref->remove_reference ();
>>>     if (e2->indirect_unknown_callee || e2->inline_failed)
>>> @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>>>         e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
>>>                                false);
>>>         e->count = gimple_bb (e->call_stmt)->count;
>>> -      e2->speculative = false;
>>> +      if (e2->indirect_info && e2->indirect_info->num_of_ics)
>>> +        {
>>> +          e2->indirect_info->num_of_ics--;
>>> +          if (e2->indirect_info->num_of_ics == 0)
>>> +        e2->speculative = false;
>>> +        }
>>> +      else
>>> +        e2->speculative = false;
>>>         e2->count = gimple_bb (e2->call_stmt)->count;
>>>         ref->speculative = false;
>>>         ref->stmt = NULL;
>>> @@ -3407,7 +3435,7 @@ cgraph_node::verify_node (void)
>>>           for (e = callees; e; e = e->next_callee)
>>>       {
>>> -      if (!e->aux)
>>> +      if (!e->aux && !e->speculative)
>>>           {
>>>             error ("edge %s->%s has no corresponding call_stmt",
>>>                identifier_to_locale (e->caller->name ()),
>>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
>>> index c294602d762..ed0fbc60432 100644
>>> --- a/gcc/cgraph.h
>>> +++ b/gcc/cgraph.h
>>> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>>>   #include "profile-count.h"
>>>   #include "ipa-ref.h"
>>>   #include "plugin-api.h"
>>> +#include "gcov-io.h"
>>>     extern void debuginfo_early_init (void);
>>>   extern void debuginfo_init (void);
>>> @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info
>>>     int param_index;
>>>     /* ECF flags determined from the caller.  */
>>>     int ecf_flags;
>>> -  /* Profile_id of common target obtrained from profile.  */
>>> +  /* Profile_id of common target obtained from profile.  */
>>>     int common_target_id;
>>>     /* Probability that call will land in function with COMMON_TARGET_ID.  */
>>>     int common_target_probability;
>>>   +  /* Profile_id of common target obtained from profile.  */
>>> +  int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2];
>>> +  /* Probabilities that call will land in function with COMMON_TARGET_IDS.  */
>>> +  int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2];
>>> +  unsigned num_of_ics;
>>> +
>>>     /* Set when the call is a virtual call with the parameter being the
>>>        associated object pointer rather than a simple direct call.  */
>>>     unsigned polymorphic : 1;
>>> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
>>> index 15f7e119d18..94f424bc10c 100644
>>> --- a/gcc/cgraphclones.c
>>> +++ b/gcc/cgraphclones.c
>>> @@ -467,6 +467,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
>>>     new_node->icf_merged = icf_merged;
>>>     new_node->merged_comdat = merged_comdat;
>>>     new_node->thunk = thunk;
>>> +  new_node->profile_id = profile_id;
>>>       new_node->clone.tree_map = NULL;
>>>     new_node->clone.args_to_skip = args_to_skip;
>>> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
>>> index 360c3de3289..ef2b217b3f9 100644
>>> --- a/gcc/ipa-inline.c
>>> +++ b/gcc/ipa-inline.c
>>> @@ -1866,12 +1866,15 @@ inline_small_functions (void)
>>>       }
>>>         if (has_speculative)
>>>       for (edge = node->callees; edge; edge = next)
>>> +    {
>>> +      next = edge->next_callee;
>>>         if (edge->speculative && !speculation_useful_p (edge,
>>>                                 edge->aux != NULL))
>>>           {
>>>             edge->resolve_speculation ();
>>>             update = true;
>>>           }
>>> +    }
>>>         if (update)
>>>       {
>>>         struct cgraph_node *where = node->global.inlined_to
>>> diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
>>> index de9563d808c..d04476295a0 100644
>>> --- a/gcc/ipa-profile.c
>>> +++ b/gcc/ipa-profile.c
>>> @@ -168,6 +168,10 @@ ipa_profile_generate_summary (void)
>>>     struct cgraph_node *node;
>>>     gimple_stmt_iterator gsi;
>>>     basic_block bb;
>>> +  enum hist_type type;
>>> +
>>> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN
>>> +                             : HIST_TYPE_INDIR_CALL;
>>>       hash_table<histogram_hash> hashtable (10);
>>>     @@ -186,10 +190,10 @@ ipa_profile_generate_summary (void)
>>>             histogram_value h;
>>>             h = gimple_histogram_value_of_type
>>>               (DECL_STRUCT_FUNCTION (node->decl),
>>> -             stmt, HIST_TYPE_INDIR_CALL);
>>> +             stmt, type);
>>>             /* No need to do sanity check: gimple_ic_transform already
>>>                takes away bad histograms.  */
>>> -          if (h)
>>> +          if (h && type == HIST_TYPE_INDIR_CALL)
>>>               {
>>>                 /* counter 0 is target, counter 1 is number of execution we called target,
>>>                counter 2 is total number of executions.  */
>>> @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void)
>>>                 gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (node->decl),
>>>                                 stmt, h);
>>>               }
>>> +          else if (h && type == HIST_TYPE_INDIR_CALL_TOPN)
>>> +            {
>>> +              unsigned j;
>>> +              struct cgraph_edge *e = node->get_edge (stmt);
>>> +              if (e && !e->indirect_unknown_callee)
>>> +            continue;
>>> +
>>> +              e->indirect_info->num_of_ics = 0;
>>> +              for (j = 1; j < h->n_counters; j += 2)
>>> +            {
>>> +              if (h->hvalue.counters[j] == 0)
>>> +                continue;
>>> +
>>> +              e->indirect_info->common_target_ids[j / 2]
>>> +                = h->hvalue.counters[j];
>>> +              e->indirect_info->common_target_probabilities[j / 2]
>>> +                = GCOV_COMPUTE_SCALE (
>>> +                  h->hvalue.counters[j + 1],
>>> +                  gimple_bb (stmt)->count.ipa ().to_gcov_type ());
>>> +              if (e->indirect_info
>>> +                ->common_target_probabilities[j / 2]
>>> +                  > REG_BR_PROB_BASE)
>>> +                {
>>> +                  if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Probability capped to 1\n");
>>> +                  e->indirect_info
>>> +                ->common_target_probabilities[j / 2]
>>> +                = REG_BR_PROB_BASE;
>>> +                }
>>> +              e->indirect_info->num_of_ics++;
>>> +            }
>>> +
>>> +              gcc_assert (e->indirect_info->num_of_ics
>>> +                  <= GCOV_ICALL_TOPN_NCOUNTS / 2);
>>> +
>>> +              gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (
>>> +                               node->decl),
>>> +                             stmt, h);
>>> +            }
>>>           }
>>>             time += estimate_num_insns (stmt, &eni_time_weights);
>>>             size += estimate_num_insns (stmt, &eni_size_weights);
>>> @@ -492,6 +536,7 @@ ipa_profile (void)
>>>     int nindirect = 0, ncommon = 0, nunknown = 0, nuseless = 0, nconverted = 0;
>>>     int nmismatch = 0, nimpossible = 0;
>>>     bool node_map_initialized = false;
>>> +  gcov_type threshold;
>>>       if (dump_file)
>>>       dump_histogram (dump_file, histogram);
>>> @@ -500,14 +545,12 @@ ipa_profile (void)
>>>         overall_time += histogram[i]->count * histogram[i]->time;
>>>         overall_size += histogram[i]->size;
>>>       }
>>> +  threshold = 0;
>>>     if (overall_time)
>>>       {
>>> -      gcov_type threshold;
>>> -
>>>         gcc_assert (overall_size);
>>>           cutoff = (overall_time * PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE) + 500) / 1000;
>>> -      threshold = 0;
>>>         for (i = 0; cumulated < cutoff; i++)
>>>       {
>>>         cumulated += histogram[i]->count * histogram[i]->time;
>>> @@ -543,7 +586,7 @@ ipa_profile (void)
>>>     histogram.release ();
>>>     histogram_pool.release ();
>>>   -  /* Produce speculative calls: we saved common traget from porfiling into
>>> +  /* Produce speculative calls: we saved common target from profiling into
>>>        e->common_target_id.  Now, at link time, we can look up corresponding
>>>        function node and produce speculative call.  */
>>>   @@ -558,7 +601,8 @@ ipa_profile (void)
>>>       {
>>>         if (n->count.initialized_p ())
>>>           nindirect++;
>>> -      if (e->indirect_info->common_target_id)
>>> +      if (e->indirect_info->common_target_id
>>> +          || (e->indirect_info && e->indirect_info->num_of_ics == 1))
>>>           {
>>>             if (!node_map_initialized)
>>>               init_node_map (false);
>>> @@ -613,7 +657,7 @@ ipa_profile (void)
>>>                 if (dump_file)
>>>               fprintf (dump_file,
>>>                    "Not speculating: "
>>> -                 "parameter count mistmatch\n");
>>> +                 "parameter count mismatch\n");
>>>               }
>>>             else if (e->indirect_info->polymorphic
>>>                  && !opt_for_fn (n->decl, flag_devirtualize)
>>> @@ -655,7 +699,130 @@ ipa_profile (void)
>>>             nunknown++;
>>>           }
>>>           }
>>> -     }
>>> +      if (e->indirect_info && e->indirect_info->num_of_ics > 1)
>>> +        {
>>> +          if (in_lto_p)
>>> +        {
>>> +          if (dump_file)
>>> +            {
>>> +              fprintf (dump_file,
>>> +                   "Updating hotness threshold in LTO mode.\n");
>>> +              fprintf (dump_file, "Updated min count: %" PRId64 "\n",
>>> +                   (int64_t) threshold);
>>> +            }
>>> +          set_hot_bb_threshold (threshold
>>> +                    / e->indirect_info->num_of_ics);
>>> +        }
>>> +          if (!node_map_initialized)
>>> +        init_node_map (false);
>>> +          node_map_initialized = true;
>>> +          ncommon++;
>>> +          unsigned speculative = 0;
>>> +          for (i = 0; i < (int)e->indirect_info->num_of_ics; i++)
>>> +        {
>>> +          n2 = find_func_by_profile_id (
>>> +            e->indirect_info->common_target_ids[i]);
>>> +          if (n2)
>>> +            {
>>> +              if (dump_file)
>>> +            {
>>> +              fprintf (
>>> +                dump_file,
>>> +                "Indirect call -> direct call from"
>>> +                " other module %s => %s, prob %3.2f\n",
>>> +                n->dump_name (), n2->dump_name (),
>>> +                e->indirect_info->common_target_probabilities[i]
>>> +                  / (float) REG_BR_PROB_BASE);
>>> +            }
>>> +              if (e->indirect_info->common_target_probabilities[i]
>>> +              < REG_BR_PROB_BASE / 2)
>>> +            {
>>> +              nuseless++;
>>> +              if (dump_file)
>>> +                fprintf (
>>> +                  dump_file,
>>> +                  "Not speculating: probability is too low.\n");
>>> +            }
>>> +              else if (!e->maybe_hot_p ())
>>> +            {
>>> +              nuseless++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Not speculating: call is cold.\n");
>>> +            }
>>> +              else if (n2->get_availability () <= AVAIL_INTERPOSABLE
>>> +                   && n2->can_be_discarded_p ())
>>> +            {
>>> +              nuseless++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Not speculating: target is overwritable "
>>> +                     "and can be discarded.\n");
>>> +            }
>>> +              else if (ipa_node_params_sum && ipa_edge_args_sum
>>> +                   && (!vec_safe_is_empty (
>>> +                 IPA_NODE_REF (n2)->descriptors))
>>> +                   && ipa_get_param_count (IPA_NODE_REF (n2))
>>> +                    != ipa_get_cs_argument_count (
>>> +                      IPA_EDGE_REF (e))
>>> +                   && (ipa_get_param_count (IPA_NODE_REF (n2))
>>> +                     >= ipa_get_cs_argument_count (
>>> +                       IPA_EDGE_REF (e))
>>> +                   || !stdarg_p (TREE_TYPE (n2->decl))))
>>> +            {
>>> +              nmismatch++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file, "Not speculating: "
>>> +                        "parameter count mismatch\n");
>>> +            }
>>> +              else if (e->indirect_info->polymorphic
>>> +                   && !opt_for_fn (n->decl, flag_devirtualize)
>>> +                   && !possible_polymorphic_call_target_p (e, n2))
>>> +            {
>>> +              nimpossible++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Not speculating: "
>>> +                     "function is not in the polymorphic "
>>> +                     "call target list\n");
>>> +            }
>>> +              else
>>> +            {
>>> +              /* Target may be overwritable, but profile says that
>>> +                 control flow goes to this particular implementation
>>> +                 of N2.  Speculate on the local alias to allow
>>> +                 inlining.
>>> +                 */
>>> +              if (!n2->can_be_discarded_p ())
>>> +                {
>>> +                  cgraph_node *alias;
>>> +                  alias = dyn_cast<cgraph_node *> (
>>> +                n2->noninterposable_alias ());
>>> +                  if (alias)
>>> +                n2 = alias;
>>> +                }
>>> +              nconverted++;
>>> +              e->make_speculative (
>>> +                n2, e->count.apply_probability (
>>> +                  e->indirect_info
>>> +                    ->common_target_probabilities[i]));
>>> +              update = true;
>>> +              speculative++;
>>> +            }
>>> +            }
>>> +          else
>>> +            {
>>> +              if (dump_file)
>>> +            fprintf (dump_file,
>>> +                 "Function with profile-id %i not found.\n",
>>> +                 e->indirect_info->common_target_ids[i]);
>>> +              nunknown++;
>>> +            }
>>> +        }
>>> +          if (speculative < e->indirect_info->num_of_ics)
>>> +        e->indirect_info->num_of_ics = speculative;
>>> +        }
>>> +    }
>>>          if (update)
>>>        ipa_update_overall_fn_summary (n);
>>>        }
>>> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
>>> index 79b250c3943..30347691029 100644
>>> --- a/gcc/ipa-utils.c
>>> +++ b/gcc/ipa-utils.c
>>> @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst,
>>>         update_max_bb_count ();
>>>         compute_function_frequency ();
>>>         pop_cfun ();
>>> +      /* When src is speculative, clone the referrings.  */
>>> +      if (src->indirect_call_target)
>>> +    for (e = src->callers; e; e = e->next_caller)
>>> +      if (e->callee == src && e->speculative)
>>> +        dst->clone_referring (src);
>>>         for (e = dst->callees; e; e = e->next_callee)
>>>       {
>>>         if (e->speculative)
>>> diff --git a/gcc/ipa.c b/gcc/ipa.c
>>> index 2496694124c..c1fe081a72d 100644
>>> --- a/gcc/ipa.c
>>> +++ b/gcc/ipa.c
>>> @@ -166,7 +166,7 @@ process_references (symtab_node *snode,
>>>      devirtualization happens.  After inlining still keep their declarations
>>>      around, so we can devirtualize to a direct call.
>>>   -   Also try to make trivial devirutalization when no or only one target is
>>> +   Also try to make trivial devirtualization when no or only one target is
>>>      possible.  */
>>>     static void
>>> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
>>> index 4dfa2862be3..0c8f547d44e 100644
>>> --- a/gcc/lto-cgraph.c
>>> +++ b/gcc/lto-cgraph.c
>>> @@ -238,6 +238,7 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
>>>     unsigned int uid;
>>>     intptr_t ref;
>>>     struct bitpack_d bp;
>>> +  unsigned i;
>>>       if (edge->indirect_unknown_callee)
>>>       streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
>>> @@ -296,6 +297,25 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
>>>         if (edge->indirect_info->common_target_id)
>>>       streamer_write_hwi_stream
>>>          (ob->main_stream, edge->indirect_info->common_target_probability);
>>> +
>>> +      gcc_assert (edge->indirect_info->num_of_ics
>>> +          <= GCOV_ICALL_TOPN_NCOUNTS / 2);
>>> +
>>> +      streamer_write_hwi_stream (ob->main_stream,
>>> +                 edge->indirect_info->num_of_ics);
>>> +
>>> +      if (edge->indirect_info->num_of_ics)
>>> +    {
>>> +      for (i = 0; i < edge->indirect_info->num_of_ics; i++)
>>> +        {
>>> +          streamer_write_hwi_stream (
>>> +        ob->main_stream, edge->indirect_info->common_target_ids[i]);
>>> +          if (edge->indirect_info->common_target_ids[i])
>>> +        streamer_write_hwi_stream (
>>> +          ob->main_stream,
>>> +          edge->indirect_info->common_target_probabilities[i]);
>>> +        }
>>> +    }
>>>       }
>>>   }
>>>   @@ -1438,6 +1458,7 @@ input_edge (struct lto_input_block *ib, vec<symtab_node *> nodes,
>>>     cgraph_inline_failed_t inline_failed;
>>>     struct bitpack_d bp;
>>>     int ecf_flags = 0;
>>> +  unsigned i;
>>>       caller = dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]);
>>>     if (caller == NULL || caller->decl == NULL_TREE)
>>> @@ -1488,6 +1509,23 @@ input_edge (struct lto_input_block *ib, vec<symtab_node *> nodes,
>>>         edge->indirect_info->common_target_id = streamer_read_hwi (ib);
>>>         if (edge->indirect_info->common_target_id)
>>>           edge->indirect_info->common_target_probability = streamer_read_hwi (ib);
>>> +
>>> +      edge->indirect_info->num_of_ics = streamer_read_hwi (ib);
>>> +
>>> +      gcc_assert (edge->indirect_info->num_of_ics
>>> +          <= GCOV_ICALL_TOPN_NCOUNTS / 2);
>>> +
>>> +      if (edge->indirect_info->num_of_ics)
>>> +    {
>>> +      for (i = 0; i < edge->indirect_info->num_of_ics; i++)
>>> +        {
>>> +          edge->indirect_info->common_target_ids[i]
>>> +        = streamer_read_hwi (ib);
>>> +          if (edge->indirect_info->common_target_ids[i])
>>> +        edge->indirect_info->common_target_probabilities[i]
>>> +          = streamer_read_hwi (ib);
>>> +        }
>>> +    }
>>>       }
>>>   }
>>>   diff --git a/gcc/predict.c b/gcc/predict.c
>>> index 43ee91a5b13..b7f38891c72 100644
>>> --- a/gcc/predict.c
>>> +++ b/gcc/predict.c
>>> @@ -763,7 +763,6 @@ dump_prediction (FILE *file, enum br_predictor predictor, int probability,
>>>         && bb->count.precise_p ()
>>>         && reason == REASON_NONE)
>>>       {
>>> -      gcc_assert (e->count ().precise_p ());
>>>         fprintf (file, ";;heuristics;%s;%" PRId64 ";%" PRId64 ";%.1f;\n",
>>>              predictor_info[predictor].name,
>>>              bb->count.to_gcov_type (), e->count ().to_gcov_type (),
>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>>> new file mode 100644
>>> index 00000000000..e0a83c2e067
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>>> @@ -0,0 +1,35 @@
>>> +/* { dg-require-effective-target lto } */
>>> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
>>> +/* { dg-require-profiling "-fprofile-generate" } */
>>> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */
>>> +
>>> +#include <stdio.h>
>>> +
>>> +typedef int (*fptr) (int);
>>> +int
>>> +one (int a);
>>> +
>>> +int
>>> +two (int a);
>>> +
>>> +fptr table[] = {&one, &two};
>>> +
>>> +int
>>> +main()
>>> +{
>>> +  int i, x;
>>> +  fptr p = &one;
>>> +
>>> +  x = one (3);
>>> +
>>> +  for (i = 0; i < 350000000; i++)
>>> +    {
>>> +      x = (*p) (3);
>>> +      p = table[x];
>>> +    }
>>> +  printf ("done:%d\n", x);
>>> +}
>>> +
>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */
>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */
>>> +
>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>>> new file mode 100644
>>> index 00000000000..a8c6e365fb9
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>>> @@ -0,0 +1,22 @@
>>> +/* It seems there is no way to avoid the other source of mulitple
>>> +   source testcase from being compiled independently.  Just avoid
>>> +   error.  */
>>> +#ifdef DOJOB
>>> +int
>>> +one (int a)
>>> +{
>>> +  return 1;
>>> +}
>>> +
>>> +int
>>> +two (int a)
>>> +{
>>> +  return 0;
>>> +}
>>> +#else
>>> +int
>>> +main()
>>> +{
>>> +  return 0;
>>> +}
>>> +#endif
>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>>> new file mode 100644
>>> index 00000000000..aa3887fde83
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>>> @@ -0,0 +1,42 @@
>>> +/* { dg-require-effective-target lto } */
>>> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
>>> +/* { dg-require-profiling "-fprofile-generate" } */
>>> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */
>>> +
>>> +#include <stdio.h>
>>> +
>>> +typedef int (*fptr) (int);
>>> +int
>>> +one (int a);
>>> +
>>> +int
>>> +two (int a);
>>> +
>>> +fptr table[] = {&one, &two};
>>> +
>>> +int foo ()
>>> +{
>>> +  int i, x;
>>> +  fptr p = &one;
>>> +
>>> +  x = one (3);
>>> +
>>> +  for (i = 0; i < 350000000; i++)
>>> +    {
>>> +      x = (*p) (3);
>>> +      p = table[x];
>>> +    }
>>> +  return x;
>>> +}
>>> +
>>> +int
>>> +main()
>>> +{
>>> +  int x = foo ();
>>> +  printf ("done:%d\n", x);
>>> +}
>>> +
>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */
>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */
>>> +
>>> +
>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
>>> new file mode 100644
>>> index 00000000000..951bc7ddd19
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
>>> @@ -0,0 +1,38 @@
>>> +/* { dg-require-profiling "-fprofile-generate" } */
>>> +/* { dg-options "-O2 -fdump-ipa-profile --param indir-call-topn-profile=1" } */
>>> +
>>> +#include <stdio.h>
>>> +
>>> +typedef int (*fptr) (int);
>>> +int
>>> +one (int a)
>>> +{
>>> +  return 1;
>>> +}
>>> +
>>> +int
>>> +two (int a)
>>> +{
>>> +  return 0;
>>> +}
>>> +
>>> +fptr table[] = {&one, &two};
>>> +
>>> +int
>>> +main()
>>> +{
>>> +  int i, x;
>>> +  fptr p = &one;
>>> +
>>> +  one (3);
>>> +
>>> +  for (i = 0; i < 350000000; i++)
>>> +    {
>>> +      x = (*p) (3);
>>> +      p = table[x];
>>> +    }
>>> +  printf ("done:%d\n", x);
>>> +}
>>> +
>>> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile" } } */
>>> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile" } } */
>>> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
>>> index 9017da878b1..f69b31b197e 100644
>>> --- a/gcc/tree-inline.c
>>> +++ b/gcc/tree-inline.c
>>> @@ -2028,43 +2028,66 @@ copy_bb (copy_body_data *id, basic_block bb,
>>>             switch (id->transform_call_graph_edges)
>>>           {
>>>           case CB_CGE_DUPLICATE:
>>> -          edge = id->src_node->get_edge (orig_stmt);
>>> -          if (edge)
>>> -            {
>>> -              struct cgraph_edge *old_edge = edge;
>>> -              profile_count old_cnt = edge->count;
>>> -              edge = edge->clone (id->dst_node, call_stmt,
>>> -                      gimple_uid (stmt),
>>> -                      num, den,
>>> -                      true);
>>> -
>>> -              /* Speculative calls consist of two edges - direct and
>>> -             indirect.  Duplicate the whole thing and distribute
>>> -             frequencies accordingly.  */
>>> -              if (edge->speculative)
>>> -            {
>>> -              struct cgraph_edge *direct, *indirect;
>>> -              struct ipa_ref *ref;
>>> -
>>> -              gcc_assert (!edge->indirect_unknown_callee);
>>> -              old_edge->speculative_call_info (direct, indirect, ref);
>>> -
>>> -              profile_count indir_cnt = indirect->count;
>>> -              indirect = indirect->clone (id->dst_node, call_stmt,
>>> -                              gimple_uid (stmt),
>>> -                              num, den,
>>> -                              true);
>>> -
>>> -              profile_probability prob
>>> -                 = indir_cnt.probability_in (old_cnt + indir_cnt);
>>> -              indirect->count
>>> -                 = copy_basic_block->count.apply_probability (prob);
>>> -              edge->count = copy_basic_block->count - indirect->count;
>>> -              id->dst_node->clone_reference (ref, stmt);
>>> -            }
>>> -              else
>>> -            edge->count = copy_basic_block->count;
>>> -            }
>>> +          {
>>> +            edge = id->src_node->get_edge (orig_stmt);
>>> +            struct cgraph_edge *old_edge = edge;
>>> +            struct cgraph_edge *direct, *indirect;
>>> +            bool next_speculative;
>>> +            do
>>> +              {
>>> +            next_speculative = false;
>>> +            if (edge)
>>> +              {
>>> +                profile_count old_cnt = edge->count;
>>> +                edge
>>> +                  = edge->clone (id->dst_node, call_stmt,
>>> +                         gimple_uid (stmt), num, den, true);
>>> +
>>> +                /* Speculative calls consist of two edges - direct
>>> +                   and indirect.  Duplicate the whole thing and
>>> +                   distribute frequencies accordingly.  */
>>> +                if (edge->speculative)
>>> +                  {
>>> +                struct ipa_ref *ref;
>>> +
>>> +                gcc_assert (!edge->indirect_unknown_callee);
>>> +                old_edge->speculative_call_info (direct,
>>> +                                 indirect, ref);
>>> +
>>> +                profile_count indir_cnt = indirect->count;
>>> +                indirect
>>> +                  = indirect->clone (id->dst_node, call_stmt,
>>> +                             gimple_uid (stmt), num,
>>> +                             den, true);
>>> +
>>> +                profile_probability prob
>>> +                  = indir_cnt.probability_in (old_cnt
>>> +                                  + indir_cnt);
>>> +                indirect->count
>>> +                  = copy_basic_block->count.apply_probability (
>>> +                    prob);
>>> +                edge->count
>>> +                  = copy_basic_block->count - indirect->count;
>>> +                id->dst_node->clone_reference (ref, stmt);
>>> +                  }
>>> +                else
>>> +                  edge->count = copy_basic_block->count;
>>> +              }
>>> +            /* If the indirect call contains more than one indirect
>>> +               targets, need clone all speculative edges here.  */
>>> +            if (old_edge && old_edge->next_callee
>>> +                && old_edge->speculative && indirect
>>> +                && indirect->indirect_info
>>> +                && indirect->indirect_info->num_of_ics > 1)
>>> +              {
>>> +                edge = old_edge->next_callee;
>>> +                old_edge = old_edge->next_callee;
>>> +                if (edge->speculative)
>>> +                  next_speculative = true;
>>> +              }
>>> +              }
>>> +            while (next_speculative);
>>> +          }
>>>             break;
>>>             case CB_CGE_MOVE_CLONES:
>>> diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
>>> index 1c3034aac10..4964dbdebb5 100644
>>> --- a/gcc/tree-profile.c
>>> +++ b/gcc/tree-profile.c
>>> @@ -74,8 +74,8 @@ static GTY(()) tree ic_tuple_callee_field;
>>>   /* Do initialization work for the edge profiler.  */
>>>     /* Add code:
>>> -   __thread gcov*    __gcov_indirect_call_counters; // pointer to actual counter
>>> -   __thread void*    __gcov_indirect_call_callee; // actual callee address
>>> +   __thread gcov*    __gcov_indirect_call.counters; // pointer to actual counter
>>> +   __thread void*    __gcov_indirect_call.callee; // actual callee address
>>>      __thread int __gcov_function_counter; // time profiler function counter
>>>   */
>>>   static void
>>> @@ -395,7 +395,7 @@ gimple_gen_ic_profiler (histogram_value value, unsigned tag, unsigned base)
>>>         f_1 = foo;
>>>         __gcov_indirect_call.counters = &__gcov4.main[0];
>>>         PROF_9 = f_1;
>>> -      __gcov_indirect_call_callee = PROF_9;
>>> +      __gcov_indirect_call.callee = PROF_9;
>>>         _4 = f_1 ();
>>>      */
>>>   @@ -458,11 +458,11 @@ gimple_gen_ic_func_profiler (void)
>>>       /* Insert code:
>>>   -     if (__gcov_indirect_call_callee != NULL)
>>> +     if (__gcov_indirect_call.callee != NULL)
>>>          __gcov_indirect_call_profiler_v3 (profile_id, &current_function_decl);
>>>          The function __gcov_indirect_call_profiler_v3 is responsible for
>>> -     resetting __gcov_indirect_call_callee to NULL.  */
>>> +     resetting __gcov_indirect_call.callee to NULL.  */
>>>       gimple_stmt_iterator gsi = gsi_start_bb (cond_bb);
>>>     void0 = build_int_cst (ptr_type_node, 0);
>>> @@ -904,7 +904,7 @@ pass_ipa_tree_profile::gate (function *)
>>>   {
>>>     /* When profile instrumentation, use or test coverage shall be performed.
>>>        But for AutoFDO, this there is no instrumentation, thus this pass is
>>> -     diabled.  */
>>> +     disabled.  */
>>>     return (!in_lto_p && !flag_auto_profile
>>>         && (flag_branch_probabilities || flag_test_coverage
>>>             || profile_arc_flag));
>>> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
>>> index 5013956cf86..4869ab8ccd6 100644
>>> --- a/gcc/value-prof.c
>>> +++ b/gcc/value-prof.c
>>> @@ -579,8 +579,8 @@ free_histograms (struct function *fn)
>>>      somehow.  */
>>>     static bool
>>> -check_counter (gimple *stmt, const char * name,
>>> -           gcov_type *count, gcov_type *all, profile_count bb_count_d)
>>> +check_counter (gimple *stmt, const char *name, gcov_type *count, gcov_type *all,
>>> +           profile_count bb_count_d, float ratio = 1.0f)
>>>   {
>>>     gcov_type bb_count = bb_count_d.ipa ().to_gcov_type ();
>>>     if (*all != bb_count || *count > *all)
>>> @@ -599,7 +599,7 @@ check_counter (gimple *stmt, const char * name,
>>>                                "count (%d)\n", name, (int)*all, (int)bb_count);
>>>         *all = bb_count;
>>>         if (*count > *all)
>>> -            *count = *all;
>>> +        *count = *all * ratio;
>>>         return false;
>>>       }
>>>         else
>>> @@ -1410,9 +1410,132 @@ gimple_ic (gcall *icall_stmt, struct cgraph_node *direct_call,
>>>     return dcall_stmt;
>>>   }
>>>   +/* If --param=indir-call-topn-profile=1 is specified when compiling, there maybe
>>> +   multiple indirect targets in histogram.  Check every indirect/virtual call
>>> +   if callee function exists, if not exit, leave it to LTO stage for later
>>> +   process.  Modify code of this indirect call to an if-else structure in
>>> +   ipa-profile finally.  */
>>> +static bool
>>> +ic_transform_topn (gimple_stmt_iterator *gsi)
>>> +{
>>> +  unsigned j;
>>> +  gcall *stmt;
>>> +  histogram_value histogram;
>>> +  gcov_type val, count, count_all, all, bb_all;
>>> +  struct cgraph_node *d_call;
>>> +  profile_count bb_count;
>>> +
>>> +  stmt = dyn_cast<gcall *> (gsi_stmt (*gsi));
>>> +  if (!stmt)
>>> +    return false;
>>> +
>>> +  if (gimple_call_fndecl (stmt) != NULL_TREE)
>>> +    return false;
>>> +
>>> +  if (gimple_call_internal_p (stmt))
>>> +    return false;
>>> +
>>> +  histogram
>>> +    = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL_TOPN);
>>> +  if (!histogram)
>>> +    return false;
>>> +
>>> +  count = 0;
>>> +  all = 0;
>>> +  bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
>>> +  bb_count = gimple_bb (stmt)->count;
>>> +
>>> +  /* n_counters need be odd to avoid access violation.  */
>>> +  gcc_assert (histogram->n_counters % 2 == 1);
>>> +
>>> +  /* For indirect call topn, accumulate all the counts first.  */
>>> +  for (j = 1; j < histogram->n_counters; j += 2)
>>> +    {
>>> +      val = histogram->hvalue.counters[j];
>>> +      count = histogram->hvalue.counters[j + 1];
>>> +      if (val)
>>> +    all += count;
>>> +    }
>>> +
>>> +  count_all = all;
>>> +  /* Do the indirect call conversion if function body exists, or else leave it
>>> +     to LTO stage.  */
>>> +  for (j = 1; j < histogram->n_counters; j += 2)
>>> +    {
>>> +      val = histogram->hvalue.counters[j];
>>> +      count = histogram->hvalue.counters[j + 1];
>>> +      if (val)
>>> +    {
>>> +      /* The order of CHECK_COUNTER calls is important
>>> +         since check_counter can correct the third parameter
>>> +         and we want to make count <= all <= bb_count.  */
>>> +      if (check_counter (stmt, "ic", &all, &bb_all, bb_count)
>>> +          || check_counter (stmt, "ic", &count, &all,
>>> +                profile_count::from_gcov_type (all),
>>> +                (float) count / count_all))
>>> +        {
>>> +          gimple_remove_histogram_value (cfun, stmt, histogram);
>>> +          return false;
>>> +        }
>>> +
>>> +      d_call = find_func_by_profile_id ((int) val);
>>> +
>>> +      if (d_call == NULL)
>>> +        {
>>> +          if (val)
>>> +        {
>>> +          if (dump_file)
>>> +            {
>>> +              fprintf (
>>> +            dump_file,
>>> +            "Indirect call -> direct call from other module");
>>> +              print_generic_expr (dump_file, gimple_call_fn (stmt),
>>> +                      TDF_SLIM);
>>> +              fprintf (dump_file,
>>> +                   "=> %i (will resolve only with LTO)\n",
>>> +                   (int) val);
>>> +            }
>>> +        }
>>> +          return false;
>>> +        }
>>> +
>>> +      if (!check_ic_target (stmt, d_call))
>>> +        {
>>> +          if (dump_file)
>>> +        {
>>> +          fprintf (dump_file, "Indirect call -> direct call ");
>>> +          print_generic_expr (dump_file, gimple_call_fn (stmt),
>>> +                      TDF_SLIM);
>>> +          fprintf (dump_file, "=> ");
>>> +          print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
>>> +          fprintf (dump_file,
>>> +               " transformation skipped because of type mismatch");
>>> +          print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>> +        }
>>> +          gimple_remove_histogram_value (cfun, stmt, histogram);
>>> +          return false;
>>> +        }
>>> +
>>> +      if (dump_file)
>>> +      {
>>> +        fprintf (dump_file, "Indirect call -> direct call ");
>>> +        print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
>>> +        fprintf (dump_file, "=> ");
>>> +        print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
>>> +        fprintf (dump_file,
>>> +             " transformation on insn postponed to ipa-profile");
>>> +        print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>> +        fprintf (dump_file, "hist->count %" PRId64
>>> +        " hist->all %" PRId64"\n", count, all);
>>> +      }
>>> +    }
>>> +    }
>>> +
>>> +  return true;
>>> +}
>>>   /*
>>>     For every checked indirect/virtual call determine if most common pid of
>>> -  function/class method has probability more than 50%. If yes modify code of
>>> +  function/class method has probability more than 50%.  If yes modify code of
>>>     this call to:
>>>    */
>>>   @@ -1423,6 +1546,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>>>     histogram_value histogram;
>>>     gcov_type val, count, all, bb_all;
>>>     struct cgraph_node *direct_call;
>>> +  enum hist_type type;
>>>       stmt = dyn_cast <gcall *> (gsi_stmt (*gsi));
>>>     if (!stmt)
>>> @@ -1434,18 +1558,24 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>>>     if (gimple_call_internal_p (stmt))
>>>       return false;
>>>   -  histogram = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL);
>>> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN
>>> +                             : HIST_TYPE_INDIR_CALL;
>>> +
>>> +  histogram = gimple_histogram_value_of_type (cfun, stmt, type);
>>>     if (!histogram)
>>>       return false;
>>>   +  if (type == HIST_TYPE_INDIR_CALL_TOPN)
>>> +      return ic_transform_topn (gsi);
>>> +
>>>     val = histogram->hvalue.counters [0];
>>>     count = histogram->hvalue.counters [1];
>>>     all = histogram->hvalue.counters [2];
>>>       bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
>>> -  /* The order of CHECK_COUNTER calls is important -
>>> +  /* The order of CHECK_COUNTER calls is important
>>>        since check_counter can correct the third parameter
>>> -     and we want to make count <= all <= bb_all. */
>>> +     and we want to make count <= all <= bb_all.  */
>>>     if (check_counter (stmt, "ic", &all, &bb_all, gimple_bb (stmt)->count)
>>>         || check_counter (stmt, "ic", &count, &all,
>>>                   profile_count::from_gcov_type (all)))
>>> @@ -1494,7 +1624,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>>>         print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
>>>         fprintf (dump_file, "=> ");
>>>         print_generic_expr (dump_file, direct_call->decl, TDF_SLIM);
>>> -      fprintf (dump_file, " transformation on insn postponned to ipa-profile");
>>> +      fprintf (dump_file, " transformation on insn postponed to ipa-profile");
>>>         print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>>         fprintf (dump_file, "hist->count %" PRId64
>>>              " hist->all %" PRId64"\n", count, all);
>>>
>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-18  9:34     ` Martin Liška
@ 2019-06-18 10:07       ` Segher Boessenkool
  2019-06-18 10:20         ` Martin Liška
  2019-06-19  5:38       ` luoxhu
  1 sibling, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2019-06-18 10:07 UTC (permalink / raw)
  To: Martin Liška; +Cc: luoxhu, gcc-patches, hubicka, wschmidt, luoxhu

On Tue, Jun 18, 2019 at 11:34:03AM +0200, Martin Liška wrote:
> I've got it. So it's situation where you have distribution equal to 50% and 50%. Note that it's
> the only valid situation where both edges with be >= 50%. That's the threshold for which
> we speculatively devirtualize edges.

But that 50% is a magic number, isn't it?  Maybe 20% works better, and
then you need a top5 (in the worst case).


Segher

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-18 10:07       ` Segher Boessenkool
@ 2019-06-18 10:20         ` Martin Liška
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Liška @ 2019-06-18 10:20 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: luoxhu, gcc-patches, hubicka, wschmidt, luoxhu

On 6/18/19 12:07 PM, Segher Boessenkool wrote:
> On Tue, Jun 18, 2019 at 11:34:03AM +0200, Martin Liška wrote:
>> I've got it. So it's situation where you have distribution equal to 50% and 50%. Note that it's
>> the only valid situation where both edges with be >= 50%. That's the threshold for which
>> we speculatively devirtualize edges.
> 
> But that 50% is a magic number, isn't it?

Yes :) Apparently LLVM does that for probability >= 30%:
https://code.woboq.org/llvm/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp.html#36

>  Maybe 20% works better, and
> then you need a top5 (in the worst case).

I would then generalize to N, for now I'm waiting for Honza.

Martin

> 
> 
> Segher
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-18  1:46 [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization Xiong Hu Luo
  2019-06-18  5:51 ` Martin Liška
@ 2019-06-18 10:21 ` Martin Liška
  2019-06-19  8:50   ` luoxhu
  2019-06-20 13:47 ` Jan Hubicka
  2 siblings, 1 reply; 25+ messages in thread
From: Martin Liška @ 2019-06-18 10:21 UTC (permalink / raw)
  To: Xiong Hu Luo, gcc-patches; +Cc: hubicka, segher, wschmidt, luoxhu

On 6/18/19 3:45 AM, Xiong Hu Luo wrote:
>     6.2.  SPEC2017 peakrate:
>         523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r (+13.33%);
>         525.x264_r (-5.29%).

Can you please elaborate what are the key indirect call promotions that are needed
to achieve such a significant speed up? Are we talking about calls to virtual functions
or C-style indirect calls?

Thanks,
Martin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-18  9:34     ` Martin Liška
  2019-06-18 10:07       ` Segher Boessenkool
@ 2019-06-19  5:38       ` luoxhu
  2019-06-19  6:57         ` Martin Liška
  1 sibling, 1 reply; 25+ messages in thread
From: luoxhu @ 2019-06-19  5:38 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: hubicka, segher, wschmidt, luoxhu

Hi Martin,

On 2019/6/18 17:34, Martin Liška wrote:
> On 6/18/19 11:02 AM, luoxhu wrote:
>> Hi,
>>
>> On 2019/6/18 13:51, Martin Liška wrote:
>>> On 6/18/19 3:45 AM, Xiong Hu Luo wrote:
>>>
>>> Hello.
>>>
>>> Thank you for the interest in the area.
>>>
>>>> This patch aims to fix PR69678 caused by PGO indirect call profiling bugs.
>>>> Currently the default instrument function can only find the indirect function
>>>> that called more than 50% with an incorrect count number returned.
>>> Can you please explain what you mean by 'an incorrect count number returned'?
>>
>> For a test case indir-call-topn.c, it include 2 indirect calls "one" and "two". the profiling data is as below with trunk code (including your patch, count[0] and count[2] is switched by your code, the count[0] is used in ipa-profile but only support the top1 format, my patch adds the support for the topn format. count[0] was incorrect as WITHOUT your patch it is 0,  things getting better with your fix as the count[0] is 350000000, but still not correct, in fact, "one" is running 175000000 times, and "two" is running the other 175000000 times):
>>
>> indir-call-topn.gcda:   22:    01a90000:  18:COUNTERS indirect_call 9 counts
>> indir-call-topn.gcda:   24:                   0: *350000000 1868707024 0* 0 0 0 0 0
>>
>> Running with the "--param indir-call-topn-profile=1" will give below profile data, My patch is based on this profile result and do the optimization for multiple indirect targets, performance can get much improve on this testcase and SPEC2017 for some benchmarks(LLVM already support this several years ago...).
>>
>> indir-call-topn.gcda:   26:    01b10000:  18:COUNTERS indirect_call_topn 9 counts
>> indir-call-topn.gcda:   28:                   0: *0 969338501 175000000 1868707024 175000000* 0 0 0
>>
>>
>> test case indir-call-topn.c:
>>
>> #include <stdio.h>
>>
>>
>> typedef int (*fptr) (int);
>> int
>> one (int a)
>> {
>>    return 1;
>> }
>>
>> int
>> two (int a)
>> {
>>    return 0;
>> }
>>
>> fptr table[] = {&one, &two};
>>
>> int
>> main()
>> {
>>    int i, x;
>>    fptr p = &one;
>>
>>    one (3);
>>
>>    for (i = 0; i < 350000000; i++)
>>      {
>>        x = (*p) (3);
>>        p = table[x];
>>      }
>>    printf ("done:%d\n", x);
>> }
> 
> I've got it. So it's situation where you have distribution equal to 50% and 50%. Note that it's
> the only valid situation where both edges with be >= 50%. That's the threshold for which
> we speculatively devirtualize edges. That said, you don't need generic topn counter, but a probably
> only a top2 counter which can be generalized from single-value counter type. I'm saying that
> because I removed the TOPN, mainly due to:
> https://github.com/gcc-mirror/gcc/commit/5cb221f2b9c268df47c97b4837230b15e65f9c14#diff-d003c64ae14449d86df03508de98bde7L179
> 
> which is over-complicated profiling function. And the changes that I've done recently are motivated
> to preserve a stable builds. That's achieved by noticing that a single-value counter can't handle all
> seen values.

Actually, the algorithm of function __gcov_one_value_profiler_body in 
libgcc/libgcov-profiler.c has functionality issue when profiling the testcase I 
provide.

     118 __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
     119                                 int use_atomic)
     120 {
     121   if (value == counters[1])
     122     counters[2]++;
     123   else if (counters[2] == 0)
     124     {
     125       counters[2] = 1;
     126       counters[1] = value;
     127     }
     128   else
     129     counters[2]--;
     130
     131   if (use_atomic)
     132     __atomic_fetch_add (&counters[0], 1, __ATOMIC_RELAXED);
     133   else
     134     counters[0]++;
     135 }

function "one" is 1868707024, function "two" is 969338501. Loop running from 
0->(350000000-1):

   value      counters[0]    counters[1]   counters[2]
1868707024            1     1868707024             1
  969338501            2     1868707024             0
1868707024            3     1868707024             1
  969338501            4     1868707024             0
1868707024            5     1868707024             1
                     ...
  969338501     350000000    1868707024             0

Finally, counters[] return value is [350000000, 1868707024, 0].
In ipa-profile.c and value-prof.c, counters[0] is the statement that executed 
all, counters[2] is the indirect call that counters[1] executed which is 0 here.
This counters[2] shouldn't be 0 in fact, which means prob is 0(It was expected 
to be 50%, right?). This prob will cause ipa-profile fail to create speculative 
edge and do indirect call later. I think this is the reason why topn was 
introduced by Rong Xu in 2014 (8ceaa1e) and reimplemented that in LLVM later. 
There was definitely a bug here before re-enable topn.

dump-profile: indir-call-topn.fb.gcc.wpa.069i.profile_estimate
       1 Histogram:5
       2   350000001: time:2 (8.70) size:2 (8.00)
       3   350000000: time:19 (91.30) size:7 (36.00)
       4   175000000: time:4 (100.00) size:2 (44.00)
       5   1: time:0 (100.00) size:0 (44.00)
       6   0: time:37 (100.00) size:14 (100.00)
       7 Determined min count: 175000000 Time:100.00% Size:44.00%
       8 Setting hotness threshold in LTO mode.
       9 Indirect call -> direct call from other module main/15 => one/11, prob 0.00
      10 Not speculating: probability is too low.
      11 1 indirect calls trained.
      12 1 (100.00%) have common target.
      13 0 (0.00%) targets was not found.
      14 0 (0.00%) targets had parameter count mismatch.
      15 0 (0.00%) targets was not in polymorphic call target list.
      16 1 (100.00%) speculations seems useless.


Thanks
Xionghu

> 
>>
>>>
>>>>    This patch
>>>> leverages the "--param indir-call-topn-profile=1" and enables multiple indirect
>>> Note that I've remove indir-call-topn-profile last week, the patch will not apply
>>> on current trunk. However, I can help you how to adapt single-value counters
>>> to support tracking of multiple values.
>>
>> It will be very useful if you help me to track multiple values similarly on trunk code. I will rebase to your code once topn is ready again. Actually topn is more general and top1 is included in, I thought that top1 should be removed instead of topn, though topn will consume longer time than top1 in profile-generate.
> 
> As mentioned earlier, I really don't want to put TOPN back. I can help you once Honza will agree with the general IPA changes.
> 
>>
>>>
>>>> targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, function
>>>> specialization, profiling, partial devirtualization, inlining and cloning could
>>>> be done successfully based on it.
>>> This decision is definitely big question for Honza?
>>>
>>>> Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests.
>>>> Details are:
>>>>     1.  When do PGO with indir-call-topn-profile, the gcda data format is not
>>>>     supported in ipa-profile pass,
>>> If you take a look at gcc/ipa-profile.c:195 you can see how the probability
>>> is propagated to IPA passes. Why is that not sufficient?
>>
>> Current code only support single indirect target, I need track multiple indirect targets and create multiple speculative edges on single indirect call statement.
>>
>> What's more, many ICEs happened in later stage due to single speculative target design, part of this patch is to solve the ICEs of multiple speculative target edges handling.
> 
> Well, to be honest I don't like the patch much. It brings another level of complexity for a quite rare situation where one
> calls 2 functions via an indirect call. And as mentioned, current IPA optimization are not happy about multiple indirect branches.
> 
> Martin
> 
>>
>>
>> Thanks
>>
>> Xionghu
>>
>>>
>>> Martin
>>>
>>>> so add variables to pass the information
>>>>     through passes, and postpone gimple_ic to ipa-profile like default as inline
>>>>     pass will decide whether it is benefit to transform indirect call.
>>>>     2.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
>>>>     profile full support in ipa passes and cgraph_edge functions.
>>>>     3.  Fix various hidden speculative call ICEs exposed after enabling this
>>>>     feature when running SPEC2017.
>>>>     4.  Add 1 in module testcase and 2 cross module testcases.
>>>>     5.  TODOs:
>>>>       5.1.  Some reference info will be dropped from WPA to LTRANS, so
>>>>       reference check will be difficult in LTRANS, need replace the strstr
>>>>       with reference compare.
>>>>       5.2.  Some duplicate code need be removed as top1 and topn share same logic.
>>>>       Actually top1 related logic could be eliminated totally as topn includes it.
>>>>       5.3.  Split patch maybe needed as too big but not sure how many would be
>>>>       reasonable.
>>>>     6.  Performance result for ppc64le:
>>>>       6.1.  Representative test: indir-call-prof-topn.c runtime improved from
>>>>       1.7s to 0.4s.
>>>>       6.2.  SPEC2017 peakrate:
>>>>           523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r (+13.33%);
>>>>           525.x264_r (-5.29%).
>>>>           No big changes of other benchmarks.
>>>>           Option: -Ofast -mcpu=power8
>>>>           PASS1_OPTIMIZE: -fprofile-generate --param indir-call-topn-profile=1 -flto
>>>>           PASS2_OPTIMIZE: -fprofile-use --param indir-call-topn-profile=1 -flto
>>>>           -fprofile-correction
>>>>       6.3.  No performance change on PHP benchmark.
>>>>     7.  Bootstrap and regression test passed on Power8-LE.
>>>>
>>>> gcc/ChangeLog
>>>>
>>>>      2019-06-17  Xiong Hu Luo  <luoxhu@linux.ibm.com>
>>>>
>>>>      PR ipa/69678
>>>>      * cgraph.c (cgraph_node::get_create): Copy profile_id.
>>>>      (cgraph_edge::speculative_call_info): Find real
>>>>      reference for indirect targets.
>>>>      (cgraph_edge::resolve_speculation): Add speculative code process
>>>>      for indirect targets.
>>>>      (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
>>>>      (cgraph_node::verify_node): Likewise.
>>>>      * cgraph.h (common_target_ids): New variable.
>>>>      (common_target_probabilities): Likewise.
>>>>      (num_of_ics): Likewise.
>>>>      * cgraphclones.c (cgraph_node::create_clone): Copy profile_id.
>>>>      * ipa-inline.c (inline_small_functions): Add iterator update.
>>>>      * ipa-profile.c (ipa_profile_generate_summary): Add indirect
>>>>      multiple targets logic.
>>>>      (ipa_profile): Likewise.
>>>>      * ipa-utils.c (ipa_merge_profiles): Clone speculative src's
>>>>      referrings to dst.
>>>>      * ipa.c (process_references): Fix typo.
>>>>      * lto-cgraph.c (lto_output_edge): Add indirect multiple targets
>>>>      logic.
>>>>      (input_edge): Likewise.
>>>>      * predict.c (dump_prediction): Revome edges count assert to be
>>>>      precise.
>>>>      * tree-profile.c (gimple_gen_ic_profiler): Use the new variable
>>>>      __gcov_indirect_call.counters and __gcov_indirect_call.callee.
>>>>      (gimple_gen_ic_func_profiler): Likewise.
>>>>      (pass_ipa_tree_profile::gate): Fix comment typos.
>>>>      * tree-inline.c (copy_bb): Duplicate all the speculative edges
>>>>      if indirect call contains multiple speculative targets.
>>>>      * value-prof.c (check_counter): Proportion the counter for
>>>>      multiple targets.
>>>>      (ic_transform_topn): New function.
>>>>      (gimple_ic_transform): Handle topn case, fix comment typos.
>>>>
>>>> gcc/testsuite/ChangeLog
>>>>
>>>>      2019-06-17  Xiong Hu Luo  <luoxhu@linux.ibm.com>
>>>>
>>>>      PR ipa/69678
>>>>      * gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase.
>>>>      * gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase.
>>>>      * gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase.
>>>>      * gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase.
>>>> ---
>>>>    gcc/cgraph.c                                  |  38 +++-
>>>>    gcc/cgraph.h                                  |   9 +-
>>>>    gcc/cgraphclones.c                            |   1 +
>>>>    gcc/ipa-inline.c                              |   3 +
>>>>    gcc/ipa-profile.c                             | 185 +++++++++++++++++-
>>>>    gcc/ipa-utils.c                               |   5 +
>>>>    gcc/ipa.c                                     |   2 +-
>>>>    gcc/lto-cgraph.c                              |  38 ++++
>>>>    gcc/predict.c                                 |   1 -
>>>>    .../tree-prof/crossmodule-indir-call-topn-1.c |  35 ++++
>>>>    .../crossmodule-indir-call-topn-1a.c          |  22 +++
>>>>    .../tree-prof/crossmodule-indir-call-topn-2.c |  42 ++++
>>>>    .../gcc.dg/tree-prof/indir-call-prof-topn.c   |  38 ++++
>>>>    gcc/tree-inline.c                             |  97 +++++----
>>>>    gcc/tree-profile.c                            |  12 +-
>>>>    gcc/value-prof.c                              | 146 +++++++++++++-
>>>>    16 files changed, 606 insertions(+), 68 deletions(-)
>>>>    create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>>>>    create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>>>>    create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>>>>    create mode 100644 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
>>>>
>>>> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
>>>> index de82316d4b1..0d373a67d1b 100644
>>>> --- a/gcc/cgraph.c
>>>> +++ b/gcc/cgraph.c
>>>> @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
>>>>        fprintf (dump_file, "Introduced new external node "
>>>>             "(%s) and turned into root of the clone tree.\n",
>>>>             node->dump_name ());
>>>> +      node->profile_id = first_clone->profile_id;
>>>>        }
>>>>      else if (dump_file)
>>>>        fprintf (dump_file, "Introduced new external node "
>>>> @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
>>>>      int i;
>>>>      cgraph_edge *e2;
>>>>      cgraph_edge *e = this;
>>>> +  cgraph_node *referred_node;
>>>>        if (!e->indirect_unknown_callee)
>>>>        for (e2 = e->caller->indirect_calls;
>>>> @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
>>>>        && ((ref->stmt && ref->stmt == e->call_stmt)
>>>>            || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
>>>>          {
>>>> -    reference = ref;
>>>> -    break;
>>>> +    if (e2->indirect_info && e2->indirect_info->num_of_ics)
>>>> +      {
>>>> +        referred_node = dyn_cast<cgraph_node *> (ref->referred);
>>>> +        if (strstr (e->callee->name (), referred_node->name ()))
>>>> +          {
>>>> +        reference = ref;
>>>> +        break;
>>>> +          }
>>>> +      }
>>>> +    else
>>>> +      {
>>>> +        reference = ref;
>>>> +        break;
>>>> +      }
>>>>          }
>>>>        /* Speculative edge always consist of all three components - direct edge,
>>>> @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
>>>>             in the functions inlined through it.  */
>>>>        }
>>>>      edge->count += e2->count;
>>>> -  edge->speculative = false;
>>>> +  if (edge->indirect_info && edge->indirect_info->num_of_ics)
>>>> +    {
>>>> +      edge->indirect_info->num_of_ics--;
>>>> +      if (edge->indirect_info->num_of_ics == 0)
>>>> +    edge->speculative = false;
>>>> +    }
>>>> +  else
>>>> +    edge->speculative = false;
>>>>      e2->speculative = false;
>>>>      ref->remove_reference ();
>>>>      if (e2->indirect_unknown_callee || e2->inline_failed)
>>>> @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>>>>          e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
>>>>                                 false);
>>>>          e->count = gimple_bb (e->call_stmt)->count;
>>>> -      e2->speculative = false;
>>>> +      if (e2->indirect_info && e2->indirect_info->num_of_ics)
>>>> +        {
>>>> +          e2->indirect_info->num_of_ics--;
>>>> +          if (e2->indirect_info->num_of_ics == 0)
>>>> +        e2->speculative = false;
>>>> +        }
>>>> +      else
>>>> +        e2->speculative = false;
>>>>          e2->count = gimple_bb (e2->call_stmt)->count;
>>>>          ref->speculative = false;
>>>>          ref->stmt = NULL;
>>>> @@ -3407,7 +3435,7 @@ cgraph_node::verify_node (void)
>>>>            for (e = callees; e; e = e->next_callee)
>>>>        {
>>>> -      if (!e->aux)
>>>> +      if (!e->aux && !e->speculative)
>>>>            {
>>>>              error ("edge %s->%s has no corresponding call_stmt",
>>>>                 identifier_to_locale (e->caller->name ()),
>>>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
>>>> index c294602d762..ed0fbc60432 100644
>>>> --- a/gcc/cgraph.h
>>>> +++ b/gcc/cgraph.h
>>>> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>    #include "profile-count.h"
>>>>    #include "ipa-ref.h"
>>>>    #include "plugin-api.h"
>>>> +#include "gcov-io.h"
>>>>      extern void debuginfo_early_init (void);
>>>>    extern void debuginfo_init (void);
>>>> @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info
>>>>      int param_index;
>>>>      /* ECF flags determined from the caller.  */
>>>>      int ecf_flags;
>>>> -  /* Profile_id of common target obtrained from profile.  */
>>>> +  /* Profile_id of common target obtained from profile.  */
>>>>      int common_target_id;
>>>>      /* Probability that call will land in function with COMMON_TARGET_ID.  */
>>>>      int common_target_probability;
>>>>    +  /* Profile_id of common target obtained from profile.  */
>>>> +  int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2];
>>>> +  /* Probabilities that call will land in function with COMMON_TARGET_IDS.  */
>>>> +  int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2];
>>>> +  unsigned num_of_ics;
>>>> +
>>>>      /* Set when the call is a virtual call with the parameter being the
>>>>         associated object pointer rather than a simple direct call.  */
>>>>      unsigned polymorphic : 1;
>>>> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
>>>> index 15f7e119d18..94f424bc10c 100644
>>>> --- a/gcc/cgraphclones.c
>>>> +++ b/gcc/cgraphclones.c
>>>> @@ -467,6 +467,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
>>>>      new_node->icf_merged = icf_merged;
>>>>      new_node->merged_comdat = merged_comdat;
>>>>      new_node->thunk = thunk;
>>>> +  new_node->profile_id = profile_id;
>>>>        new_node->clone.tree_map = NULL;
>>>>      new_node->clone.args_to_skip = args_to_skip;
>>>> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
>>>> index 360c3de3289..ef2b217b3f9 100644
>>>> --- a/gcc/ipa-inline.c
>>>> +++ b/gcc/ipa-inline.c
>>>> @@ -1866,12 +1866,15 @@ inline_small_functions (void)
>>>>        }
>>>>          if (has_speculative)
>>>>        for (edge = node->callees; edge; edge = next)
>>>> +    {
>>>> +      next = edge->next_callee;
>>>>          if (edge->speculative && !speculation_useful_p (edge,
>>>>                                  edge->aux != NULL))
>>>>            {
>>>>              edge->resolve_speculation ();
>>>>              update = true;
>>>>            }
>>>> +    }
>>>>          if (update)
>>>>        {
>>>>          struct cgraph_node *where = node->global.inlined_to
>>>> diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
>>>> index de9563d808c..d04476295a0 100644
>>>> --- a/gcc/ipa-profile.c
>>>> +++ b/gcc/ipa-profile.c
>>>> @@ -168,6 +168,10 @@ ipa_profile_generate_summary (void)
>>>>      struct cgraph_node *node;
>>>>      gimple_stmt_iterator gsi;
>>>>      basic_block bb;
>>>> +  enum hist_type type;
>>>> +
>>>> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN
>>>> +                             : HIST_TYPE_INDIR_CALL;
>>>>        hash_table<histogram_hash> hashtable (10);
>>>>      @@ -186,10 +190,10 @@ ipa_profile_generate_summary (void)
>>>>              histogram_value h;
>>>>              h = gimple_histogram_value_of_type
>>>>                (DECL_STRUCT_FUNCTION (node->decl),
>>>> -             stmt, HIST_TYPE_INDIR_CALL);
>>>> +             stmt, type);
>>>>              /* No need to do sanity check: gimple_ic_transform already
>>>>                 takes away bad histograms.  */
>>>> -          if (h)
>>>> +          if (h && type == HIST_TYPE_INDIR_CALL)
>>>>                {
>>>>                  /* counter 0 is target, counter 1 is number of execution we called target,
>>>>                 counter 2 is total number of executions.  */
>>>> @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void)
>>>>                  gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (node->decl),
>>>>                                  stmt, h);
>>>>                }
>>>> +          else if (h && type == HIST_TYPE_INDIR_CALL_TOPN)
>>>> +            {
>>>> +              unsigned j;
>>>> +              struct cgraph_edge *e = node->get_edge (stmt);
>>>> +              if (e && !e->indirect_unknown_callee)
>>>> +            continue;
>>>> +
>>>> +              e->indirect_info->num_of_ics = 0;
>>>> +              for (j = 1; j < h->n_counters; j += 2)
>>>> +            {
>>>> +              if (h->hvalue.counters[j] == 0)
>>>> +                continue;
>>>> +
>>>> +              e->indirect_info->common_target_ids[j / 2]
>>>> +                = h->hvalue.counters[j];
>>>> +              e->indirect_info->common_target_probabilities[j / 2]
>>>> +                = GCOV_COMPUTE_SCALE (
>>>> +                  h->hvalue.counters[j + 1],
>>>> +                  gimple_bb (stmt)->count.ipa ().to_gcov_type ());
>>>> +              if (e->indirect_info
>>>> +                ->common_target_probabilities[j / 2]
>>>> +                  > REG_BR_PROB_BASE)
>>>> +                {
>>>> +                  if (dump_file)
>>>> +                fprintf (dump_file,
>>>> +                     "Probability capped to 1\n");
>>>> +                  e->indirect_info
>>>> +                ->common_target_probabilities[j / 2]
>>>> +                = REG_BR_PROB_BASE;
>>>> +                }
>>>> +              e->indirect_info->num_of_ics++;
>>>> +            }
>>>> +
>>>> +              gcc_assert (e->indirect_info->num_of_ics
>>>> +                  <= GCOV_ICALL_TOPN_NCOUNTS / 2);
>>>> +
>>>> +              gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (
>>>> +                               node->decl),
>>>> +                             stmt, h);
>>>> +            }
>>>>            }
>>>>              time += estimate_num_insns (stmt, &eni_time_weights);
>>>>              size += estimate_num_insns (stmt, &eni_size_weights);
>>>> @@ -492,6 +536,7 @@ ipa_profile (void)
>>>>      int nindirect = 0, ncommon = 0, nunknown = 0, nuseless = 0, nconverted = 0;
>>>>      int nmismatch = 0, nimpossible = 0;
>>>>      bool node_map_initialized = false;
>>>> +  gcov_type threshold;
>>>>        if (dump_file)
>>>>        dump_histogram (dump_file, histogram);
>>>> @@ -500,14 +545,12 @@ ipa_profile (void)
>>>>          overall_time += histogram[i]->count * histogram[i]->time;
>>>>          overall_size += histogram[i]->size;
>>>>        }
>>>> +  threshold = 0;
>>>>      if (overall_time)
>>>>        {
>>>> -      gcov_type threshold;
>>>> -
>>>>          gcc_assert (overall_size);
>>>>            cutoff = (overall_time * PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE) + 500) / 1000;
>>>> -      threshold = 0;
>>>>          for (i = 0; cumulated < cutoff; i++)
>>>>        {
>>>>          cumulated += histogram[i]->count * histogram[i]->time;
>>>> @@ -543,7 +586,7 @@ ipa_profile (void)
>>>>      histogram.release ();
>>>>      histogram_pool.release ();
>>>>    -  /* Produce speculative calls: we saved common traget from porfiling into
>>>> +  /* Produce speculative calls: we saved common target from profiling into
>>>>         e->common_target_id.  Now, at link time, we can look up corresponding
>>>>         function node and produce speculative call.  */
>>>>    @@ -558,7 +601,8 @@ ipa_profile (void)
>>>>        {
>>>>          if (n->count.initialized_p ())
>>>>            nindirect++;
>>>> -      if (e->indirect_info->common_target_id)
>>>> +      if (e->indirect_info->common_target_id
>>>> +          || (e->indirect_info && e->indirect_info->num_of_ics == 1))
>>>>            {
>>>>              if (!node_map_initialized)
>>>>                init_node_map (false);
>>>> @@ -613,7 +657,7 @@ ipa_profile (void)
>>>>                  if (dump_file)
>>>>                fprintf (dump_file,
>>>>                     "Not speculating: "
>>>> -                 "parameter count mistmatch\n");
>>>> +                 "parameter count mismatch\n");
>>>>                }
>>>>              else if (e->indirect_info->polymorphic
>>>>                   && !opt_for_fn (n->decl, flag_devirtualize)
>>>> @@ -655,7 +699,130 @@ ipa_profile (void)
>>>>              nunknown++;
>>>>            }
>>>>            }
>>>> -     }
>>>> +      if (e->indirect_info && e->indirect_info->num_of_ics > 1)
>>>> +        {
>>>> +          if (in_lto_p)
>>>> +        {
>>>> +          if (dump_file)
>>>> +            {
>>>> +              fprintf (dump_file,
>>>> +                   "Updating hotness threshold in LTO mode.\n");
>>>> +              fprintf (dump_file, "Updated min count: %" PRId64 "\n",
>>>> +                   (int64_t) threshold);
>>>> +            }
>>>> +          set_hot_bb_threshold (threshold
>>>> +                    / e->indirect_info->num_of_ics);
>>>> +        }
>>>> +          if (!node_map_initialized)
>>>> +        init_node_map (false);
>>>> +          node_map_initialized = true;
>>>> +          ncommon++;
>>>> +          unsigned speculative = 0;
>>>> +          for (i = 0; i < (int)e->indirect_info->num_of_ics; i++)
>>>> +        {
>>>> +          n2 = find_func_by_profile_id (
>>>> +            e->indirect_info->common_target_ids[i]);
>>>> +          if (n2)
>>>> +            {
>>>> +              if (dump_file)
>>>> +            {
>>>> +              fprintf (
>>>> +                dump_file,
>>>> +                "Indirect call -> direct call from"
>>>> +                " other module %s => %s, prob %3.2f\n",
>>>> +                n->dump_name (), n2->dump_name (),
>>>> +                e->indirect_info->common_target_probabilities[i]
>>>> +                  / (float) REG_BR_PROB_BASE);
>>>> +            }
>>>> +              if (e->indirect_info->common_target_probabilities[i]
>>>> +              < REG_BR_PROB_BASE / 2)
>>>> +            {
>>>> +              nuseless++;
>>>> +              if (dump_file)
>>>> +                fprintf (
>>>> +                  dump_file,
>>>> +                  "Not speculating: probability is too low.\n");
>>>> +            }
>>>> +              else if (!e->maybe_hot_p ())
>>>> +            {
>>>> +              nuseless++;
>>>> +              if (dump_file)
>>>> +                fprintf (dump_file,
>>>> +                     "Not speculating: call is cold.\n");
>>>> +            }
>>>> +              else if (n2->get_availability () <= AVAIL_INTERPOSABLE
>>>> +                   && n2->can_be_discarded_p ())
>>>> +            {
>>>> +              nuseless++;
>>>> +              if (dump_file)
>>>> +                fprintf (dump_file,
>>>> +                     "Not speculating: target is overwritable "
>>>> +                     "and can be discarded.\n");
>>>> +            }
>>>> +              else if (ipa_node_params_sum && ipa_edge_args_sum
>>>> +                   && (!vec_safe_is_empty (
>>>> +                 IPA_NODE_REF (n2)->descriptors))
>>>> +                   && ipa_get_param_count (IPA_NODE_REF (n2))
>>>> +                    != ipa_get_cs_argument_count (
>>>> +                      IPA_EDGE_REF (e))
>>>> +                   && (ipa_get_param_count (IPA_NODE_REF (n2))
>>>> +                     >= ipa_get_cs_argument_count (
>>>> +                       IPA_EDGE_REF (e))
>>>> +                   || !stdarg_p (TREE_TYPE (n2->decl))))
>>>> +            {
>>>> +              nmismatch++;
>>>> +              if (dump_file)
>>>> +                fprintf (dump_file, "Not speculating: "
>>>> +                        "parameter count mismatch\n");
>>>> +            }
>>>> +              else if (e->indirect_info->polymorphic
>>>> +                   && !opt_for_fn (n->decl, flag_devirtualize)
>>>> +                   && !possible_polymorphic_call_target_p (e, n2))
>>>> +            {
>>>> +              nimpossible++;
>>>> +              if (dump_file)
>>>> +                fprintf (dump_file,
>>>> +                     "Not speculating: "
>>>> +                     "function is not in the polymorphic "
>>>> +                     "call target list\n");
>>>> +            }
>>>> +              else
>>>> +            {
>>>> +              /* Target may be overwritable, but profile says that
>>>> +                 control flow goes to this particular implementation
>>>> +                 of N2.  Speculate on the local alias to allow
>>>> +                 inlining.
>>>> +                 */
>>>> +              if (!n2->can_be_discarded_p ())
>>>> +                {
>>>> +                  cgraph_node *alias;
>>>> +                  alias = dyn_cast<cgraph_node *> (
>>>> +                n2->noninterposable_alias ());
>>>> +                  if (alias)
>>>> +                n2 = alias;
>>>> +                }
>>>> +              nconverted++;
>>>> +              e->make_speculative (
>>>> +                n2, e->count.apply_probability (
>>>> +                  e->indirect_info
>>>> +                    ->common_target_probabilities[i]));
>>>> +              update = true;
>>>> +              speculative++;
>>>> +            }
>>>> +            }
>>>> +          else
>>>> +            {
>>>> +              if (dump_file)
>>>> +            fprintf (dump_file,
>>>> +                 "Function with profile-id %i not found.\n",
>>>> +                 e->indirect_info->common_target_ids[i]);
>>>> +              nunknown++;
>>>> +            }
>>>> +        }
>>>> +          if (speculative < e->indirect_info->num_of_ics)
>>>> +        e->indirect_info->num_of_ics = speculative;
>>>> +        }
>>>> +    }
>>>>           if (update)
>>>>         ipa_update_overall_fn_summary (n);
>>>>         }
>>>> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
>>>> index 79b250c3943..30347691029 100644
>>>> --- a/gcc/ipa-utils.c
>>>> +++ b/gcc/ipa-utils.c
>>>> @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst,
>>>>          update_max_bb_count ();
>>>>          compute_function_frequency ();
>>>>          pop_cfun ();
>>>> +      /* When src is speculative, clone the referrings.  */
>>>> +      if (src->indirect_call_target)
>>>> +    for (e = src->callers; e; e = e->next_caller)
>>>> +      if (e->callee == src && e->speculative)
>>>> +        dst->clone_referring (src);
>>>>          for (e = dst->callees; e; e = e->next_callee)
>>>>        {
>>>>          if (e->speculative)
>>>> diff --git a/gcc/ipa.c b/gcc/ipa.c
>>>> index 2496694124c..c1fe081a72d 100644
>>>> --- a/gcc/ipa.c
>>>> +++ b/gcc/ipa.c
>>>> @@ -166,7 +166,7 @@ process_references (symtab_node *snode,
>>>>       devirtualization happens.  After inlining still keep their declarations
>>>>       around, so we can devirtualize to a direct call.
>>>>    -   Also try to make trivial devirutalization when no or only one target is
>>>> +   Also try to make trivial devirtualization when no or only one target is
>>>>       possible.  */
>>>>      static void
>>>> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
>>>> index 4dfa2862be3..0c8f547d44e 100644
>>>> --- a/gcc/lto-cgraph.c
>>>> +++ b/gcc/lto-cgraph.c
>>>> @@ -238,6 +238,7 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
>>>>      unsigned int uid;
>>>>      intptr_t ref;
>>>>      struct bitpack_d bp;
>>>> +  unsigned i;
>>>>        if (edge->indirect_unknown_callee)
>>>>        streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
>>>> @@ -296,6 +297,25 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
>>>>          if (edge->indirect_info->common_target_id)
>>>>        streamer_write_hwi_stream
>>>>           (ob->main_stream, edge->indirect_info->common_target_probability);
>>>> +
>>>> +      gcc_assert (edge->indirect_info->num_of_ics
>>>> +          <= GCOV_ICALL_TOPN_NCOUNTS / 2);
>>>> +
>>>> +      streamer_write_hwi_stream (ob->main_stream,
>>>> +                 edge->indirect_info->num_of_ics);
>>>> +
>>>> +      if (edge->indirect_info->num_of_ics)
>>>> +    {
>>>> +      for (i = 0; i < edge->indirect_info->num_of_ics; i++)
>>>> +        {
>>>> +          streamer_write_hwi_stream (
>>>> +        ob->main_stream, edge->indirect_info->common_target_ids[i]);
>>>> +          if (edge->indirect_info->common_target_ids[i])
>>>> +        streamer_write_hwi_stream (
>>>> +          ob->main_stream,
>>>> +          edge->indirect_info->common_target_probabilities[i]);
>>>> +        }
>>>> +    }
>>>>        }
>>>>    }
>>>>    @@ -1438,6 +1458,7 @@ input_edge (struct lto_input_block *ib, vec<symtab_node *> nodes,
>>>>      cgraph_inline_failed_t inline_failed;
>>>>      struct bitpack_d bp;
>>>>      int ecf_flags = 0;
>>>> +  unsigned i;
>>>>        caller = dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]);
>>>>      if (caller == NULL || caller->decl == NULL_TREE)
>>>> @@ -1488,6 +1509,23 @@ input_edge (struct lto_input_block *ib, vec<symtab_node *> nodes,
>>>>          edge->indirect_info->common_target_id = streamer_read_hwi (ib);
>>>>          if (edge->indirect_info->common_target_id)
>>>>            edge->indirect_info->common_target_probability = streamer_read_hwi (ib);
>>>> +
>>>> +      edge->indirect_info->num_of_ics = streamer_read_hwi (ib);
>>>> +
>>>> +      gcc_assert (edge->indirect_info->num_of_ics
>>>> +          <= GCOV_ICALL_TOPN_NCOUNTS / 2);
>>>> +
>>>> +      if (edge->indirect_info->num_of_ics)
>>>> +    {
>>>> +      for (i = 0; i < edge->indirect_info->num_of_ics; i++)
>>>> +        {
>>>> +          edge->indirect_info->common_target_ids[i]
>>>> +        = streamer_read_hwi (ib);
>>>> +          if (edge->indirect_info->common_target_ids[i])
>>>> +        edge->indirect_info->common_target_probabilities[i]
>>>> +          = streamer_read_hwi (ib);
>>>> +        }
>>>> +    }
>>>>        }
>>>>    }
>>>>    diff --git a/gcc/predict.c b/gcc/predict.c
>>>> index 43ee91a5b13..b7f38891c72 100644
>>>> --- a/gcc/predict.c
>>>> +++ b/gcc/predict.c
>>>> @@ -763,7 +763,6 @@ dump_prediction (FILE *file, enum br_predictor predictor, int probability,
>>>>          && bb->count.precise_p ()
>>>>          && reason == REASON_NONE)
>>>>        {
>>>> -      gcc_assert (e->count ().precise_p ());
>>>>          fprintf (file, ";;heuristics;%s;%" PRId64 ";%" PRId64 ";%.1f;\n",
>>>>               predictor_info[predictor].name,
>>>>               bb->count.to_gcov_type (), e->count ().to_gcov_type (),
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>>>> new file mode 100644
>>>> index 00000000000..e0a83c2e067
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
>>>> @@ -0,0 +1,35 @@
>>>> +/* { dg-require-effective-target lto } */
>>>> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
>>>> +/* { dg-require-profiling "-fprofile-generate" } */
>>>> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */
>>>> +
>>>> +#include <stdio.h>
>>>> +
>>>> +typedef int (*fptr) (int);
>>>> +int
>>>> +one (int a);
>>>> +
>>>> +int
>>>> +two (int a);
>>>> +
>>>> +fptr table[] = {&one, &two};
>>>> +
>>>> +int
>>>> +main()
>>>> +{
>>>> +  int i, x;
>>>> +  fptr p = &one;
>>>> +
>>>> +  x = one (3);
>>>> +
>>>> +  for (i = 0; i < 350000000; i++)
>>>> +    {
>>>> +      x = (*p) (3);
>>>> +      p = table[x];
>>>> +    }
>>>> +  printf ("done:%d\n", x);
>>>> +}
>>>> +
>>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */
>>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */
>>>> +
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>>>> new file mode 100644
>>>> index 00000000000..a8c6e365fb9
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
>>>> @@ -0,0 +1,22 @@
>>>> +/* It seems there is no way to avoid the other source of mulitple
>>>> +   source testcase from being compiled independently.  Just avoid
>>>> +   error.  */
>>>> +#ifdef DOJOB
>>>> +int
>>>> +one (int a)
>>>> +{
>>>> +  return 1;
>>>> +}
>>>> +
>>>> +int
>>>> +two (int a)
>>>> +{
>>>> +  return 0;
>>>> +}
>>>> +#else
>>>> +int
>>>> +main()
>>>> +{
>>>> +  return 0;
>>>> +}
>>>> +#endif
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>>>> new file mode 100644
>>>> index 00000000000..aa3887fde83
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
>>>> @@ -0,0 +1,42 @@
>>>> +/* { dg-require-effective-target lto } */
>>>> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
>>>> +/* { dg-require-profiling "-fprofile-generate" } */
>>>> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */
>>>> +
>>>> +#include <stdio.h>
>>>> +
>>>> +typedef int (*fptr) (int);
>>>> +int
>>>> +one (int a);
>>>> +
>>>> +int
>>>> +two (int a);
>>>> +
>>>> +fptr table[] = {&one, &two};
>>>> +
>>>> +int foo ()
>>>> +{
>>>> +  int i, x;
>>>> +  fptr p = &one;
>>>> +
>>>> +  x = one (3);
>>>> +
>>>> +  for (i = 0; i < 350000000; i++)
>>>> +    {
>>>> +      x = (*p) (3);
>>>> +      p = table[x];
>>>> +    }
>>>> +  return x;
>>>> +}
>>>> +
>>>> +int
>>>> +main()
>>>> +{
>>>> +  int x = foo ();
>>>> +  printf ("done:%d\n", x);
>>>> +}
>>>> +
>>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */
>>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */
>>>> +
>>>> +
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
>>>> new file mode 100644
>>>> index 00000000000..951bc7ddd19
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
>>>> @@ -0,0 +1,38 @@
>>>> +/* { dg-require-profiling "-fprofile-generate" } */
>>>> +/* { dg-options "-O2 -fdump-ipa-profile --param indir-call-topn-profile=1" } */
>>>> +
>>>> +#include <stdio.h>
>>>> +
>>>> +typedef int (*fptr) (int);
>>>> +int
>>>> +one (int a)
>>>> +{
>>>> +  return 1;
>>>> +}
>>>> +
>>>> +int
>>>> +two (int a)
>>>> +{
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +fptr table[] = {&one, &two};
>>>> +
>>>> +int
>>>> +main()
>>>> +{
>>>> +  int i, x;
>>>> +  fptr p = &one;
>>>> +
>>>> +  one (3);
>>>> +
>>>> +  for (i = 0; i < 350000000; i++)
>>>> +    {
>>>> +      x = (*p) (3);
>>>> +      p = table[x];
>>>> +    }
>>>> +  printf ("done:%d\n", x);
>>>> +}
>>>> +
>>>> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile" } } */
>>>> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile" } } */
>>>> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
>>>> index 9017da878b1..f69b31b197e 100644
>>>> --- a/gcc/tree-inline.c
>>>> +++ b/gcc/tree-inline.c
>>>> @@ -2028,43 +2028,66 @@ copy_bb (copy_body_data *id, basic_block bb,
>>>>              switch (id->transform_call_graph_edges)
>>>>            {
>>>>            case CB_CGE_DUPLICATE:
>>>> -          edge = id->src_node->get_edge (orig_stmt);
>>>> -          if (edge)
>>>> -            {
>>>> -              struct cgraph_edge *old_edge = edge;
>>>> -              profile_count old_cnt = edge->count;
>>>> -              edge = edge->clone (id->dst_node, call_stmt,
>>>> -                      gimple_uid (stmt),
>>>> -                      num, den,
>>>> -                      true);
>>>> -
>>>> -              /* Speculative calls consist of two edges - direct and
>>>> -             indirect.  Duplicate the whole thing and distribute
>>>> -             frequencies accordingly.  */
>>>> -              if (edge->speculative)
>>>> -            {
>>>> -              struct cgraph_edge *direct, *indirect;
>>>> -              struct ipa_ref *ref;
>>>> -
>>>> -              gcc_assert (!edge->indirect_unknown_callee);
>>>> -              old_edge->speculative_call_info (direct, indirect, ref);
>>>> -
>>>> -              profile_count indir_cnt = indirect->count;
>>>> -              indirect = indirect->clone (id->dst_node, call_stmt,
>>>> -                              gimple_uid (stmt),
>>>> -                              num, den,
>>>> -                              true);
>>>> -
>>>> -              profile_probability prob
>>>> -                 = indir_cnt.probability_in (old_cnt + indir_cnt);
>>>> -              indirect->count
>>>> -                 = copy_basic_block->count.apply_probability (prob);
>>>> -              edge->count = copy_basic_block->count - indirect->count;
>>>> -              id->dst_node->clone_reference (ref, stmt);
>>>> -            }
>>>> -              else
>>>> -            edge->count = copy_basic_block->count;
>>>> -            }
>>>> +          {
>>>> +            edge = id->src_node->get_edge (orig_stmt);
>>>> +            struct cgraph_edge *old_edge = edge;
>>>> +            struct cgraph_edge *direct, *indirect;
>>>> +            bool next_speculative;
>>>> +            do
>>>> +              {
>>>> +            next_speculative = false;
>>>> +            if (edge)
>>>> +              {
>>>> +                profile_count old_cnt = edge->count;
>>>> +                edge
>>>> +                  = edge->clone (id->dst_node, call_stmt,
>>>> +                         gimple_uid (stmt), num, den, true);
>>>> +
>>>> +                /* Speculative calls consist of two edges - direct
>>>> +                   and indirect.  Duplicate the whole thing and
>>>> +                   distribute frequencies accordingly.  */
>>>> +                if (edge->speculative)
>>>> +                  {
>>>> +                struct ipa_ref *ref;
>>>> +
>>>> +                gcc_assert (!edge->indirect_unknown_callee);
>>>> +                old_edge->speculative_call_info (direct,
>>>> +                                 indirect, ref);
>>>> +
>>>> +                profile_count indir_cnt = indirect->count;
>>>> +                indirect
>>>> +                  = indirect->clone (id->dst_node, call_stmt,
>>>> +                             gimple_uid (stmt), num,
>>>> +                             den, true);
>>>> +
>>>> +                profile_probability prob
>>>> +                  = indir_cnt.probability_in (old_cnt
>>>> +                                  + indir_cnt);
>>>> +                indirect->count
>>>> +                  = copy_basic_block->count.apply_probability (
>>>> +                    prob);
>>>> +                edge->count
>>>> +                  = copy_basic_block->count - indirect->count;
>>>> +                id->dst_node->clone_reference (ref, stmt);
>>>> +                  }
>>>> +                else
>>>> +                  edge->count = copy_basic_block->count;
>>>> +              }
>>>> +            /* If the indirect call contains more than one indirect
>>>> +               targets, need clone all speculative edges here.  */
>>>> +            if (old_edge && old_edge->next_callee
>>>> +                && old_edge->speculative && indirect
>>>> +                && indirect->indirect_info
>>>> +                && indirect->indirect_info->num_of_ics > 1)
>>>> +              {
>>>> +                edge = old_edge->next_callee;
>>>> +                old_edge = old_edge->next_callee;
>>>> +                if (edge->speculative)
>>>> +                  next_speculative = true;
>>>> +              }
>>>> +              }
>>>> +            while (next_speculative);
>>>> +          }
>>>>              break;
>>>>              case CB_CGE_MOVE_CLONES:
>>>> diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
>>>> index 1c3034aac10..4964dbdebb5 100644
>>>> --- a/gcc/tree-profile.c
>>>> +++ b/gcc/tree-profile.c
>>>> @@ -74,8 +74,8 @@ static GTY(()) tree ic_tuple_callee_field;
>>>>    /* Do initialization work for the edge profiler.  */
>>>>      /* Add code:
>>>> -   __thread gcov*    __gcov_indirect_call_counters; // pointer to actual counter
>>>> -   __thread void*    __gcov_indirect_call_callee; // actual callee address
>>>> +   __thread gcov*    __gcov_indirect_call.counters; // pointer to actual counter
>>>> +   __thread void*    __gcov_indirect_call.callee; // actual callee address
>>>>       __thread int __gcov_function_counter; // time profiler function counter
>>>>    */
>>>>    static void
>>>> @@ -395,7 +395,7 @@ gimple_gen_ic_profiler (histogram_value value, unsigned tag, unsigned base)
>>>>          f_1 = foo;
>>>>          __gcov_indirect_call.counters = &__gcov4.main[0];
>>>>          PROF_9 = f_1;
>>>> -      __gcov_indirect_call_callee = PROF_9;
>>>> +      __gcov_indirect_call.callee = PROF_9;
>>>>          _4 = f_1 ();
>>>>       */
>>>>    @@ -458,11 +458,11 @@ gimple_gen_ic_func_profiler (void)
>>>>        /* Insert code:
>>>>    -     if (__gcov_indirect_call_callee != NULL)
>>>> +     if (__gcov_indirect_call.callee != NULL)
>>>>           __gcov_indirect_call_profiler_v3 (profile_id, &current_function_decl);
>>>>           The function __gcov_indirect_call_profiler_v3 is responsible for
>>>> -     resetting __gcov_indirect_call_callee to NULL.  */
>>>> +     resetting __gcov_indirect_call.callee to NULL.  */
>>>>        gimple_stmt_iterator gsi = gsi_start_bb (cond_bb);
>>>>      void0 = build_int_cst (ptr_type_node, 0);
>>>> @@ -904,7 +904,7 @@ pass_ipa_tree_profile::gate (function *)
>>>>    {
>>>>      /* When profile instrumentation, use or test coverage shall be performed.
>>>>         But for AutoFDO, this there is no instrumentation, thus this pass is
>>>> -     diabled.  */
>>>> +     disabled.  */
>>>>      return (!in_lto_p && !flag_auto_profile
>>>>          && (flag_branch_probabilities || flag_test_coverage
>>>>              || profile_arc_flag));
>>>> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
>>>> index 5013956cf86..4869ab8ccd6 100644
>>>> --- a/gcc/value-prof.c
>>>> +++ b/gcc/value-prof.c
>>>> @@ -579,8 +579,8 @@ free_histograms (struct function *fn)
>>>>       somehow.  */
>>>>      static bool
>>>> -check_counter (gimple *stmt, const char * name,
>>>> -           gcov_type *count, gcov_type *all, profile_count bb_count_d)
>>>> +check_counter (gimple *stmt, const char *name, gcov_type *count, gcov_type *all,
>>>> +           profile_count bb_count_d, float ratio = 1.0f)
>>>>    {
>>>>      gcov_type bb_count = bb_count_d.ipa ().to_gcov_type ();
>>>>      if (*all != bb_count || *count > *all)
>>>> @@ -599,7 +599,7 @@ check_counter (gimple *stmt, const char * name,
>>>>                                 "count (%d)\n", name, (int)*all, (int)bb_count);
>>>>          *all = bb_count;
>>>>          if (*count > *all)
>>>> -            *count = *all;
>>>> +        *count = *all * ratio;
>>>>          return false;
>>>>        }
>>>>          else
>>>> @@ -1410,9 +1410,132 @@ gimple_ic (gcall *icall_stmt, struct cgraph_node *direct_call,
>>>>      return dcall_stmt;
>>>>    }
>>>>    +/* If --param=indir-call-topn-profile=1 is specified when compiling, there maybe
>>>> +   multiple indirect targets in histogram.  Check every indirect/virtual call
>>>> +   if callee function exists, if not exit, leave it to LTO stage for later
>>>> +   process.  Modify code of this indirect call to an if-else structure in
>>>> +   ipa-profile finally.  */
>>>> +static bool
>>>> +ic_transform_topn (gimple_stmt_iterator *gsi)
>>>> +{
>>>> +  unsigned j;
>>>> +  gcall *stmt;
>>>> +  histogram_value histogram;
>>>> +  gcov_type val, count, count_all, all, bb_all;
>>>> +  struct cgraph_node *d_call;
>>>> +  profile_count bb_count;
>>>> +
>>>> +  stmt = dyn_cast<gcall *> (gsi_stmt (*gsi));
>>>> +  if (!stmt)
>>>> +    return false;
>>>> +
>>>> +  if (gimple_call_fndecl (stmt) != NULL_TREE)
>>>> +    return false;
>>>> +
>>>> +  if (gimple_call_internal_p (stmt))
>>>> +    return false;
>>>> +
>>>> +  histogram
>>>> +    = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL_TOPN);
>>>> +  if (!histogram)
>>>> +    return false;
>>>> +
>>>> +  count = 0;
>>>> +  all = 0;
>>>> +  bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
>>>> +  bb_count = gimple_bb (stmt)->count;
>>>> +
>>>> +  /* n_counters need be odd to avoid access violation.  */
>>>> +  gcc_assert (histogram->n_counters % 2 == 1);
>>>> +
>>>> +  /* For indirect call topn, accumulate all the counts first.  */
>>>> +  for (j = 1; j < histogram->n_counters; j += 2)
>>>> +    {
>>>> +      val = histogram->hvalue.counters[j];
>>>> +      count = histogram->hvalue.counters[j + 1];
>>>> +      if (val)
>>>> +    all += count;
>>>> +    }
>>>> +
>>>> +  count_all = all;
>>>> +  /* Do the indirect call conversion if function body exists, or else leave it
>>>> +     to LTO stage.  */
>>>> +  for (j = 1; j < histogram->n_counters; j += 2)
>>>> +    {
>>>> +      val = histogram->hvalue.counters[j];
>>>> +      count = histogram->hvalue.counters[j + 1];
>>>> +      if (val)
>>>> +    {
>>>> +      /* The order of CHECK_COUNTER calls is important
>>>> +         since check_counter can correct the third parameter
>>>> +         and we want to make count <= all <= bb_count.  */
>>>> +      if (check_counter (stmt, "ic", &all, &bb_all, bb_count)
>>>> +          || check_counter (stmt, "ic", &count, &all,
>>>> +                profile_count::from_gcov_type (all),
>>>> +                (float) count / count_all))
>>>> +        {
>>>> +          gimple_remove_histogram_value (cfun, stmt, histogram);
>>>> +          return false;
>>>> +        }
>>>> +
>>>> +      d_call = find_func_by_profile_id ((int) val);
>>>> +
>>>> +      if (d_call == NULL)
>>>> +        {
>>>> +          if (val)
>>>> +        {
>>>> +          if (dump_file)
>>>> +            {
>>>> +              fprintf (
>>>> +            dump_file,
>>>> +            "Indirect call -> direct call from other module");
>>>> +              print_generic_expr (dump_file, gimple_call_fn (stmt),
>>>> +                      TDF_SLIM);
>>>> +              fprintf (dump_file,
>>>> +                   "=> %i (will resolve only with LTO)\n",
>>>> +                   (int) val);
>>>> +            }
>>>> +        }
>>>> +          return false;
>>>> +        }
>>>> +
>>>> +      if (!check_ic_target (stmt, d_call))
>>>> +        {
>>>> +          if (dump_file)
>>>> +        {
>>>> +          fprintf (dump_file, "Indirect call -> direct call ");
>>>> +          print_generic_expr (dump_file, gimple_call_fn (stmt),
>>>> +                      TDF_SLIM);
>>>> +          fprintf (dump_file, "=> ");
>>>> +          print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
>>>> +          fprintf (dump_file,
>>>> +               " transformation skipped because of type mismatch");
>>>> +          print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>>> +        }
>>>> +          gimple_remove_histogram_value (cfun, stmt, histogram);
>>>> +          return false;
>>>> +        }
>>>> +
>>>> +      if (dump_file)
>>>> +      {
>>>> +        fprintf (dump_file, "Indirect call -> direct call ");
>>>> +        print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
>>>> +        fprintf (dump_file, "=> ");
>>>> +        print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
>>>> +        fprintf (dump_file,
>>>> +             " transformation on insn postponed to ipa-profile");
>>>> +        print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>>> +        fprintf (dump_file, "hist->count %" PRId64
>>>> +        " hist->all %" PRId64"\n", count, all);
>>>> +      }
>>>> +    }
>>>> +    }
>>>> +
>>>> +  return true;
>>>> +}
>>>>    /*
>>>>      For every checked indirect/virtual call determine if most common pid of
>>>> -  function/class method has probability more than 50%. If yes modify code of
>>>> +  function/class method has probability more than 50%.  If yes modify code of
>>>>      this call to:
>>>>     */
>>>>    @@ -1423,6 +1546,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>>>>      histogram_value histogram;
>>>>      gcov_type val, count, all, bb_all;
>>>>      struct cgraph_node *direct_call;
>>>> +  enum hist_type type;
>>>>        stmt = dyn_cast <gcall *> (gsi_stmt (*gsi));
>>>>      if (!stmt)
>>>> @@ -1434,18 +1558,24 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>>>>      if (gimple_call_internal_p (stmt))
>>>>        return false;
>>>>    -  histogram = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL);
>>>> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN
>>>> +                             : HIST_TYPE_INDIR_CALL;
>>>> +
>>>> +  histogram = gimple_histogram_value_of_type (cfun, stmt, type);
>>>>      if (!histogram)
>>>>        return false;
>>>>    +  if (type == HIST_TYPE_INDIR_CALL_TOPN)
>>>> +      return ic_transform_topn (gsi);
>>>> +
>>>>      val = histogram->hvalue.counters [0];
>>>>      count = histogram->hvalue.counters [1];
>>>>      all = histogram->hvalue.counters [2];
>>>>        bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
>>>> -  /* The order of CHECK_COUNTER calls is important -
>>>> +  /* The order of CHECK_COUNTER calls is important
>>>>         since check_counter can correct the third parameter
>>>> -     and we want to make count <= all <= bb_all. */
>>>> +     and we want to make count <= all <= bb_all.  */
>>>>      if (check_counter (stmt, "ic", &all, &bb_all, gimple_bb (stmt)->count)
>>>>          || check_counter (stmt, "ic", &count, &all,
>>>>                    profile_count::from_gcov_type (all)))
>>>> @@ -1494,7 +1624,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
>>>>          print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
>>>>          fprintf (dump_file, "=> ");
>>>>          print_generic_expr (dump_file, direct_call->decl, TDF_SLIM);
>>>> -      fprintf (dump_file, " transformation on insn postponned to ipa-profile");
>>>> +      fprintf (dump_file, " transformation on insn postponed to ipa-profile");
>>>>          print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>>>>          fprintf (dump_file, "hist->count %" PRId64
>>>>               " hist->all %" PRId64"\n", count, all);
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-19  5:38       ` luoxhu
@ 2019-06-19  6:57         ` Martin Liška
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Liška @ 2019-06-19  6:57 UTC (permalink / raw)
  To: luoxhu, gcc-patches; +Cc: hubicka, segher, wschmidt, luoxhu

On 6/19/19 7:38 AM, luoxhu wrote:
> Actually, the algorithm of function __gcov_one_value_profiler_body in libgcc/libgcov-profiler.c has functionality issue when profiling the testcase I provide.


It's designed to track most common value and uses only one slot for storage place.
As mentioned I can easily prepare a patch that will store up to N values next
to each other. But first, Honza will have to make a general agreement about the
suggested IPA changes.

Thank you for understanding,
Martin

> 
>     118 __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
>     119                                 int use_atomic)
>     120 {
>     121   if (value == counters[1])
>     122     counters[2]++;
>     123   else if (counters[2] == 0)
>     124     {
>     125       counters[2] = 1;
>     126       counters[1] = value;
>     127     }
>     128   else
>     129     counters[2]--;
>     130
>     131   if (use_atomic)
>     132     __atomic_fetch_add (&counters[0], 1, __ATOMIC_RELAXED);
>     133   else
>     134     counters[0]++;
>     135 }
> 
> function "one" is 1868707024, function "two" is 969338501. Loop running from 0->(350000000-1):
> 
>   value      counters[0]    counters[1]   counters[2]
> 1868707024            1     1868707024             1
>  969338501            2     1868707024             0
> 1868707024            3     1868707024             1
>  969338501            4     1868707024             0
> 1868707024            5     1868707024             1
>                     ...
>  969338501     350000000    1868707024             0
> 
> Finally, counters[] return value is [350000000, 1868707024, 0].
> In ipa-profile.c and value-prof.c, counters[0] is the statement that executed all, counters[2] is the indirect call that counters[1] executed which is 0 here.
> This counters[2] shouldn't be 0 in fact, which means prob is 0(It was expected to be 50%, right?). This prob will cause ipa-profile fail to create speculative edge and do indirect call later. I think this is the reason why topn was introduced by Rong Xu in 2014 (8ceaa1e) and reimplemented that in LLVM later. There was definitely a bug here before re-enable topn.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-18 10:21 ` Martin Liška
@ 2019-06-19  8:50   ` luoxhu
  2019-06-19  8:56     ` Martin Liška
  0 siblings, 1 reply; 25+ messages in thread
From: luoxhu @ 2019-06-19  8:50 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: hubicka, segher, wschmidt, luoxhu

Hi Martin,

On 2019/6/18 18:21, Martin Liška wrote:
> On 6/18/19 3:45 AM, Xiong Hu Luo wrote:
>>      6.2.  SPEC2017 peakrate:
>>          523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r (+13.33%);
>>          525.x264_r (-5.29%).
> 
> Can you please elaborate what are the key indirect call promotions that are needed
> to achieve such a significant speed up? Are we talking about calls to virtual functions
> or C-style indirect calls?

For benchmark 511.povray_r, no speculations and indirect call promotion
happened from povray_r.wpa.069i.profile_estimate:

     994 171 indirect calls trained.
     995 0 (0.00%) have common target.
     996 0 (0.00%) targets was not found.
     997 0 (0.00%) targets had parameter count mismatch.
     998 0 (0.00%) targets was not in polymorphic call target list.
     999 0 (0.00%) speculations seems useless.
    1000 0 (0.00%) speculations produced.


After applying my patch:

    1259 171 indirect calls trained.
    1260 60 (35.09%) have common target.
    1261 41 (23.98%) targets was not found.
    1262 0 (0.00%) targets had parameter count mismatch.
    1263 0 (0.00%) targets was not in polymorphic call target list.
    1264 57 (33.33%) speculations seems useless.
    1265 5 (2.92%) speculations produced.

Below indirect calls conversion will take effect, as all of these calls
are hot functions, performance boosts a lot by the combination optimization
of later stage ipa/inline/clone.

ls *.*i.* | xargs grep "Expanding speculative call" 

povray_r.ltrans5.076i.inline:Expanding speculative call of 
create_ray.constprop/75445 -> Inside_CSG_Intersection/76219 count: 291083 
(adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
create_ray.constprop/75445 -> Inside_Plane/76221 count: 387811 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
initialize_ray_container_state_tree/54575 -> Inside_CSG_Intersection/75997 
count: 3784081 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
initialize_ray_container_state_tree/54575 -> Inside_Plane/76062 count: 
5041557 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> 
All_CSG_Intersect_Intersections/76183 count: 8983544 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> 
All_Sphere_Intersections/76184 count: 31488162 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> 
Inside_Plane/76197 count: 19044626 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
All_CSG_Intersect_Intersections/9843 -> All_Sphere_Intersections/76011 
count: 22068935 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
All_CSG_Intersect_Intersections/9843 -> Inside_Plane/76031 count: 13347702 
(adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
block_light_source/26304 -> All_CSG_Intersect_Intersections/76130 count: 
5434215 (adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
block_light_source/26304 -> All_Sphere_Intersections/76139 count: 19047432 
(adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
block_light_source/26304 -> Inside_Plane/76134 count: 11520241 (adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
Inside_CSG_Union/9845 -> Inside_Plane/76081 count: 830538 (adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
All_CSG_Union_Intersections/9842 -> All_Plane_Intersections/76049 count: 
1636158 (adjusted)

> 
> Thanks,
> Martin
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-19  8:50   ` luoxhu
@ 2019-06-19  8:56     ` Martin Liška
  2019-06-19 12:18       ` Martin Liška
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Liška @ 2019-06-19  8:56 UTC (permalink / raw)
  To: luoxhu, gcc-patches; +Cc: hubicka, segher, wschmidt, luoxhu

On 6/19/19 10:50 AM, luoxhu wrote:
> Hi Martin,
> 
> On 2019/6/18 18:21, Martin Liška wrote:
>> On 6/18/19 3:45 AM, Xiong Hu Luo wrote:
>>>      6.2.  SPEC2017 peakrate:
>>>          523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r (+13.33%);
>>>          525.x264_r (-5.29%).
>>
>> Can you please elaborate what are the key indirect call promotions that are needed
>> to achieve such a significant speed up? Are we talking about calls to virtual functions
>> or C-style indirect calls?
> 
> For benchmark 511.povray_r, no speculations and indirect call promotion
> happened from povray_r.wpa.069i.profile_estimate:
> 
>     994 171 indirect calls trained.
>     995 0 (0.00%) have common target.
>     996 0 (0.00%) targets was not found.
>     997 0 (0.00%) targets had parameter count mismatch.
>     998 0 (0.00%) targets was not in polymorphic call target list.
>     999 0 (0.00%) speculations seems useless.
>    1000 0 (0.00%) speculations produced.
> 
> 
> After applying my patch:
> 
>    1259 171 indirect calls trained.
>    1260 60 (35.09%) have common target.
>    1261 41 (23.98%) targets was not found.
>    1262 0 (0.00%) targets had parameter count mismatch.
>    1263 0 (0.00%) targets was not in polymorphic call target list.
>    1264 57 (33.33%) speculations seems useless.
>    1265 5 (2.92%) speculations produced.
> 
> Below indirect calls conversion will take effect, as all of these calls
> are hot functions, performance boosts a lot by the combination optimization
> of later stage ipa/inline/clone.
> 
> ls *.*i.* | xargs grep "Expanding speculative call"
> povray_r.ltrans5.076i.inline:Expanding speculative call of create_ray.constprop/75445 -> Inside_CSG_Intersection/76219 count: 291083 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of create_ray.constprop/75445 -> Inside_Plane/76221 count: 387811 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of initialize_ray_container_state_tree/54575 -> Inside_CSG_Intersection/75997 count: 3784081 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of initialize_ray_container_state_tree/54575 -> Inside_Plane/76062 count: 5041557 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> All_CSG_Intersect_Intersections/76183 count: 8983544 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> All_Sphere_Intersections/76184 count: 31488162 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> Inside_Plane/76197 count: 19044626 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of All_CSG_Intersect_Intersections/9843 -> All_Sphere_Intersections/76011 count: 22068935 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of All_CSG_Intersect_Intersections/9843 -> Inside_Plane/76031 count: 13347702 (adjusted)
> povray_r.ltrans6.076i.inline:Expanding speculative call of block_light_source/26304 -> All_CSG_Intersect_Intersections/76130 count: 5434215 (adjusted)
> povray_r.ltrans6.076i.inline:Expanding speculative call of block_light_source/26304 -> All_Sphere_Intersections/76139 count: 19047432 (adjusted)
> povray_r.ltrans6.076i.inline:Expanding speculative call of block_light_source/26304 -> Inside_Plane/76134 count: 11520241 (adjusted)
> povray_r.ltrans6.076i.inline:Expanding speculative call of Inside_CSG_Union/9845 -> Inside_Plane/76081 count: 830538 (adjusted)
> povray_r.ltrans6.076i.inline:Expanding speculative call of All_CSG_Union_Intersections/9842 -> All_Plane_Intersections/76049 count: 1636158 (adjusted)
> 
>>
>> Thanks,
>> Martin
>>
> 

Thank you very much for the numbers. Today, I'm going to prepare the generalization of single-value counter to track N values.

Martin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-19  8:56     ` Martin Liška
@ 2019-06-19 12:18       ` Martin Liška
  2019-06-20  1:59         ` luoxhu
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Liška @ 2019-06-19 12:18 UTC (permalink / raw)
  To: luoxhu, gcc-patches; +Cc: hubicka, segher, wschmidt, luoxhu

[-- Attachment #1: Type: text/plain, Size: 775 bytes --]

On 6/19/19 10:56 AM, Martin Liška wrote:
> Thank you very much for the numbers. Today, I'm going to prepare the generalization of single-value counter to track N values.

Ok, here's a patch candidate that does tracking of most common N values. For your test-case I can see:

pr69678.gcda:    01a90000:  18:COUNTERS indirect_call 9 counts
pr69678.gcda:                   0: 350000000 1868707024 175000000 969338501 175000000 0 0 0 
pr69678.gcda:                   8: 0 

So for now, you'll need to generalize get_most_common_single_value to return
N most common values.

Eventually we'll need to renamed the counter as it won't be tracking just a single value
any longer. I can take care of it.

Can you please verify that the patch candidate works for you?
Thanks,
Martin

[-- Attachment #2: 0001-Support-N-values-in-libgcov-for-single-value-counter.patch --]
[-- Type: text/x-patch, Size: 3851 bytes --]

From 93175b20aa794baf1795ff1ccb3ac0391c326ada Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Wed, 19 Jun 2019 14:15:14 +0200
Subject: [PATCH] Support N values in libgcov for single value counter type.

---
 libgcc/libgcov-merge.c    | 48 +++++++++++++++++++++------------------
 libgcc/libgcov-profiler.c | 38 +++++++++++++++++++++++--------
 2 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
index f778cc4b6b7..84367005663 100644
--- a/libgcc/libgcov-merge.c
+++ b/libgcc/libgcov-merge.c
@@ -89,49 +89,53 @@ __gcov_merge_time_profile (gcov_type *counters, unsigned n_counters)
 static void
 merge_single_value_set (gcov_type *counters)
 {
-  unsigned j;
-  gcov_type value, counter;
-
   /* First value is number of total executions of the profiler.  */
   gcov_type all = gcov_get_counter_ignore_scaling (-1);
   counters[0] += all;
   ++counters;
 
+  /* Read all part values.  */
+  gcov_type read_counters[2 * GCOV_DISK_SINGLE_VALUES];
+
   for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
     {
-      value = gcov_get_counter_target ();
-      counter = gcov_get_counter_ignore_scaling (-1);
+      read_counters[2 * i] = gcov_get_counter_target ();
+      read_counters[2 * i + 1] = gcov_get_counter_ignore_scaling (-1);
+    }
 
-      if (counter == -1)
-	{
-	  counters[1] = -1;
-	  /* We can't return as we need to read all counters.  */
-	  continue;
-	}
-      else if (counter == 0 || counters[1] == -1)
-	{
-	  /* We can't return as we need to read all counters.  */
-	  continue;
-	}
+  if (read_counters[1] == -1)
+    {
+      counters[1] = -1;
+      return;
+    }
+
+  for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
+    {
+      if (read_counters[2 * i + 1] == 0)
+	return;
 
+      unsigned j;
       for (j = 0; j < GCOV_DISK_SINGLE_VALUES; j++)
 	{
-	  if (counters[2 * j] == value)
+	  if (counters[2 * j] == read_counters[2 * i])
 	    {
-	      counters[2 * j + 1] += counter;
+	      counters[2 * j + 1] += read_counters[2 * i + 1];
 	      break;
 	    }
 	  else if (counters[2 * j + 1] == 0)
 	    {
-	      counters[2 * j] = value;
-	      counters[2 * j + 1] = counter;
+	      counters[2 * j] += read_counters[2 * i];
+	      counters[2 * j + 1] += read_counters[2 * i + 1];
 	      break;
 	    }
 	}
 
-      /* We haven't found a free slot for the value, mark overflow.  */
+      /* We haven't found a slot, bail out.  */
       if (j == GCOV_DISK_SINGLE_VALUES)
-	counters[1] = -1;
+	{
+	  counters[1] = -1;
+	  return;
+	}
     }
 }
 
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 9ba65b90df3..0ef400bdda7 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -118,20 +118,38 @@ static inline void
 __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
 				int use_atomic)
 {
-  if (value == counters[1])
-    counters[2]++;
-  else if (counters[2] == 0)
-    {
-      counters[2] = 1;
-      counters[1] = value;
-    }
-  else
-    counters[2]--;
-
   if (use_atomic)
     __atomic_fetch_add (&counters[0], 1, __ATOMIC_RELAXED);
   else
     counters[0]++;
+
+  ++counters;
+
+  /* We have GCOV_DISK_SINGLE_VALUES as we can keep multiple values
+     next to each other.  */
+  unsigned sindex = 0;
+
+  for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
+    {
+      if (value == counters[2 * i])
+	{
+	  counters[2 * i + 1]++;
+	  return;
+	}
+      else if (counters[2 * i + 1] == 0)
+	{
+	  /* We found an empty slot.  */
+	  counters[2 * i] = value;
+	  counters[2 * i + 1] = 1;
+	  return;
+	}
+
+      if (counters[2 * i + 1] < counters[2 * sindex + 1])
+	sindex = i;
+    }
+
+  /* We haven't found an empty slot, then decrement the smallest.  */
+  counters[2 * sindex + 1]--;
 }
 
 #ifdef L_gcov_one_value_profiler_v2
-- 
2.21.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-19 12:18       ` Martin Liška
@ 2019-06-20  1:59         ` luoxhu
  2019-06-20  6:15           ` luoxhu
  0 siblings, 1 reply; 25+ messages in thread
From: luoxhu @ 2019-06-20  1:59 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: hubicka, segher, wschmidt, luoxhu



On 2019/6/19 20:18, Martin Liška wrote:
> On 6/19/19 10:56 AM, Martin Liška wrote:
>> Thank you very much for the numbers. Today, I'm going to prepare the generalization of single-value counter to track N values.
> 
> Ok, here's a patch candidate that does tracking of most common N values. For your test-case I can see:
> 
> pr69678.gcda:    01a90000:  18:COUNTERS indirect_call 9 counts
> pr69678.gcda:                   0: 350000000 1868707024 175000000 969338501 175000000 0 0 0
> pr69678.gcda:                   8: 0
> 
> So for now, you'll need to generalize get_most_common_single_value to return
> N most common values.
> 
> Eventually we'll need to renamed the counter as it won't be tracking just a single value
> any longer. I can take care of it.
> 
> Can you please verify that the patch candidate works for you?
Thanks, the profile data seems good, I will try it.  I need rebase my patch
to trunk first, as there are many conflicts with your previous patch.


> Thanks,
> Martin
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-20  1:59         ` luoxhu
@ 2019-06-20  6:15           ` luoxhu
  2019-06-20 12:57             ` Martin Liška
  0 siblings, 1 reply; 25+ messages in thread
From: luoxhu @ 2019-06-20  6:15 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: hubicka, segher, wschmidt, luoxhu

Hi Martin,

On 2019/6/20 09:59, luoxhu wrote:
> 
> 
> On 2019/6/19 20:18, Martin Liška wrote:
>> On 6/19/19 10:56 AM, Martin Liška wrote:
>>> Thank you very much for the numbers. Today, I'm going to prepare the 
>>> generalization of single-value counter to track N values.
>>
>> Ok, here's a patch candidate that does tracking of most common N values. 
>> For your test-case I can see:
>>
>> pr69678.gcda:    01a90000:  18:COUNTERS indirect_call 9 counts
>> pr69678.gcda:                   0: 350000000 1868707024 175000000 
>> 969338501 175000000 0 0 0
>> pr69678.gcda:                   8: 0
>>
>> So for now, you'll need to generalize get_most_common_single_value to return
>> N most common values.
>>
>> Eventually we'll need to renamed the counter as it won't be tracking just 
>> a single value
>> any longer. I can take care of it.
>>
>> Can you please verify that the patch candidate works for you?
> Thanks, the profile data seems good, I will try it.  I need rebase my patch
> to trunk first, as there are many conflicts with your previous patch.

The patch works perfect for me, lots of duplicate code can be removed base
on that.  Hope you can upstream it soon.  :)
BTW, I don't need call the get_most_common_single_value function to access
the histogram values & counters, I will loop access it directly one by one.

Thanks
Xionghu

> 
> 
>> Thanks,
>> Martin
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-20  6:15           ` luoxhu
@ 2019-06-20 12:57             ` Martin Liška
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Liška @ 2019-06-20 12:57 UTC (permalink / raw)
  To: luoxhu, gcc-patches; +Cc: hubicka, segher, wschmidt, luoxhu

On 6/20/19 8:15 AM, luoxhu wrote:
> Hi Martin,
> 
> On 2019/6/20 09:59, luoxhu wrote:
>>
>>
>> On 2019/6/19 20:18, Martin Liška wrote:
>>> On 6/19/19 10:56 AM, Martin Liška wrote:
>>>> Thank you very much for the numbers. Today, I'm going to prepare the generalization of single-value counter to track N values.
>>>
>>> Ok, here's a patch candidate that does tracking of most common N values. For your test-case I can see:
>>>
>>> pr69678.gcda:    01a90000:  18:COUNTERS indirect_call 9 counts
>>> pr69678.gcda:                   0: 350000000 1868707024 175000000 969338501 175000000 0 0 0
>>> pr69678.gcda:                   8: 0
>>>
>>> So for now, you'll need to generalize get_most_common_single_value to return
>>> N most common values.
>>>
>>> Eventually we'll need to renamed the counter as it won't be tracking just a single value
>>> any longer. I can take care of it.
>>>
>>> Can you please verify that the patch candidate works for you?
>> Thanks, the profile data seems good, I will try it.  I need rebase my patch
>> to trunk first, as there are many conflicts with your previous patch.
> 
> The patch works perfect for me, lots of duplicate code can be removed base
> on that.  Hope you can upstream it soon.  :)

Yep, I'll send it in coupe of hours.

> BTW, I don't need call the get_most_common_single_value function to access
> the histogram values & counters, I will loop access it directly one by one.

No, please do not do it. I would like to see get_most_common_single_value being used
for your purpose. You'll have to generalize it, but please no direct accessed
to the histogram values.

Thanks,
Martin

> 
> Thanks
> Xionghu
> 
>>
>>
>>> Thanks,
>>> Martin
>>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-18  1:46 [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization Xiong Hu Luo
  2019-06-18  5:51 ` Martin Liška
  2019-06-18 10:21 ` Martin Liška
@ 2019-06-20 13:47 ` Jan Hubicka
  2019-06-20 14:45   ` Martin Liška
                     ` (2 more replies)
  2 siblings, 3 replies; 25+ messages in thread
From: Jan Hubicka @ 2019-06-20 13:47 UTC (permalink / raw)
  To: Xiong Hu Luo; +Cc: gcc-patches, mliska, segher, wschmidt, luoxhu

Hi,
some comments on the ipa part of the patch
(and thanks for working on it - this was on my TODO list for years)

> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index de82316d4b1..0d373a67d1b 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
>  	fprintf (dump_file, "Introduced new external node "
>  		 "(%s) and turned into root of the clone tree.\n",
>  		 node->dump_name ());
> +      node->profile_id = first_clone->profile_id;
>      }
>    else if (dump_file)
>      fprintf (dump_file, "Introduced new external node "

This is independent of the rest of changes.  Do you have example where
this matters? The inline clones are created in ipa-inline while
ipa-profile is run before it, so I can not think of such a scenario.
I see you also copy profile_id from function to clone.  I would like to
know why you needed that.

Also you mention that you hit some ICEs. If fixes are independent of
rest of your changes, send them separately.

> @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
>    int i;
>    cgraph_edge *e2;
>    cgraph_edge *e = this;
> +  cgraph_node *referred_node;
>  
>    if (!e->indirect_unknown_callee)
>      for (e2 = e->caller->indirect_calls;
> @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
>  	&& ((ref->stmt && ref->stmt == e->call_stmt)
>  	    || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
>        {
> -	reference = ref;
> -	break;
> +	if (e2->indirect_info && e2->indirect_info->num_of_ics)
> +	  {
> +	    referred_node = dyn_cast<cgraph_node *> (ref->referred);
> +	    if (strstr (e->callee->name (), referred_node->name ()))
> +	      {
> +		reference = ref;
> +		break;
> +	      }
> +	  }
> +	else
> +	  {
> +	    reference = ref;
> +	    break;
> +	  }
>        }

This function is intended to return everything related to the
speculative call, so if you add multiple direct targets, i would expect
it to tage auto_vec of cgraph_nodes for direct and auto_vec of
references.
>  
>    /* Speculative edge always consist of all three components - direct edge,
> @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
>           in the functions inlined through it.  */
>      }
>    edge->count += e2->count;
> -  edge->speculative = false;
> +  if (edge->indirect_info && edge->indirect_info->num_of_ics)
> +    {
> +      edge->indirect_info->num_of_ics--;
> +      if (edge->indirect_info->num_of_ics == 0)
> +	edge->speculative = false;
> +    }
> +  else
> +    edge->speculative = false;
>    e2->speculative = false;
>    ref->remove_reference ();
>    if (e2->indirect_unknown_callee || e2->inline_failed)

This function should turn speculative call into direct call to DECL, so
I think it should remove all the other direct calls associated with stmt
and the indirect one.

There are now two cases - in first case you want to turn speculative
call into direct call or give up on especulation completely, while in
other case you want to only remove one of speculations.

I guess we want to have resolve_speculation(decl) for first and 
remove_one_speculation(edge) for the second case?
The second case would be useful for the code below handling type
mismatches and also for inline when one of speculative targets seems not
useful to bother with.
> @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>  	  e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
>  						     false);
>  	  e->count = gimple_bb (e->call_stmt)->count;
> -	  e2->speculative = false;
> +	  if (e2->indirect_info && e2->indirect_info->num_of_ics)
> +	    {
> +	      e2->indirect_info->num_of_ics--;
> +	      if (e2->indirect_info->num_of_ics == 0)
> +		e2->speculative = false;
> +	    }
> +	  else
> +	    e2->speculative = false;
>  	  e2->count = gimple_bb (e2->call_stmt)->count;
>  	  ref->speculative = false;
>  	  ref->stmt = NULL;

>  extern void debuginfo_early_init (void);
>  extern void debuginfo_init (void);
> @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info
>    int param_index;
>    /* ECF flags determined from the caller.  */
>    int ecf_flags;
> -  /* Profile_id of common target obtrained from profile.  */
> +  /* Profile_id of common target obtained from profile.  */
>    int common_target_id;
>    /* Probability that call will land in function with COMMON_TARGET_ID.  */
>    int common_target_probability;
>  
> +  /* Profile_id of common target obtained from profile.  */
> +  int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2];
> +  /* Probabilities that call will land in function with COMMON_TARGET_IDS.  */
> +  int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2];

I would use vec of pairs (profile_id,probability) to hold this and do
not wire in GCOV_ICALL_TOPN_NCOUTS.  Most of time this vec will be just
NULL pointer so it will result in less memory overhead and will avoid
hard limit on number of speculations we want to do.

Note that the speculative edges may end up being redirected during IPA
optimization, for example when their target is cloned for particular
call context or when the function is detected identical to other
function.  So one can not preserve the mapping between targets and
profile ids.

Also this infrastructure is useful even w/o profile because we could use
ipa-devirt to devirtualize even when multiple polymorphic targets are
found. So I would not wirte in the limit GCOV_ICALL_TOPN_NCOUNTS and
just use dynamically allocated vectors instead.

With ipa-devirt it is possible that we know that there are precisely 2
possible polymorphic targets.  Other case i was considering was to
speculatively inline with -fPIC.  I.e. when one has interposiable call
for foo() one can create foo.localalias() for the definition visible to
compiler and then speculate

if (foo == foo.localalias())
  inline_path
else
  foo();

In these cases we may end up with indirect
call that has no associated indirect edge (which we do not support
currently), so it may be interesting to move speculative call info away
from indirect call info to cgraph edge structure (but that can be done
incrementally based on what you do now - this code is not completely
easy so lets do it step by step).

> @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void)
>  		      gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (node->decl),
>  						      stmt, h);
>  		    }
> +		  else if (h && type == HIST_TYPE_INDIR_CALL_TOPN)
> +		    {
> +		      unsigned j;
> +		      struct cgraph_edge *e = node->get_edge (stmt);
> +		      if (e && !e->indirect_unknown_callee)
> +			continue;

I suppose you are going to change this for Martin's implementation, so i
have skipped it for now.
> @@ -558,7 +601,8 @@ ipa_profile (void)
>  	{
>  	  if (n->count.initialized_p ())
>  	    nindirect++;
> -	  if (e->indirect_info->common_target_id)
> +	  if (e->indirect_info->common_target_id
> +	      || (e->indirect_info && e->indirect_info->num_of_ics == 1))
>  	    {
>  	      if (!node_map_initialized)
>  	        init_node_map (false);
> @@ -613,7 +657,7 @@ ipa_profile (void)
>  		      if (dump_file)
>  			fprintf (dump_file,
>  				 "Not speculating: "
> -				 "parameter count mistmatch\n");
> +				 "parameter count mismatch\n");
>  		    }
>  		  else if (e->indirect_info->polymorphic
>  			   && !opt_for_fn (n->decl, flag_devirtualize)
> @@ -655,7 +699,130 @@ ipa_profile (void)
>  		  nunknown++;
>  		}
>  	    }
> -	 }
> +	  if (e->indirect_info && e->indirect_info->num_of_ics > 1)
> +	    {
> +	      if (in_lto_p)
> +		{
> +		  if (dump_file)
> +		    {
> +		      fprintf (dump_file,
> +			       "Updating hotness threshold in LTO mode.\n");
> +		      fprintf (dump_file, "Updated min count: %" PRId64 "\n",
> +			       (int64_t) threshold);
> +		    }
> +		  set_hot_bb_threshold (threshold
> +					/ e->indirect_info->num_of_ics);
> +		}

Why do you need different paths for one target and for multiple targets?
Also why LTO is different here?
> +	      if (!node_map_initialized)
> +		init_node_map (false);
> +	      node_map_initialized = true;
> +	      ncommon++;
> +	      unsigned speculative = 0;
> +	      for (i = 0; i < (int)e->indirect_info->num_of_ics; i++)
> +		{
> +		  n2 = find_func_by_profile_id (
> +		    e->indirect_info->common_target_ids[i]);
> +		  if (n2)
> +		    {
> +		      if (dump_file)
> +			{
> +			  fprintf (
> +			    dump_file,
> +			    "Indirect call -> direct call from"
> +			    " other module %s => %s, prob %3.2f\n",
> +			    n->dump_name (), n2->dump_name (),
> +			    e->indirect_info->common_target_probabilities[i]
> +			      / (float) REG_BR_PROB_BASE);
> +			}
> +		      if (e->indirect_info->common_target_probabilities[i]
> +			  < REG_BR_PROB_BASE / 2)
> +			{
> +			  nuseless++;
> +			  if (dump_file)
> +			    fprintf (
> +			      dump_file,
> +			      "Not speculating: probability is too low.\n");
> +			}
> +		      else if (!e->maybe_hot_p ())
> +			{
> +			  nuseless++;
> +			  if (dump_file)
> +			    fprintf (dump_file,
> +				     "Not speculating: call is cold.\n");
> +			}
> +		      else if (n2->get_availability () <= AVAIL_INTERPOSABLE
> +			       && n2->can_be_discarded_p ())
> +			{
> +			  nuseless++;
> +			  if (dump_file)
> +			    fprintf (dump_file,
> +				     "Not speculating: target is overwritable "
> +				     "and can be discarded.\n");
> +			}
> +		      else if (ipa_node_params_sum && ipa_edge_args_sum
> +			       && (!vec_safe_is_empty (
> +				 IPA_NODE_REF (n2)->descriptors))
> +			       && ipa_get_param_count (IPA_NODE_REF (n2))
> +				    != ipa_get_cs_argument_count (
> +				      IPA_EDGE_REF (e))
> +			       && (ipa_get_param_count (IPA_NODE_REF (n2))
> +				     >= ipa_get_cs_argument_count (
> +				       IPA_EDGE_REF (e))
> +				   || !stdarg_p (TREE_TYPE (n2->decl))))
> +			{
> +			  nmismatch++;
> +			  if (dump_file)
> +			    fprintf (dump_file, "Not speculating: "
> +						"parameter count mismatch\n");
> +			}
> +		      else if (e->indirect_info->polymorphic
> +			       && !opt_for_fn (n->decl, flag_devirtualize)
> +			       && !possible_polymorphic_call_target_p (e, n2))
> +			{
> +			  nimpossible++;
> +			  if (dump_file)
> +			    fprintf (dump_file,
> +				     "Not speculating: "
> +				     "function is not in the polymorphic "
> +				     "call target list\n");
> +			}
> +		      else
> +			{
> +			  /* Target may be overwritable, but profile says that
> +			     control flow goes to this particular implementation
> +			     of N2.  Speculate on the local alias to allow
> +			     inlining.
> +			     */
> +			  if (!n2->can_be_discarded_p ())
> +			    {
> +			      cgraph_node *alias;
> +			      alias = dyn_cast<cgraph_node *> (
> +				n2->noninterposable_alias ());
> +			      if (alias)
> +				n2 = alias;
> +			    }
> +			  nconverted++;
> +			  e->make_speculative (
> +			    n2, e->count.apply_probability (
> +				  e->indirect_info
> +				    ->common_target_probabilities[i]));
> +			  update = true;
> +			  speculative++;
> +			}
> +		    }
> +		  else
> +		    {
> +		      if (dump_file)
> +			fprintf (dump_file,
> +				 "Function with profile-id %i not found.\n",
> +				 e->indirect_info->common_target_ids[i]);
> +		      nunknown++;
> +		    }
> +		}
> +	      if (speculative < e->indirect_info->num_of_ics)
> +		e->indirect_info->num_of_ics = speculative;
> +	    }
> +	}
>         if (update)
>  	 ipa_update_overall_fn_summary (n);
>       }
> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
> index 79b250c3943..30347691029 100644
> --- a/gcc/ipa-utils.c
> +++ b/gcc/ipa-utils.c
> @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst,
>        update_max_bb_count ();
>        compute_function_frequency ();
>        pop_cfun ();
> +      /* When src is speculative, clone the referrings.  */
> +      if (src->indirect_call_target)
> +	for (e = src->callers; e; e = e->next_caller)
> +	  if (e->callee == src && e->speculative)
> +	    dst->clone_referring (src);

This looks wrong. Why do you need to copy all references from src
to target?
> +			    /* Speculative calls consist of two edges - direct
> +			       and indirect.  Duplicate the whole thing and
> +			       distribute frequencies accordingly.  */
> +			    if (edge->speculative)
I think here you want to handle the whole group with multiple targets.
> +			      {
> +				struct ipa_ref *ref;
> +
> +				gcc_assert (!edge->indirect_unknown_callee);
> +				old_edge->speculative_call_info (direct,
> +								 indirect, ref);
> +
> +				profile_count indir_cnt = indirect->count;
> +				indirect
> +				  = indirect->clone (id->dst_node, call_stmt,
> +						     gimple_uid (stmt), num,
> +						     den, true);
> +
> +				profile_probability prob
> +				  = indir_cnt.probability_in (old_cnt
> +							      + indir_cnt);
> +				indirect->count
> +				  = copy_basic_block->count.apply_probability (
> +				    prob);
> +				edge->count
> +				  = copy_basic_block->count - indirect->count;
> +				id->dst_node->clone_reference (ref, stmt);
> +			      }
> +			    else
> +			      edge->count = copy_basic_block->count;
> +			  }
> +			/* If the indirect call contains more than one indirect
> +			   targets, need clone all speculative edges here.  */
> +			if (old_edge && old_edge->next_callee
> +			    && old_edge->speculative && indirect
> +			    && indirect->indirect_info
> +			    && indirect->indirect_info->num_of_ics > 1)
> +			  {
> +			    edge = old_edge->next_callee;
> +			    old_edge = old_edge->next_callee;
> +			    if (edge->speculative)
> +			      next_speculative = true;
> +			  }
> +		      }
> +		    while (next_speculative);
> +		  }
>  		  break;
>  
>  		case CB_CGE_MOVE_CLONES:

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-20 13:47 ` Jan Hubicka
@ 2019-06-20 14:45   ` Martin Liška
  2019-07-01 11:20     ` Martin Liška
  2019-07-03  9:08     ` Jan Hubicka
  2019-06-20 14:46   ` [PATCH 2/2] Rename SINGE_VALUE to TOPN_VALUES counters Martin Liška
  2019-06-24  2:34   ` [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization luoxhu
  2 siblings, 2 replies; 25+ messages in thread
From: Martin Liška @ 2019-06-20 14:45 UTC (permalink / raw)
  To: Jan Hubicka, Xiong Hu Luo; +Cc: gcc-patches, segher, wschmidt, luoxhu

[-- Attachment #1: Type: text/plain, Size: 188 bytes --]

Hi.

So the first part is about support of N tracked values to be supported.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

[-- Attachment #2: 0001-Support-N-values-in-libgcov-for-single-value-counter.patch --]
[-- Type: text/x-patch, Size: 5382 bytes --]

From f3e361fb6d799acf538bc76a91bfcc8e265b7cbe Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Wed, 19 Jun 2019 14:15:14 +0200
Subject: [PATCH 1/2] Support N values in libgcov for single value counter
 type.

gcc/testsuite/ChangeLog:

2019-06-20  Martin Liska  <mliska@suse.cz>

	* gcc.dg/tree-prof/val-prof-2.c: Update scanned pattern
	as we do now better.

libgcc/ChangeLog:

2019-06-20  Martin Liska  <mliska@suse.cz>

	* libgcov-merge.c (merge_single_value_set): Support N values.
	* libgcov-profiler.c (__gcov_one_value_profiler_body): Likewise.
---
 gcc/testsuite/gcc.dg/tree-prof/val-prof-2.c |  5 +--
 libgcc/libgcov-merge.c                      | 48 +++++++++++----------
 libgcc/libgcov-profiler.c                   | 42 ++++++++++++++----
 3 files changed, 60 insertions(+), 35 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-prof-2.c b/gcc/testsuite/gcc.dg/tree-prof/val-prof-2.c
index 8cb3c64fd17..b3bbadfeb40 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/val-prof-2.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/val-prof-2.c
@@ -25,8 +25,5 @@ main ()
   return 0;
 }
 /* autofdo does not do value profiling so far */
-/* { dg-final-use-not-autofdo { scan-ipa-dump "Transformation done: mod power of 2" "profile" } } */
-/* This is part of code checking that n is power of 2, so we are sure that the transformation
-   didn't get optimized out.  */
-/* { dg-final-use-not-autofdo { scan-tree-dump "n_\[0-9\]* \\+ (4294967295|0x0*ffffffff)" "optimized"} } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Transformation done: div/mod by constant 256" "profile" } } */
 /* { dg-final-use { scan-tree-dump-not "Invalid sum" "optimized"} } */
diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
index f778cc4b6b7..84367005663 100644
--- a/libgcc/libgcov-merge.c
+++ b/libgcc/libgcov-merge.c
@@ -89,49 +89,53 @@ __gcov_merge_time_profile (gcov_type *counters, unsigned n_counters)
 static void
 merge_single_value_set (gcov_type *counters)
 {
-  unsigned j;
-  gcov_type value, counter;
-
   /* First value is number of total executions of the profiler.  */
   gcov_type all = gcov_get_counter_ignore_scaling (-1);
   counters[0] += all;
   ++counters;
 
+  /* Read all part values.  */
+  gcov_type read_counters[2 * GCOV_DISK_SINGLE_VALUES];
+
   for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
     {
-      value = gcov_get_counter_target ();
-      counter = gcov_get_counter_ignore_scaling (-1);
+      read_counters[2 * i] = gcov_get_counter_target ();
+      read_counters[2 * i + 1] = gcov_get_counter_ignore_scaling (-1);
+    }
 
-      if (counter == -1)
-	{
-	  counters[1] = -1;
-	  /* We can't return as we need to read all counters.  */
-	  continue;
-	}
-      else if (counter == 0 || counters[1] == -1)
-	{
-	  /* We can't return as we need to read all counters.  */
-	  continue;
-	}
+  if (read_counters[1] == -1)
+    {
+      counters[1] = -1;
+      return;
+    }
+
+  for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
+    {
+      if (read_counters[2 * i + 1] == 0)
+	return;
 
+      unsigned j;
       for (j = 0; j < GCOV_DISK_SINGLE_VALUES; j++)
 	{
-	  if (counters[2 * j] == value)
+	  if (counters[2 * j] == read_counters[2 * i])
 	    {
-	      counters[2 * j + 1] += counter;
+	      counters[2 * j + 1] += read_counters[2 * i + 1];
 	      break;
 	    }
 	  else if (counters[2 * j + 1] == 0)
 	    {
-	      counters[2 * j] = value;
-	      counters[2 * j + 1] = counter;
+	      counters[2 * j] += read_counters[2 * i];
+	      counters[2 * j + 1] += read_counters[2 * i + 1];
 	      break;
 	    }
 	}
 
-      /* We haven't found a free slot for the value, mark overflow.  */
+      /* We haven't found a slot, bail out.  */
       if (j == GCOV_DISK_SINGLE_VALUES)
-	counters[1] = -1;
+	{
+	  counters[1] = -1;
+	  return;
+	}
     }
 }
 
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 9ba65b90df3..04d6f9c0e40 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -118,20 +118,44 @@ static inline void
 __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
 				int use_atomic)
 {
-  if (value == counters[1])
-    counters[2]++;
-  else if (counters[2] == 0)
+  if (use_atomic)
+    __atomic_fetch_add (&counters[0], 1, __ATOMIC_RELAXED);
+  else
+    counters[0]++;
+
+  ++counters;
+
+  /* We have GCOV_DISK_SINGLE_VALUES as we can keep multiple values
+     next to each other.  */
+  unsigned sindex = 0;
+
+  for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
     {
-      counters[2] = 1;
-      counters[1] = value;
+      if (value == counters[2 * i])
+	{
+	  if (use_atomic)
+	    __atomic_fetch_add (&counters[2 * i + 1], 1, __ATOMIC_RELAXED);
+	  else
+	    counters[2 * i + 1]++;
+	  return;
+	}
+      else if (counters[2 * i + 1] == 0)
+	{
+	  /* We found an empty slot.  */
+	  counters[2 * i] = value;
+	  counters[2 * i + 1] = 1;
+	  return;
+	}
+
+      if (counters[2 * i + 1] < counters[2 * sindex + 1])
+	sindex = i;
     }
-  else
-    counters[2]--;
 
+  /* We haven't found an empty slot, then decrement the smallest.  */
   if (use_atomic)
-    __atomic_fetch_add (&counters[0], 1, __ATOMIC_RELAXED);
+    __atomic_fetch_sub (&counters[2 * sindex + 1], 1, __ATOMIC_RELAXED);
   else
-    counters[0]++;
+    counters[2 * sindex + 1]--;
 }
 
 #ifdef L_gcov_one_value_profiler_v2
-- 
2.21.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/2] Rename SINGE_VALUE to TOPN_VALUES counters.
  2019-06-20 13:47 ` Jan Hubicka
  2019-06-20 14:45   ` Martin Liška
@ 2019-06-20 14:46   ` Martin Liška
  2019-07-01 11:21     ` Martin Liška
  2019-07-03  9:09     ` Jan Hubicka
  2019-06-24  2:34   ` [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization luoxhu
  2 siblings, 2 replies; 25+ messages in thread
From: Martin Liška @ 2019-06-20 14:46 UTC (permalink / raw)
  To: Jan Hubicka, Xiong Hu Luo; +Cc: gcc-patches, segher, wschmidt, luoxhu

[-- Attachment #1: Type: text/plain, Size: 222 bytes --]

And the second part is rename so that it reflect reality
that single value can actually track multiple values.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

[-- Attachment #2: 0002-Rename-SINGE_VALUE-to-TOPN_VALUES-counters.patch --]
[-- Type: text/x-patch, Size: 23806 bytes --]

From cc9e93d43941176e92b5821e5a8134a5319a10b4 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Thu, 20 Jun 2019 14:50:23 +0200
Subject: [PATCH 2/2] Rename SINGE_VALUE to TOPN_VALUES counters.

gcc/ChangeLog:

2019-06-20  Martin Liska  <mliska@suse.cz>

	* gcov-counter.def (GCOV_COUNTER_V_SINGLE): Remove.
	(GCOV_COUNTER_V_TOPN): New.
	(GCOV_COUNTER_V_INDIR): Use _topn.
	* gcov-io.h (GCOV_DISK_SINGLE_VALUES): Remove.
	(GCOV_TOPN_VALUES): New.
	(GCOV_SINGLE_VALUE_COUNTERS): Remove.
	(GCOV_TOPN_VALUES_COUNTERS): New.
	* profile.c (instrument_values): Use HIST_TYPE_TOPN_VALUES.
	* tree-profile.c:
	(gimple_init_gcov_profiler): Rename variables from one_value
	to topn_values.
	(gimple_gen_one_value_profiler): Remove.
	(gimple_gen_topn_values_profiler): New function.
	* value-prof.c (dump_histogram_value): Use TOPN_VALUES
	names instead of SINGLE_VALUE.
	(stream_out_histogram_value): Likewise.
	(stream_in_histogram_value): Likewise.
	(get_most_common_single_value): Likewise.
	(gimple_divmod_fixed_value_transform): Likewise.
	(gimple_stringops_transform): Likewise.
	(gimple_divmod_values_to_profile): Likewise.
	(gimple_stringops_values_to_profile): Likewise.
	(gimple_find_values_to_profile): Likewise.
	* value-prof.h (enum hist_type): Rename to TOPN.
	(gimple_gen_one_value_profiler): Remove.
	(gimple_gen_topn_values_profiler): New.

libgcc/ChangeLog:

2019-06-20  Martin Liska  <mliska@suse.cz>

	* Makefile.in: Use topn_values instead of one_value names.
	* libgcov-merge.c (__gcov_merge_single): Move to ...
	(__gcov_merge_topn): ... this.
	(merge_single_value_set): Move to ...
	(merge_topn_values_set): ... this.
	* libgcov-profiler.c (__gcov_one_value_profiler_body): Move to
	...
	(__gcov_topn_values_profiler_body): ... this.
	(__gcov_one_value_profiler_v2): Move to ...
	(__gcov_topn_values_profiler): ... this.
	(__gcov_one_value_profiler_v2_atomic): Move to ...
	(__gcov_topn_values_profiler_atomic): ... this.
	(__gcov_indirect_call_profiler_v4): Remove.
	* libgcov-util.c (__gcov_single_counter_op): Move to ...
	(__gcov_topn_counter_op): ... this.
	* libgcov.h (L_gcov_merge_single): Remove.
	(L_gcov_merge_topn): New.
	(__gcov_merge_single): Remove.
	(__gcov_merge_topn): New.
	(__gcov_one_value_profiler_v2): Move to ..
	(__gcov_topn_values_profiler): ... this.
	(__gcov_one_value_profiler_v2_atomic): Move to ...
	(__gcov_topn_values_profiler_atomic): ... this.
---
 gcc/gcov-counter.def      |  4 ++--
 gcc/gcov-io.h             |  7 +++----
 gcc/profile.c             |  4 ++--
 gcc/tree-profile.c        | 31 ++++++++++++++++---------------
 gcc/value-prof.c          | 35 ++++++++++++++++-------------------
 gcc/value-prof.h          |  6 +++---
 libgcc/Makefile.in        |  6 +++---
 libgcc/libgcov-merge.c    | 30 +++++++++++++++---------------
 libgcc/libgcov-profiler.c | 30 ++++++++++++------------------
 libgcc/libgcov-util.c     |  6 +++---
 libgcc/libgcov.h          | 10 +++++-----
 11 files changed, 80 insertions(+), 89 deletions(-)

diff --git a/gcc/gcov-counter.def b/gcc/gcov-counter.def
index b0596c8dc6b..1a2cbb27b31 100644
--- a/gcc/gcov-counter.def
+++ b/gcc/gcov-counter.def
@@ -36,10 +36,10 @@ DEF_GCOV_COUNTER(GCOV_COUNTER_V_INTERVAL, "interval", _add)
 DEF_GCOV_COUNTER(GCOV_COUNTER_V_POW2, "pow2", _add)
 
 /* The most common value of expression.  */
-DEF_GCOV_COUNTER(GCOV_COUNTER_V_SINGLE, "single", _single)
+DEF_GCOV_COUNTER(GCOV_COUNTER_V_TOPN, "topn", _topn)
 
 /* The most common indirect address.  */
-DEF_GCOV_COUNTER(GCOV_COUNTER_V_INDIR, "indirect_call", _single)
+DEF_GCOV_COUNTER(GCOV_COUNTER_V_INDIR, "indirect_call", _topn)
 
 /* Compute average value passed to the counter.  */
 DEF_GCOV_COUNTER(GCOV_COUNTER_AVERAGE, "average", _add)
diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index 0f2905c17ec..7df578f8538 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -266,12 +266,11 @@ GCOV_COUNTERS
 #define GCOV_N_VALUE_COUNTERS \
   (GCOV_LAST_VALUE_COUNTER - GCOV_FIRST_VALUE_COUNTER + 1)
 
-/* Number of single value histogram values that live
-   on disk representation.  */
-#define GCOV_DISK_SINGLE_VALUES 4
+/* Number of top N value histogram.  */
+#define GCOV_TOPN_VALUES 4
 
 /* Total number of single value counters.  */
-#define GCOV_SINGLE_VALUE_COUNTERS (2 * GCOV_DISK_SINGLE_VALUES + 1)
+#define GCOV_TOPN_VALUES_COUNTERS (2 * GCOV_TOPN_VALUES + 1)
 
 /* Convert a counter index to a tag.  */
 #define GCOV_TAG_FOR_COUNTER(COUNT)				\
diff --git a/gcc/profile.c b/gcc/profile.c
index 9aff9ef2b21..e3f8c5542be 100644
--- a/gcc/profile.c
+++ b/gcc/profile.c
@@ -167,8 +167,8 @@ instrument_values (histogram_values values)
 	  gimple_gen_pow2_profiler (hist, t, 0);
 	  break;
 
-	case HIST_TYPE_SINGLE_VALUE:
-	  gimple_gen_one_value_profiler (hist, t, 0);
+	case HIST_TYPE_TOPN_VALUES:
+	  gimple_gen_topn_values_profiler (hist, t, 0);
 	  break;
 
  	case HIST_TYPE_INDIR_CALL:
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 5ca4c3e80b6..554a8c98419 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -60,7 +60,7 @@ along with GCC; see the file COPYING3.  If not see
 static GTY(()) tree gcov_type_node;
 static GTY(()) tree tree_interval_profiler_fn;
 static GTY(()) tree tree_pow2_profiler_fn;
-static GTY(()) tree tree_one_value_profiler_fn;
+static GTY(()) tree tree_topn_values_profiler_fn;
 static GTY(()) tree tree_indirect_call_profiler_fn;
 static GTY(()) tree tree_average_profiler_fn;
 static GTY(()) tree tree_ior_profiler_fn;
@@ -117,7 +117,7 @@ gimple_init_gcov_profiler (void)
 {
   tree interval_profiler_fn_type;
   tree pow2_profiler_fn_type;
-  tree one_value_profiler_fn_type;
+  tree topn_values_profiler_fn_type;
   tree gcov_type_ptr;
   tree ic_profiler_fn_type;
   tree average_profiler_fn_type;
@@ -161,18 +161,18 @@ gimple_init_gcov_profiler (void)
 		     DECL_ATTRIBUTES (tree_pow2_profiler_fn));
 
       /* void (*) (gcov_type *, gcov_type)  */
-      one_value_profiler_fn_type
+      topn_values_profiler_fn_type
 	      = build_function_type_list (void_type_node,
 					  gcov_type_ptr, gcov_type_node,
 					  NULL_TREE);
-      fn_name = concat ("__gcov_one_value_profiler_v2", fn_suffix, NULL);
-      tree_one_value_profiler_fn = build_fn_decl (fn_name,
-						  one_value_profiler_fn_type);
+      fn_name = concat ("__gcov_topn_values_profiler", fn_suffix, NULL);
+      tree_topn_values_profiler_fn
+	= build_fn_decl (fn_name, topn_values_profiler_fn_type);
 
-      TREE_NOTHROW (tree_one_value_profiler_fn) = 1;
-      DECL_ATTRIBUTES (tree_one_value_profiler_fn)
+      TREE_NOTHROW (tree_topn_values_profiler_fn) = 1;
+      DECL_ATTRIBUTES (tree_topn_values_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
-		     DECL_ATTRIBUTES (tree_one_value_profiler_fn));
+		     DECL_ATTRIBUTES (tree_topn_values_profiler_fn));
 
       init_ic_make_global_vars ();
 
@@ -226,7 +226,7 @@ gimple_init_gcov_profiler (void)
          late, we need to initialize them by hand.  */
       DECL_ASSEMBLER_NAME (tree_interval_profiler_fn);
       DECL_ASSEMBLER_NAME (tree_pow2_profiler_fn);
-      DECL_ASSEMBLER_NAME (tree_one_value_profiler_fn);
+      DECL_ASSEMBLER_NAME (tree_topn_values_profiler_fn);
       DECL_ASSEMBLER_NAME (tree_indirect_call_profiler_fn);
       DECL_ASSEMBLER_NAME (tree_average_profiler_fn);
       DECL_ASSEMBLER_NAME (tree_ior_profiler_fn);
@@ -334,12 +334,13 @@ gimple_gen_pow2_profiler (histogram_value value, unsigned tag, unsigned base)
   gsi_insert_before (&gsi, call, GSI_NEW_STMT);
 }
 
-/* Output instructions as GIMPLE trees for code to find the most common value.
-   VALUE is the expression whose value is profiled.  TAG is the tag of the
-   section for counters, BASE is offset of the counter position.  */
+/* Output instructions as GIMPLE trees for code to find the most N common
+   values.  VALUE is the expression whose value is profiled.  TAG is the tag
+   of the section for counters, BASE is offset of the counter position.  */
 
 void
-gimple_gen_one_value_profiler (histogram_value value, unsigned tag, unsigned base)
+gimple_gen_topn_values_profiler (histogram_value value, unsigned tag,
+				 unsigned base)
 {
   gimple *stmt = value->hvalue.stmt;
   gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
@@ -350,7 +351,7 @@ gimple_gen_one_value_profiler (histogram_value value, unsigned tag, unsigned bas
   ref_ptr = force_gimple_operand_gsi (&gsi, ref_ptr,
 				      true, NULL_TREE, true, GSI_SAME_STMT);
   val = prepare_instrumented_value (&gsi, value);
-  call = gimple_build_call (tree_one_value_profiler_fn, 2, ref_ptr, val);
+  call = gimple_build_call (tree_topn_values_profiler_fn, 2, ref_ptr, val);
   gsi_insert_before (&gsi, call, GSI_NEW_STMT);
 }
 
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index 7289a698b71..66c4bbaad5c 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -257,23 +257,23 @@ dump_histogram_value (FILE *dump_file, histogram_value hist)
 		 (int64_t) hist->hvalue.counters[0]);
       break;
 
-    case HIST_TYPE_SINGLE_VALUE:
+    case HIST_TYPE_TOPN_VALUES:
     case HIST_TYPE_INDIR_CALL:
       if (hist->hvalue.counters)
 	{
 	  fprintf (dump_file,
-		   (hist->type == HIST_TYPE_SINGLE_VALUE
-		    ? "Single value counter " : "Indirect call counter"));
+		   (hist->type == HIST_TYPE_TOPN_VALUES
+		    ? "Top N value counter " : "Indirect call counter"));
 	  if (hist->hvalue.counters)
 	    {
 	      fprintf (dump_file, "all: %" PRId64 ", values: ",
 		       (int64_t) hist->hvalue.counters[0]);
-	      for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
+	      for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
 		{
 		  fprintf (dump_file, "[%" PRId64 ":%" PRId64 "]",
 			   (int64_t) hist->hvalue.counters[2 * i + 1],
 			   (int64_t) hist->hvalue.counters[2 * i + 2]);
-		  if (i != GCOV_DISK_SINGLE_VALUES - 1)
+		  if (i != GCOV_TOPN_VALUES - 1)
 		    fprintf (dump_file, ", ");
 		}
 	      fprintf (dump_file, ".\n");
@@ -331,7 +331,7 @@ stream_out_histogram_value (struct output_block *ob, histogram_value hist)
       /* When user uses an unsigned type with a big value, constant converted
 	 to gcov_type (a signed type) can be negative.  */
       gcov_type value = hist->hvalue.counters[i];
-      if (hist->type == HIST_TYPE_SINGLE_VALUE && i > 0)
+      if (hist->type == HIST_TYPE_TOPN_VALUES && i > 0)
 	;
       else
 	gcc_assert (value >= 0);
@@ -374,9 +374,9 @@ stream_in_histogram_value (struct lto_input_block *ib, gimple *stmt)
 	  ncounters = 2;
 	  break;
 
-	case HIST_TYPE_SINGLE_VALUE:
+	case HIST_TYPE_TOPN_VALUES:
 	case HIST_TYPE_INDIR_CALL:
-	  ncounters = GCOV_SINGLE_VALUE_COUNTERS;
+	  ncounters = GCOV_TOPN_VALUES_COUNTERS;
 	  break;
 
 	case HIST_TYPE_IOR:
@@ -713,7 +713,7 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, profile_probability prob,
   return tmp2;
 }
 
-/* Return most common value of SINGLE_VALUE histogram.  If
+/* Return most common value of TOPN_VALUE histogram.  If
    there's a unique value, return true and set VALUE and COUNT
    arguments.  */
 
@@ -731,7 +731,7 @@ get_most_common_single_value (gimple *stmt, const char *counter_type,
 
   gcov_type read_all = hist->hvalue.counters[0];
 
-  for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
+  for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
     {
       gcov_type v = hist->hvalue.counters[2 * i + 1];
       gcov_type c = hist->hvalue.counters[2 * i + 2];
@@ -780,7 +780,7 @@ gimple_divmod_fixed_value_transform (gimple_stmt_iterator *si)
     return false;
 
   histogram = gimple_histogram_value_of_type (cfun, stmt,
-					      HIST_TYPE_SINGLE_VALUE);
+					      HIST_TYPE_TOPN_VALUES);
   if (!histogram)
     return false;
 
@@ -1654,7 +1654,7 @@ gimple_stringops_transform (gimple_stmt_iterator *gsi)
     return false;
 
   histogram = gimple_histogram_value_of_type (cfun, stmt,
-					      HIST_TYPE_SINGLE_VALUE);
+					      HIST_TYPE_TOPN_VALUES);
   if (!histogram)
     return false;
 
@@ -1808,7 +1808,7 @@ gimple_divmod_values_to_profile (gimple *stmt, histogram_values *values)
 	/* Check for the case where the divisor is the same value most
 	   of the time.  */
 	values->quick_push (gimple_alloc_histogram_value (cfun,
-						      HIST_TYPE_SINGLE_VALUE,
+						      HIST_TYPE_TOPN_VALUES,
 						      stmt, divisor));
 
       /* For mod, check whether it is not often a noop (or replaceable by
@@ -1887,7 +1887,7 @@ gimple_stringops_values_to_profile (gimple *gs, histogram_values *values)
   if (TREE_CODE (blck_size) != INTEGER_CST)
     {
       values->safe_push (gimple_alloc_histogram_value (cfun,
-						       HIST_TYPE_SINGLE_VALUE,
+						       HIST_TYPE_TOPN_VALUES,
 						       stmt, blck_size));
       values->safe_push (gimple_alloc_histogram_value (cfun, HIST_TYPE_AVERAGE,
 						       stmt, blck_size));
@@ -1936,12 +1936,9 @@ gimple_find_values_to_profile (histogram_values *values)
 	  hist->n_counters = 2;
 	  break;
 
-	case HIST_TYPE_SINGLE_VALUE:
-	  hist->n_counters = GCOV_SINGLE_VALUE_COUNTERS;
-	  break;
-
+	case HIST_TYPE_TOPN_VALUES:
 	case HIST_TYPE_INDIR_CALL:
-	  hist->n_counters = GCOV_SINGLE_VALUE_COUNTERS;
+	  hist->n_counters = GCOV_TOPN_VALUES_COUNTERS;
 	  break;
 
         case HIST_TYPE_TIME_PROFILE:
diff --git a/gcc/value-prof.h b/gcc/value-prof.h
index 25b03f7591a..9f69d7df6d1 100644
--- a/gcc/value-prof.h
+++ b/gcc/value-prof.h
@@ -26,8 +26,7 @@ enum hist_type
   HIST_TYPE_INTERVAL,	/* Measures histogram of values inside a specified
 			   interval.  */
   HIST_TYPE_POW2,	/* Histogram of power of 2 values.  */
-  HIST_TYPE_SINGLE_VALUE, /* Tries to identify the value that is (almost)
-			   always constant.  */
+  HIST_TYPE_TOPN_VALUES, /* Tries to identify the N most common values.  */
   HIST_TYPE_INDIR_CALL,   /* Tries to identify the function that is (almost)
 			    called in indirect call */
   HIST_TYPE_AVERAGE,	/* Compute average value (sum of all values).  */
@@ -101,7 +100,8 @@ extern void gimple_init_gcov_profiler (void);
 extern void gimple_gen_edge_profiler (int, edge);
 extern void gimple_gen_interval_profiler (histogram_value, unsigned, unsigned);
 extern void gimple_gen_pow2_profiler (histogram_value, unsigned, unsigned);
-extern void gimple_gen_one_value_profiler (histogram_value, unsigned, unsigned);
+extern void gimple_gen_topn_values_profiler (histogram_value, unsigned,
+					     unsigned);
 extern void gimple_gen_ic_profiler (histogram_value, unsigned, unsigned);
 extern void gimple_gen_ic_func_profiler (void);
 extern void gimple_gen_time_profiler (unsigned, unsigned);
diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index 33b83809cfc..f9e6e6c8812 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -887,14 +887,14 @@ include $(iterator)
 
 # Build libgcov components.
 
-LIBGCOV_MERGE = _gcov_merge_add _gcov_merge_single			\
+LIBGCOV_MERGE = _gcov_merge_add _gcov_merge_topn			\
 	_gcov_merge_ior _gcov_merge_time_profile
 LIBGCOV_PROFILER = _gcov_interval_profiler				\
 	_gcov_interval_profiler_atomic					\
 	_gcov_pow2_profiler						\
 	_gcov_pow2_profiler_atomic					\
-	_gcov_one_value_profiler_v2					\
-	_gcov_one_value_profiler_v2_atomic					\
+	_gcov_topn_values_profiler					\
+	_gcov_topn_values_profiler_atomic					\
 	_gcov_average_profiler						\
 	_gcov_average_profiler_atomic					\
 	_gcov_ior_profiler						\
diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
index 84367005663..15f27aedb55 100644
--- a/libgcc/libgcov-merge.c
+++ b/libgcc/libgcov-merge.c
@@ -33,9 +33,9 @@ void __gcov_merge_add (gcov_type *counters  __attribute__ ((unused)),
                        unsigned n_counters __attribute__ ((unused))) {}
 #endif
 
-#ifdef L_gcov_merge_single
-void __gcov_merge_single (gcov_type *counters  __attribute__ ((unused)),
-			  unsigned n_counters __attribute__ ((unused))) {}
+#ifdef L_gcov_merge_topn
+void __gcov_merge_topn (gcov_type *counters  __attribute__ ((unused)),
+			unsigned n_counters __attribute__ ((unused))) {}
 #endif
 
 #else
@@ -84,10 +84,10 @@ __gcov_merge_time_profile (gcov_type *counters, unsigned n_counters)
 }
 #endif /* L_gcov_merge_time_profile */
 
-#ifdef L_gcov_merge_single
+#ifdef L_gcov_merge_topn
 
 static void
-merge_single_value_set (gcov_type *counters)
+merge_topn_values_set (gcov_type *counters)
 {
   /* First value is number of total executions of the profiler.  */
   gcov_type all = gcov_get_counter_ignore_scaling (-1);
@@ -95,9 +95,9 @@ merge_single_value_set (gcov_type *counters)
   ++counters;
 
   /* Read all part values.  */
-  gcov_type read_counters[2 * GCOV_DISK_SINGLE_VALUES];
+  gcov_type read_counters[2 * GCOV_TOPN_VALUES];
 
-  for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
+  for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
     {
       read_counters[2 * i] = gcov_get_counter_target ();
       read_counters[2 * i + 1] = gcov_get_counter_ignore_scaling (-1);
@@ -109,13 +109,13 @@ merge_single_value_set (gcov_type *counters)
       return;
     }
 
-  for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
+  for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
     {
       if (read_counters[2 * i + 1] == 0)
 	return;
 
       unsigned j;
-      for (j = 0; j < GCOV_DISK_SINGLE_VALUES; j++)
+      for (j = 0; j < GCOV_TOPN_VALUES; j++)
 	{
 	  if (counters[2 * j] == read_counters[2 * i])
 	    {
@@ -131,7 +131,7 @@ merge_single_value_set (gcov_type *counters)
 	}
 
       /* We haven't found a slot, bail out.  */
-      if (j == GCOV_DISK_SINGLE_VALUES)
+      if (j == GCOV_TOPN_VALUES)
 	{
 	  counters[1] = -1;
 	  return;
@@ -149,13 +149,13 @@ merge_single_value_set (gcov_type *counters)
    -- counter
    */
 void
-__gcov_merge_single (gcov_type *counters, unsigned n_counters)
+__gcov_merge_topn (gcov_type *counters, unsigned n_counters)
 {
-  gcc_assert (!(n_counters % GCOV_SINGLE_VALUE_COUNTERS));
+  gcc_assert (!(n_counters % GCOV_TOPN_VALUES_COUNTERS));
 
-  for (unsigned i = 0; i < (n_counters / GCOV_SINGLE_VALUE_COUNTERS); i++)
-    merge_single_value_set (counters + (i * GCOV_SINGLE_VALUE_COUNTERS));
+  for (unsigned i = 0; i < (n_counters / GCOV_TOPN_VALUES_COUNTERS); i++)
+    merge_topn_values_set (counters + (i * GCOV_TOPN_VALUES_COUNTERS));
 }
-#endif /* L_gcov_merge_single */
+#endif /* L_gcov_merge_topn */
 
 #endif /* inhibit_libc */
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 04d6f9c0e40..8f877a95980 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -106,17 +106,11 @@ __gcov_pow2_profiler_atomic (gcov_type *counters, gcov_type value)
 #endif
 
 
-/* Tries to determine the most common value among its inputs.  Checks if the
-   value stored in COUNTERS[0] matches VALUE.  If this is the case, COUNTERS[1]
-   is incremented.  If this is not the case and COUNTERS[1] is not zero,
-   COUNTERS[1] is decremented.  Otherwise COUNTERS[1] is set to one and
-   VALUE is stored to COUNTERS[0].  This algorithm guarantees that if this
-   function is called more than 50% of the time with one value, this value
-   will be in COUNTERS[0] in the end.  */
+/* Tries to determine N most commons value among its inputs.  */
 
 static inline void
-__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
-				int use_atomic)
+__gcov_topn_values_profiler_body (gcov_type *counters, gcov_type value,
+				  int use_atomic)
 {
   if (use_atomic)
     __atomic_fetch_add (&counters[0], 1, __ATOMIC_RELAXED);
@@ -125,11 +119,11 @@ __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
 
   ++counters;
 
-  /* We have GCOV_DISK_SINGLE_VALUES as we can keep multiple values
+  /* We have GCOV_TOPN_VALUES as we can keep multiple values
      next to each other.  */
   unsigned sindex = 0;
 
-  for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
+  for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
     {
       if (value == counters[2 * i])
 	{
@@ -158,15 +152,15 @@ __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
     counters[2 * sindex + 1]--;
 }
 
-#ifdef L_gcov_one_value_profiler_v2
+#ifdef L_gcov_topn_values_profiler
 void
-__gcov_one_value_profiler_v2 (gcov_type *counters, gcov_type value)
+__gcov_topn_values_profiler (gcov_type *counters, gcov_type value)
 {
-  __gcov_one_value_profiler_body (counters, value, 0);
+  __gcov_topn_values_profiler_body (counters, value, 0);
 }
 #endif
 
-#if defined(L_gcov_one_value_profiler_v2_atomic) && GCOV_SUPPORTS_ATOMIC
+#if defined(L_gcov_topn_values_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 
 /* Update one value profilers (COUNTERS) for a given VALUE.
 
@@ -178,9 +172,9 @@ __gcov_one_value_profiler_v2 (gcov_type *counters, gcov_type value)
    https://gcc.gnu.org/ml/gcc-patches/2016-08/msg00024.html.  */
 
 void
-__gcov_one_value_profiler_v2_atomic (gcov_type *counters, gcov_type value)
+__gcov_topn_values_profiler_atomic (gcov_type *counters, gcov_type value)
 {
-  __gcov_one_value_profiler_body (counters, value, 1);
+  __gcov_topn_values_profiler_body (counters, value, 1);
 }
 #endif
 
@@ -214,7 +208,7 @@ __gcov_indirect_call_profiler_v4 (gcov_type value, void* cur_func)
   if (cur_func == __gcov_indirect_call.callee
       || (__LIBGCC_VTABLE_USES_DESCRIPTORS__
 	  && *(void **) cur_func == *(void **) __gcov_indirect_call.callee))
-    __gcov_one_value_profiler_body (__gcov_indirect_call.counters, value, 0);
+    __gcov_topn_values_profiler_body (__gcov_indirect_call.counters, value, 0);
 
   __gcov_indirect_call.callee = NULL;
 }
diff --git a/libgcc/libgcov-util.c b/libgcc/libgcov-util.c
index c794132c172..7faa59a6ea4 100644
--- a/libgcc/libgcov-util.c
+++ b/libgcc/libgcov-util.c
@@ -723,11 +723,11 @@ __gcov_time_profile_counter_op (gcov_type *counters ATTRIBUTE_UNUSED,
   /* Do nothing.  */
 }
 
-/* Performing FN upon single counters.  */
+/* Performing FN upon TOP N counters.  */
 
 static void
-__gcov_single_counter_op (gcov_type *counters, unsigned n_counters,
-                          counter_op_fn fn, void *data1, void *data2)
+__gcov_topn_counter_op (gcov_type *counters, unsigned n_counters,
+			counter_op_fn fn, void *data1, void *data2)
 {
   unsigned i, n_measures;
 
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index 7f316146d49..30a8a116fec 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -126,7 +126,7 @@ typedef unsigned gcov_position_t;
 
 #define L_gcov 1
 #define L_gcov_merge_add 1
-#define L_gcov_merge_single 1
+#define L_gcov_merge_topn 1
 #define L_gcov_merge_ior 1
 #define L_gcov_merge_time_profile 1
 
@@ -259,8 +259,8 @@ extern void __gcov_merge_add (gcov_type *, unsigned) ATTRIBUTE_HIDDEN;
 /* The merge function to select the minimum valid counter value.  */
 extern void __gcov_merge_time_profile (gcov_type *, unsigned) ATTRIBUTE_HIDDEN;
 
-/* The merge function to choose the most common value.  */
-extern void __gcov_merge_single (gcov_type *, unsigned) ATTRIBUTE_HIDDEN;
+/* The merge function to choose the most common N values.  */
+extern void __gcov_merge_topn (gcov_type *, unsigned) ATTRIBUTE_HIDDEN;
 
 /* The merge function that just ors the counters together.  */
 extern void __gcov_merge_ior (gcov_type *, unsigned) ATTRIBUTE_HIDDEN;
@@ -271,8 +271,8 @@ extern void __gcov_interval_profiler_atomic (gcov_type *, gcov_type, int,
 					     unsigned);
 extern void __gcov_pow2_profiler (gcov_type *, gcov_type);
 extern void __gcov_pow2_profiler_atomic (gcov_type *, gcov_type);
-extern void __gcov_one_value_profiler_v2 (gcov_type *, gcov_type);
-extern void __gcov_one_value_profiler_v2_atomic (gcov_type *, gcov_type);
+extern void __gcov_topn_values_profiler (gcov_type *, gcov_type);
+extern void __gcov_topn_values_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_indirect_call_profiler_v4 (gcov_type, void *);
 extern void __gcov_time_profiler (gcov_type *);
 extern void __gcov_time_profiler_atomic (gcov_type *);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-20 13:47 ` Jan Hubicka
  2019-06-20 14:45   ` Martin Liška
  2019-06-20 14:46   ` [PATCH 2/2] Rename SINGE_VALUE to TOPN_VALUES counters Martin Liška
@ 2019-06-24  2:34   ` luoxhu
  2019-06-24  9:20     ` luoxhu
  2 siblings, 1 reply; 25+ messages in thread
From: luoxhu @ 2019-06-24  2:34 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches, mliska, segher, wschmidt, luoxhu

Hi Honza,
Thanks very much to get so many useful comments from you.
As a newbie to GCC, not sure whether my questions are described clearly
enough.  Thanks for your patience in advance.  :)


On 2019/6/20 21:47, Jan Hubicka wrote:
> Hi,
> some comments on the ipa part of the patch
> (and thanks for working on it - this was on my TODO list for years)
> 
>> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
>> index de82316d4b1..0d373a67d1b 100644
>> --- a/gcc/cgraph.c
>> +++ b/gcc/cgraph.c
>> @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
>>   	fprintf (dump_file, "Introduced new external node "
>>   		 "(%s) and turned into root of the clone tree.\n",
>>   		 node->dump_name ());
>> +      node->profile_id = first_clone->profile_id;
>>       }
>>     else if (dump_file)
>>       fprintf (dump_file, "Introduced new external node "
> 
> This is independent of the rest of changes.  Do you have example where
> this matters? The inline clones are created in ipa-inline while
> ipa-profile is run before it, so I can not think of such a scenario.
> I see you also copy profile_id from function to clone.  I would like to
> know why you needed that.
> 
> Also you mention that you hit some ICEs. If fixes are independent of
> rest of your changes, send them separately.

I copy the profile_id for cloned node as when in LTO ltrans, there is no
references or referrings info for the specialized node/cloned node, so it 
is difficult to track the node's reference in 
cgraph_edge::speculative_call_info.  I use it mainly for debug purpose now.
Will remove it and split the patches in later version to include ICE fixes.

> 
>> @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
>>     int i;
>>     cgraph_edge *e2;
>>     cgraph_edge *e = this;
>> +  cgraph_node *referred_node;
>>   
>>     if (!e->indirect_unknown_callee)
>>       for (e2 = e->caller->indirect_calls;
>> @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
>>   	&& ((ref->stmt && ref->stmt == e->call_stmt)
>>   	    || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
>>         {
>> -	reference = ref;
>> -	break;
>> +	if (e2->indirect_info && e2->indirect_info->num_of_ics)
>> +	  {
>> +	    referred_node = dyn_cast<cgraph_node *> (ref->referred);
>> +	    if (strstr (e->callee->name (), referred_node->name ()))
>> +	      {
>> +		reference = ref;
>> +		break;
>> +	      }
>> +	  }
>> +	else
>> +	  {
>> +	    reference = ref;
>> +	    break;
>> +	  }
>>         }
> 
> This function is intended to return everything related to the
> speculative call, so if you add multiple direct targets, i would expect
> it to tage auto_vec of cgraph_nodes for direct and auto_vec of
> references.

So will the signature becomes
cgraph_edge::speculative_call_info (auto_vec<cgraph_edge *> *direct,
                                     cgraph_edge *&indirect,
                                     auto_vec<ipa_ref *> *reference)

Seems a lot of code related to it, maybe should split to another patch.
And will the sequence of direct and reference in each auto_vec be strictly
mapped for iteration convenience?
Second question is "this" is a direct edge will be pushed to auto_vec 
"direct", how can it get its next direct edge here?  From e->caller->callees?


>>   
>>     /* Speculative edge always consist of all three components - direct edge,
>> @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
>>            in the functions inlined through it.  */
>>       }
>>     edge->count += e2->count;
>> -  edge->speculative = false;
>> +  if (edge->indirect_info && edge->indirect_info->num_of_ics)
>> +    {
>> +      edge->indirect_info->num_of_ics--;
>> +      if (edge->indirect_info->num_of_ics == 0)
>> +	edge->speculative = false;
>> +    }
>> +  else
>> +    edge->speculative = false;
>>     e2->speculative = false;
>>     ref->remove_reference ();
>>     if (e2->indirect_unknown_callee || e2->inline_failed)
> 
> This function should turn speculative call into direct call to DECL, so
> I think it should remove all the other direct calls associated with stmt
> and the indirect one.
> 
> There are now two cases - in first case you want to turn speculative
> call into direct call or give up on especulation completely, while in
> other case you want to only remove one of speculations.
> 
> I guess we want to have resolve_speculation(decl) for first and
> remove_one_speculation(edge) for the second case?
> The second case would be useful for the code below handling type
> mismatches and also for inline when one of speculative targets seems not
> useful to bother with.

So the logic will be:

if (edge->indirect_info->num_of_ics > 1)
	cgraph_edge::resolve_speculation (tree callee_decl);
else
	remove_one_speculation(edge);

cgraph_edge::resolve_speculation will call edge->speculative_call_info (e2, 
edge, ref) internally, at this time, e2 and ref will only contains one 
direct target?


>> @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>>   	  e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
>>   						     false);
>>   	  e->count = gimple_bb (e->call_stmt)->count;
>> -	  e2->speculative = false;
>> +	  if (e2->indirect_info && e2->indirect_info->num_of_ics)
>> +	    {
>> +	      e2->indirect_info->num_of_ics--;
>> +	      if (e2->indirect_info->num_of_ics == 0)
>> +		e2->speculative = false;
>> +	    }
>> +	  else
>> +	    e2->speculative = false;
>>   	  e2->count = gimple_bb (e2->call_stmt)->count;
>>   	  ref->speculative = false;
>>   	  ref->stmt = NULL;
> 
>>   extern void debuginfo_early_init (void);
>>   extern void debuginfo_init (void);
>> @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info
>>     int param_index;
>>     /* ECF flags determined from the caller.  */
>>     int ecf_flags;
>> -  /* Profile_id of common target obtrained from profile.  */
>> +  /* Profile_id of common target obtained from profile.  */
>>     int common_target_id;
>>     /* Probability that call will land in function with COMMON_TARGET_ID.  */
>>     int common_target_probability;
>>   
>> +  /* Profile_id of common target obtained from profile.  */
>> +  int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2];
>> +  /* Probabilities that call will land in function with COMMON_TARGET_IDS.  */
>> +  int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2];
> 
> I would use vec of pairs (profile_id,probability) to hold this and do
> not wire in GCOV_ICALL_TOPN_NCOUTS.  Most of time this vec will be just
> NULL pointer so it will result in less memory overhead and will avoid
> hard limit on number of speculations we want to do.
> 
> Note that the speculative edges may end up being redirected during IPA
> optimization, for example when their target is cloned for particular
> call context or when the function is detected identical to other
> function.  So one can not preserve the mapping between targets and
> profile ids.
> 
> Also this infrastructure is useful even w/o profile because we could use
> ipa-devirt to devirtualize even when multiple polymorphic targets are
> found. So I would not wirte in the limit GCOV_ICALL_TOPN_NCOUNTS and
> just use dynamically allocated vectors instead.
> 
> With ipa-devirt it is possible that we know that there are precisely 2
> possible polymorphic targets.  Other case i was considering was to
> speculatively inline with -fPIC.  I.e. when one has interposiable call
> for foo() one can create foo.localalias() for the definition visible to
> compiler and then speculate
> 
> if (foo == foo.localalias())
>    inline_path
> else
>    foo();
> 
> In these cases we may end up with indirect
> call that has no associated indirect edge (which we do not support
> currently), so it may be interesting to move speculative call info away
> from indirect call info to cgraph edge structure (but that can be done
> incrementally based on what you do now - this code is not completely
> easy so lets do it step by step).

+
+struct GTY (()) indirect_target_info
+{
+  /* Profile_id of common target obtained from profile.  */
+  int common_target_id;
+  /* Probability that call will land in function with COMMON_TARGET_ID.  */
+  int common_target_probability;
+};
+

Tested with "vec<indirect_target_info, va_gc> *indirect_call_targets;"
works quite good, really much better.

Is it correct that profile_id/common_target_id will only be used to 
generate speculative edges in ipa-profile, all new specialized node/cloned
node will take(not copy) the speculative property from original node, so
was the reason that no need to clone profile id?


> 
>> @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void)
>>   		      gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (node->decl),
>>   						      stmt, h);
>>   		    }
>> +		  else if (h && type == HIST_TYPE_INDIR_CALL_TOPN)
>> +		    {
>> +		      unsigned j;
>> +		      struct cgraph_edge *e = node->get_edge (stmt);
>> +		      if (e && !e->indirect_unknown_callee)
>> +			continue;
> 
> I suppose you are going to change this for Martin's implementation, so i
> have skipped it for now.
>> @@ -558,7 +601,8 @@ ipa_profile (void)
>>   	{
>>   	  if (n->count.initialized_p ())
>>   	    nindirect++;
>> -	  if (e->indirect_info->common_target_id)
>> +	  if (e->indirect_info->common_target_id
>> +	      || (e->indirect_info && e->indirect_info->num_of_ics == 1))
>>   	    {
>>   	      if (!node_map_initialized)
>>   	        init_node_map (false);
>> @@ -613,7 +657,7 @@ ipa_profile (void)
>>   		      if (dump_file)
>>   			fprintf (dump_file,
>>   				 "Not speculating: "
>> -				 "parameter count mistmatch\n");
>> +				 "parameter count mismatch\n");
>>   		    }
>>   		  else if (e->indirect_info->polymorphic
>>   			   && !opt_for_fn (n->decl, flag_devirtualize)
>> @@ -655,7 +699,130 @@ ipa_profile (void)
>>   		  nunknown++;
>>   		}
>>   	    }
>> -	 }
>> +	  if (e->indirect_info && e->indirect_info->num_of_ics > 1)
>> +	    {
>> +	      if (in_lto_p)
>> +		{
>> +		  if (dump_file)
>> +		    {
>> +		      fprintf (dump_file,
>> +			       "Updating hotness threshold in LTO mode.\n");
>> +		      fprintf (dump_file, "Updated min count: %" PRId64 "\n",
>> +			       (int64_t) threshold);
>> +		    }
>> +		  set_hot_bb_threshold (threshold
>> +					/ e->indirect_info->num_of_ics);
>> +		}
> 
> Why do you need different paths for one target and for multiple targets?
> Also why LTO is different here?
Will remove the logic and the above change based on Martin'
implementation.

>> +	      if (!node_map_initialized)
>> +		init_node_map (false);
>> +	      node_map_initialized = true;
>> +	      ncommon++;
>> +	      unsigned speculative = 0;
>> +	      for (i = 0; i < (int)e->indirect_info->num_of_ics; i++)
>> +		{
>> +		  n2 = find_func_by_profile_id (
>> +		    e->indirect_info->common_target_ids[i]);
>> +		  if (n2)
>> +		    {
>> +		      if (dump_file)
>> +			{
>> +			  fprintf (
>> +			    dump_file,
>> +			    "Indirect call -> direct call from"
>> +			    " other module %s => %s, prob %3.2f\n",
>> +			    n->dump_name (), n2->dump_name (),
>> +			    e->indirect_info->common_target_probabilities[i]
>> +			      / (float) REG_BR_PROB_BASE);
>> +			}
>> +		      if (e->indirect_info->common_target_probabilities[i]
>> +			  < REG_BR_PROB_BASE / 2)
>> +			{
>> +			  nuseless++;
>> +			  if (dump_file)
>> +			    fprintf (
>> +			      dump_file,
>> +			      "Not speculating: probability is too low.\n");
>> +			}
>> +		      else if (!e->maybe_hot_p ())
>> +			{
>> +			  nuseless++;
>> +			  if (dump_file)
>> +			    fprintf (dump_file,
>> +				     "Not speculating: call is cold.\n");
>> +			}
>> +		      else if (n2->get_availability () <= AVAIL_INTERPOSABLE
>> +			       && n2->can_be_discarded_p ())
>> +			{
>> +			  nuseless++;
>> +			  if (dump_file)
>> +			    fprintf (dump_file,
>> +				     "Not speculating: target is overwritable "
>> +				     "and can be discarded.\n");
>> +			}
>> +		      else if (ipa_node_params_sum && ipa_edge_args_sum
>> +			       && (!vec_safe_is_empty (
>> +				 IPA_NODE_REF (n2)->descriptors))
>> +			       && ipa_get_param_count (IPA_NODE_REF (n2))
>> +				    != ipa_get_cs_argument_count (
>> +				      IPA_EDGE_REF (e))
>> +			       && (ipa_get_param_count (IPA_NODE_REF (n2))
>> +				     >= ipa_get_cs_argument_count (
>> +				       IPA_EDGE_REF (e))
>> +				   || !stdarg_p (TREE_TYPE (n2->decl))))
>> +			{
>> +			  nmismatch++;
>> +			  if (dump_file)
>> +			    fprintf (dump_file, "Not speculating: "
>> +						"parameter count mismatch\n");
>> +			}
>> +		      else if (e->indirect_info->polymorphic
>> +			       && !opt_for_fn (n->decl, flag_devirtualize)
>> +			       && !possible_polymorphic_call_target_p (e, n2))
>> +			{
>> +			  nimpossible++;
>> +			  if (dump_file)
>> +			    fprintf (dump_file,
>> +				     "Not speculating: "
>> +				     "function is not in the polymorphic "
>> +				     "call target list\n");
>> +			}
>> +		      else
>> +			{
>> +			  /* Target may be overwritable, but profile says that
>> +			     control flow goes to this particular implementation
>> +			     of N2.  Speculate on the local alias to allow
>> +			     inlining.
>> +			     */
>> +			  if (!n2->can_be_discarded_p ())
>> +			    {
>> +			      cgraph_node *alias;
>> +			      alias = dyn_cast<cgraph_node *> (
>> +				n2->noninterposable_alias ());
>> +			      if (alias)
>> +				n2 = alias;
>> +			    }
>> +			  nconverted++;
>> +			  e->make_speculative (
>> +			    n2, e->count.apply_probability (
>> +				  e->indirect_info
>> +				    ->common_target_probabilities[i]));
>> +			  update = true;
>> +			  speculative++;
>> +			}
>> +		    }
>> +		  else
>> +		    {
>> +		      if (dump_file)
>> +			fprintf (dump_file,
>> +				 "Function with profile-id %i not found.\n",
>> +				 e->indirect_info->common_target_ids[i]);
>> +		      nunknown++;
>> +		    }
>> +		}
>> +	      if (speculative < e->indirect_info->num_of_ics)
>> +		e->indirect_info->num_of_ics = speculative;
>> +	    }
>> +	}
>>          if (update)
>>   	 ipa_update_overall_fn_summary (n);
>>        }
>> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
>> index 79b250c3943..30347691029 100644
>> --- a/gcc/ipa-utils.c
>> +++ b/gcc/ipa-utils.c
>> @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst,
>>         update_max_bb_count ();
>>         compute_function_frequency ();
>>         pop_cfun ();
>> +      /* When src is speculative, clone the referrings.  */
>> +      if (src->indirect_call_target)
>> +	for (e = src->callers; e; e = e->next_caller)
>> +	  if (e->callee == src && e->speculative)
>> +	    dst->clone_referring (src);
> 
> This looks wrong. Why do you need to copy all references from src
> to target?
Clonning the referrings from callee to the merged callee, not references.
the information from "caller to src" is not passed to "caller to dst" when
merge profiles, it is a workaround when fixing a SPEC ICE.  Will root
cause it later.


>> +			    /* Speculative calls consist of two edges - direct
>> +			       and indirect.  Duplicate the whole thing and
>> +			       distribute frequencies accordingly.  */
>> +			    if (edge->speculative)
> I think here you want to handle the whole group with multiple targets.
OK.  Also an ICE fix in SPEC.


Thanks
Xionghu

>> +			      {
>> +				struct ipa_ref *ref;
>> +
>> +				gcc_assert (!edge->indirect_unknown_callee);
>> +				old_edge->speculative_call_info (direct,
>> +								 indirect, ref);
>> +
>> +				profile_count indir_cnt = indirect->count;
>> +				indirect
>> +				  = indirect->clone (id->dst_node, call_stmt,
>> +						     gimple_uid (stmt), num,
>> +						     den, true);
>> +
>> +				profile_probability prob
>> +				  = indir_cnt.probability_in (old_cnt
>> +							      + indir_cnt);
>> +				indirect->count
>> +				  = copy_basic_block->count.apply_probability (
>> +				    prob);
>> +				edge->count
>> +				  = copy_basic_block->count - indirect->count;
>> +				id->dst_node->clone_reference (ref, stmt);
>> +			      }
>> +			    else
>> +			      edge->count = copy_basic_block->count;
>> +			  }
>> +			/* If the indirect call contains more than one indirect
>> +			   targets, need clone all speculative edges here.  */
>> +			if (old_edge && old_edge->next_callee
>> +			    && old_edge->speculative && indirect
>> +			    && indirect->indirect_info
>> +			    && indirect->indirect_info->num_of_ics > 1)
>> +			  {
>> +			    edge = old_edge->next_callee;
>> +			    old_edge = old_edge->next_callee;
>> +			    if (edge->speculative)
>> +			      next_speculative = true;
>> +			  }
>> +		      }
>> +		    while (next_speculative);
>> +		  }
>>   		  break;
>>   
>>   		case CB_CGE_MOVE_CLONES:

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-24  2:34   ` [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization luoxhu
@ 2019-06-24  9:20     ` luoxhu
  0 siblings, 0 replies; 25+ messages in thread
From: luoxhu @ 2019-06-24  9:20 UTC (permalink / raw)
  To: gcc-patches



On 2019/6/24 10:34, luoxhu wrote:
> Hi Honza,
> Thanks very much to get so many useful comments from you.
> As a newbie to GCC, not sure whether my questions are described clearly
> enough.  Thanks for your patience in advance.  :)
> 
> 
> On 2019/6/20 21:47, Jan Hubicka wrote:
>> Hi,
>> some comments on the ipa part of the patch
>> (and thanks for working on it - this was on my TODO list for years)
>>
>>> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
>>> index de82316d4b1..0d373a67d1b 100644
>>> --- a/gcc/cgraph.c
>>> +++ b/gcc/cgraph.c
>>> @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
>>>       fprintf (dump_file, "Introduced new external node "
>>>            "(%s) and turned into root of the clone tree.\n",
>>>            node->dump_name ());
>>> +      node->profile_id = first_clone->profile_id;
>>>       }
>>>     else if (dump_file)
>>>       fprintf (dump_file, "Introduced new external node "
>>
>> This is independent of the rest of changes.  Do you have example where
>> this matters? The inline clones are created in ipa-inline while
>> ipa-profile is run before it, so I can not think of such a scenario.
>> I see you also copy profile_id from function to clone.  I would like to
>> know why you needed that.
>>
>> Also you mention that you hit some ICEs. If fixes are independent of
>> rest of your changes, send them separately.
> 
> I copy the profile_id for cloned node as when in LTO ltrans, there is no
> references or referrings info for the specialized node/cloned node, so it 
> is difficult to track the node's reference in 
> cgraph_edge::speculative_call_info.  I use it mainly for debug purpose now.
> Will remove it and split the patches in later version to include ICE fixes.
> 
>>
>>> @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge 
>>> *&direct,
>>>     int i;
>>>     cgraph_edge *e2;
>>>     cgraph_edge *e = this;
>>> +  cgraph_node *referred_node;
>>>     if (!e->indirect_unknown_callee)
>>>       for (e2 = e->caller->indirect_calls;
>>> @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge 
>>> *&direct,
>>>       && ((ref->stmt && ref->stmt == e->call_stmt)
>>>           || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
>>>         {
>>> -    reference = ref;
>>> -    break;
>>> +    if (e2->indirect_info && e2->indirect_info->num_of_ics)
>>> +      {
>>> +        referred_node = dyn_cast<cgraph_node *> (ref->referred);
>>> +        if (strstr (e->callee->name (), referred_node->name ()))
>>> +          {
>>> +        reference = ref;
>>> +        break;
>>> +          }
>>> +      }
>>> +    else
>>> +      {
>>> +        reference = ref;
>>> +        break;
>>> +      }
>>>         }
>>
>> This function is intended to return everything related to the
>> speculative call, so if you add multiple direct targets, i would expect
>> it to tage auto_vec of cgraph_nodes for direct and auto_vec of
>> references.
> 
> So will the signature becomes
> cgraph_edge::speculative_call_info (auto_vec<cgraph_edge *> *direct,
>                                      cgraph_edge *&indirect,
>                                      auto_vec<ipa_ref *> *reference)
> 
> Seems a lot of code related to it, maybe should split to another patch.
> And will the sequence of direct and reference in each auto_vec be strictly
> mapped for iteration convenience?
> Second question is "this" is a direct edge will be pushed to auto_vec 
> "direct", how can it get its next direct edge here?  From e->caller->callees?

There maybe some misunderstanding here.  The direct should be one edge
only, but reference could be multiple.

For example: two indirect edge on one single statement x = p(3);
the first speculative edge is main -> one;
the second speculative edge 2 is main -> two.
direct->call_stmt is: x_10 = p_3 (3);

call code in ipa-inline-transform.c:
for (e = node->callees; e; e = next)
   {
      next = e->next_callee;
      e->redirect_call_stmt_to_callee ();
   }

redirect_call_stmt_to_callee will call
e->speculative_call_info(e, e2, ref).

When e is “main -> one" being redirected, The returned auto_vec reference
length will be 2.
So the map should be 1:N instead of N:N.  (one direct edge will find N 
reference nodes, but only one of it is correct, need iterate to find it
out.)
e2 is the indirect call(e->caller->indirect_calls) can only be set to false
speculative if all indirect targets are redirected by "next=e->next_callee"
Or else, the next speculative edge couldn't finish the redirect as the e2
is not speculative again in next round iteration.
As a result, maybe still need similar logic to check the returned reference
length, only set "e2->speculative = false;" when the length is 1.  which
means all direct targets are redirected.

> 
> 
>>>     /* Speculative edge always consist of all three components - direct 
>>> edge,
>>> @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
>>>            in the functions inlined through it.  */
>>>       }
>>>     edge->count += e2->count;
>>> -  edge->speculative = false;
>>> +  if (edge->indirect_info && edge->indirect_info->num_of_ics)
>>> +    {
>>> +      edge->indirect_info->num_of_ics--;
>>> +      if (edge->indirect_info->num_of_ics == 0)
>>> +    edge->speculative = false;
>>> +    }
>>> +  else
>>> +    edge->speculative = false;
>>>     e2->speculative = false;
>>>     ref->remove_reference ();
>>>     if (e2->indirect_unknown_callee || e2->inline_failed)
>>
>> This function should turn speculative call into direct call to DECL, so
>> I think it should remove all the other direct calls associated with stmt
>> and the indirect one.
>>
>> There are now two cases - in first case you want to turn speculative
>> call into direct call or give up on especulation completely, while in
>> other case you want to only remove one of speculations.
>>
>> I guess we want to have resolve_speculation(decl) for first and
>> remove_one_speculation(edge) for the second case?
>> The second case would be useful for the code below handling type
>> mismatches and also for inline when one of speculative targets seems not
>> useful to bother with.
> 
> So the logic will be:
> 
> if (edge->indirect_info->num_of_ics > 1)
>      cgraph_edge::resolve_speculation (tree callee_decl);
> else
>      remove_one_speculation(edge);
> 
> cgraph_edge::resolve_speculation will call edge->speculative_call_info (e2, 
> edge, ref) internally, at this time, e2 and ref will only contains one 
> direct target?
> 
> 
>>> @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>>>         e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
>>>                                false);
>>>         e->count = gimple_bb (e->call_stmt)->count;
>>> -      e2->speculative = false;
>>> +      if (e2->indirect_info && e2->indirect_info->num_of_ics)
>>> +        {
>>> +          e2->indirect_info->num_of_ics--;
>>> +          if (e2->indirect_info->num_of_ics == 0)
>>> +        e2->speculative = false;
>>> +        }
>>> +      else
>>> +        e2->speculative = false;
>>>         e2->count = gimple_bb (e2->call_stmt)->count;
>>>         ref->speculative = false;
>>>         ref->stmt = NULL;
>>
>>>   extern void debuginfo_early_init (void);
>>>   extern void debuginfo_init (void);
>>> @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info
>>>     int param_index;
>>>     /* ECF flags determined from the caller.  */
>>>     int ecf_flags;
>>> -  /* Profile_id of common target obtrained from profile.  */
>>> +  /* Profile_id of common target obtained from profile.  */
>>>     int common_target_id;
>>>     /* Probability that call will land in function with 
>>> COMMON_TARGET_ID.  */
>>>     int common_target_probability;
>>> +  /* Profile_id of common target obtained from profile.  */
>>> +  int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2];
>>> +  /* Probabilities that call will land in function with 
>>> COMMON_TARGET_IDS.  */
>>> +  int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2];
>>
>> I would use vec of pairs (profile_id,probability) to hold this and do
>> not wire in GCOV_ICALL_TOPN_NCOUTS.  Most of time this vec will be just
>> NULL pointer so it will result in less memory overhead and will avoid
>> hard limit on number of speculations we want to do.
>>
>> Note that the speculative edges may end up being redirected during IPA
>> optimization, for example when their target is cloned for particular
>> call context or when the function is detected identical to other
>> function.  So one can not preserve the mapping between targets and
>> profile ids.
>>
>> Also this infrastructure is useful even w/o profile because we could use
>> ipa-devirt to devirtualize even when multiple polymorphic targets are
>> found. So I would not wirte in the limit GCOV_ICALL_TOPN_NCOUNTS and
>> just use dynamically allocated vectors instead.
>>
>> With ipa-devirt it is possible that we know that there are precisely 2
>> possible polymorphic targets.  Other case i was considering was to
>> speculatively inline with -fPIC.  I.e. when one has interposiable call
>> for foo() one can create foo.localalias() for the definition visible to
>> compiler and then speculate
>>
>> if (foo == foo.localalias())
>>    inline_path
>> else
>>    foo();
>>
>> In these cases we may end up with indirect
>> call that has no associated indirect edge (which we do not support
>> currently), so it may be interesting to move speculative call info away
>> from indirect call info to cgraph edge structure (but that can be done
>> incrementally based on what you do now - this code is not completely
>> easy so lets do it step by step).
> 
> +
> +struct GTY (()) indirect_target_info
> +{
> +  /* Profile_id of common target obtained from profile.  */
> +  int common_target_id;
> +  /* Probability that call will land in function with COMMON_TARGET_ID.  */
> +  int common_target_probability;
> +};
> +
> 
> Tested with "vec<indirect_target_info, va_gc> *indirect_call_targets;"
> works quite good, really much better.
> 
> Is it correct that profile_id/common_target_id will only be used to 
> generate speculative edges in ipa-profile, all new specialized node/cloned
> node will take(not copy) the speculative property from original node, so
> was the reason that no need to clone profile id?
> 
> 
>>
>>> @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void)
>>>                 gimple_remove_histogram_value (DECL_STRUCT_FUNCTION 
>>> (node->decl),
>>>                                 stmt, h);
>>>               }
>>> +          else if (h && type == HIST_TYPE_INDIR_CALL_TOPN)
>>> +            {
>>> +              unsigned j;
>>> +              struct cgraph_edge *e = node->get_edge (stmt);
>>> +              if (e && !e->indirect_unknown_callee)
>>> +            continue;
>>
>> I suppose you are going to change this for Martin's implementation, so i
>> have skipped it for now.
>>> @@ -558,7 +601,8 @@ ipa_profile (void)
>>>       {
>>>         if (n->count.initialized_p ())
>>>           nindirect++;
>>> -      if (e->indirect_info->common_target_id)
>>> +      if (e->indirect_info->common_target_id
>>> +          || (e->indirect_info && e->indirect_info->num_of_ics == 1))
>>>           {
>>>             if (!node_map_initialized)
>>>               init_node_map (false);
>>> @@ -613,7 +657,7 @@ ipa_profile (void)
>>>                 if (dump_file)
>>>               fprintf (dump_file,
>>>                    "Not speculating: "
>>> -                 "parameter count mistmatch\n");
>>> +                 "parameter count mismatch\n");
>>>               }
>>>             else if (e->indirect_info->polymorphic
>>>                  && !opt_for_fn (n->decl, flag_devirtualize)
>>> @@ -655,7 +699,130 @@ ipa_profile (void)
>>>             nunknown++;
>>>           }
>>>           }
>>> -     }
>>> +      if (e->indirect_info && e->indirect_info->num_of_ics > 1)
>>> +        {
>>> +          if (in_lto_p)
>>> +        {
>>> +          if (dump_file)
>>> +            {
>>> +              fprintf (dump_file,
>>> +                   "Updating hotness threshold in LTO mode.\n");
>>> +              fprintf (dump_file, "Updated min count: %" PRId64 "\n",
>>> +                   (int64_t) threshold);
>>> +            }
>>> +          set_hot_bb_threshold (threshold
>>> +                    / e->indirect_info->num_of_ics);
>>> +        }
>>
>> Why do you need different paths for one target and for multiple targets?
>> Also why LTO is different here?
> Will remove the logic and the above change based on Martin'
> implementation.
> 
>>> +          if (!node_map_initialized)
>>> +        init_node_map (false);
>>> +          node_map_initialized = true;
>>> +          ncommon++;
>>> +          unsigned speculative = 0;
>>> +          for (i = 0; i < (int)e->indirect_info->num_of_ics; i++)
>>> +        {
>>> +          n2 = find_func_by_profile_id (
>>> +            e->indirect_info->common_target_ids[i]);
>>> +          if (n2)
>>> +            {
>>> +              if (dump_file)
>>> +            {
>>> +              fprintf (
>>> +                dump_file,
>>> +                "Indirect call -> direct call from"
>>> +                " other module %s => %s, prob %3.2f\n",
>>> +                n->dump_name (), n2->dump_name (),
>>> +                e->indirect_info->common_target_probabilities[i]
>>> +                  / (float) REG_BR_PROB_BASE);
>>> +            }
>>> +              if (e->indirect_info->common_target_probabilities[i]
>>> +              < REG_BR_PROB_BASE / 2)
>>> +            {
>>> +              nuseless++;
>>> +              if (dump_file)
>>> +                fprintf (
>>> +                  dump_file,
>>> +                  "Not speculating: probability is too low.\n");
>>> +            }
>>> +              else if (!e->maybe_hot_p ())
>>> +            {
>>> +              nuseless++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Not speculating: call is cold.\n");
>>> +            }
>>> +              else if (n2->get_availability () <= AVAIL_INTERPOSABLE
>>> +                   && n2->can_be_discarded_p ())
>>> +            {
>>> +              nuseless++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Not speculating: target is overwritable "
>>> +                     "and can be discarded.\n");
>>> +            }
>>> +              else if (ipa_node_params_sum && ipa_edge_args_sum
>>> +                   && (!vec_safe_is_empty (
>>> +                 IPA_NODE_REF (n2)->descriptors))
>>> +                   && ipa_get_param_count (IPA_NODE_REF (n2))
>>> +                    != ipa_get_cs_argument_count (
>>> +                      IPA_EDGE_REF (e))
>>> +                   && (ipa_get_param_count (IPA_NODE_REF (n2))
>>> +                     >= ipa_get_cs_argument_count (
>>> +                       IPA_EDGE_REF (e))
>>> +                   || !stdarg_p (TREE_TYPE (n2->decl))))
>>> +            {
>>> +              nmismatch++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file, "Not speculating: "
>>> +                        "parameter count mismatch\n");
>>> +            }
>>> +              else if (e->indirect_info->polymorphic
>>> +                   && !opt_for_fn (n->decl, flag_devirtualize)
>>> +                   && !possible_polymorphic_call_target_p (e, n2))
>>> +            {
>>> +              nimpossible++;
>>> +              if (dump_file)
>>> +                fprintf (dump_file,
>>> +                     "Not speculating: "
>>> +                     "function is not in the polymorphic "
>>> +                     "call target list\n");
>>> +            }
>>> +              else
>>> +            {
>>> +              /* Target may be overwritable, but profile says that
>>> +                 control flow goes to this particular implementation
>>> +                 of N2.  Speculate on the local alias to allow
>>> +                 inlining.
>>> +                 */
>>> +              if (!n2->can_be_discarded_p ())
>>> +                {
>>> +                  cgraph_node *alias;
>>> +                  alias = dyn_cast<cgraph_node *> (
>>> +                n2->noninterposable_alias ());
>>> +                  if (alias)
>>> +                n2 = alias;
>>> +                }
>>> +              nconverted++;
>>> +              e->make_speculative (
>>> +                n2, e->count.apply_probability (
>>> +                  e->indirect_info
>>> +                    ->common_target_probabilities[i]));
>>> +              update = true;
>>> +              speculative++;
>>> +            }
>>> +            }
>>> +          else
>>> +            {
>>> +              if (dump_file)
>>> +            fprintf (dump_file,
>>> +                 "Function with profile-id %i not found.\n",
>>> +                 e->indirect_info->common_target_ids[i]);
>>> +              nunknown++;
>>> +            }
>>> +        }
>>> +          if (speculative < e->indirect_info->num_of_ics)
>>> +        e->indirect_info->num_of_ics = speculative;
>>> +        }
>>> +    }
>>>          if (update)
>>>        ipa_update_overall_fn_summary (n);
>>>        }
>>> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
>>> index 79b250c3943..30347691029 100644
>>> --- a/gcc/ipa-utils.c
>>> +++ b/gcc/ipa-utils.c
>>> @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst,
>>>         update_max_bb_count ();
>>>         compute_function_frequency ();
>>>         pop_cfun ();
>>> +      /* When src is speculative, clone the referrings.  */
>>> +      if (src->indirect_call_target)
>>> +    for (e = src->callers; e; e = e->next_caller)
>>> +      if (e->callee == src && e->speculative)
>>> +        dst->clone_referring (src);
>>
>> This looks wrong. Why do you need to copy all references from src
>> to target?
> Clonning the referrings from callee to the merged callee, not references.
> the information from "caller to src" is not passed to "caller to dst" when
> merge profiles, it is a workaround when fixing a SPEC ICE.  Will root
> cause it later.
> 
> 
>>> +                /* Speculative calls consist of two edges - direct
>>> +                   and indirect.  Duplicate the whole thing and
>>> +                   distribute frequencies accordingly.  */
>>> +                if (edge->speculative)
>> I think here you want to handle the whole group with multiple targets.
> OK.  Also an ICE fix in SPEC.
> 
> 
> Thanks
> Xionghu
> 
>>> +                  {
>>> +                struct ipa_ref *ref;
>>> +
>>> +                gcc_assert (!edge->indirect_unknown_callee);
>>> +                old_edge->speculative_call_info (direct,
>>> +                                 indirect, ref);
>>> +
>>> +                profile_count indir_cnt = indirect->count;
>>> +                indirect
>>> +                  = indirect->clone (id->dst_node, call_stmt,
>>> +                             gimple_uid (stmt), num,
>>> +                             den, true);
>>> +
>>> +                profile_probability prob
>>> +                  = indir_cnt.probability_in (old_cnt
>>> +                                  + indir_cnt);
>>> +                indirect->count
>>> +                  = copy_basic_block->count.apply_probability (
>>> +                    prob);
>>> +                edge->count
>>> +                  = copy_basic_block->count - indirect->count;
>>> +                id->dst_node->clone_reference (ref, stmt);
>>> +                  }
>>> +                else
>>> +                  edge->count = copy_basic_block->count;
>>> +              }
>>> +            /* If the indirect call contains more than one indirect
>>> +               targets, need clone all speculative edges here.  */
>>> +            if (old_edge && old_edge->next_callee
>>> +                && old_edge->speculative && indirect
>>> +                && indirect->indirect_info
>>> +                && indirect->indirect_info->num_of_ics > 1)
>>> +              {
>>> +                edge = old_edge->next_callee;
>>> +                old_edge = old_edge->next_callee;
>>> +                if (edge->speculative)
>>> +                  next_speculative = true;
>>> +              }
>>> +              }
>>> +            while (next_speculative);
>>> +          }
>>>             break;
>>>           case CB_CGE_MOVE_CLONES:
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-20 14:45   ` Martin Liška
@ 2019-07-01 11:20     ` Martin Liška
  2019-07-03  9:08     ` Jan Hubicka
  1 sibling, 0 replies; 25+ messages in thread
From: Martin Liška @ 2019-07-01 11:20 UTC (permalink / raw)
  To: Jan Hubicka, Xiong Hu Luo; +Cc: gcc-patches, segher, wschmidt, luoxhu

@Honza: PING^1

On 6/20/19 4:45 PM, Martin Liška wrote:
> Hi.
> 
> So the first part is about support of N tracked values to be supported.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] Rename SINGE_VALUE to TOPN_VALUES counters.
  2019-06-20 14:46   ` [PATCH 2/2] Rename SINGE_VALUE to TOPN_VALUES counters Martin Liška
@ 2019-07-01 11:21     ` Martin Liška
  2019-07-03  9:09     ` Jan Hubicka
  1 sibling, 0 replies; 25+ messages in thread
From: Martin Liška @ 2019-07-01 11:21 UTC (permalink / raw)
  To: Jan Hubicka, Xiong Hu Luo; +Cc: gcc-patches, segher, wschmidt, luoxhu

@Honza: PING^1

On 6/20/19 4:46 PM, Martin Liška wrote:
> And the second part is rename so that it reflect reality
> that single value can actually track multiple values.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
  2019-06-20 14:45   ` Martin Liška
  2019-07-01 11:20     ` Martin Liška
@ 2019-07-03  9:08     ` Jan Hubicka
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Hubicka @ 2019-07-03  9:08 UTC (permalink / raw)
  To: Martin Liška; +Cc: Xiong Hu Luo, gcc-patches, segher, wschmidt, luoxhu

> Hi.
> 
> So the first part is about support of N tracked values to be supported.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin

> From f3e361fb6d799acf538bc76a91bfcc8e265b7cbe Mon Sep 17 00:00:00 2001
> From: Martin Liska <mliska@suse.cz>
> Date: Wed, 19 Jun 2019 14:15:14 +0200
> Subject: [PATCH 1/2] Support N values in libgcov for single value counter
>  type.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-06-20  Martin Liska  <mliska@suse.cz>
> 
> 	* gcc.dg/tree-prof/val-prof-2.c: Update scanned pattern
> 	as we do now better.
> 
> libgcc/ChangeLog:
> 
> 2019-06-20  Martin Liska  <mliska@suse.cz>
> 
> 	* libgcov-merge.c (merge_single_value_set): Support N values.
> 	* libgcov-profiler.c (__gcov_one_value_profiler_body): Likewise.

OK,
Thanks.
Honza

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] Rename SINGE_VALUE to TOPN_VALUES counters.
  2019-06-20 14:46   ` [PATCH 2/2] Rename SINGE_VALUE to TOPN_VALUES counters Martin Liška
  2019-07-01 11:21     ` Martin Liška
@ 2019-07-03  9:09     ` Jan Hubicka
  2019-07-03 12:41       ` Martin Liška
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Hubicka @ 2019-07-03  9:09 UTC (permalink / raw)
  To: Martin Liška; +Cc: Xiong Hu Luo, gcc-patches, segher, wschmidt, luoxhu

> And the second part is rename so that it reflect reality
> that single value can actually track multiple values.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin

> From cc9e93d43941176e92b5821e5a8134a5319a10b4 Mon Sep 17 00:00:00 2001
> From: Martin Liska <mliska@suse.cz>
> Date: Thu, 20 Jun 2019 14:50:23 +0200
> Subject: [PATCH 2/2] Rename SINGE_VALUE to TOPN_VALUES counters.
> 
> gcc/ChangeLog:
> 
> 2019-06-20  Martin Liska  <mliska@suse.cz>
> 
> 	* gcov-counter.def (GCOV_COUNTER_V_SINGLE): Remove.
> 	(GCOV_COUNTER_V_TOPN): New.
> 	(GCOV_COUNTER_V_INDIR): Use _topn.
> 	* gcov-io.h (GCOV_DISK_SINGLE_VALUES): Remove.
> 	(GCOV_TOPN_VALUES): New.
> 	(GCOV_SINGLE_VALUE_COUNTERS): Remove.
> 	(GCOV_TOPN_VALUES_COUNTERS): New.
> 	* profile.c (instrument_values): Use HIST_TYPE_TOPN_VALUES.
> 	* tree-profile.c:
> 	(gimple_init_gcov_profiler): Rename variables from one_value
> 	to topn_values.
> 	(gimple_gen_one_value_profiler): Remove.
> 	(gimple_gen_topn_values_profiler): New function.
> 	* value-prof.c (dump_histogram_value): Use TOPN_VALUES
> 	names instead of SINGLE_VALUE.
> 	(stream_out_histogram_value): Likewise.
> 	(stream_in_histogram_value): Likewise.
> 	(get_most_common_single_value): Likewise.
> 	(gimple_divmod_fixed_value_transform): Likewise.
> 	(gimple_stringops_transform): Likewise.
> 	(gimple_divmod_values_to_profile): Likewise.
> 	(gimple_stringops_values_to_profile): Likewise.
> 	(gimple_find_values_to_profile): Likewise.
> 	* value-prof.h (enum hist_type): Rename to TOPN.
> 	(gimple_gen_one_value_profiler): Remove.
> 	(gimple_gen_topn_values_profiler): New.
> 
> libgcc/ChangeLog:
> 
> 2019-06-20  Martin Liska  <mliska@suse.cz>
> 
> 	* Makefile.in: Use topn_values instead of one_value names.
> 	* libgcov-merge.c (__gcov_merge_single): Move to ...
> 	(__gcov_merge_topn): ... this.
> 	(merge_single_value_set): Move to ...
> 	(merge_topn_values_set): ... this.
> 	* libgcov-profiler.c (__gcov_one_value_profiler_body): Move to
> 	...
> 	(__gcov_topn_values_profiler_body): ... this.
> 	(__gcov_one_value_profiler_v2): Move to ...
> 	(__gcov_topn_values_profiler): ... this.
> 	(__gcov_one_value_profiler_v2_atomic): Move to ...
> 	(__gcov_topn_values_profiler_atomic): ... this.
> 	(__gcov_indirect_call_profiler_v4): Remove.
> 	* libgcov-util.c (__gcov_single_counter_op): Move to ...
> 	(__gcov_topn_counter_op): ... this.
> 	* libgcov.h (L_gcov_merge_single): Remove.
> 	(L_gcov_merge_topn): New.
> 	(__gcov_merge_single): Remove.
> 	(__gcov_merge_topn): New.
> 	(__gcov_one_value_profiler_v2): Move to ..
> 	(__gcov_topn_values_profiler): ... this.
> 	(__gcov_one_value_profiler_v2_atomic): Move to ...
> 	(__gcov_topn_values_profiler_atomic): ... this.

OK,
I would rename the __gcov_topn_values_profiler to _v2 since we had this
function before.

Honza

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/2] Rename SINGE_VALUE to TOPN_VALUES counters.
  2019-07-03  9:09     ` Jan Hubicka
@ 2019-07-03 12:41       ` Martin Liška
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Liška @ 2019-07-03 12:41 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Xiong Hu Luo, gcc-patches, segher, wschmidt, luoxhu

On 7/3/19 11:09 AM, Jan Hubicka wrote:
> OK,
> I would rename the __gcov_topn_values_profiler to _v2 since we had this
> function before.

It's bit tricky, but we hadn't because I named that *_values_*. We used to
have *_value_* :)

Martin

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-07-03 12:40 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-18  1:46 [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization Xiong Hu Luo
2019-06-18  5:51 ` Martin Liška
2019-06-18  9:03   ` luoxhu
2019-06-18  9:34     ` Martin Liška
2019-06-18 10:07       ` Segher Boessenkool
2019-06-18 10:20         ` Martin Liška
2019-06-19  5:38       ` luoxhu
2019-06-19  6:57         ` Martin Liška
2019-06-18 10:21 ` Martin Liška
2019-06-19  8:50   ` luoxhu
2019-06-19  8:56     ` Martin Liška
2019-06-19 12:18       ` Martin Liška
2019-06-20  1:59         ` luoxhu
2019-06-20  6:15           ` luoxhu
2019-06-20 12:57             ` Martin Liška
2019-06-20 13:47 ` Jan Hubicka
2019-06-20 14:45   ` Martin Liška
2019-07-01 11:20     ` Martin Liška
2019-07-03  9:08     ` Jan Hubicka
2019-06-20 14:46   ` [PATCH 2/2] Rename SINGE_VALUE to TOPN_VALUES counters Martin Liška
2019-07-01 11:21     ` Martin Liška
2019-07-03  9:09     ` Jan Hubicka
2019-07-03 12:41       ` Martin Liška
2019-06-24  2:34   ` [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization luoxhu
2019-06-24  9:20     ` luoxhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).