From: "Martin Liška" <mliska@suse.cz>
To: Xiong Hu Luo <luoxhu@linux.ibm.com>, gcc-patches@gcc.gnu.org
Cc: hubicka@ucw.cz, segher@kernel.crashing.org,
wschmidt@linux.ibm.com, luoxhu@cn.ibm.com
Subject: Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization
Date: Tue, 18 Jun 2019 05:51:00 -0000 [thread overview]
Message-ID: <adde5d29-d091-0768-32e3-50313d330dc2@suse.cz> (raw)
In-Reply-To: <20190618014521.67198-1-luoxhu@linux.ibm.com>
On 6/18/19 3:45 AM, Xiong Hu Luo wrote:
Hello.
Thank you for the interest in the area.
> This patch aims to fix PR69678 caused by PGO indirect call profiling bugs.
> Currently the default instrument function can only find the indirect function
> that called more than 50% with an incorrect count number returned.
Can you please explain what you mean by 'an incorrect count number returned'?
> This patch
> leverages the "--param indir-call-topn-profile=1" and enables multiple indirect
Note that I've remove indir-call-topn-profile last week, the patch will not apply
on current trunk. However, I can help you how to adapt single-value counters
to support tracking of multiple values.
> targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, function
> specialization, profiling, partial devirtualization, inlining and cloning could
> be done successfully based on it.
This decision is definitely big question for Honza?
> Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests.
> Details are:
> 1. When do PGO with indir-call-topn-profile, the gcda data format is not
> supported in ipa-profile pass,
If you take a look at gcc/ipa-profile.c:195 you can see how the probability
is propagated to IPA passes. Why is that not sufficient?
Martin
> so add variables to pass the information
> through passes, and postpone gimple_ic to ipa-profile like default as inline
> pass will decide whether it is benefit to transform indirect call.
> 2. Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
> profile full support in ipa passes and cgraph_edge functions.
> 3. Fix various hidden speculative call ICEs exposed after enabling this
> feature when running SPEC2017.
> 4. Add 1 in module testcase and 2 cross module testcases.
> 5. TODOs:
> 5.1. Some reference info will be dropped from WPA to LTRANS, so
> reference check will be difficult in LTRANS, need replace the strstr
> with reference compare.
> 5.2. Some duplicate code need be removed as top1 and topn share same logic.
> Actually top1 related logic could be eliminated totally as topn includes it.
> 5.3. Split patch maybe needed as too big but not sure how many would be
> reasonable.
> 6. Performance result for ppc64le:
> 6.1. Representative test: indir-call-prof-topn.c runtime improved from
> 1.7s to 0.4s.
> 6.2. SPEC2017 peakrate:
> 523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r (+13.33%);
> 525.x264_r (-5.29%).
> No big changes of other benchmarks.
> Option: -Ofast -mcpu=power8
> PASS1_OPTIMIZE: -fprofile-generate --param indir-call-topn-profile=1 -flto
> PASS2_OPTIMIZE: -fprofile-use --param indir-call-topn-profile=1 -flto
> -fprofile-correction
> 6.3. No performance change on PHP benchmark.
> 7. Bootstrap and regression test passed on Power8-LE.
>
> gcc/ChangeLog
>
> 2019-06-17 Xiong Hu Luo <luoxhu@linux.ibm.com>
>
> PR ipa/69678
> * cgraph.c (cgraph_node::get_create): Copy profile_id.
> (cgraph_edge::speculative_call_info): Find real
> reference for indirect targets.
> (cgraph_edge::resolve_speculation): Add speculative code process
> for indirect targets.
> (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
> (cgraph_node::verify_node): Likewise.
> * cgraph.h (common_target_ids): New variable.
> (common_target_probabilities): Likewise.
> (num_of_ics): Likewise.
> * cgraphclones.c (cgraph_node::create_clone): Copy profile_id.
> * ipa-inline.c (inline_small_functions): Add iterator update.
> * ipa-profile.c (ipa_profile_generate_summary): Add indirect
> multiple targets logic.
> (ipa_profile): Likewise.
> * ipa-utils.c (ipa_merge_profiles): Clone speculative src's
> referrings to dst.
> * ipa.c (process_references): Fix typo.
> * lto-cgraph.c (lto_output_edge): Add indirect multiple targets
> logic.
> (input_edge): Likewise.
> * predict.c (dump_prediction): Revome edges count assert to be
> precise.
> * tree-profile.c (gimple_gen_ic_profiler): Use the new variable
> __gcov_indirect_call.counters and __gcov_indirect_call.callee.
> (gimple_gen_ic_func_profiler): Likewise.
> (pass_ipa_tree_profile::gate): Fix comment typos.
> * tree-inline.c (copy_bb): Duplicate all the speculative edges
> if indirect call contains multiple speculative targets.
> * value-prof.c (check_counter): Proportion the counter for
> multiple targets.
> (ic_transform_topn): New function.
> (gimple_ic_transform): Handle topn case, fix comment typos.
>
> gcc/testsuite/ChangeLog
>
> 2019-06-17 Xiong Hu Luo <luoxhu@linux.ibm.com>
>
> PR ipa/69678
> * gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase.
> * gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase.
> * gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase.
> * gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase.
> ---
> gcc/cgraph.c | 38 +++-
> gcc/cgraph.h | 9 +-
> gcc/cgraphclones.c | 1 +
> gcc/ipa-inline.c | 3 +
> gcc/ipa-profile.c | 185 +++++++++++++++++-
> gcc/ipa-utils.c | 5 +
> gcc/ipa.c | 2 +-
> gcc/lto-cgraph.c | 38 ++++
> gcc/predict.c | 1 -
> .../tree-prof/crossmodule-indir-call-topn-1.c | 35 ++++
> .../crossmodule-indir-call-topn-1a.c | 22 +++
> .../tree-prof/crossmodule-indir-call-topn-2.c | 42 ++++
> .../gcc.dg/tree-prof/indir-call-prof-topn.c | 38 ++++
> gcc/tree-inline.c | 97 +++++----
> gcc/tree-profile.c | 12 +-
> gcc/value-prof.c | 146 +++++++++++++-
> 16 files changed, 606 insertions(+), 68 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
>
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index de82316d4b1..0d373a67d1b 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl)
> fprintf (dump_file, "Introduced new external node "
> "(%s) and turned into root of the clone tree.\n",
> node->dump_name ());
> + node->profile_id = first_clone->profile_id;
> }
> else if (dump_file)
> fprintf (dump_file, "Introduced new external node "
> @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
> int i;
> cgraph_edge *e2;
> cgraph_edge *e = this;
> + cgraph_node *referred_node;
>
> if (!e->indirect_unknown_callee)
> for (e2 = e->caller->indirect_calls;
> @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct,
> && ((ref->stmt && ref->stmt == e->call_stmt)
> || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid)))
> {
> - reference = ref;
> - break;
> + if (e2->indirect_info && e2->indirect_info->num_of_ics)
> + {
> + referred_node = dyn_cast<cgraph_node *> (ref->referred);
> + if (strstr (e->callee->name (), referred_node->name ()))
> + {
> + reference = ref;
> + break;
> + }
> + }
> + else
> + {
> + reference = ref;
> + break;
> + }
> }
>
> /* Speculative edge always consist of all three components - direct edge,
> @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl)
> in the functions inlined through it. */
> }
> edge->count += e2->count;
> - edge->speculative = false;
> + if (edge->indirect_info && edge->indirect_info->num_of_ics)
> + {
> + edge->indirect_info->num_of_ics--;
> + if (edge->indirect_info->num_of_ics == 0)
> + edge->speculative = false;
> + }
> + else
> + edge->speculative = false;
> e2->speculative = false;
> ref->remove_reference ();
> if (e2->indirect_unknown_callee || e2->inline_failed)
> @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
> e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt,
> false);
> e->count = gimple_bb (e->call_stmt)->count;
> - e2->speculative = false;
> + if (e2->indirect_info && e2->indirect_info->num_of_ics)
> + {
> + e2->indirect_info->num_of_ics--;
> + if (e2->indirect_info->num_of_ics == 0)
> + e2->speculative = false;
> + }
> + else
> + e2->speculative = false;
> e2->count = gimple_bb (e2->call_stmt)->count;
> ref->speculative = false;
> ref->stmt = NULL;
> @@ -3407,7 +3435,7 @@ cgraph_node::verify_node (void)
>
> for (e = callees; e; e = e->next_callee)
> {
> - if (!e->aux)
> + if (!e->aux && !e->speculative)
> {
> error ("edge %s->%s has no corresponding call_stmt",
> identifier_to_locale (e->caller->name ()),
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index c294602d762..ed0fbc60432 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3. If not see
> #include "profile-count.h"
> #include "ipa-ref.h"
> #include "plugin-api.h"
> +#include "gcov-io.h"
>
> extern void debuginfo_early_init (void);
> extern void debuginfo_init (void);
> @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info
> int param_index;
> /* ECF flags determined from the caller. */
> int ecf_flags;
> - /* Profile_id of common target obtrained from profile. */
> + /* Profile_id of common target obtained from profile. */
> int common_target_id;
> /* Probability that call will land in function with COMMON_TARGET_ID. */
> int common_target_probability;
>
> + /* Profile_id of common target obtained from profile. */
> + int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2];
> + /* Probabilities that call will land in function with COMMON_TARGET_IDS. */
> + int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2];
> + unsigned num_of_ics;
> +
> /* Set when the call is a virtual call with the parameter being the
> associated object pointer rather than a simple direct call. */
> unsigned polymorphic : 1;
> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
> index 15f7e119d18..94f424bc10c 100644
> --- a/gcc/cgraphclones.c
> +++ b/gcc/cgraphclones.c
> @@ -467,6 +467,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
> new_node->icf_merged = icf_merged;
> new_node->merged_comdat = merged_comdat;
> new_node->thunk = thunk;
> + new_node->profile_id = profile_id;
>
> new_node->clone.tree_map = NULL;
> new_node->clone.args_to_skip = args_to_skip;
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index 360c3de3289..ef2b217b3f9 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -1866,12 +1866,15 @@ inline_small_functions (void)
> }
> if (has_speculative)
> for (edge = node->callees; edge; edge = next)
> + {
> + next = edge->next_callee;
> if (edge->speculative && !speculation_useful_p (edge,
> edge->aux != NULL))
> {
> edge->resolve_speculation ();
> update = true;
> }
> + }
> if (update)
> {
> struct cgraph_node *where = node->global.inlined_to
> diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
> index de9563d808c..d04476295a0 100644
> --- a/gcc/ipa-profile.c
> +++ b/gcc/ipa-profile.c
> @@ -168,6 +168,10 @@ ipa_profile_generate_summary (void)
> struct cgraph_node *node;
> gimple_stmt_iterator gsi;
> basic_block bb;
> + enum hist_type type;
> +
> + type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN
> + : HIST_TYPE_INDIR_CALL;
>
> hash_table<histogram_hash> hashtable (10);
>
> @@ -186,10 +190,10 @@ ipa_profile_generate_summary (void)
> histogram_value h;
> h = gimple_histogram_value_of_type
> (DECL_STRUCT_FUNCTION (node->decl),
> - stmt, HIST_TYPE_INDIR_CALL);
> + stmt, type);
> /* No need to do sanity check: gimple_ic_transform already
> takes away bad histograms. */
> - if (h)
> + if (h && type == HIST_TYPE_INDIR_CALL)
> {
> /* counter 0 is target, counter 1 is number of execution we called target,
> counter 2 is total number of executions. */
> @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void)
> gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (node->decl),
> stmt, h);
> }
> + else if (h && type == HIST_TYPE_INDIR_CALL_TOPN)
> + {
> + unsigned j;
> + struct cgraph_edge *e = node->get_edge (stmt);
> + if (e && !e->indirect_unknown_callee)
> + continue;
> +
> + e->indirect_info->num_of_ics = 0;
> + for (j = 1; j < h->n_counters; j += 2)
> + {
> + if (h->hvalue.counters[j] == 0)
> + continue;
> +
> + e->indirect_info->common_target_ids[j / 2]
> + = h->hvalue.counters[j];
> + e->indirect_info->common_target_probabilities[j / 2]
> + = GCOV_COMPUTE_SCALE (
> + h->hvalue.counters[j + 1],
> + gimple_bb (stmt)->count.ipa ().to_gcov_type ());
> + if (e->indirect_info
> + ->common_target_probabilities[j / 2]
> + > REG_BR_PROB_BASE)
> + {
> + if (dump_file)
> + fprintf (dump_file,
> + "Probability capped to 1\n");
> + e->indirect_info
> + ->common_target_probabilities[j / 2]
> + = REG_BR_PROB_BASE;
> + }
> + e->indirect_info->num_of_ics++;
> + }
> +
> + gcc_assert (e->indirect_info->num_of_ics
> + <= GCOV_ICALL_TOPN_NCOUNTS / 2);
> +
> + gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (
> + node->decl),
> + stmt, h);
> + }
> }
> time += estimate_num_insns (stmt, &eni_time_weights);
> size += estimate_num_insns (stmt, &eni_size_weights);
> @@ -492,6 +536,7 @@ ipa_profile (void)
> int nindirect = 0, ncommon = 0, nunknown = 0, nuseless = 0, nconverted = 0;
> int nmismatch = 0, nimpossible = 0;
> bool node_map_initialized = false;
> + gcov_type threshold;
>
> if (dump_file)
> dump_histogram (dump_file, histogram);
> @@ -500,14 +545,12 @@ ipa_profile (void)
> overall_time += histogram[i]->count * histogram[i]->time;
> overall_size += histogram[i]->size;
> }
> + threshold = 0;
> if (overall_time)
> {
> - gcov_type threshold;
> -
> gcc_assert (overall_size);
>
> cutoff = (overall_time * PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE) + 500) / 1000;
> - threshold = 0;
> for (i = 0; cumulated < cutoff; i++)
> {
> cumulated += histogram[i]->count * histogram[i]->time;
> @@ -543,7 +586,7 @@ ipa_profile (void)
> histogram.release ();
> histogram_pool.release ();
>
> - /* Produce speculative calls: we saved common traget from porfiling into
> + /* Produce speculative calls: we saved common target from profiling into
> e->common_target_id. Now, at link time, we can look up corresponding
> function node and produce speculative call. */
>
> @@ -558,7 +601,8 @@ ipa_profile (void)
> {
> if (n->count.initialized_p ())
> nindirect++;
> - if (e->indirect_info->common_target_id)
> + if (e->indirect_info->common_target_id
> + || (e->indirect_info && e->indirect_info->num_of_ics == 1))
> {
> if (!node_map_initialized)
> init_node_map (false);
> @@ -613,7 +657,7 @@ ipa_profile (void)
> if (dump_file)
> fprintf (dump_file,
> "Not speculating: "
> - "parameter count mistmatch\n");
> + "parameter count mismatch\n");
> }
> else if (e->indirect_info->polymorphic
> && !opt_for_fn (n->decl, flag_devirtualize)
> @@ -655,7 +699,130 @@ ipa_profile (void)
> nunknown++;
> }
> }
> - }
> + if (e->indirect_info && e->indirect_info->num_of_ics > 1)
> + {
> + if (in_lto_p)
> + {
> + if (dump_file)
> + {
> + fprintf (dump_file,
> + "Updating hotness threshold in LTO mode.\n");
> + fprintf (dump_file, "Updated min count: %" PRId64 "\n",
> + (int64_t) threshold);
> + }
> + set_hot_bb_threshold (threshold
> + / e->indirect_info->num_of_ics);
> + }
> + if (!node_map_initialized)
> + init_node_map (false);
> + node_map_initialized = true;
> + ncommon++;
> + unsigned speculative = 0;
> + for (i = 0; i < (int)e->indirect_info->num_of_ics; i++)
> + {
> + n2 = find_func_by_profile_id (
> + e->indirect_info->common_target_ids[i]);
> + if (n2)
> + {
> + if (dump_file)
> + {
> + fprintf (
> + dump_file,
> + "Indirect call -> direct call from"
> + " other module %s => %s, prob %3.2f\n",
> + n->dump_name (), n2->dump_name (),
> + e->indirect_info->common_target_probabilities[i]
> + / (float) REG_BR_PROB_BASE);
> + }
> + if (e->indirect_info->common_target_probabilities[i]
> + < REG_BR_PROB_BASE / 2)
> + {
> + nuseless++;
> + if (dump_file)
> + fprintf (
> + dump_file,
> + "Not speculating: probability is too low.\n");
> + }
> + else if (!e->maybe_hot_p ())
> + {
> + nuseless++;
> + if (dump_file)
> + fprintf (dump_file,
> + "Not speculating: call is cold.\n");
> + }
> + else if (n2->get_availability () <= AVAIL_INTERPOSABLE
> + && n2->can_be_discarded_p ())
> + {
> + nuseless++;
> + if (dump_file)
> + fprintf (dump_file,
> + "Not speculating: target is overwritable "
> + "and can be discarded.\n");
> + }
> + else if (ipa_node_params_sum && ipa_edge_args_sum
> + && (!vec_safe_is_empty (
> + IPA_NODE_REF (n2)->descriptors))
> + && ipa_get_param_count (IPA_NODE_REF (n2))
> + != ipa_get_cs_argument_count (
> + IPA_EDGE_REF (e))
> + && (ipa_get_param_count (IPA_NODE_REF (n2))
> + >= ipa_get_cs_argument_count (
> + IPA_EDGE_REF (e))
> + || !stdarg_p (TREE_TYPE (n2->decl))))
> + {
> + nmismatch++;
> + if (dump_file)
> + fprintf (dump_file, "Not speculating: "
> + "parameter count mismatch\n");
> + }
> + else if (e->indirect_info->polymorphic
> + && !opt_for_fn (n->decl, flag_devirtualize)
> + && !possible_polymorphic_call_target_p (e, n2))
> + {
> + nimpossible++;
> + if (dump_file)
> + fprintf (dump_file,
> + "Not speculating: "
> + "function is not in the polymorphic "
> + "call target list\n");
> + }
> + else
> + {
> + /* Target may be overwritable, but profile says that
> + control flow goes to this particular implementation
> + of N2. Speculate on the local alias to allow
> + inlining.
> + */
> + if (!n2->can_be_discarded_p ())
> + {
> + cgraph_node *alias;
> + alias = dyn_cast<cgraph_node *> (
> + n2->noninterposable_alias ());
> + if (alias)
> + n2 = alias;
> + }
> + nconverted++;
> + e->make_speculative (
> + n2, e->count.apply_probability (
> + e->indirect_info
> + ->common_target_probabilities[i]));
> + update = true;
> + speculative++;
> + }
> + }
> + else
> + {
> + if (dump_file)
> + fprintf (dump_file,
> + "Function with profile-id %i not found.\n",
> + e->indirect_info->common_target_ids[i]);
> + nunknown++;
> + }
> + }
> + if (speculative < e->indirect_info->num_of_ics)
> + e->indirect_info->num_of_ics = speculative;
> + }
> + }
> if (update)
> ipa_update_overall_fn_summary (n);
> }
> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
> index 79b250c3943..30347691029 100644
> --- a/gcc/ipa-utils.c
> +++ b/gcc/ipa-utils.c
> @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst,
> update_max_bb_count ();
> compute_function_frequency ();
> pop_cfun ();
> + /* When src is speculative, clone the referrings. */
> + if (src->indirect_call_target)
> + for (e = src->callers; e; e = e->next_caller)
> + if (e->callee == src && e->speculative)
> + dst->clone_referring (src);
> for (e = dst->callees; e; e = e->next_callee)
> {
> if (e->speculative)
> diff --git a/gcc/ipa.c b/gcc/ipa.c
> index 2496694124c..c1fe081a72d 100644
> --- a/gcc/ipa.c
> +++ b/gcc/ipa.c
> @@ -166,7 +166,7 @@ process_references (symtab_node *snode,
> devirtualization happens. After inlining still keep their declarations
> around, so we can devirtualize to a direct call.
>
> - Also try to make trivial devirutalization when no or only one target is
> + Also try to make trivial devirtualization when no or only one target is
> possible. */
>
> static void
> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> index 4dfa2862be3..0c8f547d44e 100644
> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -238,6 +238,7 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
> unsigned int uid;
> intptr_t ref;
> struct bitpack_d bp;
> + unsigned i;
>
> if (edge->indirect_unknown_callee)
> streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
> @@ -296,6 +297,25 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge,
> if (edge->indirect_info->common_target_id)
> streamer_write_hwi_stream
> (ob->main_stream, edge->indirect_info->common_target_probability);
> +
> + gcc_assert (edge->indirect_info->num_of_ics
> + <= GCOV_ICALL_TOPN_NCOUNTS / 2);
> +
> + streamer_write_hwi_stream (ob->main_stream,
> + edge->indirect_info->num_of_ics);
> +
> + if (edge->indirect_info->num_of_ics)
> + {
> + for (i = 0; i < edge->indirect_info->num_of_ics; i++)
> + {
> + streamer_write_hwi_stream (
> + ob->main_stream, edge->indirect_info->common_target_ids[i]);
> + if (edge->indirect_info->common_target_ids[i])
> + streamer_write_hwi_stream (
> + ob->main_stream,
> + edge->indirect_info->common_target_probabilities[i]);
> + }
> + }
> }
> }
>
> @@ -1438,6 +1458,7 @@ input_edge (struct lto_input_block *ib, vec<symtab_node *> nodes,
> cgraph_inline_failed_t inline_failed;
> struct bitpack_d bp;
> int ecf_flags = 0;
> + unsigned i;
>
> caller = dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]);
> if (caller == NULL || caller->decl == NULL_TREE)
> @@ -1488,6 +1509,23 @@ input_edge (struct lto_input_block *ib, vec<symtab_node *> nodes,
> edge->indirect_info->common_target_id = streamer_read_hwi (ib);
> if (edge->indirect_info->common_target_id)
> edge->indirect_info->common_target_probability = streamer_read_hwi (ib);
> +
> + edge->indirect_info->num_of_ics = streamer_read_hwi (ib);
> +
> + gcc_assert (edge->indirect_info->num_of_ics
> + <= GCOV_ICALL_TOPN_NCOUNTS / 2);
> +
> + if (edge->indirect_info->num_of_ics)
> + {
> + for (i = 0; i < edge->indirect_info->num_of_ics; i++)
> + {
> + edge->indirect_info->common_target_ids[i]
> + = streamer_read_hwi (ib);
> + if (edge->indirect_info->common_target_ids[i])
> + edge->indirect_info->common_target_probabilities[i]
> + = streamer_read_hwi (ib);
> + }
> + }
> }
> }
>
> diff --git a/gcc/predict.c b/gcc/predict.c
> index 43ee91a5b13..b7f38891c72 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -763,7 +763,6 @@ dump_prediction (FILE *file, enum br_predictor predictor, int probability,
> && bb->count.precise_p ()
> && reason == REASON_NONE)
> {
> - gcc_assert (e->count ().precise_p ());
> fprintf (file, ";;heuristics;%s;%" PRId64 ";%" PRId64 ";%.1f;\n",
> predictor_info[predictor].name,
> bb->count.to_gcov_type (), e->count ().to_gcov_type (),
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
> new file mode 100644
> index 00000000000..e0a83c2e067
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
> @@ -0,0 +1,35 @@
> +/* { dg-require-effective-target lto } */
> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */
> +
> +#include <stdio.h>
> +
> +typedef int (*fptr) (int);
> +int
> +one (int a);
> +
> +int
> +two (int a);
> +
> +fptr table[] = {&one, &two};
> +
> +int
> +main()
> +{
> + int i, x;
> + fptr p = &one;
> +
> + x = one (3);
> +
> + for (i = 0; i < 350000000; i++)
> + {
> + x = (*p) (3);
> + p = table[x];
> + }
> + printf ("done:%d\n", x);
> +}
> +
> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */
> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
> new file mode 100644
> index 00000000000..a8c6e365fb9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c
> @@ -0,0 +1,22 @@
> +/* It seems there is no way to avoid the other source of mulitple
> + source testcase from being compiled independently. Just avoid
> + error. */
> +#ifdef DOJOB
> +int
> +one (int a)
> +{
> + return 1;
> +}
> +
> +int
> +two (int a)
> +{
> + return 0;
> +}
> +#else
> +int
> +main()
> +{
> + return 0;
> +}
> +#endif
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
> new file mode 100644
> index 00000000000..aa3887fde83
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
> @@ -0,0 +1,42 @@
> +/* { dg-require-effective-target lto } */
> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */
> +
> +#include <stdio.h>
> +
> +typedef int (*fptr) (int);
> +int
> +one (int a);
> +
> +int
> +two (int a);
> +
> +fptr table[] = {&one, &two};
> +
> +int foo ()
> +{
> + int i, x;
> + fptr p = &one;
> +
> + x = one (3);
> +
> + for (i = 0; i < 350000000; i++)
> + {
> + x = (*p) (3);
> + p = table[x];
> + }
> + return x;
> +}
> +
> +int
> +main()
> +{
> + int x = foo ();
> + printf ("done:%d\n", x);
> +}
> +
> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */
> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */
> +
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
> new file mode 100644
> index 00000000000..951bc7ddd19
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c
> @@ -0,0 +1,38 @@
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -fdump-ipa-profile --param indir-call-topn-profile=1" } */
> +
> +#include <stdio.h>
> +
> +typedef int (*fptr) (int);
> +int
> +one (int a)
> +{
> + return 1;
> +}
> +
> +int
> +two (int a)
> +{
> + return 0;
> +}
> +
> +fptr table[] = {&one, &two};
> +
> +int
> +main()
> +{
> + int i, x;
> + fptr p = &one;
> +
> + one (3);
> +
> + for (i = 0; i < 350000000; i++)
> + {
> + x = (*p) (3);
> + p = table[x];
> + }
> + printf ("done:%d\n", x);
> +}
> +
> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile" } } */
> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile" } } */
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index 9017da878b1..f69b31b197e 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -2028,43 +2028,66 @@ copy_bb (copy_body_data *id, basic_block bb,
> switch (id->transform_call_graph_edges)
> {
> case CB_CGE_DUPLICATE:
> - edge = id->src_node->get_edge (orig_stmt);
> - if (edge)
> - {
> - struct cgraph_edge *old_edge = edge;
> - profile_count old_cnt = edge->count;
> - edge = edge->clone (id->dst_node, call_stmt,
> - gimple_uid (stmt),
> - num, den,
> - true);
> -
> - /* Speculative calls consist of two edges - direct and
> - indirect. Duplicate the whole thing and distribute
> - frequencies accordingly. */
> - if (edge->speculative)
> - {
> - struct cgraph_edge *direct, *indirect;
> - struct ipa_ref *ref;
> -
> - gcc_assert (!edge->indirect_unknown_callee);
> - old_edge->speculative_call_info (direct, indirect, ref);
> -
> - profile_count indir_cnt = indirect->count;
> - indirect = indirect->clone (id->dst_node, call_stmt,
> - gimple_uid (stmt),
> - num, den,
> - true);
> -
> - profile_probability prob
> - = indir_cnt.probability_in (old_cnt + indir_cnt);
> - indirect->count
> - = copy_basic_block->count.apply_probability (prob);
> - edge->count = copy_basic_block->count - indirect->count;
> - id->dst_node->clone_reference (ref, stmt);
> - }
> - else
> - edge->count = copy_basic_block->count;
> - }
> + {
> + edge = id->src_node->get_edge (orig_stmt);
> + struct cgraph_edge *old_edge = edge;
> + struct cgraph_edge *direct, *indirect;
> + bool next_speculative;
> + do
> + {
> + next_speculative = false;
> + if (edge)
> + {
> + profile_count old_cnt = edge->count;
> + edge
> + = edge->clone (id->dst_node, call_stmt,
> + gimple_uid (stmt), num, den, true);
> +
> + /* Speculative calls consist of two edges - direct
> + and indirect. Duplicate the whole thing and
> + distribute frequencies accordingly. */
> + if (edge->speculative)
> + {
> + struct ipa_ref *ref;
> +
> + gcc_assert (!edge->indirect_unknown_callee);
> + old_edge->speculative_call_info (direct,
> + indirect, ref);
> +
> + profile_count indir_cnt = indirect->count;
> + indirect
> + = indirect->clone (id->dst_node, call_stmt,
> + gimple_uid (stmt), num,
> + den, true);
> +
> + profile_probability prob
> + = indir_cnt.probability_in (old_cnt
> + + indir_cnt);
> + indirect->count
> + = copy_basic_block->count.apply_probability (
> + prob);
> + edge->count
> + = copy_basic_block->count - indirect->count;
> + id->dst_node->clone_reference (ref, stmt);
> + }
> + else
> + edge->count = copy_basic_block->count;
> + }
> + /* If the indirect call contains more than one indirect
> + targets, need clone all speculative edges here. */
> + if (old_edge && old_edge->next_callee
> + && old_edge->speculative && indirect
> + && indirect->indirect_info
> + && indirect->indirect_info->num_of_ics > 1)
> + {
> + edge = old_edge->next_callee;
> + old_edge = old_edge->next_callee;
> + if (edge->speculative)
> + next_speculative = true;
> + }
> + }
> + while (next_speculative);
> + }
> break;
>
> case CB_CGE_MOVE_CLONES:
> diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
> index 1c3034aac10..4964dbdebb5 100644
> --- a/gcc/tree-profile.c
> +++ b/gcc/tree-profile.c
> @@ -74,8 +74,8 @@ static GTY(()) tree ic_tuple_callee_field;
> /* Do initialization work for the edge profiler. */
>
> /* Add code:
> - __thread gcov* __gcov_indirect_call_counters; // pointer to actual counter
> - __thread void* __gcov_indirect_call_callee; // actual callee address
> + __thread gcov* __gcov_indirect_call.counters; // pointer to actual counter
> + __thread void* __gcov_indirect_call.callee; // actual callee address
> __thread int __gcov_function_counter; // time profiler function counter
> */
> static void
> @@ -395,7 +395,7 @@ gimple_gen_ic_profiler (histogram_value value, unsigned tag, unsigned base)
> f_1 = foo;
> __gcov_indirect_call.counters = &__gcov4.main[0];
> PROF_9 = f_1;
> - __gcov_indirect_call_callee = PROF_9;
> + __gcov_indirect_call.callee = PROF_9;
> _4 = f_1 ();
> */
>
> @@ -458,11 +458,11 @@ gimple_gen_ic_func_profiler (void)
>
> /* Insert code:
>
> - if (__gcov_indirect_call_callee != NULL)
> + if (__gcov_indirect_call.callee != NULL)
> __gcov_indirect_call_profiler_v3 (profile_id, ¤t_function_decl);
>
> The function __gcov_indirect_call_profiler_v3 is responsible for
> - resetting __gcov_indirect_call_callee to NULL. */
> + resetting __gcov_indirect_call.callee to NULL. */
>
> gimple_stmt_iterator gsi = gsi_start_bb (cond_bb);
> void0 = build_int_cst (ptr_type_node, 0);
> @@ -904,7 +904,7 @@ pass_ipa_tree_profile::gate (function *)
> {
> /* When profile instrumentation, use or test coverage shall be performed.
> But for AutoFDO, this there is no instrumentation, thus this pass is
> - diabled. */
> + disabled. */
> return (!in_lto_p && !flag_auto_profile
> && (flag_branch_probabilities || flag_test_coverage
> || profile_arc_flag));
> diff --git a/gcc/value-prof.c b/gcc/value-prof.c
> index 5013956cf86..4869ab8ccd6 100644
> --- a/gcc/value-prof.c
> +++ b/gcc/value-prof.c
> @@ -579,8 +579,8 @@ free_histograms (struct function *fn)
> somehow. */
>
> static bool
> -check_counter (gimple *stmt, const char * name,
> - gcov_type *count, gcov_type *all, profile_count bb_count_d)
> +check_counter (gimple *stmt, const char *name, gcov_type *count, gcov_type *all,
> + profile_count bb_count_d, float ratio = 1.0f)
> {
> gcov_type bb_count = bb_count_d.ipa ().to_gcov_type ();
> if (*all != bb_count || *count > *all)
> @@ -599,7 +599,7 @@ check_counter (gimple *stmt, const char * name,
> "count (%d)\n", name, (int)*all, (int)bb_count);
> *all = bb_count;
> if (*count > *all)
> - *count = *all;
> + *count = *all * ratio;
> return false;
> }
> else
> @@ -1410,9 +1410,132 @@ gimple_ic (gcall *icall_stmt, struct cgraph_node *direct_call,
> return dcall_stmt;
> }
>
> +/* If --param=indir-call-topn-profile=1 is specified when compiling, there maybe
> + multiple indirect targets in histogram. Check every indirect/virtual call
> + if callee function exists, if not exit, leave it to LTO stage for later
> + process. Modify code of this indirect call to an if-else structure in
> + ipa-profile finally. */
> +static bool
> +ic_transform_topn (gimple_stmt_iterator *gsi)
> +{
> + unsigned j;
> + gcall *stmt;
> + histogram_value histogram;
> + gcov_type val, count, count_all, all, bb_all;
> + struct cgraph_node *d_call;
> + profile_count bb_count;
> +
> + stmt = dyn_cast<gcall *> (gsi_stmt (*gsi));
> + if (!stmt)
> + return false;
> +
> + if (gimple_call_fndecl (stmt) != NULL_TREE)
> + return false;
> +
> + if (gimple_call_internal_p (stmt))
> + return false;
> +
> + histogram
> + = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL_TOPN);
> + if (!histogram)
> + return false;
> +
> + count = 0;
> + all = 0;
> + bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
> + bb_count = gimple_bb (stmt)->count;
> +
> + /* n_counters need be odd to avoid access violation. */
> + gcc_assert (histogram->n_counters % 2 == 1);
> +
> + /* For indirect call topn, accumulate all the counts first. */
> + for (j = 1; j < histogram->n_counters; j += 2)
> + {
> + val = histogram->hvalue.counters[j];
> + count = histogram->hvalue.counters[j + 1];
> + if (val)
> + all += count;
> + }
> +
> + count_all = all;
> + /* Do the indirect call conversion if function body exists, or else leave it
> + to LTO stage. */
> + for (j = 1; j < histogram->n_counters; j += 2)
> + {
> + val = histogram->hvalue.counters[j];
> + count = histogram->hvalue.counters[j + 1];
> + if (val)
> + {
> + /* The order of CHECK_COUNTER calls is important
> + since check_counter can correct the third parameter
> + and we want to make count <= all <= bb_count. */
> + if (check_counter (stmt, "ic", &all, &bb_all, bb_count)
> + || check_counter (stmt, "ic", &count, &all,
> + profile_count::from_gcov_type (all),
> + (float) count / count_all))
> + {
> + gimple_remove_histogram_value (cfun, stmt, histogram);
> + return false;
> + }
> +
> + d_call = find_func_by_profile_id ((int) val);
> +
> + if (d_call == NULL)
> + {
> + if (val)
> + {
> + if (dump_file)
> + {
> + fprintf (
> + dump_file,
> + "Indirect call -> direct call from other module");
> + print_generic_expr (dump_file, gimple_call_fn (stmt),
> + TDF_SLIM);
> + fprintf (dump_file,
> + "=> %i (will resolve only with LTO)\n",
> + (int) val);
> + }
> + }
> + return false;
> + }
> +
> + if (!check_ic_target (stmt, d_call))
> + {
> + if (dump_file)
> + {
> + fprintf (dump_file, "Indirect call -> direct call ");
> + print_generic_expr (dump_file, gimple_call_fn (stmt),
> + TDF_SLIM);
> + fprintf (dump_file, "=> ");
> + print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
> + fprintf (dump_file,
> + " transformation skipped because of type mismatch");
> + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> + }
> + gimple_remove_histogram_value (cfun, stmt, histogram);
> + return false;
> + }
> +
> + if (dump_file)
> + {
> + fprintf (dump_file, "Indirect call -> direct call ");
> + print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
> + fprintf (dump_file, "=> ");
> + print_generic_expr (dump_file, d_call->decl, TDF_SLIM);
> + fprintf (dump_file,
> + " transformation on insn postponed to ipa-profile");
> + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> + fprintf (dump_file, "hist->count %" PRId64
> + " hist->all %" PRId64"\n", count, all);
> + }
> + }
> + }
> +
> + return true;
> +}
> /*
> For every checked indirect/virtual call determine if most common pid of
> - function/class method has probability more than 50%. If yes modify code of
> + function/class method has probability more than 50%. If yes modify code of
> this call to:
> */
>
> @@ -1423,6 +1546,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
> histogram_value histogram;
> gcov_type val, count, all, bb_all;
> struct cgraph_node *direct_call;
> + enum hist_type type;
>
> stmt = dyn_cast <gcall *> (gsi_stmt (*gsi));
> if (!stmt)
> @@ -1434,18 +1558,24 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
> if (gimple_call_internal_p (stmt))
> return false;
>
> - histogram = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL);
> + type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN
> + : HIST_TYPE_INDIR_CALL;
> +
> + histogram = gimple_histogram_value_of_type (cfun, stmt, type);
> if (!histogram)
> return false;
>
> + if (type == HIST_TYPE_INDIR_CALL_TOPN)
> + return ic_transform_topn (gsi);
> +
> val = histogram->hvalue.counters [0];
> count = histogram->hvalue.counters [1];
> all = histogram->hvalue.counters [2];
>
> bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type ();
> - /* The order of CHECK_COUNTER calls is important -
> + /* The order of CHECK_COUNTER calls is important
> since check_counter can correct the third parameter
> - and we want to make count <= all <= bb_all. */
> + and we want to make count <= all <= bb_all. */
> if (check_counter (stmt, "ic", &all, &bb_all, gimple_bb (stmt)->count)
> || check_counter (stmt, "ic", &count, &all,
> profile_count::from_gcov_type (all)))
> @@ -1494,7 +1624,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi)
> print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM);
> fprintf (dump_file, "=> ");
> print_generic_expr (dump_file, direct_call->decl, TDF_SLIM);
> - fprintf (dump_file, " transformation on insn postponned to ipa-profile");
> + fprintf (dump_file, " transformation on insn postponed to ipa-profile");
> print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> fprintf (dump_file, "hist->count %" PRId64
> " hist->all %" PRId64"\n", count, all);
>
next prev parent reply other threads:[~2019-06-18 5:51 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-18 1:46 Xiong Hu Luo
2019-06-18 5:51 ` Martin Liška [this message]
2019-06-18 9:03 ` luoxhu
2019-06-18 9:34 ` Martin Liška
2019-06-18 10:07 ` Segher Boessenkool
2019-06-18 10:20 ` Martin Liška
2019-06-19 5:38 ` luoxhu
2019-06-19 6:57 ` Martin Liška
2019-06-18 10:21 ` Martin Liška
2019-06-19 8:50 ` luoxhu
2019-06-19 8:56 ` Martin Liška
2019-06-19 12:18 ` Martin Liška
2019-06-20 1:59 ` luoxhu
2019-06-20 6:15 ` luoxhu
2019-06-20 12:57 ` Martin Liška
2019-06-20 13:47 ` Jan Hubicka
2019-06-20 14:45 ` Martin Liška
2019-07-01 11:20 ` Martin Liška
2019-07-03 9:08 ` Jan Hubicka
2019-06-20 14:46 ` [PATCH 2/2] Rename SINGE_VALUE to TOPN_VALUES counters Martin Liška
2019-07-01 11:21 ` Martin Liška
2019-07-03 9:09 ` Jan Hubicka
2019-07-03 12:41 ` Martin Liška
2019-06-24 2:34 ` [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization luoxhu
2019-06-24 9:20 ` luoxhu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adde5d29-d091-0768-32e3-50313d330dc2@suse.cz \
--to=mliska@suse.cz \
--cc=gcc-patches@gcc.gnu.org \
--cc=hubicka@ucw.cz \
--cc=luoxhu@cn.ibm.com \
--cc=luoxhu@linux.ibm.com \
--cc=segher@kernel.crashing.org \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).