From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 104998 invoked by alias); 19 Jun 2019 05:38:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 104988 invoked by uid 89); 19 Jun 2019 05:38:49 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-20.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 19 Jun 2019 05:38:41 +0000 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x5J5bbXY006086 for ; Wed, 19 Jun 2019 01:38:38 -0400 Received: from e06smtp02.uk.ibm.com (e06smtp02.uk.ibm.com [195.75.94.98]) by mx0b-001b2d01.pphosted.com with ESMTP id 2t7epfrt1a-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 19 Jun 2019 01:38:37 -0400 Received: from localhost by e06smtp02.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 19 Jun 2019 06:38:36 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp02.uk.ibm.com (192.168.101.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 19 Jun 2019 06:38:34 +0100 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x5J5cWRW55115954 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 19 Jun 2019 05:38:32 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8900752057; Wed, 19 Jun 2019 05:38:32 +0000 (GMT) Received: from luoxhus-mbp.cn.ibm.com (unknown [9.200.154.76]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 4F4BF5204F; Wed, 19 Jun 2019 05:38:29 +0000 (GMT) Subject: Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization To: =?UTF-8?Q?Martin_Li=c5=a1ka?= , gcc-patches@gcc.gnu.org Cc: hubicka@ucw.cz, segher@kernel.crashing.org, wschmidt@linux.ibm.com, luoxhu@cn.ibm.com References: <20190618014521.67198-1-luoxhu@linux.ibm.com> <124124c4-cf59-fbdd-198b-af85a1e64593@linux.ibm.com> <0367e49b-1c51-c2ee-aa0a-6ff4cc5d1dba@suse.cz> From: luoxhu Date: Wed, 19 Jun 2019 05:38:00 -0000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.7.1 MIME-Version: 1.0 In-Reply-To: <0367e49b-1c51-c2ee-aa0a-6ff4cc5d1dba@suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit x-cbid: 19061905-0008-0000-0000-000002F503E6 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19061905-0009-0000-0000-000022621DAF Message-Id: <7fda6a7e-1ded-54f8-f9b1-11fb47cef477@linux.ibm.com> X-SW-Source: 2019-06/txt/msg01109.txt.bz2 Hi Martin, On 2019/6/18 17:34, Martin LiÅ¡ka wrote: > On 6/18/19 11:02 AM, luoxhu wrote: >> Hi, >> >> On 2019/6/18 13:51, Martin LiÅ¡ka wrote: >>> On 6/18/19 3:45 AM, Xiong Hu Luo wrote: >>> >>> Hello. >>> >>> Thank you for the interest in the area. >>> >>>> This patch aims to fix PR69678 caused by PGO indirect call profiling bugs. >>>> Currently the default instrument function can only find the indirect function >>>> that called more than 50% with an incorrect count number returned. >>> Can you please explain what you mean by 'an incorrect count number returned'? >> >> For a test case indir-call-topn.c, it include 2 indirect calls "one" and "two". the profiling data is as below with trunk code (including your patch, count[0] and count[2] is switched by your code, the count[0] is used in ipa-profile but only support the top1 format, my patch adds the support for the topn format. count[0] was incorrect as WITHOUT your patch it is 0,  things getting better with your fix as the count[0] is 350000000, but still not correct, in fact, "one" is running 175000000 times, and "two" is running the other 175000000 times): >> >> indir-call-topn.gcda:   22:    01a90000:  18:COUNTERS indirect_call 9 counts >> indir-call-topn.gcda:   24:                   0: *350000000 1868707024 0* 0 0 0 0 0 >> >> Running with the "--param indir-call-topn-profile=1" will give below profile data, My patch is based on this profile result and do the optimization for multiple indirect targets, performance can get much improve on this testcase and SPEC2017 for some benchmarks(LLVM already support this several years ago...). >> >> indir-call-topn.gcda:   26:    01b10000:  18:COUNTERS indirect_call_topn 9 counts >> indir-call-topn.gcda:   28:                   0: *0 969338501 175000000 1868707024 175000000* 0 0 0 >> >> >> test case indir-call-topn.c: >> >> #include >> >> >> typedef int (*fptr) (int); >> int >> one (int a) >> { >>   return 1; >> } >> >> int >> two (int a) >> { >>   return 0; >> } >> >> fptr table[] = {&one, &two}; >> >> int >> main() >> { >>   int i, x; >>   fptr p = &one; >> >>   one (3); >> >>   for (i = 0; i < 350000000; i++) >>     { >>       x = (*p) (3); >>       p = table[x]; >>     } >>   printf ("done:%d\n", x); >> } > > I've got it. So it's situation where you have distribution equal to 50% and 50%. Note that it's > the only valid situation where both edges with be >= 50%. That's the threshold for which > we speculatively devirtualize edges. That said, you don't need generic topn counter, but a probably > only a top2 counter which can be generalized from single-value counter type. I'm saying that > because I removed the TOPN, mainly due to: > https://github.com/gcc-mirror/gcc/commit/5cb221f2b9c268df47c97b4837230b15e65f9c14#diff-d003c64ae14449d86df03508de98bde7L179 > > which is over-complicated profiling function. And the changes that I've done recently are motivated > to preserve a stable builds. That's achieved by noticing that a single-value counter can't handle all > seen values. Actually, the algorithm of function __gcov_one_value_profiler_body in libgcc/libgcov-profiler.c has functionality issue when profiling the testcase I provide. 118 __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value, 119 int use_atomic) 120 { 121 if (value == counters[1]) 122 counters[2]++; 123 else if (counters[2] == 0) 124 { 125 counters[2] = 1; 126 counters[1] = value; 127 } 128 else 129 counters[2]--; 130 131 if (use_atomic) 132 __atomic_fetch_add (&counters[0], 1, __ATOMIC_RELAXED); 133 else 134 counters[0]++; 135 } function "one" is 1868707024, function "two" is 969338501. Loop running from 0->(350000000-1): value counters[0] counters[1] counters[2] 1868707024 1 1868707024 1 969338501 2 1868707024 0 1868707024 3 1868707024 1 969338501 4 1868707024 0 1868707024 5 1868707024 1 ... 969338501 350000000 1868707024 0 Finally, counters[] return value is [350000000, 1868707024, 0]. In ipa-profile.c and value-prof.c, counters[0] is the statement that executed all, counters[2] is the indirect call that counters[1] executed which is 0 here. This counters[2] shouldn't be 0 in fact, which means prob is 0(It was expected to be 50%, right?). This prob will cause ipa-profile fail to create speculative edge and do indirect call later. I think this is the reason why topn was introduced by Rong Xu in 2014 (8ceaa1e) and reimplemented that in LLVM later. There was definitely a bug here before re-enable topn. dump-profile: indir-call-topn.fb.gcc.wpa.069i.profile_estimate 1 Histogram:5 2 350000001: time:2 (8.70) size:2 (8.00) 3 350000000: time:19 (91.30) size:7 (36.00) 4 175000000: time:4 (100.00) size:2 (44.00) 5 1: time:0 (100.00) size:0 (44.00) 6 0: time:37 (100.00) size:14 (100.00) 7 Determined min count: 175000000 Time:100.00% Size:44.00% 8 Setting hotness threshold in LTO mode. 9 Indirect call -> direct call from other module main/15 => one/11, prob 0.00 10 Not speculating: probability is too low. 11 1 indirect calls trained. 12 1 (100.00%) have common target. 13 0 (0.00%) targets was not found. 14 0 (0.00%) targets had parameter count mismatch. 15 0 (0.00%) targets was not in polymorphic call target list. 16 1 (100.00%) speculations seems useless. Thanks Xionghu > >> >>> >>>>   This patch >>>> leverages the "--param indir-call-topn-profile=1" and enables multiple indirect >>> Note that I've remove indir-call-topn-profile last week, the patch will not apply >>> on current trunk. However, I can help you how to adapt single-value counters >>> to support tracking of multiple values. >> >> It will be very useful if you help me to track multiple values similarly on trunk code. I will rebase to your code once topn is ready again. Actually topn is more general and top1 is included in, I thought that top1 should be removed instead of topn, though topn will consume longer time than top1 in profile-generate. > > As mentioned earlier, I really don't want to put TOPN back. I can help you once Honza will agree with the general IPA changes. > >> >>> >>>> targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, function >>>> specialization, profiling, partial devirtualization, inlining and cloning could >>>> be done successfully based on it. >>> This decision is definitely big question for Honza? >>> >>>> Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests. >>>> Details are: >>>>    1.  When do PGO with indir-call-topn-profile, the gcda data format is not >>>>    supported in ipa-profile pass, >>> If you take a look at gcc/ipa-profile.c:195 you can see how the probability >>> is propagated to IPA passes. Why is that not sufficient? >> >> Current code only support single indirect target, I need track multiple indirect targets and create multiple speculative edges on single indirect call statement. >> >> What's more, many ICEs happened in later stage due to single speculative target design, part of this patch is to solve the ICEs of multiple speculative target edges handling. > > Well, to be honest I don't like the patch much. It brings another level of complexity for a quite rare situation where one > calls 2 functions via an indirect call. And as mentioned, current IPA optimization are not happy about multiple indirect branches. > > Martin > >> >> >> Thanks >> >> Xionghu >> >>> >>> Martin >>> >>>> so add variables to pass the information >>>>    through passes, and postpone gimple_ic to ipa-profile like default as inline >>>>    pass will decide whether it is benefit to transform indirect call. >>>>    2.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for >>>>    profile full support in ipa passes and cgraph_edge functions. >>>>    3.  Fix various hidden speculative call ICEs exposed after enabling this >>>>    feature when running SPEC2017. >>>>    4.  Add 1 in module testcase and 2 cross module testcases. >>>>    5.  TODOs: >>>>      5.1.  Some reference info will be dropped from WPA to LTRANS, so >>>>      reference check will be difficult in LTRANS, need replace the strstr >>>>      with reference compare. >>>>      5.2.  Some duplicate code need be removed as top1 and topn share same logic. >>>>      Actually top1 related logic could be eliminated totally as topn includes it. >>>>      5.3.  Split patch maybe needed as too big but not sure how many would be >>>>      reasonable. >>>>    6.  Performance result for ppc64le: >>>>      6.1.  Representative test: indir-call-prof-topn.c runtime improved from >>>>      1.7s to 0.4s. >>>>      6.2.  SPEC2017 peakrate: >>>>          523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r (+13.33%); >>>>          525.x264_r (-5.29%). >>>>          No big changes of other benchmarks. >>>>          Option: -Ofast -mcpu=power8 >>>>          PASS1_OPTIMIZE: -fprofile-generate --param indir-call-topn-profile=1 -flto >>>>          PASS2_OPTIMIZE: -fprofile-use --param indir-call-topn-profile=1 -flto >>>>          -fprofile-correction >>>>      6.3.  No performance change on PHP benchmark. >>>>    7.  Bootstrap and regression test passed on Power8-LE. >>>> >>>> gcc/ChangeLog >>>> >>>>     2019-06-17  Xiong Hu Luo  >>>> >>>>     PR ipa/69678 >>>>     * cgraph.c (cgraph_node::get_create): Copy profile_id. >>>>     (cgraph_edge::speculative_call_info): Find real >>>>     reference for indirect targets. >>>>     (cgraph_edge::resolve_speculation): Add speculative code process >>>>     for indirect targets. >>>>     (cgraph_edge::redirect_call_stmt_to_callee): Likewise. >>>>     (cgraph_node::verify_node): Likewise. >>>>     * cgraph.h (common_target_ids): New variable. >>>>     (common_target_probabilities): Likewise. >>>>     (num_of_ics): Likewise. >>>>     * cgraphclones.c (cgraph_node::create_clone): Copy profile_id. >>>>     * ipa-inline.c (inline_small_functions): Add iterator update. >>>>     * ipa-profile.c (ipa_profile_generate_summary): Add indirect >>>>     multiple targets logic. >>>>     (ipa_profile): Likewise. >>>>     * ipa-utils.c (ipa_merge_profiles): Clone speculative src's >>>>     referrings to dst. >>>>     * ipa.c (process_references): Fix typo. >>>>     * lto-cgraph.c (lto_output_edge): Add indirect multiple targets >>>>     logic. >>>>     (input_edge): Likewise. >>>>     * predict.c (dump_prediction): Revome edges count assert to be >>>>     precise. >>>>     * tree-profile.c (gimple_gen_ic_profiler): Use the new variable >>>>     __gcov_indirect_call.counters and __gcov_indirect_call.callee. >>>>     (gimple_gen_ic_func_profiler): Likewise. >>>>     (pass_ipa_tree_profile::gate): Fix comment typos. >>>>     * tree-inline.c (copy_bb): Duplicate all the speculative edges >>>>     if indirect call contains multiple speculative targets. >>>>     * value-prof.c (check_counter): Proportion the counter for >>>>     multiple targets. >>>>     (ic_transform_topn): New function. >>>>     (gimple_ic_transform): Handle topn case, fix comment typos. >>>> >>>> gcc/testsuite/ChangeLog >>>> >>>>     2019-06-17  Xiong Hu Luo  >>>> >>>>     PR ipa/69678 >>>>     * gcc.dg/tree-prof/indir-call-prof-topn.c: New testcase. >>>>     * gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: New testcase. >>>>     * gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c: New testcase. >>>>     * gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: New testcase. >>>> --- >>>>   gcc/cgraph.c                                  |  38 +++- >>>>   gcc/cgraph.h                                  |   9 +- >>>>   gcc/cgraphclones.c                            |   1 + >>>>   gcc/ipa-inline.c                              |   3 + >>>>   gcc/ipa-profile.c                             | 185 +++++++++++++++++- >>>>   gcc/ipa-utils.c                               |   5 + >>>>   gcc/ipa.c                                     |   2 +- >>>>   gcc/lto-cgraph.c                              |  38 ++++ >>>>   gcc/predict.c                                 |   1 - >>>>   .../tree-prof/crossmodule-indir-call-topn-1.c |  35 ++++ >>>>   .../crossmodule-indir-call-topn-1a.c          |  22 +++ >>>>   .../tree-prof/crossmodule-indir-call-topn-2.c |  42 ++++ >>>>   .../gcc.dg/tree-prof/indir-call-prof-topn.c   |  38 ++++ >>>>   gcc/tree-inline.c                             |  97 +++++---- >>>>   gcc/tree-profile.c                            |  12 +- >>>>   gcc/value-prof.c                              | 146 +++++++++++++- >>>>   16 files changed, 606 insertions(+), 68 deletions(-) >>>>   create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c >>>>   create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c >>>>   create mode 100644 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c >>>>   create mode 100644 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c >>>> >>>> diff --git a/gcc/cgraph.c b/gcc/cgraph.c >>>> index de82316d4b1..0d373a67d1b 100644 >>>> --- a/gcc/cgraph.c >>>> +++ b/gcc/cgraph.c >>>> @@ -553,6 +553,7 @@ cgraph_node::get_create (tree decl) >>>>       fprintf (dump_file, "Introduced new external node " >>>>            "(%s) and turned into root of the clone tree.\n", >>>>            node->dump_name ()); >>>> +      node->profile_id = first_clone->profile_id; >>>>       } >>>>     else if (dump_file) >>>>       fprintf (dump_file, "Introduced new external node " >>>> @@ -1110,6 +1111,7 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct, >>>>     int i; >>>>     cgraph_edge *e2; >>>>     cgraph_edge *e = this; >>>> +  cgraph_node *referred_node; >>>>       if (!e->indirect_unknown_callee) >>>>       for (e2 = e->caller->indirect_calls; >>>> @@ -1142,8 +1144,20 @@ cgraph_edge::speculative_call_info (cgraph_edge *&direct, >>>>       && ((ref->stmt && ref->stmt == e->call_stmt) >>>>           || (!ref->stmt && ref->lto_stmt_uid == e->lto_stmt_uid))) >>>>         { >>>> -    reference = ref; >>>> -    break; >>>> +    if (e2->indirect_info && e2->indirect_info->num_of_ics) >>>> +      { >>>> +        referred_node = dyn_cast (ref->referred); >>>> +        if (strstr (e->callee->name (), referred_node->name ())) >>>> +          { >>>> +        reference = ref; >>>> +        break; >>>> +          } >>>> +      } >>>> +    else >>>> +      { >>>> +        reference = ref; >>>> +        break; >>>> +      } >>>>         } >>>>       /* Speculative edge always consist of all three components - direct edge, >>>> @@ -1199,7 +1213,14 @@ cgraph_edge::resolve_speculation (tree callee_decl) >>>>            in the functions inlined through it.  */ >>>>       } >>>>     edge->count += e2->count; >>>> -  edge->speculative = false; >>>> +  if (edge->indirect_info && edge->indirect_info->num_of_ics) >>>> +    { >>>> +      edge->indirect_info->num_of_ics--; >>>> +      if (edge->indirect_info->num_of_ics == 0) >>>> +    edge->speculative = false; >>>> +    } >>>> +  else >>>> +    edge->speculative = false; >>>>     e2->speculative = false; >>>>     ref->remove_reference (); >>>>     if (e2->indirect_unknown_callee || e2->inline_failed) >>>> @@ -1333,7 +1354,14 @@ cgraph_edge::redirect_call_stmt_to_callee (void) >>>>         e->caller->set_call_stmt_including_clones (e->call_stmt, new_stmt, >>>>                                false); >>>>         e->count = gimple_bb (e->call_stmt)->count; >>>> -      e2->speculative = false; >>>> +      if (e2->indirect_info && e2->indirect_info->num_of_ics) >>>> +        { >>>> +          e2->indirect_info->num_of_ics--; >>>> +          if (e2->indirect_info->num_of_ics == 0) >>>> +        e2->speculative = false; >>>> +        } >>>> +      else >>>> +        e2->speculative = false; >>>>         e2->count = gimple_bb (e2->call_stmt)->count; >>>>         ref->speculative = false; >>>>         ref->stmt = NULL; >>>> @@ -3407,7 +3435,7 @@ cgraph_node::verify_node (void) >>>>           for (e = callees; e; e = e->next_callee) >>>>       { >>>> -      if (!e->aux) >>>> +      if (!e->aux && !e->speculative) >>>>           { >>>>             error ("edge %s->%s has no corresponding call_stmt", >>>>                identifier_to_locale (e->caller->name ()), >>>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h >>>> index c294602d762..ed0fbc60432 100644 >>>> --- a/gcc/cgraph.h >>>> +++ b/gcc/cgraph.h >>>> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see >>>>   #include "profile-count.h" >>>>   #include "ipa-ref.h" >>>>   #include "plugin-api.h" >>>> +#include "gcov-io.h" >>>>     extern void debuginfo_early_init (void); >>>>   extern void debuginfo_init (void); >>>> @@ -1638,11 +1639,17 @@ struct GTY(()) cgraph_indirect_call_info >>>>     int param_index; >>>>     /* ECF flags determined from the caller.  */ >>>>     int ecf_flags; >>>> -  /* Profile_id of common target obtrained from profile.  */ >>>> +  /* Profile_id of common target obtained from profile.  */ >>>>     int common_target_id; >>>>     /* Probability that call will land in function with COMMON_TARGET_ID.  */ >>>>     int common_target_probability; >>>>   +  /* Profile_id of common target obtained from profile.  */ >>>> +  int common_target_ids[GCOV_ICALL_TOPN_NCOUNTS / 2]; >>>> +  /* Probabilities that call will land in function with COMMON_TARGET_IDS.  */ >>>> +  int common_target_probabilities[GCOV_ICALL_TOPN_NCOUNTS / 2]; >>>> +  unsigned num_of_ics; >>>> + >>>>     /* Set when the call is a virtual call with the parameter being the >>>>        associated object pointer rather than a simple direct call.  */ >>>>     unsigned polymorphic : 1; >>>> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c >>>> index 15f7e119d18..94f424bc10c 100644 >>>> --- a/gcc/cgraphclones.c >>>> +++ b/gcc/cgraphclones.c >>>> @@ -467,6 +467,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count, >>>>     new_node->icf_merged = icf_merged; >>>>     new_node->merged_comdat = merged_comdat; >>>>     new_node->thunk = thunk; >>>> +  new_node->profile_id = profile_id; >>>>       new_node->clone.tree_map = NULL; >>>>     new_node->clone.args_to_skip = args_to_skip; >>>> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c >>>> index 360c3de3289..ef2b217b3f9 100644 >>>> --- a/gcc/ipa-inline.c >>>> +++ b/gcc/ipa-inline.c >>>> @@ -1866,12 +1866,15 @@ inline_small_functions (void) >>>>       } >>>>         if (has_speculative) >>>>       for (edge = node->callees; edge; edge = next) >>>> +    { >>>> +      next = edge->next_callee; >>>>         if (edge->speculative && !speculation_useful_p (edge, >>>>                                 edge->aux != NULL)) >>>>           { >>>>             edge->resolve_speculation (); >>>>             update = true; >>>>           } >>>> +    } >>>>         if (update) >>>>       { >>>>         struct cgraph_node *where = node->global.inlined_to >>>> diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c >>>> index de9563d808c..d04476295a0 100644 >>>> --- a/gcc/ipa-profile.c >>>> +++ b/gcc/ipa-profile.c >>>> @@ -168,6 +168,10 @@ ipa_profile_generate_summary (void) >>>>     struct cgraph_node *node; >>>>     gimple_stmt_iterator gsi; >>>>     basic_block bb; >>>> +  enum hist_type type; >>>> + >>>> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN >>>> +                             : HIST_TYPE_INDIR_CALL; >>>>       hash_table hashtable (10); >>>>     @@ -186,10 +190,10 @@ ipa_profile_generate_summary (void) >>>>             histogram_value h; >>>>             h = gimple_histogram_value_of_type >>>>               (DECL_STRUCT_FUNCTION (node->decl), >>>> -             stmt, HIST_TYPE_INDIR_CALL); >>>> +             stmt, type); >>>>             /* No need to do sanity check: gimple_ic_transform already >>>>                takes away bad histograms.  */ >>>> -          if (h) >>>> +          if (h && type == HIST_TYPE_INDIR_CALL) >>>>               { >>>>                 /* counter 0 is target, counter 1 is number of execution we called target, >>>>                counter 2 is total number of executions.  */ >>>> @@ -212,6 +216,46 @@ ipa_profile_generate_summary (void) >>>>                 gimple_remove_histogram_value (DECL_STRUCT_FUNCTION (node->decl), >>>>                                 stmt, h); >>>>               } >>>> +          else if (h && type == HIST_TYPE_INDIR_CALL_TOPN) >>>> +            { >>>> +              unsigned j; >>>> +              struct cgraph_edge *e = node->get_edge (stmt); >>>> +              if (e && !e->indirect_unknown_callee) >>>> +            continue; >>>> + >>>> +              e->indirect_info->num_of_ics = 0; >>>> +              for (j = 1; j < h->n_counters; j += 2) >>>> +            { >>>> +              if (h->hvalue.counters[j] == 0) >>>> +                continue; >>>> + >>>> +              e->indirect_info->common_target_ids[j / 2] >>>> +                = h->hvalue.counters[j]; >>>> +              e->indirect_info->common_target_probabilities[j / 2] >>>> +                = GCOV_COMPUTE_SCALE ( >>>> +                  h->hvalue.counters[j + 1], >>>> +                  gimple_bb (stmt)->count.ipa ().to_gcov_type ()); >>>> +              if (e->indirect_info >>>> +                ->common_target_probabilities[j / 2] >>>> +                  > REG_BR_PROB_BASE) >>>> +                { >>>> +                  if (dump_file) >>>> +                fprintf (dump_file, >>>> +                     "Probability capped to 1\n"); >>>> +                  e->indirect_info >>>> +                ->common_target_probabilities[j / 2] >>>> +                = REG_BR_PROB_BASE; >>>> +                } >>>> +              e->indirect_info->num_of_ics++; >>>> +            } >>>> + >>>> +              gcc_assert (e->indirect_info->num_of_ics >>>> +                  <= GCOV_ICALL_TOPN_NCOUNTS / 2); >>>> + >>>> +              gimple_remove_histogram_value (DECL_STRUCT_FUNCTION ( >>>> +                               node->decl), >>>> +                             stmt, h); >>>> +            } >>>>           } >>>>             time += estimate_num_insns (stmt, &eni_time_weights); >>>>             size += estimate_num_insns (stmt, &eni_size_weights); >>>> @@ -492,6 +536,7 @@ ipa_profile (void) >>>>     int nindirect = 0, ncommon = 0, nunknown = 0, nuseless = 0, nconverted = 0; >>>>     int nmismatch = 0, nimpossible = 0; >>>>     bool node_map_initialized = false; >>>> +  gcov_type threshold; >>>>       if (dump_file) >>>>       dump_histogram (dump_file, histogram); >>>> @@ -500,14 +545,12 @@ ipa_profile (void) >>>>         overall_time += histogram[i]->count * histogram[i]->time; >>>>         overall_size += histogram[i]->size; >>>>       } >>>> +  threshold = 0; >>>>     if (overall_time) >>>>       { >>>> -      gcov_type threshold; >>>> - >>>>         gcc_assert (overall_size); >>>>           cutoff = (overall_time * PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE) + 500) / 1000; >>>> -      threshold = 0; >>>>         for (i = 0; cumulated < cutoff; i++) >>>>       { >>>>         cumulated += histogram[i]->count * histogram[i]->time; >>>> @@ -543,7 +586,7 @@ ipa_profile (void) >>>>     histogram.release (); >>>>     histogram_pool.release (); >>>>   -  /* Produce speculative calls: we saved common traget from porfiling into >>>> +  /* Produce speculative calls: we saved common target from profiling into >>>>        e->common_target_id.  Now, at link time, we can look up corresponding >>>>        function node and produce speculative call.  */ >>>>   @@ -558,7 +601,8 @@ ipa_profile (void) >>>>       { >>>>         if (n->count.initialized_p ()) >>>>           nindirect++; >>>> -      if (e->indirect_info->common_target_id) >>>> +      if (e->indirect_info->common_target_id >>>> +          || (e->indirect_info && e->indirect_info->num_of_ics == 1)) >>>>           { >>>>             if (!node_map_initialized) >>>>               init_node_map (false); >>>> @@ -613,7 +657,7 @@ ipa_profile (void) >>>>                 if (dump_file) >>>>               fprintf (dump_file, >>>>                    "Not speculating: " >>>> -                 "parameter count mistmatch\n"); >>>> +                 "parameter count mismatch\n"); >>>>               } >>>>             else if (e->indirect_info->polymorphic >>>>                  && !opt_for_fn (n->decl, flag_devirtualize) >>>> @@ -655,7 +699,130 @@ ipa_profile (void) >>>>             nunknown++; >>>>           } >>>>           } >>>> -     } >>>> +      if (e->indirect_info && e->indirect_info->num_of_ics > 1) >>>> +        { >>>> +          if (in_lto_p) >>>> +        { >>>> +          if (dump_file) >>>> +            { >>>> +              fprintf (dump_file, >>>> +                   "Updating hotness threshold in LTO mode.\n"); >>>> +              fprintf (dump_file, "Updated min count: %" PRId64 "\n", >>>> +                   (int64_t) threshold); >>>> +            } >>>> +          set_hot_bb_threshold (threshold >>>> +                    / e->indirect_info->num_of_ics); >>>> +        } >>>> +          if (!node_map_initialized) >>>> +        init_node_map (false); >>>> +          node_map_initialized = true; >>>> +          ncommon++; >>>> +          unsigned speculative = 0; >>>> +          for (i = 0; i < (int)e->indirect_info->num_of_ics; i++) >>>> +        { >>>> +          n2 = find_func_by_profile_id ( >>>> +            e->indirect_info->common_target_ids[i]); >>>> +          if (n2) >>>> +            { >>>> +              if (dump_file) >>>> +            { >>>> +              fprintf ( >>>> +                dump_file, >>>> +                "Indirect call -> direct call from" >>>> +                " other module %s => %s, prob %3.2f\n", >>>> +                n->dump_name (), n2->dump_name (), >>>> +                e->indirect_info->common_target_probabilities[i] >>>> +                  / (float) REG_BR_PROB_BASE); >>>> +            } >>>> +              if (e->indirect_info->common_target_probabilities[i] >>>> +              < REG_BR_PROB_BASE / 2) >>>> +            { >>>> +              nuseless++; >>>> +              if (dump_file) >>>> +                fprintf ( >>>> +                  dump_file, >>>> +                  "Not speculating: probability is too low.\n"); >>>> +            } >>>> +              else if (!e->maybe_hot_p ()) >>>> +            { >>>> +              nuseless++; >>>> +              if (dump_file) >>>> +                fprintf (dump_file, >>>> +                     "Not speculating: call is cold.\n"); >>>> +            } >>>> +              else if (n2->get_availability () <= AVAIL_INTERPOSABLE >>>> +                   && n2->can_be_discarded_p ()) >>>> +            { >>>> +              nuseless++; >>>> +              if (dump_file) >>>> +                fprintf (dump_file, >>>> +                     "Not speculating: target is overwritable " >>>> +                     "and can be discarded.\n"); >>>> +            } >>>> +              else if (ipa_node_params_sum && ipa_edge_args_sum >>>> +                   && (!vec_safe_is_empty ( >>>> +                 IPA_NODE_REF (n2)->descriptors)) >>>> +                   && ipa_get_param_count (IPA_NODE_REF (n2)) >>>> +                    != ipa_get_cs_argument_count ( >>>> +                      IPA_EDGE_REF (e)) >>>> +                   && (ipa_get_param_count (IPA_NODE_REF (n2)) >>>> +                     >= ipa_get_cs_argument_count ( >>>> +                       IPA_EDGE_REF (e)) >>>> +                   || !stdarg_p (TREE_TYPE (n2->decl)))) >>>> +            { >>>> +              nmismatch++; >>>> +              if (dump_file) >>>> +                fprintf (dump_file, "Not speculating: " >>>> +                        "parameter count mismatch\n"); >>>> +            } >>>> +              else if (e->indirect_info->polymorphic >>>> +                   && !opt_for_fn (n->decl, flag_devirtualize) >>>> +                   && !possible_polymorphic_call_target_p (e, n2)) >>>> +            { >>>> +              nimpossible++; >>>> +              if (dump_file) >>>> +                fprintf (dump_file, >>>> +                     "Not speculating: " >>>> +                     "function is not in the polymorphic " >>>> +                     "call target list\n"); >>>> +            } >>>> +              else >>>> +            { >>>> +              /* Target may be overwritable, but profile says that >>>> +                 control flow goes to this particular implementation >>>> +                 of N2.  Speculate on the local alias to allow >>>> +                 inlining. >>>> +                 */ >>>> +              if (!n2->can_be_discarded_p ()) >>>> +                { >>>> +                  cgraph_node *alias; >>>> +                  alias = dyn_cast ( >>>> +                n2->noninterposable_alias ()); >>>> +                  if (alias) >>>> +                n2 = alias; >>>> +                } >>>> +              nconverted++; >>>> +              e->make_speculative ( >>>> +                n2, e->count.apply_probability ( >>>> +                  e->indirect_info >>>> +                    ->common_target_probabilities[i])); >>>> +              update = true; >>>> +              speculative++; >>>> +            } >>>> +            } >>>> +          else >>>> +            { >>>> +              if (dump_file) >>>> +            fprintf (dump_file, >>>> +                 "Function with profile-id %i not found.\n", >>>> +                 e->indirect_info->common_target_ids[i]); >>>> +              nunknown++; >>>> +            } >>>> +        } >>>> +          if (speculative < e->indirect_info->num_of_ics) >>>> +        e->indirect_info->num_of_ics = speculative; >>>> +        } >>>> +    } >>>>          if (update) >>>>        ipa_update_overall_fn_summary (n); >>>>        } >>>> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c >>>> index 79b250c3943..30347691029 100644 >>>> --- a/gcc/ipa-utils.c >>>> +++ b/gcc/ipa-utils.c >>>> @@ -587,6 +587,11 @@ ipa_merge_profiles (struct cgraph_node *dst, >>>>         update_max_bb_count (); >>>>         compute_function_frequency (); >>>>         pop_cfun (); >>>> +      /* When src is speculative, clone the referrings.  */ >>>> +      if (src->indirect_call_target) >>>> +    for (e = src->callers; e; e = e->next_caller) >>>> +      if (e->callee == src && e->speculative) >>>> +        dst->clone_referring (src); >>>>         for (e = dst->callees; e; e = e->next_callee) >>>>       { >>>>         if (e->speculative) >>>> diff --git a/gcc/ipa.c b/gcc/ipa.c >>>> index 2496694124c..c1fe081a72d 100644 >>>> --- a/gcc/ipa.c >>>> +++ b/gcc/ipa.c >>>> @@ -166,7 +166,7 @@ process_references (symtab_node *snode, >>>>      devirtualization happens.  After inlining still keep their declarations >>>>      around, so we can devirtualize to a direct call. >>>>   -   Also try to make trivial devirutalization when no or only one target is >>>> +   Also try to make trivial devirtualization when no or only one target is >>>>      possible.  */ >>>>     static void >>>> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c >>>> index 4dfa2862be3..0c8f547d44e 100644 >>>> --- a/gcc/lto-cgraph.c >>>> +++ b/gcc/lto-cgraph.c >>>> @@ -238,6 +238,7 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge, >>>>     unsigned int uid; >>>>     intptr_t ref; >>>>     struct bitpack_d bp; >>>> +  unsigned i; >>>>       if (edge->indirect_unknown_callee) >>>>       streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag, >>>> @@ -296,6 +297,25 @@ lto_output_edge (struct lto_simple_output_block *ob, struct cgraph_edge *edge, >>>>         if (edge->indirect_info->common_target_id) >>>>       streamer_write_hwi_stream >>>>          (ob->main_stream, edge->indirect_info->common_target_probability); >>>> + >>>> +      gcc_assert (edge->indirect_info->num_of_ics >>>> +          <= GCOV_ICALL_TOPN_NCOUNTS / 2); >>>> + >>>> +      streamer_write_hwi_stream (ob->main_stream, >>>> +                 edge->indirect_info->num_of_ics); >>>> + >>>> +      if (edge->indirect_info->num_of_ics) >>>> +    { >>>> +      for (i = 0; i < edge->indirect_info->num_of_ics; i++) >>>> +        { >>>> +          streamer_write_hwi_stream ( >>>> +        ob->main_stream, edge->indirect_info->common_target_ids[i]); >>>> +          if (edge->indirect_info->common_target_ids[i]) >>>> +        streamer_write_hwi_stream ( >>>> +          ob->main_stream, >>>> +          edge->indirect_info->common_target_probabilities[i]); >>>> +        } >>>> +    } >>>>       } >>>>   } >>>>   @@ -1438,6 +1458,7 @@ input_edge (struct lto_input_block *ib, vec nodes, >>>>     cgraph_inline_failed_t inline_failed; >>>>     struct bitpack_d bp; >>>>     int ecf_flags = 0; >>>> +  unsigned i; >>>>       caller = dyn_cast (nodes[streamer_read_hwi (ib)]); >>>>     if (caller == NULL || caller->decl == NULL_TREE) >>>> @@ -1488,6 +1509,23 @@ input_edge (struct lto_input_block *ib, vec nodes, >>>>         edge->indirect_info->common_target_id = streamer_read_hwi (ib); >>>>         if (edge->indirect_info->common_target_id) >>>>           edge->indirect_info->common_target_probability = streamer_read_hwi (ib); >>>> + >>>> +      edge->indirect_info->num_of_ics = streamer_read_hwi (ib); >>>> + >>>> +      gcc_assert (edge->indirect_info->num_of_ics >>>> +          <= GCOV_ICALL_TOPN_NCOUNTS / 2); >>>> + >>>> +      if (edge->indirect_info->num_of_ics) >>>> +    { >>>> +      for (i = 0; i < edge->indirect_info->num_of_ics; i++) >>>> +        { >>>> +          edge->indirect_info->common_target_ids[i] >>>> +        = streamer_read_hwi (ib); >>>> +          if (edge->indirect_info->common_target_ids[i]) >>>> +        edge->indirect_info->common_target_probabilities[i] >>>> +          = streamer_read_hwi (ib); >>>> +        } >>>> +    } >>>>       } >>>>   } >>>>   diff --git a/gcc/predict.c b/gcc/predict.c >>>> index 43ee91a5b13..b7f38891c72 100644 >>>> --- a/gcc/predict.c >>>> +++ b/gcc/predict.c >>>> @@ -763,7 +763,6 @@ dump_prediction (FILE *file, enum br_predictor predictor, int probability, >>>>         && bb->count.precise_p () >>>>         && reason == REASON_NONE) >>>>       { >>>> -      gcc_assert (e->count ().precise_p ()); >>>>         fprintf (file, ";;heuristics;%s;%" PRId64 ";%" PRId64 ";%.1f;\n", >>>>              predictor_info[predictor].name, >>>>              bb->count.to_gcov_type (), e->count ().to_gcov_type (), >>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c >>>> new file mode 100644 >>>> index 00000000000..e0a83c2e067 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c >>>> @@ -0,0 +1,35 @@ >>>> +/* { dg-require-effective-target lto } */ >>>> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */ >>>> +/* { dg-require-profiling "-fprofile-generate" } */ >>>> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */ >>>> + >>>> +#include >>>> + >>>> +typedef int (*fptr) (int); >>>> +int >>>> +one (int a); >>>> + >>>> +int >>>> +two (int a); >>>> + >>>> +fptr table[] = {&one, &two}; >>>> + >>>> +int >>>> +main() >>>> +{ >>>> +  int i, x; >>>> +  fptr p = &one; >>>> + >>>> +  x = one (3); >>>> + >>>> +  for (i = 0; i < 350000000; i++) >>>> +    { >>>> +      x = (*p) (3); >>>> +      p = table[x]; >>>> +    } >>>> +  printf ("done:%d\n", x); >>>> +} >>>> + >>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */ >>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */ >>>> + >>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c >>>> new file mode 100644 >>>> index 00000000000..a8c6e365fb9 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1a.c >>>> @@ -0,0 +1,22 @@ >>>> +/* It seems there is no way to avoid the other source of mulitple >>>> +   source testcase from being compiled independently.  Just avoid >>>> +   error.  */ >>>> +#ifdef DOJOB >>>> +int >>>> +one (int a) >>>> +{ >>>> +  return 1; >>>> +} >>>> + >>>> +int >>>> +two (int a) >>>> +{ >>>> +  return 0; >>>> +} >>>> +#else >>>> +int >>>> +main() >>>> +{ >>>> +  return 0; >>>> +} >>>> +#endif >>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c >>>> new file mode 100644 >>>> index 00000000000..aa3887fde83 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c >>>> @@ -0,0 +1,42 @@ >>>> +/* { dg-require-effective-target lto } */ >>>> +/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */ >>>> +/* { dg-require-profiling "-fprofile-generate" } */ >>>> +/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param indir-call-topn-profile=1" } */ >>>> + >>>> +#include >>>> + >>>> +typedef int (*fptr) (int); >>>> +int >>>> +one (int a); >>>> + >>>> +int >>>> +two (int a); >>>> + >>>> +fptr table[] = {&one, &two}; >>>> + >>>> +int foo () >>>> +{ >>>> +  int i, x; >>>> +  fptr p = &one; >>>> + >>>> +  x = one (3); >>>> + >>>> +  for (i = 0; i < 350000000; i++) >>>> +    { >>>> +      x = (*p) (3); >>>> +      p = table[x]; >>>> +    } >>>> +  return x; >>>> +} >>>> + >>>> +int >>>> +main() >>>> +{ >>>> +  int x = foo (); >>>> +  printf ("done:%d\n", x); >>>> +} >>>> + >>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile_estimate" } } */ >>>> +/* { dg-final-use-not-autofdo { scan-wpa-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile_estimate" } } */ >>>> + >>>> + >>>> diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c >>>> new file mode 100644 >>>> index 00000000000..951bc7ddd19 >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-topn.c >>>> @@ -0,0 +1,38 @@ >>>> +/* { dg-require-profiling "-fprofile-generate" } */ >>>> +/* { dg-options "-O2 -fdump-ipa-profile --param indir-call-topn-profile=1" } */ >>>> + >>>> +#include >>>> + >>>> +typedef int (*fptr) (int); >>>> +int >>>> +one (int a) >>>> +{ >>>> +  return 1; >>>> +} >>>> + >>>> +int >>>> +two (int a) >>>> +{ >>>> +  return 0; >>>> +} >>>> + >>>> +fptr table[] = {&one, &two}; >>>> + >>>> +int >>>> +main() >>>> +{ >>>> +  int i, x; >>>> +  fptr p = &one; >>>> + >>>> +  one (3); >>>> + >>>> +  for (i = 0; i < 350000000; i++) >>>> +    { >>>> +      x = (*p) (3); >>>> +      p = table[x]; >>>> +    } >>>> +  printf ("done:%d\n", x); >>>> +} >>>> + >>>> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* one transformation on insn" "profile" } } */ >>>> +/* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* two transformation on insn" "profile" } } */ >>>> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c >>>> index 9017da878b1..f69b31b197e 100644 >>>> --- a/gcc/tree-inline.c >>>> +++ b/gcc/tree-inline.c >>>> @@ -2028,43 +2028,66 @@ copy_bb (copy_body_data *id, basic_block bb, >>>>             switch (id->transform_call_graph_edges) >>>>           { >>>>           case CB_CGE_DUPLICATE: >>>> -          edge = id->src_node->get_edge (orig_stmt); >>>> -          if (edge) >>>> -            { >>>> -              struct cgraph_edge *old_edge = edge; >>>> -              profile_count old_cnt = edge->count; >>>> -              edge = edge->clone (id->dst_node, call_stmt, >>>> -                      gimple_uid (stmt), >>>> -                      num, den, >>>> -                      true); >>>> - >>>> -              /* Speculative calls consist of two edges - direct and >>>> -             indirect.  Duplicate the whole thing and distribute >>>> -             frequencies accordingly.  */ >>>> -              if (edge->speculative) >>>> -            { >>>> -              struct cgraph_edge *direct, *indirect; >>>> -              struct ipa_ref *ref; >>>> - >>>> -              gcc_assert (!edge->indirect_unknown_callee); >>>> -              old_edge->speculative_call_info (direct, indirect, ref); >>>> - >>>> -              profile_count indir_cnt = indirect->count; >>>> -              indirect = indirect->clone (id->dst_node, call_stmt, >>>> -                              gimple_uid (stmt), >>>> -                              num, den, >>>> -                              true); >>>> - >>>> -              profile_probability prob >>>> -                 = indir_cnt.probability_in (old_cnt + indir_cnt); >>>> -              indirect->count >>>> -                 = copy_basic_block->count.apply_probability (prob); >>>> -              edge->count = copy_basic_block->count - indirect->count; >>>> -              id->dst_node->clone_reference (ref, stmt); >>>> -            } >>>> -              else >>>> -            edge->count = copy_basic_block->count; >>>> -            } >>>> +          { >>>> +            edge = id->src_node->get_edge (orig_stmt); >>>> +            struct cgraph_edge *old_edge = edge; >>>> +            struct cgraph_edge *direct, *indirect; >>>> +            bool next_speculative; >>>> +            do >>>> +              { >>>> +            next_speculative = false; >>>> +            if (edge) >>>> +              { >>>> +                profile_count old_cnt = edge->count; >>>> +                edge >>>> +                  = edge->clone (id->dst_node, call_stmt, >>>> +                         gimple_uid (stmt), num, den, true); >>>> + >>>> +                /* Speculative calls consist of two edges - direct >>>> +                   and indirect.  Duplicate the whole thing and >>>> +                   distribute frequencies accordingly.  */ >>>> +                if (edge->speculative) >>>> +                  { >>>> +                struct ipa_ref *ref; >>>> + >>>> +                gcc_assert (!edge->indirect_unknown_callee); >>>> +                old_edge->speculative_call_info (direct, >>>> +                                 indirect, ref); >>>> + >>>> +                profile_count indir_cnt = indirect->count; >>>> +                indirect >>>> +                  = indirect->clone (id->dst_node, call_stmt, >>>> +                             gimple_uid (stmt), num, >>>> +                             den, true); >>>> + >>>> +                profile_probability prob >>>> +                  = indir_cnt.probability_in (old_cnt >>>> +                                  + indir_cnt); >>>> +                indirect->count >>>> +                  = copy_basic_block->count.apply_probability ( >>>> +                    prob); >>>> +                edge->count >>>> +                  = copy_basic_block->count - indirect->count; >>>> +                id->dst_node->clone_reference (ref, stmt); >>>> +                  } >>>> +                else >>>> +                  edge->count = copy_basic_block->count; >>>> +              } >>>> +            /* If the indirect call contains more than one indirect >>>> +               targets, need clone all speculative edges here.  */ >>>> +            if (old_edge && old_edge->next_callee >>>> +                && old_edge->speculative && indirect >>>> +                && indirect->indirect_info >>>> +                && indirect->indirect_info->num_of_ics > 1) >>>> +              { >>>> +                edge = old_edge->next_callee; >>>> +                old_edge = old_edge->next_callee; >>>> +                if (edge->speculative) >>>> +                  next_speculative = true; >>>> +              } >>>> +              } >>>> +            while (next_speculative); >>>> +          } >>>>             break; >>>>             case CB_CGE_MOVE_CLONES: >>>> diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c >>>> index 1c3034aac10..4964dbdebb5 100644 >>>> --- a/gcc/tree-profile.c >>>> +++ b/gcc/tree-profile.c >>>> @@ -74,8 +74,8 @@ static GTY(()) tree ic_tuple_callee_field; >>>>   /* Do initialization work for the edge profiler.  */ >>>>     /* Add code: >>>> -   __thread gcov*    __gcov_indirect_call_counters; // pointer to actual counter >>>> -   __thread void*    __gcov_indirect_call_callee; // actual callee address >>>> +   __thread gcov*    __gcov_indirect_call.counters; // pointer to actual counter >>>> +   __thread void*    __gcov_indirect_call.callee; // actual callee address >>>>      __thread int __gcov_function_counter; // time profiler function counter >>>>   */ >>>>   static void >>>> @@ -395,7 +395,7 @@ gimple_gen_ic_profiler (histogram_value value, unsigned tag, unsigned base) >>>>         f_1 = foo; >>>>         __gcov_indirect_call.counters = &__gcov4.main[0]; >>>>         PROF_9 = f_1; >>>> -      __gcov_indirect_call_callee = PROF_9; >>>> +      __gcov_indirect_call.callee = PROF_9; >>>>         _4 = f_1 (); >>>>      */ >>>>   @@ -458,11 +458,11 @@ gimple_gen_ic_func_profiler (void) >>>>       /* Insert code: >>>>   -     if (__gcov_indirect_call_callee != NULL) >>>> +     if (__gcov_indirect_call.callee != NULL) >>>>          __gcov_indirect_call_profiler_v3 (profile_id, ¤t_function_decl); >>>>          The function __gcov_indirect_call_profiler_v3 is responsible for >>>> -     resetting __gcov_indirect_call_callee to NULL.  */ >>>> +     resetting __gcov_indirect_call.callee to NULL.  */ >>>>       gimple_stmt_iterator gsi = gsi_start_bb (cond_bb); >>>>     void0 = build_int_cst (ptr_type_node, 0); >>>> @@ -904,7 +904,7 @@ pass_ipa_tree_profile::gate (function *) >>>>   { >>>>     /* When profile instrumentation, use or test coverage shall be performed. >>>>        But for AutoFDO, this there is no instrumentation, thus this pass is >>>> -     diabled.  */ >>>> +     disabled.  */ >>>>     return (!in_lto_p && !flag_auto_profile >>>>         && (flag_branch_probabilities || flag_test_coverage >>>>             || profile_arc_flag)); >>>> diff --git a/gcc/value-prof.c b/gcc/value-prof.c >>>> index 5013956cf86..4869ab8ccd6 100644 >>>> --- a/gcc/value-prof.c >>>> +++ b/gcc/value-prof.c >>>> @@ -579,8 +579,8 @@ free_histograms (struct function *fn) >>>>      somehow.  */ >>>>     static bool >>>> -check_counter (gimple *stmt, const char * name, >>>> -           gcov_type *count, gcov_type *all, profile_count bb_count_d) >>>> +check_counter (gimple *stmt, const char *name, gcov_type *count, gcov_type *all, >>>> +           profile_count bb_count_d, float ratio = 1.0f) >>>>   { >>>>     gcov_type bb_count = bb_count_d.ipa ().to_gcov_type (); >>>>     if (*all != bb_count || *count > *all) >>>> @@ -599,7 +599,7 @@ check_counter (gimple *stmt, const char * name, >>>>                                "count (%d)\n", name, (int)*all, (int)bb_count); >>>>         *all = bb_count; >>>>         if (*count > *all) >>>> -            *count = *all; >>>> +        *count = *all * ratio; >>>>         return false; >>>>       } >>>>         else >>>> @@ -1410,9 +1410,132 @@ gimple_ic (gcall *icall_stmt, struct cgraph_node *direct_call, >>>>     return dcall_stmt; >>>>   } >>>>   +/* If --param=indir-call-topn-profile=1 is specified when compiling, there maybe >>>> +   multiple indirect targets in histogram.  Check every indirect/virtual call >>>> +   if callee function exists, if not exit, leave it to LTO stage for later >>>> +   process.  Modify code of this indirect call to an if-else structure in >>>> +   ipa-profile finally.  */ >>>> +static bool >>>> +ic_transform_topn (gimple_stmt_iterator *gsi) >>>> +{ >>>> +  unsigned j; >>>> +  gcall *stmt; >>>> +  histogram_value histogram; >>>> +  gcov_type val, count, count_all, all, bb_all; >>>> +  struct cgraph_node *d_call; >>>> +  profile_count bb_count; >>>> + >>>> +  stmt = dyn_cast (gsi_stmt (*gsi)); >>>> +  if (!stmt) >>>> +    return false; >>>> + >>>> +  if (gimple_call_fndecl (stmt) != NULL_TREE) >>>> +    return false; >>>> + >>>> +  if (gimple_call_internal_p (stmt)) >>>> +    return false; >>>> + >>>> +  histogram >>>> +    = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL_TOPN); >>>> +  if (!histogram) >>>> +    return false; >>>> + >>>> +  count = 0; >>>> +  all = 0; >>>> +  bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type (); >>>> +  bb_count = gimple_bb (stmt)->count; >>>> + >>>> +  /* n_counters need be odd to avoid access violation.  */ >>>> +  gcc_assert (histogram->n_counters % 2 == 1); >>>> + >>>> +  /* For indirect call topn, accumulate all the counts first.  */ >>>> +  for (j = 1; j < histogram->n_counters; j += 2) >>>> +    { >>>> +      val = histogram->hvalue.counters[j]; >>>> +      count = histogram->hvalue.counters[j + 1]; >>>> +      if (val) >>>> +    all += count; >>>> +    } >>>> + >>>> +  count_all = all; >>>> +  /* Do the indirect call conversion if function body exists, or else leave it >>>> +     to LTO stage.  */ >>>> +  for (j = 1; j < histogram->n_counters; j += 2) >>>> +    { >>>> +      val = histogram->hvalue.counters[j]; >>>> +      count = histogram->hvalue.counters[j + 1]; >>>> +      if (val) >>>> +    { >>>> +      /* The order of CHECK_COUNTER calls is important >>>> +         since check_counter can correct the third parameter >>>> +         and we want to make count <= all <= bb_count.  */ >>>> +      if (check_counter (stmt, "ic", &all, &bb_all, bb_count) >>>> +          || check_counter (stmt, "ic", &count, &all, >>>> +                profile_count::from_gcov_type (all), >>>> +                (float) count / count_all)) >>>> +        { >>>> +          gimple_remove_histogram_value (cfun, stmt, histogram); >>>> +          return false; >>>> +        } >>>> + >>>> +      d_call = find_func_by_profile_id ((int) val); >>>> + >>>> +      if (d_call == NULL) >>>> +        { >>>> +          if (val) >>>> +        { >>>> +          if (dump_file) >>>> +            { >>>> +              fprintf ( >>>> +            dump_file, >>>> +            "Indirect call -> direct call from other module"); >>>> +              print_generic_expr (dump_file, gimple_call_fn (stmt), >>>> +                      TDF_SLIM); >>>> +              fprintf (dump_file, >>>> +                   "=> %i (will resolve only with LTO)\n", >>>> +                   (int) val); >>>> +            } >>>> +        } >>>> +          return false; >>>> +        } >>>> + >>>> +      if (!check_ic_target (stmt, d_call)) >>>> +        { >>>> +          if (dump_file) >>>> +        { >>>> +          fprintf (dump_file, "Indirect call -> direct call "); >>>> +          print_generic_expr (dump_file, gimple_call_fn (stmt), >>>> +                      TDF_SLIM); >>>> +          fprintf (dump_file, "=> "); >>>> +          print_generic_expr (dump_file, d_call->decl, TDF_SLIM); >>>> +          fprintf (dump_file, >>>> +               " transformation skipped because of type mismatch"); >>>> +          print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); >>>> +        } >>>> +          gimple_remove_histogram_value (cfun, stmt, histogram); >>>> +          return false; >>>> +        } >>>> + >>>> +      if (dump_file) >>>> +      { >>>> +        fprintf (dump_file, "Indirect call -> direct call "); >>>> +        print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM); >>>> +        fprintf (dump_file, "=> "); >>>> +        print_generic_expr (dump_file, d_call->decl, TDF_SLIM); >>>> +        fprintf (dump_file, >>>> +             " transformation on insn postponed to ipa-profile"); >>>> +        print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); >>>> +        fprintf (dump_file, "hist->count %" PRId64 >>>> +        " hist->all %" PRId64"\n", count, all); >>>> +      } >>>> +    } >>>> +    } >>>> + >>>> +  return true; >>>> +} >>>>   /* >>>>     For every checked indirect/virtual call determine if most common pid of >>>> -  function/class method has probability more than 50%. If yes modify code of >>>> +  function/class method has probability more than 50%.  If yes modify code of >>>>     this call to: >>>>    */ >>>>   @@ -1423,6 +1546,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi) >>>>     histogram_value histogram; >>>>     gcov_type val, count, all, bb_all; >>>>     struct cgraph_node *direct_call; >>>> +  enum hist_type type; >>>>       stmt = dyn_cast (gsi_stmt (*gsi)); >>>>     if (!stmt) >>>> @@ -1434,18 +1558,24 @@ gimple_ic_transform (gimple_stmt_iterator *gsi) >>>>     if (gimple_call_internal_p (stmt)) >>>>       return false; >>>>   -  histogram = gimple_histogram_value_of_type (cfun, stmt, HIST_TYPE_INDIR_CALL); >>>> +  type = PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? HIST_TYPE_INDIR_CALL_TOPN >>>> +                             : HIST_TYPE_INDIR_CALL; >>>> + >>>> +  histogram = gimple_histogram_value_of_type (cfun, stmt, type); >>>>     if (!histogram) >>>>       return false; >>>>   +  if (type == HIST_TYPE_INDIR_CALL_TOPN) >>>> +      return ic_transform_topn (gsi); >>>> + >>>>     val = histogram->hvalue.counters [0]; >>>>     count = histogram->hvalue.counters [1]; >>>>     all = histogram->hvalue.counters [2]; >>>>       bb_all = gimple_bb (stmt)->count.ipa ().to_gcov_type (); >>>> -  /* The order of CHECK_COUNTER calls is important - >>>> +  /* The order of CHECK_COUNTER calls is important >>>>        since check_counter can correct the third parameter >>>> -     and we want to make count <= all <= bb_all. */ >>>> +     and we want to make count <= all <= bb_all.  */ >>>>     if (check_counter (stmt, "ic", &all, &bb_all, gimple_bb (stmt)->count) >>>>         || check_counter (stmt, "ic", &count, &all, >>>>                   profile_count::from_gcov_type (all))) >>>> @@ -1494,7 +1624,7 @@ gimple_ic_transform (gimple_stmt_iterator *gsi) >>>>         print_generic_expr (dump_file, gimple_call_fn (stmt), TDF_SLIM); >>>>         fprintf (dump_file, "=> "); >>>>         print_generic_expr (dump_file, direct_call->decl, TDF_SLIM); >>>> -      fprintf (dump_file, " transformation on insn postponned to ipa-profile"); >>>> +      fprintf (dump_file, " transformation on insn postponed to ipa-profile"); >>>>         print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); >>>>         fprintf (dump_file, "hist->count %" PRId64 >>>>              " hist->all %" PRId64"\n", count, all); >>>> >>> >> >