public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Teresa Johnson <tejohnson@google.com>
To: Xinliang David Li <davidxl@google.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	Jan Hubicka <hubicka@ucw.cz>
Subject: Re: [PATCH] Handle more COMDAT profiling issues
Date: Wed, 12 Feb 2014 14:45:00 -0000	[thread overview]
Message-ID: <CAAe5K+Wi=PNVX4+zoBVCm-+_2eXh1cfdWZNY5jxUWXgaZffGmg@mail.gmail.com> (raw)
In-Reply-To: <CAAkRFZJcwU_nQjCKqDT=j4xWjYUKRDnkSziRp9+UiiV8t1mGrQ@mail.gmail.com>

On Tue, Feb 11, 2014 at 6:13 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Tue, Feb 11, 2014 at 5:36 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Tue, Feb 11, 2014 at 5:16 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> Why is call graph needed to determine whether to drop the profile?
>>
>> Because we detect this situation by looking for cases where the call
>> edge counts greatly exceed the callee node count.
>>
>>>
>>> If that is needed, it might be possible to leverage the ipa_profile
>>> pass as it will walk through all function nodes to do profile
>>> annotation. With this you can make decision to drop callee profile in
>>> caller's context.
>>
>> There are 2 ipa profiling passes, which are somewhat confusingly named
>> (to me at least. =). This is being done during the first.
>>
>> The first is pass_ipa_tree_profile in tree-profile.c, but is a
>> SIMPLE_IPA_PASS and has the name "profile" in the dump. The second is
>> pass_ipa_profile in ipa-profile.c, which is an IPA_PASS and has the
>> name "profile_estimate" in the dump. I assume you are suggesting to
>> move this into the latter? But I'm not clear on what benefit that
>> gives - the functions are not being traversed in order, so there is
>> still the issue of needing to rebuild the cgraph after dropping
>> profiles, which might be best done earlier as I have in the patch.
>
>
> I meant the tree-profile one.  I think this might work: after all the
> function's profile counts are annotated, add another walk of the the
> call graph nodes to drop bad profiles before the the call graph is
> rebuilt (Call graph does exist at that point).

Ok, so it is already done in tree-profile. But it sounds like you are
suggesting reordering it to just above where we update the calls and
rebuild the cgraph the first time? As you noted in a follow-on email
to me, the cgraph edges don't have the profile counts at that point
(and neither do the nodes), so I would need to compare the count on
the call's bb to the entry bb count of the callee. That should be
doable, let me take a stab at it.

Thanks,
Teresa

>
> David
>>
>> Teresa
>>
>>>
>>> David
>>>
>>> On Tue, Feb 11, 2014 at 5:04 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> On Tue, Feb 11, 2014 at 2:56 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> Is it better to add some logic in counts_to_freq to determine if the
>>>>> profile count needs to be dropped completely to force profile
>>>>> estimation?
>>>>
>>>> This is the problem I was mentioning below where we call
>>>> counts_to_freqs before we have the cgraph and can tell that we will
>>>> need to drop the profile. When we were only dropping the profile for
>>>> functions with all 0 counts, we simply avoided doing the
>>>> counts_to_freqs when the counts were all 0, which works since the 0
>>>> counts don't leave things in an inconsistent state (counts vs
>>>> estimated frequencies).
>>>>
>>>> Teresa
>>>>
>>>>>
>>>>> David
>>>>>
>>>>> On Mon, Feb 10, 2014 at 2:12 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>>>> This patch attempts to address the lost profile issue for COMDATs in
>>>>>> more circumstances, exposed by function splitting.
>>>>>>
>>>>>> My earlier patch handled the case where the comdat had 0 counts since
>>>>>> the linker kept the copy in a different module. In that case we
>>>>>> prevent the guessed frequencies on 0-count functions from being
>>>>>> dropped by counts_to_freq, and then later mark any reached via
>>>>>> non-zero callgraph edges as guessed. Finally, when one such 0-count
>>>>>> comdat is inlined the call count is propagated to the callee blocks
>>>>>> using the guessed probabilities.
>>>>>>
>>>>>> However, in this case, there was a comdat that had a very small
>>>>>> non-zero count, that was being inlined to a much hotter callsite. This
>>>>>> could happen when there was a copy that was ipa-inlined
>>>>>> in the profile gen compile, so the copy in that module gets some
>>>>>> non-zero counts from the ipa inlined instance, but when the out of
>>>>>> line copy was eliminated by the linker (selected from a different
>>>>>> module). In this case the inliner was scaling the bb counts up quite a
>>>>>> lot when inlining. The problem is that you most likely can't trust
>>>>>> that the 0 count bbs in such a case are really not executed by the
>>>>>> callsite it is being into, since the counts are very small and
>>>>>> correspond to a different callsite. In some internal C++ applications
>>>>>> I am seeing that we execute out of the split cold portion of COMDATs
>>>>>> for this reason.
>>>>>>
>>>>>> This problem is more complicated to address than the 0-count instance,
>>>>>> because we need the cgraph to determine which functions to drop the
>>>>>> profile on, and at that point the estimated frequencies have already
>>>>>> been overwritten by counts_to_freqs. To handle this broader case, I
>>>>>> have changed the drop_profile routine to simply re-estimate the
>>>>>> probabilities/frequencies (and translate these into counts scaled from
>>>>>> the incoming call edge counts). This unfortunately necessitates
>>>>>> rebuilding the cgraph, to propagate the new synthesized counts and
>>>>>> avoid checking failures downstream. But it will only be rebuilt if we
>>>>>> dropped any profiles. With this solution, some of the older approach
>>>>>> can be removed (e.g. propagating counts using the guessed
>>>>>> probabilities during inlining).
>>>>>>
>>>>>> Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu.
>>>>>> Also tested on
>>>>>> a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens?
>>>>>>
>>>>>> Thanks,
>>>>>> Teresa
>>>>>>
>>>>>> 2014-02-10  Teresa Johnson  <tejohnson@google.com>
>>>>>>
>>>>>>         * graphite.c (graphite_finalize): Pass new parameter.
>>>>>>         * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New.
>>>>>>         * predict.c (tree_estimate_probability): New parameter.
>>>>>>         (tree_estimate_probability_worker): Renamed from
>>>>>>         tree_estimate_probability_driver.
>>>>>>         (tree_reestimate_probability): New function.
>>>>>>         (tree_estimate_probability_driver): Invoke
>>>>>>         tree_estimate_probability_worker.
>>>>>>         (freqs_to_counts): Move from tree-inline.c.
>>>>>>         (drop_profile): Re-estimated profiles when dropping counts.
>>>>>>         (handle_missing_profiles): Drop for some non-zero functions as well.
>>>>>>         (counts_to_freqs): Remove code obviated by reestimation.
>>>>>>         * predict.h (handle_missing_profiles): Update declartion.
>>>>>>         (tree_estimate_probability): Ditto.
>>>>>>         * tree-inline.c (freqs_to_counts): Move to predict.c.
>>>>>>         (copy_cfg_body): Remove code obviated by reestimation.
>>>>>>         * tree-profile.c (gimple_gen_ior_profiler):
>>>>>>         (rebuild_cgraph): Code extracted from tree_profiling to rebuild cgraph.
>>>>>>         (tree_profiling): Invoke rebuild_cgraph as needed.
>>>>>>
>>>>>> Index: graphite.c
>>>>>> ===================================================================
>>>>>> --- graphite.c  (revision 207436)
>>>>>> +++ graphite.c  (working copy)
>>>>>> @@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p)
>>>>>>        cleanup_tree_cfg ();
>>>>>>        profile_status_for_fn (cfun) = PROFILE_ABSENT;
>>>>>>        release_recorded_exits ();
>>>>>> -      tree_estimate_probability ();
>>>>>> +      tree_estimate_probability (false);
>>>>>>      }
>>>>>>
>>>>>>    cloog_state_free (cloog_state);
>>>>>> Index: params.def
>>>>>> ===================================================================
>>>>>> --- params.def  (revision 207436)
>>>>>> +++ params.def  (working copy)
>>>>>> @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME,
>>>>>>           "Maximal estimated outcome of branch considered predictable",
>>>>>>           2, 0, 50)
>>>>>>
>>>>>> +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO,
>>>>>> +         "min-caller-reestimate-ratio",
>>>>>> +         "Minimum caller-to-callee node count ratio to force
>>>>>> reestimated branch "
>>>>>> +          "probabilities in callee (where 0 means only when callee
>>>>>> count is 0)",
>>>>>> +         10, 0, 0)
>>>>>> +
>>>>>>  DEFPARAM (PARAM_INLINE_MIN_SPEEDUP,
>>>>>>           "inline-min-speedup",
>>>>>>           "The minimal estimated speedup allowing inliner to ignore
>>>>>> inline-insns-single and inline-isnsns-auto",
>>>>>> Index: predict.c
>>>>>> ===================================================================
>>>>>> --- predict.c   (revision 207436)
>>>>>> +++ predict.c   (working copy)
>>>>>> @@ -2379,10 +2379,12 @@ tree_estimate_probability_bb (basic_block bb)
>>>>>>
>>>>>>  /* Predict branch probabilities and estimate profile of the tree CFG.
>>>>>>     This function can be called from the loop optimizers to recompute
>>>>>> -   the profile information.  */
>>>>>> +   the profile information.  When REDO is true then we are forcing
>>>>>> +   re-estimation of the probabilities because the profile was deemed
>>>>>> +   insufficient.  */
>>>>>>
>>>>>>  void
>>>>>> -tree_estimate_probability (void)
>>>>>> +tree_estimate_probability (bool redo)
>>>>>>  {
>>>>>>    basic_block bb;
>>>>>>
>>>>>> @@ -2390,7 +2392,8 @@ void
>>>>>>    connect_infinite_loops_to_exit ();
>>>>>>    /* We use loop_niter_by_eval, which requires that the loops have
>>>>>>       preheaders.  */
>>>>>> -  create_preheaders (CP_SIMPLE_PREHEADERS);
>>>>>> +  if (!redo)
>>>>>> +    create_preheaders (CP_SIMPLE_PREHEADERS);
>>>>>>    calculate_dominance_info (CDI_POST_DOMINATORS);
>>>>>>
>>>>>>    bb_predictions = pointer_map_create ();
>>>>>> @@ -2412,16 +2415,16 @@ void
>>>>>>    pointer_map_destroy (bb_predictions);
>>>>>>    bb_predictions = NULL;
>>>>>>
>>>>>> -  estimate_bb_frequencies (false);
>>>>>> +  estimate_bb_frequencies (redo);
>>>>>>    free_dominance_info (CDI_POST_DOMINATORS);
>>>>>>    remove_fake_exit_edges ();
>>>>>>  }
>>>>>>
>>>>>>  /* Predict branch probabilities and estimate profile of the tree CFG.
>>>>>> -   This is the driver function for PASS_PROFILE.  */
>>>>>> +   When REDO is true, we are forcing reestimation of the probabilities.  */
>>>>>>
>>>>>> -static unsigned int
>>>>>> -tree_estimate_probability_driver (void)
>>>>>> +static void
>>>>>> +tree_estimate_probability_worker (bool redo)
>>>>>>  {
>>>>>>    unsigned nb_loops;
>>>>>>
>>>>>> @@ -2435,7 +2438,7 @@ void
>>>>>>    if (nb_loops > 1)
>>>>>>      scev_initialize ();
>>>>>>
>>>>>> -  tree_estimate_probability ();
>>>>>> +  tree_estimate_probability (redo);
>>>>>>
>>>>>>    if (nb_loops > 1)
>>>>>>      scev_finalize ();
>>>>>> @@ -2445,6 +2448,34 @@ void
>>>>>>      gimple_dump_cfg (dump_file, dump_flags);
>>>>>>    if (profile_status_for_fn (cfun) == PROFILE_ABSENT)
>>>>>>      profile_status_for_fn (cfun) = PROFILE_GUESSED;
>>>>>> +}
>>>>>> +
>>>>>> +/* Force re-estimation of the probabilities, because the profile was
>>>>>> +   deemed insufficient.  */
>>>>>> +
>>>>>> +static void
>>>>>> +tree_reestimate_probability (void)
>>>>>> +{
>>>>>> +  basic_block bb;
>>>>>> +  edge e;
>>>>>> +  edge_iterator ei;
>>>>>> +
>>>>>> +  /* Need to reset the counts to ensure probabilities are recomputed.  */
>>>>>> +  FOR_EACH_BB_FN (bb, cfun)
>>>>>> +    {
>>>>>> +      bb->count = 0;
>>>>>> +      FOR_EACH_EDGE (e, ei, bb->succs)
>>>>>> +        e->count = 0;
>>>>>> +    }
>>>>>> +  tree_estimate_probability_worker (true);
>>>>>> +}
>>>>>> +
>>>>>> +/* Estimate probabilities.
>>>>>> +   This is the driver function for PASS_PROFILE.  */
>>>>>> +static unsigned int
>>>>>> +tree_estimate_probability_driver (void)
>>>>>> +{
>>>>>> +  tree_estimate_probability_worker (false);
>>>>>>    return 0;
>>>>>>  }
>>>>>>  ^L
>>>>>> @@ -2765,6 +2796,28 @@ estimate_loops (void)
>>>>>>    BITMAP_FREE (tovisit);
>>>>>>  }
>>>>>>
>>>>>> +/* Convert estimated frequencies into counts for NODE, scaling COUNT
>>>>>> +   with each bb's frequency. Used when NODE has an entry count that
>>>>>> +   is much lower than the caller edges reaching it.  See the comments
>>>>>> +   for handle_missing_profiles() for when this can happen for COMDATs.  */
>>>>>> +
>>>>>> +void
>>>>>> +freqs_to_counts (struct cgraph_node *node, gcov_type count)
>>>>>> +{
>>>>>> +  basic_block bb;
>>>>>> +  edge_iterator ei;
>>>>>> +  edge e;
>>>>>> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>>>>>> +
>>>>>> +  FOR_ALL_BB_FN(bb, fn)
>>>>>> +    {
>>>>>> +      bb->count = apply_scale (count,
>>>>>> +                               GCOV_COMPUTE_SCALE (bb->frequency,
>>>>>> BB_FREQ_MAX));
>>>>>> +      FOR_EACH_EDGE (e, ei, bb->succs)
>>>>>> +        e->count = apply_probability (e->src->count, e->probability);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>>  /* Drop the profile for NODE to guessed, and update its frequency based on
>>>>>>     whether it is expected to be hot given the CALL_COUNT.  */
>>>>>>
>>>>>> @@ -2772,6 +2825,9 @@ static void
>>>>>>  drop_profile (struct cgraph_node *node, gcov_type call_count)
>>>>>>  {
>>>>>>    struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>>>>>> +
>>>>>> +  if (profile_status_for_fn (fn) == PROFILE_GUESSED)
>>>>>> +    return;
>>>>>>    /* In the case where this was called by another function with a
>>>>>>       dropped profile, call_count will be 0. Since there are no
>>>>>>       non-zero call counts to this function, we don't know for sure
>>>>>> @@ -2780,7 +2836,8 @@ drop_profile (struct cgraph_node *node, gcov_type
>>>>>>
>>>>>>    if (dump_file)
>>>>>>      fprintf (dump_file,
>>>>>> -             "Dropping 0 profile for %s/%i. %s based on calls.\n",
>>>>>> +             "Dropping %ld profile for %s/%i. %s based on calls.\n",
>>>>>> +             node->count,
>>>>>>               node->name (), node->order,
>>>>>>               hot ? "Function is hot" : "Function is normal");
>>>>>>    /* We only expect to miss profiles for functions that are reached
>>>>>> @@ -2806,6 +2863,18 @@ drop_profile (struct cgraph_node *node, gcov_type
>>>>>>                   node->name (), node->order);
>>>>>>      }
>>>>>>
>>>>>> +  /* Re-estimate the probabilities for function and use the estimated
>>>>>> +     frequencies to compute the counts.  */
>>>>>> +  push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>>>>>> +  tree_reestimate_probability ();
>>>>>> +  freqs_to_counts (node, call_count);
>>>>>> +  if (dump_file)
>>>>>> +    {
>>>>>> +      fprintf (dump_file, "After re-estimating probabilies and counts\n");
>>>>>> +      gimple_dump_cfg (dump_file,
>>>>>> dump_flags|TDF_DETAILS|TDF_BLOCKS|TDF_LINENO|TDF_STATS);
>>>>>> +    }
>>>>>> +  pop_cfun ();
>>>>>> +
>>>>>>    profile_status_for_fn (fn)
>>>>>>        = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>>>>>>    node->frequency
>>>>>> @@ -2815,26 +2884,37 @@ drop_profile (struct cgraph_node *node, gcov_type
>>>>>>  /* In the case of COMDAT routines, multiple object files will contain the same
>>>>>>     function and the linker will select one for the binary. In that case
>>>>>>     all the other copies from the profile instrument binary will be missing
>>>>>> -   profile counts. Look for cases where this happened, due to non-zero
>>>>>> +   profile counts. This can confuse downstream optimizations such as
>>>>>> +   function splitting.
>>>>>> +
>>>>>> +   Look for cases where this happened, due to non-zero
>>>>>>     call counts going to 0-count functions, and drop the profile to guessed
>>>>>>     so that we can use the estimated probabilities and avoid optimizing only
>>>>>> -   for size.
>>>>>> +   for size. In the case where the COMDAT was inlined in some locations
>>>>>> +   within the file and not others, the profile count will be non-zero due
>>>>>> +   to the inlined instances, but may still be significantly smaller than the
>>>>>> +   call edges for the non-inlined instances. Detect that case when requested
>>>>>> +   and reestimate probabilities, since the counts will not necessarily reflect
>>>>>> +   the behavior along the more frequent call paths.
>>>>>>
>>>>>>     The other case where the profile may be missing is when the routine
>>>>>>     is not going to be emitted to the object file, e.g. for "extern template"
>>>>>>     class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
>>>>>>     all other cases of non-zero calls to 0-count functions.  */
>>>>>>
>>>>>> -void
>>>>>> +bool
>>>>>>  handle_missing_profiles (void)
>>>>>>  {
>>>>>>    struct cgraph_node *node;
>>>>>>    int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
>>>>>>    vec<struct cgraph_node *> worklist;
>>>>>>    worklist.create (64);
>>>>>> +  int min_reest_ratio = PARAM_VALUE (PARAM_MIN_CALLER_REESTIMATE_RATIO);
>>>>>> +  bool changed = false;
>>>>>>
>>>>>> -  /* See if 0 count function has non-0 count callers.  In this case we
>>>>>> -     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>>>>>> +  /* See if 0 or low count function has higher count caller edges.  In this
>>>>>> +     case we lost some profile.  Drop its function profile to
>>>>>> +     PROFILE_GUESSED.  */
>>>>>>    FOR_EACH_DEFINED_FUNCTION (node)
>>>>>>      {
>>>>>>        struct cgraph_edge *e;
>>>>>> @@ -2842,8 +2922,6 @@ handle_missing_profiles (void)
>>>>>>        gcov_type max_tp_first_run = 0;
>>>>>>        struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>>>>>>
>>>>>> -      if (node->count)
>>>>>> -        continue;
>>>>>>        for (e = node->callers; e; e = e->next_caller)
>>>>>>        {
>>>>>>          call_count += e->count;
>>>>>> @@ -2852,6 +2930,12 @@ handle_missing_profiles (void)
>>>>>>           max_tp_first_run = e->caller->tp_first_run;
>>>>>>        }
>>>>>>
>>>>>> +      /* When the PARAM_MIN_CALLER_REESTIMATE_RATIO is 0, then we only drop
>>>>>> +         profiles for 0-count functions called by non-zero call edges.  */
>>>>>> +      if ((!min_reest_ratio && node->count > 0)
>>>>>> +          || (min_reest_ratio && node->count * min_reest_ratio > call_count))
>>>>>> +        continue;
>>>>>> +
>>>>>>        /* If time profile is missing, let assign the maximum that comes from
>>>>>>          caller functions.  */
>>>>>>        if (!node->tp_first_run && max_tp_first_run)
>>>>>> @@ -2862,11 +2946,12 @@ handle_missing_profiles (void)
>>>>>>            && (call_count * unlikely_count_fraction >= profile_info->runs))
>>>>>>          {
>>>>>>            drop_profile (node, call_count);
>>>>>> +          changed = true;
>>>>>>            worklist.safe_push (node);
>>>>>>          }
>>>>>>      }
>>>>>>
>>>>>> -  /* Propagate the profile dropping to other 0-count COMDATs that are
>>>>>> +  /* Propagate the profile dropping to other low-count COMDATs that are
>>>>>>       potentially called by COMDATs we already dropped the profile on.  */
>>>>>>    while (worklist.length () > 0)
>>>>>>      {
>>>>>> @@ -2878,17 +2963,33 @@ handle_missing_profiles (void)
>>>>>>            struct cgraph_node *callee = e->callee;
>>>>>>            struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
>>>>>>
>>>>>> -          if (callee->count > 0)
>>>>>> +          /* When min_reest_ratio is non-zero, if we get here we dropped
>>>>>> +             a caller's profile since it was significantly smaller than its
>>>>>> +             call edge. Drop the profile on any callees whose node count is
>>>>>> +             now exceeded by the new caller node count.  */
>>>>>> +          if ((!min_reest_ratio && callee->count > 0)
>>>>>> +              || (min_reest_ratio && callee->count >= node->count))
>>>>>>              continue;
>>>>>> +
>>>>>> +          gcov_type call_count = 0;
>>>>>> +          if (min_reest_ratio > 0)
>>>>>> +            {
>>>>>> +              struct cgraph_edge *e2;
>>>>>> +              for (e2 = node->callers; e2; e2 = e2->next_caller)
>>>>>> +                call_count += e2->count;
>>>>>> +            }
>>>>>> +
>>>>>>            if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>>>>>>                && profile_status_for_fn (fn) == PROFILE_READ)
>>>>>>              {
>>>>>> -              drop_profile (node, 0);
>>>>>> +              drop_profile (node, call_count);
>>>>>> +              changed = true;
>>>>>>                worklist.safe_push (callee);
>>>>>>              }
>>>>>>          }
>>>>>>      }
>>>>>>    worklist.release ();
>>>>>> +  return changed;
>>>>>>  }
>>>>>>
>>>>>>  /* Convert counts measured by profile driven feedback to frequencies.
>>>>>> @@ -2900,12 +3001,6 @@ counts_to_freqs (void)
>>>>>>    gcov_type count_max, true_count_max = 0;
>>>>>>    basic_block bb;
>>>>>>
>>>>>> -  /* Don't overwrite the estimated frequencies when the profile for
>>>>>> -     the function is missing.  We may drop this function PROFILE_GUESSED
>>>>>> -     later in drop_profile ().  */
>>>>>> -  if (!ENTRY_BLOCK_PTR_FOR_FN (cfun)->count)
>>>>>> -    return 0;
>>>>>> -
>>>>>>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
>>>>>>      true_count_max = MAX (bb->count, true_count_max);
>>>>>>
>>>>>> Index: predict.h
>>>>>> ===================================================================
>>>>>> --- predict.h   (revision 207436)
>>>>>> +++ predict.h   (working copy)
>>>>>> @@ -47,11 +47,11 @@ enum prediction
>>>>>>
>>>>>>  extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
>>>>>>  extern int counts_to_freqs (void);
>>>>>> -extern void handle_missing_profiles (void);
>>>>>> +extern bool handle_missing_profiles (void);
>>>>>>  extern void estimate_bb_frequencies (bool);
>>>>>>  extern const char *predictor_name (enum br_predictor);
>>>>>>  extern tree build_predict_expr (enum br_predictor, enum prediction);
>>>>>> -extern void tree_estimate_probability (void);
>>>>>> +extern void tree_estimate_probability (bool);
>>>>>>  extern void compute_function_frequency (void);
>>>>>>  extern void rebuild_frequencies (void);
>>>>>>
>>>>>> Index: tree-inline.c
>>>>>> ===================================================================
>>>>>> --- tree-inline.c       (revision 207436)
>>>>>> +++ tree-inline.c       (working copy)
>>>>>> @@ -2384,29 +2384,6 @@ redirect_all_calls (copy_body_data * id, basic_blo
>>>>>>      }
>>>>>>  }
>>>>>>
>>>>>> -/* Convert estimated frequencies into counts for NODE, scaling COUNT
>>>>>> -   with each bb's frequency. Used when NODE has a 0-weight entry
>>>>>> -   but we are about to inline it into a non-zero count call bb.
>>>>>> -   See the comments for handle_missing_profiles() in predict.c for
>>>>>> -   when this can happen for COMDATs.  */
>>>>>> -
>>>>>> -void
>>>>>> -freqs_to_counts (struct cgraph_node *node, gcov_type count)
>>>>>> -{
>>>>>> -  basic_block bb;
>>>>>> -  edge_iterator ei;
>>>>>> -  edge e;
>>>>>> -  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>>>>>> -
>>>>>> -  FOR_ALL_BB_FN(bb, fn)
>>>>>> -    {
>>>>>> -      bb->count = apply_scale (count,
>>>>>> -                               GCOV_COMPUTE_SCALE (bb->frequency,
>>>>>> BB_FREQ_MAX));
>>>>>> -      FOR_EACH_EDGE (e, ei, bb->succs)
>>>>>> -        e->count = apply_probability (e->src->count, e->probability);
>>>>>> -    }
>>>>>> -}
>>>>>> -
>>>>>>  /* Make a copy of the body of FN so that it can be inserted inline in
>>>>>>     another function.  Walks FN via CFG, returns new fndecl.  */
>>>>>>
>>>>>> @@ -2427,24 +2404,6 @@ copy_cfg_body (copy_body_data * id, gcov_type coun
>>>>>>    int incoming_frequency = 0;
>>>>>>    gcov_type incoming_count = 0;
>>>>>>
>>>>>> -  /* This can happen for COMDAT routines that end up with 0 counts
>>>>>> -     despite being called (see the comments for handle_missing_profiles()
>>>>>> -     in predict.c as to why). Apply counts to the blocks in the callee
>>>>>> -     before inlining, using the guessed edge frequencies, so that we don't
>>>>>> -     end up with a 0-count inline body which can confuse downstream
>>>>>> -     optimizations such as function splitting.  */
>>>>>> -  if (!ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count && count)
>>>>>> -    {
>>>>>> -      /* Apply the larger of the call bb count and the total incoming
>>>>>> -         call edge count to the callee.  */
>>>>>> -      gcov_type in_count = 0;
>>>>>> -      struct cgraph_edge *in_edge;
>>>>>> -      for (in_edge = id->src_node->callers; in_edge;
>>>>>> -           in_edge = in_edge->next_caller)
>>>>>> -        in_count += in_edge->count;
>>>>>> -      freqs_to_counts (id->src_node, count > in_count ? count : in_count);
>>>>>> -    }
>>>>>> -
>>>>>>    if (ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count)
>>>>>>      count_scale
>>>>>>          = GCOV_COMPUTE_SCALE (count,
>>>>>> @@ -2452,6 +2411,13 @@ copy_cfg_body (copy_body_data * id, gcov_type coun
>>>>>>    else
>>>>>>      count_scale = REG_BR_PROB_BASE;
>>>>>>
>>>>>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>> +    fprintf (dump_file,
>>>>>> +             "Scaling entry count %ld to %ld with scale %ld while inlining "
>>>>>> +             "%s into %s\n",
>>>>>> +             count, ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count, count_scale,
>>>>>> +             id->src_node->name (), id->dst_node->name ());
>>>>>> +
>>>>>>    /* Register specific tree functions.  */
>>>>>>    gimple_register_cfg_hooks ();
>>>>>>
>>>>>> Index: tree-profile.c
>>>>>> ===================================================================
>>>>>> --- tree-profile.c      (revision 207436)
>>>>>> +++ tree-profile.c      (working copy)
>>>>>> @@ -558,6 +558,52 @@ gimple_gen_ior_profiler (histogram_value value, un
>>>>>>    gsi_insert_before (&gsi, call, GSI_NEW_STMT);
>>>>>>  }
>>>>>>
>>>>>> +/* Update call statements when UPDATE_CALLS, and rebuild the cgraph edges.  */
>>>>>> +
>>>>>> +static void
>>>>>> +rebuild_cgraph (bool update_calls)
>>>>>> +{
>>>>>> +  struct cgraph_node *node;
>>>>>> +
>>>>>> +  FOR_EACH_DEFINED_FUNCTION (node)
>>>>>> +    {
>>>>>> +      basic_block bb;
>>>>>> +
>>>>>> +      if (!gimple_has_body_p (node->decl)
>>>>>> +         || !(!node->clone_of
>>>>>> +         || node->decl != node->clone_of->decl))
>>>>>> +       continue;
>>>>>> +
>>>>>> +      /* Don't profile functions produced for builtin stuff.  */
>>>>>> +      if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
>>>>>> +       continue;
>>>>>> +
>>>>>> +      push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>>>>>> +
>>>>>> +      if (update_calls)
>>>>>> +        {
>>>>>> +          FOR_EACH_BB_FN (bb, cfun)
>>>>>> +            {
>>>>>> +              gimple_stmt_iterator gsi;
>>>>>> +              for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>>>> +                {
>>>>>> +                  gimple stmt = gsi_stmt (gsi);
>>>>>> +                  if (is_gimple_call (stmt))
>>>>>> +                    update_stmt (stmt);
>>>>>> +                }
>>>>>> +            }
>>>>>> +
>>>>>> +          /* re-merge split blocks.  */
>>>>>> +          cleanup_tree_cfg ();
>>>>>> +          update_ssa (TODO_update_ssa);
>>>>>> +        }
>>>>>> +
>>>>>> +      rebuild_cgraph_edges ();
>>>>>> +
>>>>>> +      pop_cfun ();
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>>  /* Profile all functions in the callgraph.  */
>>>>>>
>>>>>>  static unsigned int
>>>>>> @@ -622,43 +668,14 @@ tree_profiling (void)
>>>>>>      }
>>>>>>
>>>>>>    /* Update call statements and rebuild the cgraph.  */
>>>>>> -  FOR_EACH_DEFINED_FUNCTION (node)
>>>>>> -    {
>>>>>> -      basic_block bb;
>>>>>> +  rebuild_cgraph (true);
>>>>>>
>>>>>> -      if (!gimple_has_body_p (node->decl)
>>>>>> -         || !(!node->clone_of
>>>>>> -         || node->decl != node->clone_of->decl))
>>>>>> -       continue;
>>>>>> +  /* If the profiles were dropped on any functions, unfortunately we
>>>>>> +     need to rebuild the cgraph to propagate the new reestimated counts
>>>>>> +     and avoid sanity failures due to inconsistencies.  */
>>>>>> +  if (handle_missing_profiles ())
>>>>>> +    rebuild_cgraph (false);
>>>>>>
>>>>>> -      /* Don't profile functions produced for builtin stuff.  */
>>>>>> -      if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
>>>>>> -       continue;
>>>>>> -
>>>>>> -      push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>>>>>> -
>>>>>> -      FOR_EACH_BB_FN (bb, cfun)
>>>>>> -       {
>>>>>> -         gimple_stmt_iterator gsi;
>>>>>> -         for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>>>> -           {
>>>>>> -             gimple stmt = gsi_stmt (gsi);
>>>>>> -             if (is_gimple_call (stmt))
>>>>>> -               update_stmt (stmt);
>>>>>> -           }
>>>>>> -       }
>>>>>> -
>>>>>> -      /* re-merge split blocks.  */
>>>>>> -      cleanup_tree_cfg ();
>>>>>> -      update_ssa (TODO_update_ssa);
>>>>>> -
>>>>>> -      rebuild_cgraph_edges ();
>>>>>> -
>>>>>> -      pop_cfun ();
>>>>>> -    }
>>>>>> -
>>>>>> -  handle_missing_profiles ();
>>>>>> -
>>>>>>    del_node_map ();
>>>>>>    return 0;
>>>>>>  }
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

  reply	other threads:[~2014-02-12 14:45 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-10 22:12 Teresa Johnson
2014-02-11 22:56 ` Xinliang David Li
2014-02-12  1:04   ` Teresa Johnson
2014-02-12  1:16     ` Xinliang David Li
2014-02-12  1:36       ` Teresa Johnson
2014-02-12  2:13         ` Xinliang David Li
2014-02-12 14:45           ` Teresa Johnson [this message]
2014-02-12 16:02             ` Teresa Johnson
2014-02-12 22:03               ` Xinliang David Li
2014-02-13 16:41                 ` Teresa Johnson
2014-02-13 17:49                   ` Mike Stump
2014-02-13 17:52                     ` Xinliang David Li
2014-02-28 17:13                   ` Teresa Johnson
2014-03-22  1:40                     ` Teresa Johnson
2014-05-26  5:43 ` Jan Hubicka
2014-07-07 18:31   ` Teresa Johnson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAe5K+Wi=PNVX4+zoBVCm-+_2eXh1cfdWZNY5jxUWXgaZffGmg@mail.gmail.com' \
    --to=tejohnson@google.com \
    --cc=davidxl@google.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).