public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] decide edge's hotness when there is profile info
@ 2013-10-14 16:13 Dehao Chen
  2013-10-14 17:15 ` Xinliang David Li
  0 siblings, 1 reply; 42+ messages in thread
From: Dehao Chen @ 2013-10-14 16:13 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jan Hubicka, David Li

This patch forces to use profile info to check if an edge is hot when
profile is available.

Bootstrapped and passed regression tests.

OK for trunk?

Thanks,
Dehao

gcc/ChangeLog:
2013-10-14  Dehao Chen  <dehao@google.com>
* predict.c(cgraph_maybe_hot_edge_p): Decide edge's hotness from profile.

Index: gcc/predict.c
===================================================================
--- gcc/predict.c (revision 203568)
+++ gcc/predict.c (working copy)
@@ -185,10 +185,8 @@ maybe_hot_bb_p (struct function *fun, const_basic_
 bool
 cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
 {
-  if (profile_info && flag_branch_probabilities
-      && !maybe_hot_count_p (NULL,
-                             edge->count))
-    return false;
+  if (profile_info && flag_branch_probabilities)
+    return maybe_hot_count_p (NULL, edge->count);
   if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
       || (edge->callee
   && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-14 16:13 [PATCH] decide edge's hotness when there is profile info Dehao Chen
@ 2013-10-14 17:15 ` Xinliang David Li
  2013-10-14 17:44   ` Dehao Chen
  0 siblings, 1 reply; 42+ messages in thread
From: Xinliang David Li @ 2013-10-14 17:15 UTC (permalink / raw)
  To: Dehao Chen; +Cc: GCC Patches, Jan Hubicka

Looks like there is some inconsistency between edge hotness and callee
frequency?

David

On Mon, Oct 14, 2013 at 9:08 AM, Dehao Chen <dehao@google.com> wrote:
> This patch forces to use profile info to check if an edge is hot when
> profile is available.
>
> Bootstrapped and passed regression tests.
>
> OK for trunk?
>
> Thanks,
> Dehao
>
> gcc/ChangeLog:
> 2013-10-14  Dehao Chen  <dehao@google.com>
> * predict.c(cgraph_maybe_hot_edge_p): Decide edge's hotness from profile.
>
> Index: gcc/predict.c
> ===================================================================
> --- gcc/predict.c (revision 203568)
> +++ gcc/predict.c (working copy)
> @@ -185,10 +185,8 @@ maybe_hot_bb_p (struct function *fun, const_basic_
>  bool
>  cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
>  {
> -  if (profile_info && flag_branch_probabilities
> -      && !maybe_hot_count_p (NULL,
> -                             edge->count))
> -    return false;
> +  if (profile_info && flag_branch_probabilities)
> +    return maybe_hot_count_p (NULL, edge->count);
>    if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
>        || (edge->callee
>    && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-14 17:15 ` Xinliang David Li
@ 2013-10-14 17:44   ` Dehao Chen
  2013-10-14 20:09     ` Jan Hubicka
  0 siblings, 1 reply; 42+ messages in thread
From: Dehao Chen @ 2013-10-14 17:44 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches, Jan Hubicka

Not for instrumented FDO (not as I know of). But for AutoFDO, this
could be a potential risk because some callee is marked unlikely
executed simply because they are inlined and eliminated in the O2
binary. But in ipa-inline it will not get inlined because the edge is
not hot from cgraph_maybe_hot_edge_p (because callee is
UNLIKELY_EXECUTED), while the edge->count is actually hot.

Dehao

On Mon, Oct 14, 2013 at 9:13 AM, Xinliang David Li <davidxl@google.com> wrote:
> Looks like there is some inconsistency between edge hotness and callee
> frequency?
>
> David
>
> On Mon, Oct 14, 2013 at 9:08 AM, Dehao Chen <dehao@google.com> wrote:
>> This patch forces to use profile info to check if an edge is hot when
>> profile is available.
>>
>> Bootstrapped and passed regression tests.
>>
>> OK for trunk?
>>
>> Thanks,
>> Dehao
>>
>> gcc/ChangeLog:
>> 2013-10-14  Dehao Chen  <dehao@google.com>
>> * predict.c(cgraph_maybe_hot_edge_p): Decide edge's hotness from profile.
>>
>> Index: gcc/predict.c
>> ===================================================================
>> --- gcc/predict.c (revision 203568)
>> +++ gcc/predict.c (working copy)
>> @@ -185,10 +185,8 @@ maybe_hot_bb_p (struct function *fun, const_basic_
>>  bool
>>  cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
>>  {
>> -  if (profile_info && flag_branch_probabilities
>> -      && !maybe_hot_count_p (NULL,
>> -                             edge->count))
>> -    return false;
>> +  if (profile_info && flag_branch_probabilities)
>> +    return maybe_hot_count_p (NULL, edge->count);
>>    if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
>>        || (edge->callee
>>    && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-14 17:44   ` Dehao Chen
@ 2013-10-14 20:09     ` Jan Hubicka
  2013-10-14 21:28       ` Dehao Chen
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Hubicka @ 2013-10-14 20:09 UTC (permalink / raw)
  To: Dehao Chen; +Cc: Xinliang David Li, GCC Patches, Jan Hubicka

> Not for instrumented FDO (not as I know of). But for AutoFDO, this
> could be a potential risk because some callee is marked unlikely
> executed simply because they are inlined and eliminated in the O2
> binary. But in ipa-inline it will not get inlined because the edge is
> not hot from cgraph_maybe_hot_edge_p (because callee is
> UNLIKELY_EXECUTED), while the edge->count is actually hot.

Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
It seems that having profile set incorrectly will lead to other problems later, too.
We discussed similar problem with Teresa about the missing profiles for comdat,
basically one should detect these cases as profile being lost and go with guessed
profile.  (I believe patch for that was posted, too, and so far it seems best approach
to this issue)

Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-14 20:09     ` Jan Hubicka
@ 2013-10-14 21:28       ` Dehao Chen
  2013-10-14 21:50         ` Xinliang David Li
  0 siblings, 1 reply; 42+ messages in thread
From: Dehao Chen @ 2013-10-14 21:28 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Xinliang David Li, GCC Patches

On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>> could be a potential risk because some callee is marked unlikely
>> executed simply because they are inlined and eliminated in the O2
>> binary. But in ipa-inline it will not get inlined because the edge is
>> not hot from cgraph_maybe_hot_edge_p (because callee is
>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>
> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
> It seems that having profile set incorrectly will lead to other problems later, too.
> We discussed similar problem with Teresa about the missing profiles for comdat,
> basically one should detect these cases as profile being lost and go with guessed
> profile.  (I believe patch for that was posted, too, and so far it seems best approach
> to this issue)

The current AutoFDO implementation will take all functions that do not
have have profile as normally executed, thus use guessed profile for
it. This is like using profile for truly hot functions, and using O2
for other functions. This works fine. However, it leads to larger code
size (approximately 10%~20% larger than FDO).

I'd like to introduce another mode for users who care about both
performance and code size, and can be sure that profile is
representative. In this mode, we will mark all functions without
sample as "unlikely executed". However, because AutoFDO use debug info
(of optimized code) to represent profile, it's possible that some hot
functions (say foo) are inlined and fully eliminated into another hot
function (say bar). So in the profile, bar is cold, and because the
profile for foo::bar is eliminated, bar will not be inlined into foo
before the profile annotation. However, after profile annotate, we can
infer from the bb count that foo->bar is hot, thus it should be
inlined in ipa-inline phase. However, because bar itself is marked
UNLIKELY_EXECUTED, it will not be inlined.

One possible workaround would be that during rebuild_cgraph_edges, if
we find an edge's callee is unlikely executed, add the edge count to
the callee's count and recalculate callee's frequency.

Dehao

>
> Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-14 21:28       ` Dehao Chen
@ 2013-10-14 21:50         ` Xinliang David Li
  2013-10-14 22:00           ` Dehao Chen
  0 siblings, 1 reply; 42+ messages in thread
From: Xinliang David Li @ 2013-10-14 21:50 UTC (permalink / raw)
  To: Dehao Chen; +Cc: Jan Hubicka, GCC Patches

Is it possible to update the callee node summary after profile
annotate (using information from inline instances which are not
inlined in early inline)?

David

On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>> could be a potential risk because some callee is marked unlikely
>>> executed simply because they are inlined and eliminated in the O2
>>> binary. But in ipa-inline it will not get inlined because the edge is
>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>
>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>> It seems that having profile set incorrectly will lead to other problems later, too.
>> We discussed similar problem with Teresa about the missing profiles for comdat,
>> basically one should detect these cases as profile being lost and go with guessed
>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>> to this issue)
>
> The current AutoFDO implementation will take all functions that do not
> have have profile as normally executed, thus use guessed profile for
> it. This is like using profile for truly hot functions, and using O2
> for other functions. This works fine. However, it leads to larger code
> size (approximately 10%~20% larger than FDO).
>
> I'd like to introduce another mode for users who care about both
> performance and code size, and can be sure that profile is
> representative. In this mode, we will mark all functions without
> sample as "unlikely executed". However, because AutoFDO use debug info
> (of optimized code) to represent profile, it's possible that some hot
> functions (say foo) are inlined and fully eliminated into another hot
> function (say bar). So in the profile, bar is cold, and because the
> profile for foo::bar is eliminated, bar will not be inlined into foo
> before the profile annotation. However, after profile annotate, we can
> infer from the bb count that foo->bar is hot, thus it should be
> inlined in ipa-inline phase. However, because bar itself is marked
> UNLIKELY_EXECUTED, it will not be inlined.
>
> One possible workaround would be that during rebuild_cgraph_edges, if
> we find an edge's callee is unlikely executed, add the edge count to
> the callee's count and recalculate callee's frequency.
>
> Dehao
>
>>
>> Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-14 21:50         ` Xinliang David Li
@ 2013-10-14 22:00           ` Dehao Chen
  2013-10-14 22:01             ` Jan Hubicka
  2013-10-14 22:26             ` Xinliang David Li
  0 siblings, 2 replies; 42+ messages in thread
From: Dehao Chen @ 2013-10-14 22:00 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Jan Hubicka, GCC Patches

For my test case, the entire inline instance is optimized away, so
there is no info about it in the profile. I can do some fixup in the
rebuild_cgraph_edge though.

Dehao

On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
> Is it possible to update the callee node summary after profile
> annotate (using information from inline instances which are not
> inlined in early inline)?
>
> David
>
> On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
>> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>>> could be a potential risk because some callee is marked unlikely
>>>> executed simply because they are inlined and eliminated in the O2
>>>> binary. But in ipa-inline it will not get inlined because the edge is
>>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>>
>>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>>> It seems that having profile set incorrectly will lead to other problems later, too.
>>> We discussed similar problem with Teresa about the missing profiles for comdat,
>>> basically one should detect these cases as profile being lost and go with guessed
>>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>>> to this issue)
>>
>> The current AutoFDO implementation will take all functions that do not
>> have have profile as normally executed, thus use guessed profile for
>> it. This is like using profile for truly hot functions, and using O2
>> for other functions. This works fine. However, it leads to larger code
>> size (approximately 10%~20% larger than FDO).
>>
>> I'd like to introduce another mode for users who care about both
>> performance and code size, and can be sure that profile is
>> representative. In this mode, we will mark all functions without
>> sample as "unlikely executed". However, because AutoFDO use debug info
>> (of optimized code) to represent profile, it's possible that some hot
>> functions (say foo) are inlined and fully eliminated into another hot
>> function (say bar). So in the profile, bar is cold, and because the
>> profile for foo::bar is eliminated, bar will not be inlined into foo
>> before the profile annotation. However, after profile annotate, we can
>> infer from the bb count that foo->bar is hot, thus it should be
>> inlined in ipa-inline phase. However, because bar itself is marked
>> UNLIKELY_EXECUTED, it will not be inlined.
>>
>> One possible workaround would be that during rebuild_cgraph_edges, if
>> we find an edge's callee is unlikely executed, add the edge count to
>> the callee's count and recalculate callee's frequency.
>>
>> Dehao
>>
>>>
>>> Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-14 22:00           ` Dehao Chen
@ 2013-10-14 22:01             ` Jan Hubicka
  2013-10-14 22:43               ` Dehao Chen
  2013-10-14 22:26             ` Xinliang David Li
  1 sibling, 1 reply; 42+ messages in thread
From: Jan Hubicka @ 2013-10-14 22:01 UTC (permalink / raw)
  To: Dehao Chen; +Cc: Xinliang David Li, Jan Hubicka, GCC Patches

> For my test case, the entire inline instance is optimized away, so
> there is no info about it in the profile. I can do some fixup in the
> rebuild_cgraph_edge though.

Yep, I understand that.  In this case we should turn PROFILE_READ to PROFILE_GUESSED
and guess the profile when we detect this (i.e. we have edges with non-0 counts into
functions with 0 profile).  That should prvent these from getting UNLIKELY_EXECUTED
and they will be inlined normal way.

Honza
> 
> Dehao
> 
> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
> > Is it possible to update the callee node summary after profile
> > annotate (using information from inline instances which are not
> > inlined in early inline)?
> >
> > David
> >
> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
> >>>> could be a potential risk because some callee is marked unlikely
> >>>> executed simply because they are inlined and eliminated in the O2
> >>>> binary. But in ipa-inline it will not get inlined because the edge is
> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is
> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
> >>>
> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
> >>> It seems that having profile set incorrectly will lead to other problems later, too.
> >>> We discussed similar problem with Teresa about the missing profiles for comdat,
> >>> basically one should detect these cases as profile being lost and go with guessed
> >>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
> >>> to this issue)
> >>
> >> The current AutoFDO implementation will take all functions that do not
> >> have have profile as normally executed, thus use guessed profile for
> >> it. This is like using profile for truly hot functions, and using O2
> >> for other functions. This works fine. However, it leads to larger code
> >> size (approximately 10%~20% larger than FDO).
> >>
> >> I'd like to introduce another mode for users who care about both
> >> performance and code size, and can be sure that profile is
> >> representative. In this mode, we will mark all functions without
> >> sample as "unlikely executed". However, because AutoFDO use debug info
> >> (of optimized code) to represent profile, it's possible that some hot
> >> functions (say foo) are inlined and fully eliminated into another hot
> >> function (say bar). So in the profile, bar is cold, and because the
> >> profile for foo::bar is eliminated, bar will not be inlined into foo
> >> before the profile annotation. However, after profile annotate, we can
> >> infer from the bb count that foo->bar is hot, thus it should be
> >> inlined in ipa-inline phase. However, because bar itself is marked
> >> UNLIKELY_EXECUTED, it will not be inlined.
> >>
> >> One possible workaround would be that during rebuild_cgraph_edges, if
> >> we find an edge's callee is unlikely executed, add the edge count to
> >> the callee's count and recalculate callee's frequency.
> >>
> >> Dehao
> >>
> >>>
> >>> Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-14 22:00           ` Dehao Chen
  2013-10-14 22:01             ` Jan Hubicka
@ 2013-10-14 22:26             ` Xinliang David Li
  2013-10-14 22:28               ` Dehao Chen
  1 sibling, 1 reply; 42+ messages in thread
From: Xinliang David Li @ 2013-10-14 22:26 UTC (permalink / raw)
  To: Dehao Chen; +Cc: Jan Hubicka, GCC Patches

On Mon, Oct 14, 2013 at 2:34 PM, Dehao Chen <dehao@google.com> wrote:
> For my test case, the entire inline instance is optimized away,

do you mean there is no out of line instance for the target function
in the profile binary?

David

> so
> there is no info about it in the profile. I can do some fixup in the
> rebuild_cgraph_edge though.
>
> Dehao
>
> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
>> Is it possible to update the callee node summary after profile
>> annotate (using information from inline instances which are not
>> inlined in early inline)?
>>
>> David
>>
>> On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
>>> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>>>> could be a potential risk because some callee is marked unlikely
>>>>> executed simply because they are inlined and eliminated in the O2
>>>>> binary. But in ipa-inline it will not get inlined because the edge is
>>>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>>>
>>>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>>>> It seems that having profile set incorrectly will lead to other problems later, too.
>>>> We discussed similar problem with Teresa about the missing profiles for comdat,
>>>> basically one should detect these cases as profile being lost and go with guessed
>>>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>>>> to this issue)
>>>
>>> The current AutoFDO implementation will take all functions that do not
>>> have have profile as normally executed, thus use guessed profile for
>>> it. This is like using profile for truly hot functions, and using O2
>>> for other functions. This works fine. However, it leads to larger code
>>> size (approximately 10%~20% larger than FDO).
>>>
>>> I'd like to introduce another mode for users who care about both
>>> performance and code size, and can be sure that profile is
>>> representative. In this mode, we will mark all functions without
>>> sample as "unlikely executed". However, because AutoFDO use debug info
>>> (of optimized code) to represent profile, it's possible that some hot
>>> functions (say foo) are inlined and fully eliminated into another hot
>>> function (say bar). So in the profile, bar is cold, and because the
>>> profile for foo::bar is eliminated, bar will not be inlined into foo
>>> before the profile annotation. However, after profile annotate, we can
>>> infer from the bb count that foo->bar is hot, thus it should be
>>> inlined in ipa-inline phase. However, because bar itself is marked
>>> UNLIKELY_EXECUTED, it will not be inlined.
>>>
>>> One possible workaround would be that during rebuild_cgraph_edges, if
>>> we find an edge's callee is unlikely executed, add the edge count to
>>> the callee's count and recalculate callee's frequency.
>>>
>>> Dehao
>>>
>>>>
>>>> Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-14 22:26             ` Xinliang David Li
@ 2013-10-14 22:28               ` Dehao Chen
  0 siblings, 0 replies; 42+ messages in thread
From: Dehao Chen @ 2013-10-14 22:28 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Jan Hubicka, GCC Patches

On Mon, Oct 14, 2013 at 3:04 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Mon, Oct 14, 2013 at 2:34 PM, Dehao Chen <dehao@google.com> wrote:
>> For my test case, the entire inline instance is optimized away,
>
> do you mean there is no out of line instance for the target function
> in the profile binary?

Yes, and there is no inline instance either.

Dehao

>
> David
>
>> so
>> there is no info about it in the profile. I can do some fixup in the
>> rebuild_cgraph_edge though.
>>
>> Dehao
>>
>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> Is it possible to update the callee node summary after profile
>>> annotate (using information from inline instances which are not
>>> inlined in early inline)?
>>>
>>> David
>>>
>>> On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
>>>> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>>>>> could be a potential risk because some callee is marked unlikely
>>>>>> executed simply because they are inlined and eliminated in the O2
>>>>>> binary. But in ipa-inline it will not get inlined because the edge is
>>>>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>>>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>>>>
>>>>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>>>>> It seems that having profile set incorrectly will lead to other problems later, too.
>>>>> We discussed similar problem with Teresa about the missing profiles for comdat,
>>>>> basically one should detect these cases as profile being lost and go with guessed
>>>>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>>>>> to this issue)
>>>>
>>>> The current AutoFDO implementation will take all functions that do not
>>>> have have profile as normally executed, thus use guessed profile for
>>>> it. This is like using profile for truly hot functions, and using O2
>>>> for other functions. This works fine. However, it leads to larger code
>>>> size (approximately 10%~20% larger than FDO).
>>>>
>>>> I'd like to introduce another mode for users who care about both
>>>> performance and code size, and can be sure that profile is
>>>> representative. In this mode, we will mark all functions without
>>>> sample as "unlikely executed". However, because AutoFDO use debug info
>>>> (of optimized code) to represent profile, it's possible that some hot
>>>> functions (say foo) are inlined and fully eliminated into another hot
>>>> function (say bar). So in the profile, bar is cold, and because the
>>>> profile for foo::bar is eliminated, bar will not be inlined into foo
>>>> before the profile annotation. However, after profile annotate, we can
>>>> infer from the bb count that foo->bar is hot, thus it should be
>>>> inlined in ipa-inline phase. However, because bar itself is marked
>>>> UNLIKELY_EXECUTED, it will not be inlined.
>>>>
>>>> One possible workaround would be that during rebuild_cgraph_edges, if
>>>> we find an edge's callee is unlikely executed, add the edge count to
>>>> the callee's count and recalculate callee's frequency.
>>>>
>>>> Dehao
>>>>
>>>>>
>>>>> Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-14 22:01             ` Jan Hubicka
@ 2013-10-14 22:43               ` Dehao Chen
  2013-10-15 14:59                 ` Dehao Chen
  2013-10-15 15:08                 ` Teresa Johnson
  0 siblings, 2 replies; 42+ messages in thread
From: Dehao Chen @ 2013-10-14 22:43 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Xinliang David Li, GCC Patches

On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> For my test case, the entire inline instance is optimized away, so
>> there is no info about it in the profile. I can do some fixup in the
>> rebuild_cgraph_edge though.
>
> Yep, I understand that.  In this case we should turn PROFILE_READ to PROFILE_GUESSED
> and guess the profile when we detect this (i.e. we have edges with non-0 counts into
> functions with 0 profile).  That should prvent these from getting UNLIKELY_EXECUTED
> and they will be inlined normal way.

Oh, actually in AutoFDO, only functions with samples will be marked as
PROFILE_READ. Others will all be marked as PROFILE_GUESSED.

Dehao

>
> Honza
>>
>> Dehao
>>
>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
>> > Is it possible to update the callee node summary after profile
>> > annotate (using information from inline instances which are not
>> > inlined in early inline)?
>> >
>> > David
>> >
>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>> >>>> could be a potential risk because some callee is marked unlikely
>> >>>> executed simply because they are inlined and eliminated in the O2
>> >>>> binary. But in ipa-inline it will not get inlined because the edge is
>> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>> >>>
>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>> >>> It seems that having profile set incorrectly will lead to other problems later, too.
>> >>> We discussed similar problem with Teresa about the missing profiles for comdat,
>> >>> basically one should detect these cases as profile being lost and go with guessed
>> >>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>> >>> to this issue)
>> >>
>> >> The current AutoFDO implementation will take all functions that do not
>> >> have have profile as normally executed, thus use guessed profile for
>> >> it. This is like using profile for truly hot functions, and using O2
>> >> for other functions. This works fine. However, it leads to larger code
>> >> size (approximately 10%~20% larger than FDO).
>> >>
>> >> I'd like to introduce another mode for users who care about both
>> >> performance and code size, and can be sure that profile is
>> >> representative. In this mode, we will mark all functions without
>> >> sample as "unlikely executed". However, because AutoFDO use debug info
>> >> (of optimized code) to represent profile, it's possible that some hot
>> >> functions (say foo) are inlined and fully eliminated into another hot
>> >> function (say bar). So in the profile, bar is cold, and because the
>> >> profile for foo::bar is eliminated, bar will not be inlined into foo
>> >> before the profile annotation. However, after profile annotate, we can
>> >> infer from the bb count that foo->bar is hot, thus it should be
>> >> inlined in ipa-inline phase. However, because bar itself is marked
>> >> UNLIKELY_EXECUTED, it will not be inlined.
>> >>
>> >> One possible workaround would be that during rebuild_cgraph_edges, if
>> >> we find an edge's callee is unlikely executed, add the edge count to
>> >> the callee's count and recalculate callee's frequency.
>> >>
>> >> Dehao
>> >>
>> >>>
>> >>> Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-14 22:43               ` Dehao Chen
@ 2013-10-15 14:59                 ` Dehao Chen
  2013-10-15 15:08                 ` Teresa Johnson
  1 sibling, 0 replies; 42+ messages in thread
From: Dehao Chen @ 2013-10-15 14:59 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Xinliang David Li, GCC Patches

The patch updated:

Index: gcc/cgraph.c
===================================================================
--- gcc/cgraph.c (revision 203609)
+++ gcc/cgraph.c (working copy)
@@ -877,6 +877,10 @@ cgraph_create_edge_1 (struct cgraph_node *caller,
   if (call_stmt && caller->call_site_hash)
     cgraph_add_edge_to_call_site_hash (edge);

+  if (count > 0 && edge->callee
+      && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)
+    edge->callee->frequency = NODE_FREQUENCY_NORMAL;
+
   return edge;
 }

Thanks,
Dehao

On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen <dehao@google.com> wrote:
> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> For my test case, the entire inline instance is optimized away, so
>>> there is no info about it in the profile. I can do some fixup in the
>>> rebuild_cgraph_edge though.
>>
>> Yep, I understand that.  In this case we should turn PROFILE_READ to PROFILE_GUESSED
>> and guess the profile when we detect this (i.e. we have edges with non-0 counts into
>> functions with 0 profile).  That should prvent these from getting UNLIKELY_EXECUTED
>> and they will be inlined normal way.
>
> Oh, actually in AutoFDO, only functions with samples will be marked as
> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.
>
> Dehao
>
>>
>> Honza
>>>
>>> Dehao
>>>
>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> > Is it possible to update the callee node summary after profile
>>> > annotate (using information from inline instances which are not
>>> > inlined in early inline)?
>>> >
>>> > David
>>> >
>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>> >>>> could be a potential risk because some callee is marked unlikely
>>> >>>> executed simply because they are inlined and eliminated in the O2
>>> >>>> binary. But in ipa-inline it will not get inlined because the edge is
>>> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>> >>>
>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>>> >>> It seems that having profile set incorrectly will lead to other problems later, too.
>>> >>> We discussed similar problem with Teresa about the missing profiles for comdat,
>>> >>> basically one should detect these cases as profile being lost and go with guessed
>>> >>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>>> >>> to this issue)
>>> >>
>>> >> The current AutoFDO implementation will take all functions that do not
>>> >> have have profile as normally executed, thus use guessed profile for
>>> >> it. This is like using profile for truly hot functions, and using O2
>>> >> for other functions. This works fine. However, it leads to larger code
>>> >> size (approximately 10%~20% larger than FDO).
>>> >>
>>> >> I'd like to introduce another mode for users who care about both
>>> >> performance and code size, and can be sure that profile is
>>> >> representative. In this mode, we will mark all functions without
>>> >> sample as "unlikely executed". However, because AutoFDO use debug info
>>> >> (of optimized code) to represent profile, it's possible that some hot
>>> >> functions (say foo) are inlined and fully eliminated into another hot
>>> >> function (say bar). So in the profile, bar is cold, and because the
>>> >> profile for foo::bar is eliminated, bar will not be inlined into foo
>>> >> before the profile annotation. However, after profile annotate, we can
>>> >> infer from the bb count that foo->bar is hot, thus it should be
>>> >> inlined in ipa-inline phase. However, because bar itself is marked
>>> >> UNLIKELY_EXECUTED, it will not be inlined.
>>> >>
>>> >> One possible workaround would be that during rebuild_cgraph_edges, if
>>> >> we find an edge's callee is unlikely executed, add the edge count to
>>> >> the callee's count and recalculate callee's frequency.
>>> >>
>>> >> Dehao
>>> >>
>>> >>>
>>> >>> Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-14 22:43               ` Dehao Chen
  2013-10-15 14:59                 ` Dehao Chen
@ 2013-10-15 15:08                 ` Teresa Johnson
  2013-10-15 15:53                   ` Dehao Chen
  1 sibling, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-10-15 15:08 UTC (permalink / raw)
  To: Dehao Chen; +Cc: Jan Hubicka, Xinliang David Li, GCC Patches

On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen <dehao@google.com> wrote:
> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> For my test case, the entire inline instance is optimized away, so
>>> there is no info about it in the profile. I can do some fixup in the
>>> rebuild_cgraph_edge though.
>>
>> Yep, I understand that.  In this case we should turn PROFILE_READ to PROFILE_GUESSED
>> and guess the profile when we detect this (i.e. we have edges with non-0 counts into
>> functions with 0 profile).  That should prvent these from getting UNLIKELY_EXECUTED
>> and they will be inlined normal way.
>
> Oh, actually in AutoFDO, only functions with samples will be marked as
> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.

Here is Honza's patch that he was referring to:

Index: tree-profile.c
===================================================================
--- tree-profile.c (revision 201838)
+++ tree-profile.c (working copy)
@@ -604,6 +604,34 @@

       pop_cfun ();
     }
+  /* See if 0 count function has non-0 count callers.  In this case we
+     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      struct cgraph_edge *e;
+      bool called = false;
+      if (node->count)
+ continue;
+      for (e = node->callers; e; e = e->next_caller)
+ {
+  if (e->count)
+    called = true;
+  if (cgraph_maybe_hot_edge_p (e))
+    break;
+ }
+      if (e || called
+  && profile_status_for_function
+      (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
+ {
+  if (dump_file)
+    fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
+     cgraph_node_name (node), node->symbol.order,
+     e ? "function is hot" : "function is normal");
+  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
+    = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+ }
+    }

   del_node_map();
   return 0;
Index: predict.c
===================================================================
--- predict.c (revision 201838)
+++ predict.c (working copy)
@@ -2715,6 +2715,9 @@
   gcov_type count_max, true_count_max = 0;
   basic_block bb;

+  if (!ENTRY_BLOCK_PTR->count)
+    return 0;
+
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
     true_count_max = MAX (bb->count, true_count_max);


Which is discussed in this email:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html

For COMDATs I need to extend this to do it a little later to do it
recursively to catch the case of COMDATs feeding other COMDATs and I
need to do some other handling to compute counts from the frequencies
when inlining. I have been meaning to work on this for awhile but
finally am getting to it this week. (Here's the last message from a
later thread that forked off the above one:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)

In the meantime, perhaps Honza's patch will suffice?

Teresa

>
> Dehao
>
>>
>> Honza
>>>
>>> Dehao
>>>
>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> > Is it possible to update the callee node summary after profile
>>> > annotate (using information from inline instances which are not
>>> > inlined in early inline)?
>>> >
>>> > David
>>> >
>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>> >>>> could be a potential risk because some callee is marked unlikely
>>> >>>> executed simply because they are inlined and eliminated in the O2
>>> >>>> binary. But in ipa-inline it will not get inlined because the edge is
>>> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>> >>>
>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>>> >>> It seems that having profile set incorrectly will lead to other problems later, too.
>>> >>> We discussed similar problem with Teresa about the missing profiles for comdat,
>>> >>> basically one should detect these cases as profile being lost and go with guessed
>>> >>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>>> >>> to this issue)
>>> >>
>>> >> The current AutoFDO implementation will take all functions that do not
>>> >> have have profile as normally executed, thus use guessed profile for
>>> >> it. This is like using profile for truly hot functions, and using O2
>>> >> for other functions. This works fine. However, it leads to larger code
>>> >> size (approximately 10%~20% larger than FDO).
>>> >>
>>> >> I'd like to introduce another mode for users who care about both
>>> >> performance and code size, and can be sure that profile is
>>> >> representative. In this mode, we will mark all functions without
>>> >> sample as "unlikely executed". However, because AutoFDO use debug info
>>> >> (of optimized code) to represent profile, it's possible that some hot
>>> >> functions (say foo) are inlined and fully eliminated into another hot
>>> >> function (say bar). So in the profile, bar is cold, and because the
>>> >> profile for foo::bar is eliminated, bar will not be inlined into foo
>>> >> before the profile annotation. However, after profile annotate, we can
>>> >> infer from the bb count that foo->bar is hot, thus it should be
>>> >> inlined in ipa-inline phase. However, because bar itself is marked
>>> >> UNLIKELY_EXECUTED, it will not be inlined.
>>> >>
>>> >> One possible workaround would be that during rebuild_cgraph_edges, if
>>> >> we find an edge's callee is unlikely executed, add the edge count to
>>> >> the callee's count and recalculate callee's frequency.
>>> >>
>>> >> Dehao
>>> >>
>>> >>>
>>> >>> Honza



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-15 15:08                 ` Teresa Johnson
@ 2013-10-15 15:53                   ` Dehao Chen
  2013-10-15 16:06                     ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Dehao Chen @ 2013-10-15 15:53 UTC (permalink / raw)
  To: Teresa Johnson; +Cc: Jan Hubicka, Xinliang David Li, GCC Patches

Thanks for the pointer to Honza's patch. The patch does exactly what I
need. But it only resides in the instrumentation based FDO path. Can
we move the code to more common place (like rebuild_cgraph_edges)?

Thanks,
Dehao

On Tue, Oct 15, 2013 at 7:59 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen <dehao@google.com> wrote:
>> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> For my test case, the entire inline instance is optimized away, so
>>>> there is no info about it in the profile. I can do some fixup in the
>>>> rebuild_cgraph_edge though.
>>>
>>> Yep, I understand that.  In this case we should turn PROFILE_READ to PROFILE_GUESSED
>>> and guess the profile when we detect this (i.e. we have edges with non-0 counts into
>>> functions with 0 profile).  That should prvent these from getting UNLIKELY_EXECUTED
>>> and they will be inlined normal way.
>>
>> Oh, actually in AutoFDO, only functions with samples will be marked as
>> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.
>
> Here is Honza's patch that he was referring to:
>
> Index: tree-profile.c
> ===================================================================
> --- tree-profile.c (revision 201838)
> +++ tree-profile.c (working copy)
> @@ -604,6 +604,34 @@
>
>        pop_cfun ();
>      }
> +  /* See if 0 count function has non-0 count callers.  In this case we
> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    {
> +      struct cgraph_edge *e;
> +      bool called = false;
> +      if (node->count)
> + continue;
> +      for (e = node->callers; e; e = e->next_caller)
> + {
> +  if (e->count)
> +    called = true;
> +  if (cgraph_maybe_hot_edge_p (e))
> +    break;
> + }
> +      if (e || called
> +  && profile_status_for_function
> +      (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
> + {
> +  if (dump_file)
> +    fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
> +     cgraph_node_name (node), node->symbol.order,
> +     e ? "function is hot" : "function is normal");
> +  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
> +    = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> +  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
> + }
> +    }
>
>    del_node_map();
>    return 0;
> Index: predict.c
> ===================================================================
> --- predict.c (revision 201838)
> +++ predict.c (working copy)
> @@ -2715,6 +2715,9 @@
>    gcov_type count_max, true_count_max = 0;
>    basic_block bb;
>
> +  if (!ENTRY_BLOCK_PTR->count)
> +    return 0;
> +
>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>      true_count_max = MAX (bb->count, true_count_max);
>
>
> Which is discussed in this email:
> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html
>
> For COMDATs I need to extend this to do it a little later to do it
> recursively to catch the case of COMDATs feeding other COMDATs and I
> need to do some other handling to compute counts from the frequencies
> when inlining. I have been meaning to work on this for awhile but
> finally am getting to it this week. (Here's the last message from a
> later thread that forked off the above one:
> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)
>
> In the meantime, perhaps Honza's patch will suffice?
>
> Teresa
>
>>
>> Dehao
>>
>>>
>>> Honza
>>>>
>>>> Dehao
>>>>
>>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> > Is it possible to update the callee node summary after profile
>>>> > annotate (using information from inline instances which are not
>>>> > inlined in early inline)?
>>>> >
>>>> > David
>>>> >
>>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
>>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>>> >>>> could be a potential risk because some callee is marked unlikely
>>>> >>>> executed simply because they are inlined and eliminated in the O2
>>>> >>>> binary. But in ipa-inline it will not get inlined because the edge is
>>>> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>>> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>>> >>>
>>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>>>> >>> It seems that having profile set incorrectly will lead to other problems later, too.
>>>> >>> We discussed similar problem with Teresa about the missing profiles for comdat,
>>>> >>> basically one should detect these cases as profile being lost and go with guessed
>>>> >>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>>>> >>> to this issue)
>>>> >>
>>>> >> The current AutoFDO implementation will take all functions that do not
>>>> >> have have profile as normally executed, thus use guessed profile for
>>>> >> it. This is like using profile for truly hot functions, and using O2
>>>> >> for other functions. This works fine. However, it leads to larger code
>>>> >> size (approximately 10%~20% larger than FDO).
>>>> >>
>>>> >> I'd like to introduce another mode for users who care about both
>>>> >> performance and code size, and can be sure that profile is
>>>> >> representative. In this mode, we will mark all functions without
>>>> >> sample as "unlikely executed". However, because AutoFDO use debug info
>>>> >> (of optimized code) to represent profile, it's possible that some hot
>>>> >> functions (say foo) are inlined and fully eliminated into another hot
>>>> >> function (say bar). So in the profile, bar is cold, and because the
>>>> >> profile for foo::bar is eliminated, bar will not be inlined into foo
>>>> >> before the profile annotation. However, after profile annotate, we can
>>>> >> infer from the bb count that foo->bar is hot, thus it should be
>>>> >> inlined in ipa-inline phase. However, because bar itself is marked
>>>> >> UNLIKELY_EXECUTED, it will not be inlined.
>>>> >>
>>>> >> One possible workaround would be that during rebuild_cgraph_edges, if
>>>> >> we find an edge's callee is unlikely executed, add the edge count to
>>>> >> the callee's count and recalculate callee's frequency.
>>>> >>
>>>> >> Dehao
>>>> >>
>>>> >>>
>>>> >>> Honza
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-15 15:53                   ` Dehao Chen
@ 2013-10-15 16:06                     ` Teresa Johnson
  2013-10-15 16:53                       ` Dehao Chen
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-10-15 16:06 UTC (permalink / raw)
  To: Dehao Chen; +Cc: Jan Hubicka, Xinliang David Li, GCC Patches

I'm planning to move it to ipa_profile (pass ipa-profile_estimate) and
doing it iteratively. Would that location work?

Teresa

On Tue, Oct 15, 2013 at 8:40 AM, Dehao Chen <dehao@google.com> wrote:
> Thanks for the pointer to Honza's patch. The patch does exactly what I
> need. But it only resides in the instrumentation based FDO path. Can
> we move the code to more common place (like rebuild_cgraph_edges)?
>
> Thanks,
> Dehao
>
> On Tue, Oct 15, 2013 at 7:59 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen <dehao@google.com> wrote:
>>> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>> For my test case, the entire inline instance is optimized away, so
>>>>> there is no info about it in the profile. I can do some fixup in the
>>>>> rebuild_cgraph_edge though.
>>>>
>>>> Yep, I understand that.  In this case we should turn PROFILE_READ to PROFILE_GUESSED
>>>> and guess the profile when we detect this (i.e. we have edges with non-0 counts into
>>>> functions with 0 profile).  That should prvent these from getting UNLIKELY_EXECUTED
>>>> and they will be inlined normal way.
>>>
>>> Oh, actually in AutoFDO, only functions with samples will be marked as
>>> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.
>>
>> Here is Honza's patch that he was referring to:
>>
>> Index: tree-profile.c
>> ===================================================================
>> --- tree-profile.c (revision 201838)
>> +++ tree-profile.c (working copy)
>> @@ -604,6 +604,34 @@
>>
>>        pop_cfun ();
>>      }
>> +  /* See if 0 count function has non-0 count callers.  In this case we
>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>> +  FOR_EACH_DEFINED_FUNCTION (node)
>> +    {
>> +      struct cgraph_edge *e;
>> +      bool called = false;
>> +      if (node->count)
>> + continue;
>> +      for (e = node->callers; e; e = e->next_caller)
>> + {
>> +  if (e->count)
>> +    called = true;
>> +  if (cgraph_maybe_hot_edge_p (e))
>> +    break;
>> + }
>> +      if (e || called
>> +  && profile_status_for_function
>> +      (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
>> + {
>> +  if (dump_file)
>> +    fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
>> +     cgraph_node_name (node), node->symbol.order,
>> +     e ? "function is hot" : "function is normal");
>> +  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
>> +    = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>> +  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>> + }
>> +    }
>>
>>    del_node_map();
>>    return 0;
>> Index: predict.c
>> ===================================================================
>> --- predict.c (revision 201838)
>> +++ predict.c (working copy)
>> @@ -2715,6 +2715,9 @@
>>    gcov_type count_max, true_count_max = 0;
>>    basic_block bb;
>>
>> +  if (!ENTRY_BLOCK_PTR->count)
>> +    return 0;
>> +
>>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>>      true_count_max = MAX (bb->count, true_count_max);
>>
>>
>> Which is discussed in this email:
>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html
>>
>> For COMDATs I need to extend this to do it a little later to do it
>> recursively to catch the case of COMDATs feeding other COMDATs and I
>> need to do some other handling to compute counts from the frequencies
>> when inlining. I have been meaning to work on this for awhile but
>> finally am getting to it this week. (Here's the last message from a
>> later thread that forked off the above one:
>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)
>>
>> In the meantime, perhaps Honza's patch will suffice?
>>
>> Teresa
>>
>>>
>>> Dehao
>>>
>>>>
>>>> Honza
>>>>>
>>>>> Dehao
>>>>>
>>>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> > Is it possible to update the callee node summary after profile
>>>>> > annotate (using information from inline instances which are not
>>>>> > inlined in early inline)?
>>>>> >
>>>>> > David
>>>>> >
>>>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
>>>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>>>> >>>> could be a potential risk because some callee is marked unlikely
>>>>> >>>> executed simply because they are inlined and eliminated in the O2
>>>>> >>>> binary. But in ipa-inline it will not get inlined because the edge is
>>>>> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>>>> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>>>> >>>
>>>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>>>>> >>> It seems that having profile set incorrectly will lead to other problems later, too.
>>>>> >>> We discussed similar problem with Teresa about the missing profiles for comdat,
>>>>> >>> basically one should detect these cases as profile being lost and go with guessed
>>>>> >>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>>>>> >>> to this issue)
>>>>> >>
>>>>> >> The current AutoFDO implementation will take all functions that do not
>>>>> >> have have profile as normally executed, thus use guessed profile for
>>>>> >> it. This is like using profile for truly hot functions, and using O2
>>>>> >> for other functions. This works fine. However, it leads to larger code
>>>>> >> size (approximately 10%~20% larger than FDO).
>>>>> >>
>>>>> >> I'd like to introduce another mode for users who care about both
>>>>> >> performance and code size, and can be sure that profile is
>>>>> >> representative. In this mode, we will mark all functions without
>>>>> >> sample as "unlikely executed". However, because AutoFDO use debug info
>>>>> >> (of optimized code) to represent profile, it's possible that some hot
>>>>> >> functions (say foo) are inlined and fully eliminated into another hot
>>>>> >> function (say bar). So in the profile, bar is cold, and because the
>>>>> >> profile for foo::bar is eliminated, bar will not be inlined into foo
>>>>> >> before the profile annotation. However, after profile annotate, we can
>>>>> >> infer from the bb count that foo->bar is hot, thus it should be
>>>>> >> inlined in ipa-inline phase. However, because bar itself is marked
>>>>> >> UNLIKELY_EXECUTED, it will not be inlined.
>>>>> >>
>>>>> >> One possible workaround would be that during rebuild_cgraph_edges, if
>>>>> >> we find an edge's callee is unlikely executed, add the edge count to
>>>>> >> the callee's count and recalculate callee's frequency.
>>>>> >>
>>>>> >> Dehao
>>>>> >>
>>>>> >>>
>>>>> >>> Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-15 16:06                     ` Teresa Johnson
@ 2013-10-15 16:53                       ` Dehao Chen
  2013-10-15 21:21                         ` Jan Hubicka
  0 siblings, 1 reply; 42+ messages in thread
From: Dehao Chen @ 2013-10-15 16:53 UTC (permalink / raw)
  To: Teresa Johnson; +Cc: Jan Hubicka, Xinliang David Li, GCC Patches

Yes, that would work. So let's discard this patch because the fix for
comdat can also fix this problem.

Thanks,
Dehao

On Tue, Oct 15, 2013 at 8:42 AM, Teresa Johnson <tejohnson@google.com> wrote:
> I'm planning to move it to ipa_profile (pass ipa-profile_estimate) and
> doing it iteratively. Would that location work?
>
> Teresa
>
> On Tue, Oct 15, 2013 at 8:40 AM, Dehao Chen <dehao@google.com> wrote:
>> Thanks for the pointer to Honza's patch. The patch does exactly what I
>> need. But it only resides in the instrumentation based FDO path. Can
>> we move the code to more common place (like rebuild_cgraph_edges)?
>>
>> Thanks,
>> Dehao
>>
>> On Tue, Oct 15, 2013 at 7:59 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen <dehao@google.com> wrote:
>>>> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>>> For my test case, the entire inline instance is optimized away, so
>>>>>> there is no info about it in the profile. I can do some fixup in the
>>>>>> rebuild_cgraph_edge though.
>>>>>
>>>>> Yep, I understand that.  In this case we should turn PROFILE_READ to PROFILE_GUESSED
>>>>> and guess the profile when we detect this (i.e. we have edges with non-0 counts into
>>>>> functions with 0 profile).  That should prvent these from getting UNLIKELY_EXECUTED
>>>>> and they will be inlined normal way.
>>>>
>>>> Oh, actually in AutoFDO, only functions with samples will be marked as
>>>> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.
>>>
>>> Here is Honza's patch that he was referring to:
>>>
>>> Index: tree-profile.c
>>> ===================================================================
>>> --- tree-profile.c (revision 201838)
>>> +++ tree-profile.c (working copy)
>>> @@ -604,6 +604,34 @@
>>>
>>>        pop_cfun ();
>>>      }
>>> +  /* See if 0 count function has non-0 count callers.  In this case we
>>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>>> +  FOR_EACH_DEFINED_FUNCTION (node)
>>> +    {
>>> +      struct cgraph_edge *e;
>>> +      bool called = false;
>>> +      if (node->count)
>>> + continue;
>>> +      for (e = node->callers; e; e = e->next_caller)
>>> + {
>>> +  if (e->count)
>>> +    called = true;
>>> +  if (cgraph_maybe_hot_edge_p (e))
>>> +    break;
>>> + }
>>> +      if (e || called
>>> +  && profile_status_for_function
>>> +      (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
>>> + {
>>> +  if (dump_file)
>>> +    fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
>>> +     cgraph_node_name (node), node->symbol.order,
>>> +     e ? "function is hot" : "function is normal");
>>> +  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
>>> +    = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>>> +  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>>> + }
>>> +    }
>>>
>>>    del_node_map();
>>>    return 0;
>>> Index: predict.c
>>> ===================================================================
>>> --- predict.c (revision 201838)
>>> +++ predict.c (working copy)
>>> @@ -2715,6 +2715,9 @@
>>>    gcov_type count_max, true_count_max = 0;
>>>    basic_block bb;
>>>
>>> +  if (!ENTRY_BLOCK_PTR->count)
>>> +    return 0;
>>> +
>>>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>>>      true_count_max = MAX (bb->count, true_count_max);
>>>
>>>
>>> Which is discussed in this email:
>>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html
>>>
>>> For COMDATs I need to extend this to do it a little later to do it
>>> recursively to catch the case of COMDATs feeding other COMDATs and I
>>> need to do some other handling to compute counts from the frequencies
>>> when inlining. I have been meaning to work on this for awhile but
>>> finally am getting to it this week. (Here's the last message from a
>>> later thread that forked off the above one:
>>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)
>>>
>>> In the meantime, perhaps Honza's patch will suffice?
>>>
>>> Teresa
>>>
>>>>
>>>> Dehao
>>>>
>>>>>
>>>>> Honza
>>>>>>
>>>>>> Dehao
>>>>>>
>>>>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>> > Is it possible to update the callee node summary after profile
>>>>>> > annotate (using information from inline instances which are not
>>>>>> > inlined in early inline)?
>>>>>> >
>>>>>> > David
>>>>>> >
>>>>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
>>>>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>>> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>>>>> >>>> could be a potential risk because some callee is marked unlikely
>>>>>> >>>> executed simply because they are inlined and eliminated in the O2
>>>>>> >>>> binary. But in ipa-inline it will not get inlined because the edge is
>>>>>> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>>>>> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>>>>> >>>
>>>>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>>>>>> >>> It seems that having profile set incorrectly will lead to other problems later, too.
>>>>>> >>> We discussed similar problem with Teresa about the missing profiles for comdat,
>>>>>> >>> basically one should detect these cases as profile being lost and go with guessed
>>>>>> >>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>>>>>> >>> to this issue)
>>>>>> >>
>>>>>> >> The current AutoFDO implementation will take all functions that do not
>>>>>> >> have have profile as normally executed, thus use guessed profile for
>>>>>> >> it. This is like using profile for truly hot functions, and using O2
>>>>>> >> for other functions. This works fine. However, it leads to larger code
>>>>>> >> size (approximately 10%~20% larger than FDO).
>>>>>> >>
>>>>>> >> I'd like to introduce another mode for users who care about both
>>>>>> >> performance and code size, and can be sure that profile is
>>>>>> >> representative. In this mode, we will mark all functions without
>>>>>> >> sample as "unlikely executed". However, because AutoFDO use debug info
>>>>>> >> (of optimized code) to represent profile, it's possible that some hot
>>>>>> >> functions (say foo) are inlined and fully eliminated into another hot
>>>>>> >> function (say bar). So in the profile, bar is cold, and because the
>>>>>> >> profile for foo::bar is eliminated, bar will not be inlined into foo
>>>>>> >> before the profile annotation. However, after profile annotate, we can
>>>>>> >> infer from the bb count that foo->bar is hot, thus it should be
>>>>>> >> inlined in ipa-inline phase. However, because bar itself is marked
>>>>>> >> UNLIKELY_EXECUTED, it will not be inlined.
>>>>>> >>
>>>>>> >> One possible workaround would be that during rebuild_cgraph_edges, if
>>>>>> >> we find an edge's callee is unlikely executed, add the edge count to
>>>>>> >> the callee's count and recalculate callee's frequency.
>>>>>> >>
>>>>>> >> Dehao
>>>>>> >>
>>>>>> >>>
>>>>>> >>> Honza
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-15 16:53                       ` Dehao Chen
@ 2013-10-15 21:21                         ` Jan Hubicka
  2013-10-15 21:35                           ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Hubicka @ 2013-10-15 21:21 UTC (permalink / raw)
  To: Dehao Chen; +Cc: Teresa Johnson, Jan Hubicka, Xinliang David Li, GCC Patches

> Yes, that would work. So let's discard this patch because the fix for
> comdat can also fix this problem.

Unforutnately ipa-profile-estimate is an IPA pass and as such it does
not have access to profile_status_to_function.
You can probably just factor this out into a function that can be called
and for normal FDO we call it where the loop stis now and for auto-FDO we can
probably have another invocation from before early passes where auto-FDO is collected.
> >>> +      if (node->count)
> >>> + continue;
Also here we should sum the counts and consider function non unlikely executed
in the same way as probably_never_executed does.

I can prepare updated patch, but i am currently travelling, so i would not
be disapointed if you beat me ;)

Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-15 21:21                         ` Jan Hubicka
@ 2013-10-15 21:35                           ` Teresa Johnson
  2013-10-15 22:39                             ` Jan Hubicka
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-10-15 21:35 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches

On Tue, Oct 15, 2013 at 2:03 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Yes, that would work. So let's discard this patch because the fix for
>> comdat can also fix this problem.
>
> Unforutnately ipa-profile-estimate is an IPA pass and as such it does
> not have access to profile_status_to_function.

I see. I was coding that up today but hadn't tested it yet.

> You can probably just factor this out into a function that can be called
> and for normal FDO we call it where the loop stis now and for auto-FDO we can
> probably have another invocation from before early passes where auto-FDO is collected.

Ok, let's go with that approach for now. It won't address the 0 count
COMDAT calling another 0 count COMDAT problem, but I will probably
just find a way to deal with this when inlining.

>> >>> +      if (node->count)
>> >>> + continue;
> Also here we should sum the counts and consider function non unlikely executed
> in the same way as probably_never_executed does.

I assume you mean by doing the same comparison to the number of
profile->runs. Yes, this makes sense.

>
> I can prepare updated patch, but i am currently travelling, so i would not
> be disapointed if you beat me ;)

I'm working on it, and I think based on Dehao's needs I am going to
split up the patch into two phases, the one that does just the part
you had sent a patch for (making sure 0 count routines with non-zero
calls are marked guessed and have their node frequency set
appropriately), and a subsequent one to do the count application when
we inline a 0-count routine into a non-zero callsite. I'll shoot for
getting this ready by tomorrow.

BTW, in your original patch you are checking for both e->count or
cgraph_maybe_hot_edge_p(e), but AFAICT the call to
cgraph_maybe_hot_edge_p will never return true when e->count is zero.
When there is a profile it will return false via maybe_hot_count_p
since e->count == 0. When there is no profile it will return false
when the callee has NODE_FREQUENCY_UNLIKELY_EXECUTED. So I think just
checking for e->count >0 is sufficient here.

Thanks,
Teresa

>
> Honza



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-15 21:35                           ` Teresa Johnson
@ 2013-10-15 22:39                             ` Jan Hubicka
  2013-10-16  0:20                               ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Hubicka @ 2013-10-15 22:39 UTC (permalink / raw)
  To: Teresa Johnson; +Cc: Jan Hubicka, Dehao Chen, Xinliang David Li, GCC Patches

> On Tue, Oct 15, 2013 at 2:03 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> Yes, that would work. So let's discard this patch because the fix for
> >> comdat can also fix this problem.
> >
> > Unforutnately ipa-profile-estimate is an IPA pass and as such it does
> > not have access to profile_status_to_function.
> 
> I see. I was coding that up today but hadn't tested it yet.
> 
> > You can probably just factor this out into a function that can be called
> > and for normal FDO we call it where the loop stis now and for auto-FDO we can
> > probably have another invocation from before early passes where auto-FDO is collected.
> 
> Ok, let's go with that approach for now. It won't address the 0 count
> COMDAT calling another 0 count COMDAT problem, but I will probably
> just find a way to deal with this when inlining.

You can still propagate, since tree-profile is an simple-ipa pass.
> 
> >> >>> +      if (node->count)
> >> >>> + continue;
> > Also here we should sum the counts and consider function non unlikely executed
> > in the same way as probably_never_executed does.
> 
> I assume you mean by doing the same comparison to the number of
> profile->runs. Yes, this makes sense.

Yes.
> 
> >
> > I can prepare updated patch, but i am currently travelling, so i would not
> > be disapointed if you beat me ;)
> 
> I'm working on it, and I think based on Dehao's needs I am going to
> split up the patch into two phases, the one that does just the part
> you had sent a patch for (making sure 0 count routines with non-zero
> calls are marked guessed and have their node frequency set
> appropriately), and a subsequent one to do the count application when
> we inline a 0-count routine into a non-zero callsite. I'll shoot for
> getting this ready by tomorrow.
> 
> BTW, in your original patch you are checking for both e->count or
> cgraph_maybe_hot_edge_p(e), but AFAICT the call to
> cgraph_maybe_hot_edge_p will never return true when e->count is zero.
> When there is a profile it will return false via maybe_hot_count_p
> since e->count == 0. When there is no profile it will return false
> when the callee has NODE_FREQUENCY_UNLIKELY_EXECUTED. So I think just
> checking for e->count >0 is sufficient here.

I think I was checking caller count here (that is read) and the code
was supposed to make functoin with hot caller to be hot...

Honza
> 
> Thanks,
> Teresa
> 
> >
> > Honza
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-15 22:39                             ` Jan Hubicka
@ 2013-10-16  0:20                               ` Teresa Johnson
  2013-10-16 19:15                                 ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-10-16  0:20 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches

On Tue, Oct 15, 2013 at 2:57 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Tue, Oct 15, 2013 at 2:03 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> Yes, that would work. So let's discard this patch because the fix for
>> >> comdat can also fix this problem.
>> >
>> > Unforutnately ipa-profile-estimate is an IPA pass and as such it does
>> > not have access to profile_status_to_function.
>>
>> I see. I was coding that up today but hadn't tested it yet.
>>
>> > You can probably just factor this out into a function that can be called
>> > and for normal FDO we call it where the loop stis now and for auto-FDO we can
>> > probably have another invocation from before early passes where auto-FDO is collected.
>>
>> Ok, let's go with that approach for now. It won't address the 0 count
>> COMDAT calling another 0 count COMDAT problem, but I will probably
>> just find a way to deal with this when inlining.
>
> You can still propagate, since tree-profile is an simple-ipa pass.

Ok, I think I will leave that for the second patch, since I need to do
some testing on the effects - i.e. the propagation I had in mind would
make any 0-count routine called by a routine that was dropped to
guessed to also be dropped to guessed (and no longer unlikely). It may
be too aggressive, I need to check.

>>
>> >> >>> +      if (node->count)
>> >> >>> + continue;
>> > Also here we should sum the counts and consider function non unlikely executed
>> > in the same way as probably_never_executed does.
>>
>> I assume you mean by doing the same comparison to the number of
>> profile->runs. Yes, this makes sense.
>
> Yes.
>>
>> >
>> > I can prepare updated patch, but i am currently travelling, so i would not
>> > be disapointed if you beat me ;)
>>
>> I'm working on it, and I think based on Dehao's needs I am going to
>> split up the patch into two phases, the one that does just the part
>> you had sent a patch for (making sure 0 count routines with non-zero
>> calls are marked guessed and have their node frequency set
>> appropriately), and a subsequent one to do the count application when
>> we inline a 0-count routine into a non-zero callsite. I'll shoot for
>> getting this ready by tomorrow.
>>
>> BTW, in your original patch you are checking for both e->count or
>> cgraph_maybe_hot_edge_p(e), but AFAICT the call to
>> cgraph_maybe_hot_edge_p will never return true when e->count is zero.
>> When there is a profile it will return false via maybe_hot_count_p
>> since e->count == 0. When there is no profile it will return false
>> when the callee has NODE_FREQUENCY_UNLIKELY_EXECUTED. So I think just
>> checking for e->count >0 is sufficient here.
>
> I think I was checking caller count here (that is read) and the code
> was supposed to make functoin with hot caller to be hot...

The code in your patch was just checking the edge count, not the
caller count. cgraph_maybe_hot_edge_p also doesn't check the caller
count, just the edge count. Do we want to make all functions with hot
callers also be hot, even when the edge count is not hot? This may be
to aggressive. I was simply going to make the code check e->count.

Teresa

>
> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-16  0:20                               ` Teresa Johnson
@ 2013-10-16 19:15                                 ` Teresa Johnson
  2013-10-18 20:55                                   ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-10-16 19:15 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches

On Tue, Oct 15, 2013 at 3:39 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Tue, Oct 15, 2013 at 2:57 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> On Tue, Oct 15, 2013 at 2:03 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> >> Yes, that would work. So let's discard this patch because the fix for
>>> >> comdat can also fix this problem.
>>> >
>>> > Unforutnately ipa-profile-estimate is an IPA pass and as such it does
>>> > not have access to profile_status_to_function.
>>>
>>> I see. I was coding that up today but hadn't tested it yet.
>>>
>>> > You can probably just factor this out into a function that can be called
>>> > and for normal FDO we call it where the loop stis now and for auto-FDO we can
>>> > probably have another invocation from before early passes where auto-FDO is collected.
>>>
>>> Ok, let's go with that approach for now. It won't address the 0 count
>>> COMDAT calling another 0 count COMDAT problem, but I will probably
>>> just find a way to deal with this when inlining.
>>
>> You can still propagate, since tree-profile is an simple-ipa pass.
>
> Ok, I think I will leave that for the second patch, since I need to do
> some testing on the effects - i.e. the propagation I had in mind would
> make any 0-count routine called by a routine that was dropped to
> guessed to also be dropped to guessed (and no longer unlikely). It may
> be too aggressive, I need to check.
>
>>>
>>> >> >>> +      if (node->count)
>>> >> >>> + continue;
>>> > Also here we should sum the counts and consider function non unlikely executed
>>> > in the same way as probably_never_executed does.
>>>
>>> I assume you mean by doing the same comparison to the number of
>>> profile->runs. Yes, this makes sense.
>>
>> Yes.
>>>
>>> >
>>> > I can prepare updated patch, but i am currently travelling, so i would not
>>> > be disapointed if you beat me ;)
>>>
>>> I'm working on it, and I think based on Dehao's needs I am going to
>>> split up the patch into two phases, the one that does just the part
>>> you had sent a patch for (making sure 0 count routines with non-zero
>>> calls are marked guessed and have their node frequency set
>>> appropriately), and a subsequent one to do the count application when
>>> we inline a 0-count routine into a non-zero callsite. I'll shoot for
>>> getting this ready by tomorrow.
>>>
>>> BTW, in your original patch you are checking for both e->count or
>>> cgraph_maybe_hot_edge_p(e), but AFAICT the call to
>>> cgraph_maybe_hot_edge_p will never return true when e->count is zero.
>>> When there is a profile it will return false via maybe_hot_count_p
>>> since e->count == 0. When there is no profile it will return false
>>> when the callee has NODE_FREQUENCY_UNLIKELY_EXECUTED. So I think just
>>> checking for e->count >0 is sufficient here.
>>
>> I think I was checking caller count here (that is read) and the code
>> was supposed to make functoin with hot caller to be hot...
>
> The code in your patch was just checking the edge count, not the
> caller count. cgraph_maybe_hot_edge_p also doesn't check the caller
> count, just the edge count. Do we want to make all functions with hot
> callers also be hot, even when the edge count is not hot? This may be
> to aggressive. I was simply going to make the code check e->count.

Here is the patch from Honza, that I revised based on discussions and
testing. Once this related patch goes in, I can change the check against the
profile_info->runs that hardcodes the threshold with the new parameter
proposed for probably_never_executed there:
  http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00743.html

Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?

Thanks, Teresa

2013-10-16  Jan Hubicka  <jh@suse.cz>
            Teresa Johnson  <tejohnson@google.com>

        * predict.c (handle_missing_profiles): New function.
        (counts_to_freqs): Don't overwrite estimated frequencies
        when function has no profile counts.
        * predict.h (handle_missing_profiles): Declare.
        * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.

Index: predict.c
===================================================================
--- predict.c   (revision 203633)
+++ predict.c   (working copy)
@@ -2742,6 +2742,39 @@ estimate_loops (void)
   BITMAP_FREE (tovisit);
 }

+void
+handle_missing_profiles (void)
+{
+  struct cgraph_node *node;
+  /* See if 0 count function has non-0 count callers.  In this case we
+     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      struct cgraph_edge *e;
+      gcov_type call_count = 0;
+      struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl);
+      if (node->count)
+        continue;
+      for (e = node->callers; e; e = e->next_caller)
+        call_count += e->count;
+      if (call_count
+          && fn && fn->cfg
+          && (call_count * 4 >= profile_info->runs))
+        {
+          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
+          if (dump_file)
+            fprintf (dump_file,
+                     "Dropping 0 profile for %s/%i. %s based on calls.\n",
+                     cgraph_node_name (node), node->symbol.order,
+                     maybe_hot ? "Function is hot" : "Function is normal");
+          profile_status_for_function (fn)
+              = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+          node->frequency
+              = maybe_hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+        }
+    }
+}
+
 /* Convert counts measured by profile driven feedback to frequencies.
    Return nonzero iff there was any nonzero execution count.  */

@@ -2751,6 +2784,9 @@ counts_to_freqs (void)
   gcov_type count_max, true_count_max = 0;
   basic_block bb;

+  if (!ENTRY_BLOCK_PTR->count)
+    return 0;
+
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
     true_count_max = MAX (bb->count, true_count_max);

Index: predict.h
===================================================================
--- predict.h   (revision 203633)
+++ predict.h   (working copy)
@@ -37,6 +37,7 @@ enum prediction

 extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
 extern int counts_to_freqs (void);
+extern void handle_missing_profiles (void);
 extern void estimate_bb_frequencies (bool);
 extern const char *predictor_name (enum br_predictor);
 extern tree build_predict_expr (enum br_predictor, enum prediction);
Index: tree-profile.c
===================================================================
--- tree-profile.c      (revision 203633)
+++ tree-profile.c      (working copy)
@@ -607,6 +607,8 @@ tree_profiling (void)
       pop_cfun ();
     }

+  handle_missing_profiles ();
+
   del_node_map ();
   return 0;
 }


>
> Teresa
>
>>
>> Honza
>>>
>>> Thanks,
>>> Teresa
>>>
>>> >
>>> > Honza
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-16 19:15                                 ` Teresa Johnson
@ 2013-10-18 20:55                                   ` Teresa Johnson
  2013-10-18 22:35                                     ` Jan Hubicka
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-10-18 20:55 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches

Here is the patch updated to use the new parameter from r203830.
Passed bootstrap and regression tests.

2013-10-18  Jan Hubicka  <jh@suse.cz>
            Teresa Johnson  <tejohnson@google.com>

        * predict.c (handle_missing_profiles): New function.
        (counts_to_freqs): Don't overwrite estimated frequencies
        when function has no profile counts.
        * predict.h (handle_missing_profiles): Declare.
        * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.

Index: predict.c
===================================================================
--- predict.c   (revision 203830)
+++ predict.c   (working copy)
@@ -2758,6 +2758,40 @@ estimate_loops (void)
   BITMAP_FREE (tovisit);
 }

+void
+handle_missing_profiles (void)
+{
+  struct cgraph_node *node;
+  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
+  /* See if 0 count function has non-0 count callers.  In this case we
+     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      struct cgraph_edge *e;
+      gcov_type call_count = 0;
+      struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl);
+      if (node->count)
+        continue;
+      for (e = node->callers; e; e = e->next_caller)
+        call_count += e->count;
+      if (call_count
+          && fn && fn->cfg
+          && (call_count * unlikely_count_fraction >= profile_info->runs))
+        {
+          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
+          if (dump_file)
+            fprintf (dump_file,
+                     "Dropping 0 profile for %s/%i. %s based on calls.\n",
+                     cgraph_node_name (node), node->symbol.order,
+                     maybe_hot ? "Function is hot" : "Function is normal");
+          profile_status_for_function (fn)
+              = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+          node->frequency
+              = maybe_hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+        }
+    }
+}
+
 /* Convert counts measured by profile driven feedback to frequencies.
    Return nonzero iff there was any nonzero execution count.  */

@@ -2767,6 +2801,9 @@ counts_to_freqs (void)
   gcov_type count_max, true_count_max = 0;
   basic_block bb;

+  if (!ENTRY_BLOCK_PTR->count)
+    return 0;
+
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
     true_count_max = MAX (bb->count, true_count_max);

Index: predict.h
===================================================================
--- predict.h   (revision 203830)
+++ predict.h   (working copy)
@@ -37,6 +37,7 @@ enum prediction

 extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
 extern int counts_to_freqs (void);
+extern void handle_missing_profiles (void);
 extern void estimate_bb_frequencies (bool);
 extern const char *predictor_name (enum br_predictor);
 extern tree build_predict_expr (enum br_predictor, enum prediction);
Index: tree-profile.c
===================================================================
--- tree-profile.c      (revision 203830)
+++ tree-profile.c      (working copy)
@@ -607,6 +607,8 @@ tree_profiling (void)
       pop_cfun ();
     }

+  handle_missing_profiles ();
+
   del_node_map ();
   return 0;
 }

On Wed, Oct 16, 2013 at 12:02 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Tue, Oct 15, 2013 at 3:39 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Tue, Oct 15, 2013 at 2:57 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> On Tue, Oct 15, 2013 at 2:03 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> >> Yes, that would work. So let's discard this patch because the fix for
>>>> >> comdat can also fix this problem.
>>>> >
>>>> > Unforutnately ipa-profile-estimate is an IPA pass and as such it does
>>>> > not have access to profile_status_to_function.
>>>>
>>>> I see. I was coding that up today but hadn't tested it yet.
>>>>
>>>> > You can probably just factor this out into a function that can be called
>>>> > and for normal FDO we call it where the loop stis now and for auto-FDO we can
>>>> > probably have another invocation from before early passes where auto-FDO is collected.
>>>>
>>>> Ok, let's go with that approach for now. It won't address the 0 count
>>>> COMDAT calling another 0 count COMDAT problem, but I will probably
>>>> just find a way to deal with this when inlining.
>>>
>>> You can still propagate, since tree-profile is an simple-ipa pass.
>>
>> Ok, I think I will leave that for the second patch, since I need to do
>> some testing on the effects - i.e. the propagation I had in mind would
>> make any 0-count routine called by a routine that was dropped to
>> guessed to also be dropped to guessed (and no longer unlikely). It may
>> be too aggressive, I need to check.
>>
>>>>
>>>> >> >>> +      if (node->count)
>>>> >> >>> + continue;
>>>> > Also here we should sum the counts and consider function non unlikely executed
>>>> > in the same way as probably_never_executed does.
>>>>
>>>> I assume you mean by doing the same comparison to the number of
>>>> profile->runs. Yes, this makes sense.
>>>
>>> Yes.
>>>>
>>>> >
>>>> > I can prepare updated patch, but i am currently travelling, so i would not
>>>> > be disapointed if you beat me ;)
>>>>
>>>> I'm working on it, and I think based on Dehao's needs I am going to
>>>> split up the patch into two phases, the one that does just the part
>>>> you had sent a patch for (making sure 0 count routines with non-zero
>>>> calls are marked guessed and have their node frequency set
>>>> appropriately), and a subsequent one to do the count application when
>>>> we inline a 0-count routine into a non-zero callsite. I'll shoot for
>>>> getting this ready by tomorrow.
>>>>
>>>> BTW, in your original patch you are checking for both e->count or
>>>> cgraph_maybe_hot_edge_p(e), but AFAICT the call to
>>>> cgraph_maybe_hot_edge_p will never return true when e->count is zero.
>>>> When there is a profile it will return false via maybe_hot_count_p
>>>> since e->count == 0. When there is no profile it will return false
>>>> when the callee has NODE_FREQUENCY_UNLIKELY_EXECUTED. So I think just
>>>> checking for e->count >0 is sufficient here.
>>>
>>> I think I was checking caller count here (that is read) and the code
>>> was supposed to make functoin with hot caller to be hot...
>>
>> The code in your patch was just checking the edge count, not the
>> caller count. cgraph_maybe_hot_edge_p also doesn't check the caller
>> count, just the edge count. Do we want to make all functions with hot
>> callers also be hot, even when the edge count is not hot? This may be
>> to aggressive. I was simply going to make the code check e->count.
>
> Here is the patch from Honza, that I revised based on discussions and
> testing. Once this related patch goes in, I can change the check against the
> profile_info->runs that hardcodes the threshold with the new parameter
> proposed for probably_never_executed there:
>   http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00743.html
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?
>
> Thanks, Teresa
>
> 2013-10-16  Jan Hubicka  <jh@suse.cz>
>             Teresa Johnson  <tejohnson@google.com>
>
>         * predict.c (handle_missing_profiles): New function.
>         (counts_to_freqs): Don't overwrite estimated frequencies
>         when function has no profile counts.
>         * predict.h (handle_missing_profiles): Declare.
>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 203633)
> +++ predict.c   (working copy)
> @@ -2742,6 +2742,39 @@ estimate_loops (void)
>    BITMAP_FREE (tovisit);
>  }
>
> +void
> +handle_missing_profiles (void)
> +{
> +  struct cgraph_node *node;
> +  /* See if 0 count function has non-0 count callers.  In this case we
> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    {
> +      struct cgraph_edge *e;
> +      gcov_type call_count = 0;
> +      struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl);
> +      if (node->count)
> +        continue;
> +      for (e = node->callers; e; e = e->next_caller)
> +        call_count += e->count;
> +      if (call_count
> +          && fn && fn->cfg
> +          && (call_count * 4 >= profile_info->runs))
> +        {
> +          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
> +          if (dump_file)
> +            fprintf (dump_file,
> +                     "Dropping 0 profile for %s/%i. %s based on calls.\n",
> +                     cgraph_node_name (node), node->symbol.order,
> +                     maybe_hot ? "Function is hot" : "Function is normal");
> +          profile_status_for_function (fn)
> +              = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> +          node->frequency
> +              = maybe_hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
> +        }
> +    }
> +}
> +
>  /* Convert counts measured by profile driven feedback to frequencies.
>     Return nonzero iff there was any nonzero execution count.  */
>
> @@ -2751,6 +2784,9 @@ counts_to_freqs (void)
>    gcov_type count_max, true_count_max = 0;
>    basic_block bb;
>
> +  if (!ENTRY_BLOCK_PTR->count)
> +    return 0;
> +
>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>      true_count_max = MAX (bb->count, true_count_max);
>
> Index: predict.h
> ===================================================================
> --- predict.h   (revision 203633)
> +++ predict.h   (working copy)
> @@ -37,6 +37,7 @@ enum prediction
>
>  extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
>  extern int counts_to_freqs (void);
> +extern void handle_missing_profiles (void);
>  extern void estimate_bb_frequencies (bool);
>  extern const char *predictor_name (enum br_predictor);
>  extern tree build_predict_expr (enum br_predictor, enum prediction);
> Index: tree-profile.c
> ===================================================================
> --- tree-profile.c      (revision 203633)
> +++ tree-profile.c      (working copy)
> @@ -607,6 +607,8 @@ tree_profiling (void)
>        pop_cfun ();
>      }
>
> +  handle_missing_profiles ();
> +
>    del_node_map ();
>    return 0;
>  }
>
>
>>
>> Teresa
>>
>>>
>>> Honza
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>> >
>>>> > Honza
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-18 20:55                                   ` Teresa Johnson
@ 2013-10-18 22:35                                     ` Jan Hubicka
  2013-10-30 20:40                                       ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Hubicka @ 2013-10-18 22:35 UTC (permalink / raw)
  To: Teresa Johnson; +Cc: Jan Hubicka, Dehao Chen, Xinliang David Li, GCC Patches

> Here is the patch updated to use the new parameter from r203830.
> Passed bootstrap and regression tests.
> 
> 2013-10-18  Jan Hubicka  <jh@suse.cz>
>             Teresa Johnson  <tejohnson@google.com>
> 
>         * predict.c (handle_missing_profiles): New function.
>         (counts_to_freqs): Don't overwrite estimated frequencies
>         when function has no profile counts.
>         * predict.h (handle_missing_profiles): Declare.
>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
> 
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 203830)
> +++ predict.c   (working copy)
> @@ -2758,6 +2758,40 @@ estimate_loops (void)
>    BITMAP_FREE (tovisit);
>  }
> 

You need block comment. It should explain the problem of COMDATs and how they are handled.
It is not an obvious thing.

> +void
> +handle_missing_profiles (void)
> +{
> +  struct cgraph_node *node;
> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
extra line
> +  /* See if 0 count function has non-0 count callers.  In this case we
> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    {
> +      struct cgraph_edge *e;
> +      gcov_type call_count = 0;
> +      struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl);
Extra line
> +      if (node->count)
> +        continue;
> +      for (e = node->callers; e; e = e->next_caller)
> +        call_count += e->count;
What about checking if the sum is way off even for non-0 counts. 
I.e. for case where function was inlined to some calls but not to others?  In
that case we may want to scale up the counts (with some resonable care for
capping)

Also I think the code really should propagate - i.e. have simple worklist and
look for functions that are COMDAT and are called by other COMDAT functions
where we decided to drop the profile.  Having non-trivial chains of comdats is
common thing.

What about outputting user visible warning/error on the incosnsistency when
no COMDAT function is involved same way as we do for BB profile?

Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-18 22:35                                     ` Jan Hubicka
@ 2013-10-30 20:40                                       ` Teresa Johnson
  2013-11-05 14:08                                         ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-10-30 20:40 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches

On Fri, Oct 18, 2013 at 2:17 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Here is the patch updated to use the new parameter from r203830.
>> Passed bootstrap and regression tests.
>>
>> 2013-10-18  Jan Hubicka  <jh@suse.cz>
>>             Teresa Johnson  <tejohnson@google.com>
>>
>>         * predict.c (handle_missing_profiles): New function.
>>         (counts_to_freqs): Don't overwrite estimated frequencies
>>         when function has no profile counts.
>>         * predict.h (handle_missing_profiles): Declare.
>>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>>
>> Index: predict.c
>> ===================================================================
>> --- predict.c   (revision 203830)
>> +++ predict.c   (working copy)
>> @@ -2758,6 +2758,40 @@ estimate_loops (void)
>>    BITMAP_FREE (tovisit);
>>  }
>>
>
> You need block comment. It should explain the problem of COMDATs and how they are handled.
> It is not an obvious thing.

Done.

>
>> +void
>> +handle_missing_profiles (void)
>> +{
>> +  struct cgraph_node *node;
>> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
> extra line
>> +  /* See if 0 count function has non-0 count callers.  In this case we
>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>> +  FOR_EACH_DEFINED_FUNCTION (node)
>> +    {
>> +      struct cgraph_edge *e;
>> +      gcov_type call_count = 0;
>> +      struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl);
> Extra line
>> +      if (node->count)
>> +        continue;
>> +      for (e = node->callers; e; e = e->next_caller)
>> +        call_count += e->count;
> What about checking if the sum is way off even for non-0 counts.
> I.e. for case where function was inlined to some calls but not to others?  In
> that case we may want to scale up the counts (with some resonable care for
> capping)

In this patch I am not changing any counts, so I am leaving this one
for follow-on work (even for the routines missing counts completely
like these, we don't apply any counts, just mark them as guessed. I
have a follow-on patch to send once this goes in that does apply
counts to these 0-count routines only when we decide to inline as we
discussed).

>
> Also I think the code really should propagate - i.e. have simple worklist and
> look for functions that are COMDAT and are called by other COMDAT functions
> where we decided to drop the profile.  Having non-trivial chains of comdats is
> common thing.

Done.

>
> What about outputting user visible warning/error on the incosnsistency when
> no COMDAT function is involved same way as we do for BB profile?

Done. This one caught the fact that we have this situation for "extern
template" functions as well when I tested on cpu2006. I added in a
check to ignore those when issuing the warning (they are not emitted
thus don't get any profile counts).

Updated patch below.

Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on
profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk?

Thanks,
Teresa

2013-10-30  Teresa Johnson  <tejohnson@google.com>

        * predict.c (drop_profile): New function.
        (handle_missing_profiles): Ditto.
        (counts_to_freqs): Don't overwrite estimated frequencies
        when function has no profile counts.
        * predict.h (handle_missing_profiles): Declare.
        * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.

Index: predict.c
===================================================================
--- predict.c   (revision 204213)
+++ predict.c   (working copy)
@@ -2765,6 +2765,107 @@ estimate_loops (void)
   BITMAP_FREE (tovisit);
 }

+/* Drop the profile for NODE to guessed, and update its frequency based on
+   whether it is expected to be HOT.  */
+
+static void
+drop_profile (struct cgraph_node *node, bool hot)
+{
+  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+  if (dump_file)
+    fprintf (dump_file,
+             "Dropping 0 profile for %s/%i. %s based on calls.\n",
+             cgraph_node_name (node), node->order,
+             hot ? "Function is hot" : "Function is normal");
+  /* We only expect to miss profiles for functions that are reached
+     via non-zero call edges in cases where the function may have
+     been linked from another module or library (COMDATs and extern
+     templates). See the comments below for handle_missing_profiles.  */
+  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
+    warning (0,
+             "Missing counts for called function %s/%i",
+             cgraph_node_name (node), node->order);
+
+  profile_status_for_function (fn)
+      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+  node->frequency
+      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+}
+
+/* In the case of COMDAT routines, multiple object files will contain the same
+   function and the linker will select one for the binary. In that case
+   all the other copies from the profile instrument binary will be missing
+   profile counts. Look for cases where this happened, due to non-zero
+   call counts going to 0-count functions, and drop the profile to guessed
+   so that we can use the estimated probabilities and avoid optimizing only
+   for size.
+
+   The other case where the profile may be missing is when the routine
+   is not going to be emitted to the object file, e.g. for "extern template"
+   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
+   all other cases of non-zero calls to 0-count functions.  */
+
+void
+handle_missing_profiles (void)
+{
+  struct cgraph_node *node;
+  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
+  vec<struct cgraph_node *> worklist;
+  worklist.create (64);
+
+  /* See if 0 count function has non-0 count callers.  In this case we
+     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      struct cgraph_edge *e;
+      gcov_type call_count = 0;
+      struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+      if (node->count)
+        continue;
+      for (e = node->callers; e; e = e->next_caller)
+        call_count += e->count;
+      if (call_count
+          && fn && fn->cfg
+          && (call_count * unlikely_count_fraction >= profile_info->runs))
+        {
+          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
+
+          drop_profile (node, maybe_hot);
+          worklist.safe_push (node);
+        }
+    }
+
+  /* Propagate the profile dropping to other 0-count COMDATs that are
+     potentially called by COMDATs we already dropped the profile on.  */
+  while (worklist.length () > 0)
+    {
+      struct cgraph_edge *e;
+
+      node = worklist.pop ();
+      for (e = node->callees; e; e = e->next_caller)
+        {
+          struct cgraph_node *callee = e->callee;
+          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
+
+          if (callee->count > 0)
+            continue;
+          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
+              && profile_status_for_function (fn) == PROFILE_READ)
+            {
+              /* Since there are no non-0 call counts to this function,
+                 we don't know for sure whether it is hot. Indicate to
+                 the drop_profile routine that function should be marked
+                 normal, rather than hot.  */
+              drop_profile (node, false);
+              worklist.safe_push (callee);
+            }
+        }
+    }
+  worklist.release ();
+}
+
 /* Convert counts measured by profile driven feedback to frequencies.
    Return nonzero iff there was any nonzero execution count.  */

@@ -2774,6 +2875,9 @@ counts_to_freqs (void)
   gcov_type count_max, true_count_max = 0;
   basic_block bb;

+  if (!ENTRY_BLOCK_PTR->count)
+    return 0;
+
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
     true_count_max = MAX (bb->count, true_count_max);

Index: predict.h
===================================================================
--- predict.h   (revision 204213)
+++ predict.h   (working copy)
@@ -37,6 +37,7 @@ enum prediction

 extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
 extern int counts_to_freqs (void);
+extern void handle_missing_profiles (void);
 extern void estimate_bb_frequencies (bool);
 extern const char *predictor_name (enum br_predictor);
 extern tree build_predict_expr (enum br_predictor, enum prediction);
Index: tree-profile.c
===================================================================
--- tree-profile.c      (revision 204213)
+++ tree-profile.c      (working copy)
@@ -612,6 +612,8 @@ tree_profiling (void)
       pop_cfun ();
     }

+  handle_missing_profiles ();
+
   del_node_map ();
   return 0;
 }

>
> Honza



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-10-30 20:40                                       ` Teresa Johnson
@ 2013-11-05 14:08                                         ` Teresa Johnson
  2013-11-08 22:17                                           ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-11-05 14:08 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches

Ping.
Thanks, Teresa

On Wed, Oct 30, 2013 at 1:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Fri, Oct 18, 2013 at 2:17 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> Here is the patch updated to use the new parameter from r203830.
>>> Passed bootstrap and regression tests.
>>>
>>> 2013-10-18  Jan Hubicka  <jh@suse.cz>
>>>             Teresa Johnson  <tejohnson@google.com>
>>>
>>>         * predict.c (handle_missing_profiles): New function.
>>>         (counts_to_freqs): Don't overwrite estimated frequencies
>>>         when function has no profile counts.
>>>         * predict.h (handle_missing_profiles): Declare.
>>>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>>>
>>> Index: predict.c
>>> ===================================================================
>>> --- predict.c   (revision 203830)
>>> +++ predict.c   (working copy)
>>> @@ -2758,6 +2758,40 @@ estimate_loops (void)
>>>    BITMAP_FREE (tovisit);
>>>  }
>>>
>>
>> You need block comment. It should explain the problem of COMDATs and how they are handled.
>> It is not an obvious thing.
>
> Done.
>
>>
>>> +void
>>> +handle_missing_profiles (void)
>>> +{
>>> +  struct cgraph_node *node;
>>> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
>> extra line
>>> +  /* See if 0 count function has non-0 count callers.  In this case we
>>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>>> +  FOR_EACH_DEFINED_FUNCTION (node)
>>> +    {
>>> +      struct cgraph_edge *e;
>>> +      gcov_type call_count = 0;
>>> +      struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl);
>> Extra line
>>> +      if (node->count)
>>> +        continue;
>>> +      for (e = node->callers; e; e = e->next_caller)
>>> +        call_count += e->count;
>> What about checking if the sum is way off even for non-0 counts.
>> I.e. for case where function was inlined to some calls but not to others?  In
>> that case we may want to scale up the counts (with some resonable care for
>> capping)
>
> In this patch I am not changing any counts, so I am leaving this one
> for follow-on work (even for the routines missing counts completely
> like these, we don't apply any counts, just mark them as guessed. I
> have a follow-on patch to send once this goes in that does apply
> counts to these 0-count routines only when we decide to inline as we
> discussed).
>
>>
>> Also I think the code really should propagate - i.e. have simple worklist and
>> look for functions that are COMDAT and are called by other COMDAT functions
>> where we decided to drop the profile.  Having non-trivial chains of comdats is
>> common thing.
>
> Done.
>
>>
>> What about outputting user visible warning/error on the incosnsistency when
>> no COMDAT function is involved same way as we do for BB profile?
>
> Done. This one caught the fact that we have this situation for "extern
> template" functions as well when I tested on cpu2006. I added in a
> check to ignore those when issuing the warning (they are not emitted
> thus don't get any profile counts).
>
> Updated patch below.
>
> Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on
> profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk?
>
> Thanks,
> Teresa
>
> 2013-10-30  Teresa Johnson  <tejohnson@google.com>
>
>         * predict.c (drop_profile): New function.
>         (handle_missing_profiles): Ditto.
>         (counts_to_freqs): Don't overwrite estimated frequencies
>         when function has no profile counts.
>         * predict.h (handle_missing_profiles): Declare.
>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 204213)
> +++ predict.c   (working copy)
> @@ -2765,6 +2765,107 @@ estimate_loops (void)
>    BITMAP_FREE (tovisit);
>  }
>
> +/* Drop the profile for NODE to guessed, and update its frequency based on
> +   whether it is expected to be HOT.  */
> +
> +static void
> +drop_profile (struct cgraph_node *node, bool hot)
> +{
> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +  if (dump_file)
> +    fprintf (dump_file,
> +             "Dropping 0 profile for %s/%i. %s based on calls.\n",
> +             cgraph_node_name (node), node->order,
> +             hot ? "Function is hot" : "Function is normal");
> +  /* We only expect to miss profiles for functions that are reached
> +     via non-zero call edges in cases where the function may have
> +     been linked from another module or library (COMDATs and extern
> +     templates). See the comments below for handle_missing_profiles.  */
> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
> +    warning (0,
> +             "Missing counts for called function %s/%i",
> +             cgraph_node_name (node), node->order);
> +
> +  profile_status_for_function (fn)
> +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> +  node->frequency
> +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
> +}
> +
> +/* In the case of COMDAT routines, multiple object files will contain the same
> +   function and the linker will select one for the binary. In that case
> +   all the other copies from the profile instrument binary will be missing
> +   profile counts. Look for cases where this happened, due to non-zero
> +   call counts going to 0-count functions, and drop the profile to guessed
> +   so that we can use the estimated probabilities and avoid optimizing only
> +   for size.
> +
> +   The other case where the profile may be missing is when the routine
> +   is not going to be emitted to the object file, e.g. for "extern template"
> +   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
> +   all other cases of non-zero calls to 0-count functions.  */
> +
> +void
> +handle_missing_profiles (void)
> +{
> +  struct cgraph_node *node;
> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
> +  vec<struct cgraph_node *> worklist;
> +  worklist.create (64);
> +
> +  /* See if 0 count function has non-0 count callers.  In this case we
> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    {
> +      struct cgraph_edge *e;
> +      gcov_type call_count = 0;
> +      struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +      if (node->count)
> +        continue;
> +      for (e = node->callers; e; e = e->next_caller)
> +        call_count += e->count;
> +      if (call_count
> +          && fn && fn->cfg
> +          && (call_count * unlikely_count_fraction >= profile_info->runs))
> +        {
> +          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
> +
> +          drop_profile (node, maybe_hot);
> +          worklist.safe_push (node);
> +        }
> +    }
> +
> +  /* Propagate the profile dropping to other 0-count COMDATs that are
> +     potentially called by COMDATs we already dropped the profile on.  */
> +  while (worklist.length () > 0)
> +    {
> +      struct cgraph_edge *e;
> +
> +      node = worklist.pop ();
> +      for (e = node->callees; e; e = e->next_caller)
> +        {
> +          struct cgraph_node *callee = e->callee;
> +          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
> +
> +          if (callee->count > 0)
> +            continue;
> +          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
> +              && profile_status_for_function (fn) == PROFILE_READ)
> +            {
> +              /* Since there are no non-0 call counts to this function,
> +                 we don't know for sure whether it is hot. Indicate to
> +                 the drop_profile routine that function should be marked
> +                 normal, rather than hot.  */
> +              drop_profile (node, false);
> +              worklist.safe_push (callee);
> +            }
> +        }
> +    }
> +  worklist.release ();
> +}
> +
>  /* Convert counts measured by profile driven feedback to frequencies.
>     Return nonzero iff there was any nonzero execution count.  */
>
> @@ -2774,6 +2875,9 @@ counts_to_freqs (void)
>    gcov_type count_max, true_count_max = 0;
>    basic_block bb;
>
> +  if (!ENTRY_BLOCK_PTR->count)
> +    return 0;
> +
>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>      true_count_max = MAX (bb->count, true_count_max);
>
> Index: predict.h
> ===================================================================
> --- predict.h   (revision 204213)
> +++ predict.h   (working copy)
> @@ -37,6 +37,7 @@ enum prediction
>
>  extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
>  extern int counts_to_freqs (void);
> +extern void handle_missing_profiles (void);
>  extern void estimate_bb_frequencies (bool);
>  extern const char *predictor_name (enum br_predictor);
>  extern tree build_predict_expr (enum br_predictor, enum prediction);
> Index: tree-profile.c
> ===================================================================
> --- tree-profile.c      (revision 204213)
> +++ tree-profile.c      (working copy)
> @@ -612,6 +612,8 @@ tree_profiling (void)
>        pop_cfun ();
>      }
>
> +  handle_missing_profiles ();
> +
>    del_node_map ();
>    return 0;
>  }
>
>>
>> Honza
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-05 14:08                                         ` Teresa Johnson
@ 2013-11-08 22:17                                           ` Teresa Johnson
  2013-11-08 22:28                                             ` Steven Bosscher
  2013-11-11 16:24                                             ` Jan Hubicka
  0 siblings, 2 replies; 42+ messages in thread
From: Teresa Johnson @ 2013-11-08 22:17 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches

On Tue, Nov 5, 2013 at 6:02 AM, Teresa Johnson <tejohnson@google.com> wrote:
> Ping.
> Thanks, Teresa
>

I updated the patch to include the follow-on work to apply call counts
to 0-weight functions using the estimated frequencies when inlining.
This helps avoid 0-weight blocks of inlined code which are
particularly bad for function splitting.

Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on
profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk?

Thanks,
Teresa

2013-11-08  Teresa Johnson  <tejohnson@google.com>
            Jan Hubicka  <jh@suse.cz>

        * predict.c (drop_profile): New function.
        (handle_missing_profiles): Ditto.
        (counts_to_freqs): Don't overwrite estimated frequencies
        when function has no profile counts.
        * predict.h (handle_missing_profiles): Declare.
        * tree-inline.c (freqs_to_counts): New function.
        (copy_cfg_body): Invoke freqs_to_counts as needed.
        * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.

Index: predict.c
===================================================================
--- predict.c   (revision 204516)
+++ predict.c   (working copy)
@@ -2765,6 +2765,107 @@ estimate_loops (void)
   BITMAP_FREE (tovisit);
 }

+/* Drop the profile for NODE to guessed, and update its frequency based on
+   whether it is expected to be HOT.  */
+
+static void
+drop_profile (struct cgraph_node *node, bool hot)
+{
+  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+  if (dump_file)
+    fprintf (dump_file,
+             "Dropping 0 profile for %s/%i. %s based on calls.\n",
+             cgraph_node_name (node), node->order,
+             hot ? "Function is hot" : "Function is normal");
+  /* We only expect to miss profiles for functions that are reached
+     via non-zero call edges in cases where the function may have
+     been linked from another module or library (COMDATs and extern
+     templates). See the comments below for handle_missing_profiles.  */
+  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
+    warning (0,
+             "Missing counts for called function %s/%i",
+             cgraph_node_name (node), node->order);
+
+  profile_status_for_function (fn)
+      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+  node->frequency
+      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+}
+
+/* In the case of COMDAT routines, multiple object files will contain the same
+   function and the linker will select one for the binary. In that case
+   all the other copies from the profile instrument binary will be missing
+   profile counts. Look for cases where this happened, due to non-zero
+   call counts going to 0-count functions, and drop the profile to guessed
+   so that we can use the estimated probabilities and avoid optimizing only
+   for size.
+
+   The other case where the profile may be missing is when the routine
+   is not going to be emitted to the object file, e.g. for "extern template"
+   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
+   all other cases of non-zero calls to 0-count functions.  */
+
+void
+handle_missing_profiles (void)
+{
+  struct cgraph_node *node;
+  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
+  vec<struct cgraph_node *> worklist;
+  worklist.create (64);
+
+  /* See if 0 count function has non-0 count callers.  In this case we
+     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      struct cgraph_edge *e;
+      gcov_type call_count = 0;
+      struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+      if (node->count)
+        continue;
+      for (e = node->callers; e; e = e->next_caller)
+        call_count += e->count;
+      if (call_count
+          && fn && fn->cfg
+          && (call_count * unlikely_count_fraction >= profile_info->runs))
+        {
+          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
+
+          drop_profile (node, maybe_hot);
+          worklist.safe_push (node);
+        }
+    }
+
+  /* Propagate the profile dropping to other 0-count COMDATs that are
+     potentially called by COMDATs we already dropped the profile on.  */
+  while (worklist.length () > 0)
+    {
+      struct cgraph_edge *e;
+
+      node = worklist.pop ();
+      for (e = node->callees; e; e = e->next_caller)
+        {
+          struct cgraph_node *callee = e->callee;
+          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
+
+          if (callee->count > 0)
+            continue;
+          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
+              && profile_status_for_function (fn) == PROFILE_READ)
+            {
+              /* Since there are no non-0 call counts to this function,
+                 we don't know for sure whether it is hot. Indicate to
+                 the drop_profile routine that function should be marked
+                 normal, rather than hot.  */
+              drop_profile (node, false);
+              worklist.safe_push (callee);
+            }
+        }
+    }
+  worklist.release ();
+}
+
 /* Convert counts measured by profile driven feedback to frequencies.
    Return nonzero iff there was any nonzero execution count.  */

@@ -2774,6 +2875,9 @@ counts_to_freqs (void)
   gcov_type count_max, true_count_max = 0;
   basic_block bb;

+  if (!ENTRY_BLOCK_PTR->count)
+    return 0;
+
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
     true_count_max = MAX (bb->count, true_count_max);

Index: predict.h
===================================================================
--- predict.h   (revision 204516)
+++ predict.h   (working copy)
@@ -37,6 +37,7 @@ enum prediction

 extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
 extern int counts_to_freqs (void);
+extern void handle_missing_profiles (void);
 extern void estimate_bb_frequencies (bool);
 extern const char *predictor_name (enum br_predictor);
 extern tree build_predict_expr (enum br_predictor, enum prediction);
Index: tree-inline.c
===================================================================
--- tree-inline.c       (revision 204516)
+++ tree-inline.c       (working copy)
@@ -2353,6 +2353,29 @@ redirect_all_calls (copy_body_data * id, basic_blo
     }
 }

+/* Convert estimated frequencies into counts for NODE, scaling COUNT
+   with each bb's frequency. Used when NODE has a 0-weight entry
+   but we are about to inline it into a non-zero count call bb.
+   See the comments for handle_missing_profiles() in predict.c for
+   when this can happen for COMDATs.  */
+
+void
+freqs_to_counts (struct cgraph_node *node, gcov_type count)
+{
+  basic_block bb;
+  edge_iterator ei;
+  edge e;
+  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+  FOR_ALL_BB_FN(bb, fn)
+  {
+    bb->count = apply_scale (count,
+                             GCOV_COMPUTE_SCALE (bb->frequency, BB_FREQ_MAX));
+    FOR_EACH_EDGE (e, ei, bb->succs)
+      e->count = apply_probability (e->src->count, e->probability);
+  }
+}
+
 /* Make a copy of the body of FN so that it can be inserted inline in
    another function.  Walks FN via CFG, returns new fndecl.  */

@@ -2373,6 +2396,24 @@ copy_cfg_body (copy_body_data * id, gcov_type coun
   int incoming_frequency = 0;
   gcov_type incoming_count = 0;

+  /* This can happen for COMDAT routines that end up with 0 counts
+     despite being called (see the comments for handle_missing_profiles()
+     in predict.c as to why). Apply counts to the blocks in the callee
+     before inlining, using the guessed edge frequencies, so that we don't
+     end up with a 0-count inline body which can confuse downstream
+     optimizations such as function splitting.  */
+  if (!ENTRY_BLOCK_PTR_FOR_FUNCTION (src_cfun)->count && count)
+    {
+      /* Apply the larger of the call bb count and the total incoming
+         call edge count to the callee.  */
+      gcov_type in_count = 0;
+      struct cgraph_edge *in_edge;
+      for (in_edge = id->src_node->callers; in_edge;
+           in_edge = in_edge->next_caller)
+        in_count += in_edge->count;
+      freqs_to_counts (id->src_node, count > in_count ? count : in_count);
+    }
+
   if (ENTRY_BLOCK_PTR_FOR_FUNCTION (src_cfun)->count)
     count_scale
         = GCOV_COMPUTE_SCALE (count,
Index: tree-profile.c
===================================================================
--- tree-profile.c      (revision 204516)
+++ tree-profile.c      (working copy)
@@ -612,6 +612,8 @@ tree_profiling (void)
       pop_cfun ();
     }

+  handle_missing_profiles ();
+
   del_node_map ();
   return 0;
 }

> On Wed, Oct 30, 2013 at 1:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Fri, Oct 18, 2013 at 2:17 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> Here is the patch updated to use the new parameter from r203830.
>>>> Passed bootstrap and regression tests.
>>>>
>>>> 2013-10-18  Jan Hubicka  <jh@suse.cz>
>>>>             Teresa Johnson  <tejohnson@google.com>
>>>>
>>>>         * predict.c (handle_missing_profiles): New function.
>>>>         (counts_to_freqs): Don't overwrite estimated frequencies
>>>>         when function has no profile counts.
>>>>         * predict.h (handle_missing_profiles): Declare.
>>>>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>>>>
>>>> Index: predict.c
>>>> ===================================================================
>>>> --- predict.c   (revision 203830)
>>>> +++ predict.c   (working copy)
>>>> @@ -2758,6 +2758,40 @@ estimate_loops (void)
>>>>    BITMAP_FREE (tovisit);
>>>>  }
>>>>
>>>
>>> You need block comment. It should explain the problem of COMDATs and how they are handled.
>>> It is not an obvious thing.
>>
>> Done.
>>
>>>
>>>> +void
>>>> +handle_missing_profiles (void)
>>>> +{
>>>> +  struct cgraph_node *node;
>>>> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
>>> extra line
>>>> +  /* See if 0 count function has non-0 count callers.  In this case we
>>>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>>>> +  FOR_EACH_DEFINED_FUNCTION (node)
>>>> +    {
>>>> +      struct cgraph_edge *e;
>>>> +      gcov_type call_count = 0;
>>>> +      struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl);
>>> Extra line
>>>> +      if (node->count)
>>>> +        continue;
>>>> +      for (e = node->callers; e; e = e->next_caller)
>>>> +        call_count += e->count;
>>> What about checking if the sum is way off even for non-0 counts.
>>> I.e. for case where function was inlined to some calls but not to others?  In
>>> that case we may want to scale up the counts (with some resonable care for
>>> capping)
>>
>> In this patch I am not changing any counts, so I am leaving this one
>> for follow-on work (even for the routines missing counts completely
>> like these, we don't apply any counts, just mark them as guessed. I
>> have a follow-on patch to send once this goes in that does apply
>> counts to these 0-count routines only when we decide to inline as we
>> discussed).
>>
>>>
>>> Also I think the code really should propagate - i.e. have simple worklist and
>>> look for functions that are COMDAT and are called by other COMDAT functions
>>> where we decided to drop the profile.  Having non-trivial chains of comdats is
>>> common thing.
>>
>> Done.
>>
>>>
>>> What about outputting user visible warning/error on the incosnsistency when
>>> no COMDAT function is involved same way as we do for BB profile?
>>
>> Done. This one caught the fact that we have this situation for "extern
>> template" functions as well when I tested on cpu2006. I added in a
>> check to ignore those when issuing the warning (they are not emitted
>> thus don't get any profile counts).
>>
>> Updated patch below.
>>
>> Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on
>> profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2013-10-30  Teresa Johnson  <tejohnson@google.com>
>>
>>         * predict.c (drop_profile): New function.
>>         (handle_missing_profiles): Ditto.
>>         (counts_to_freqs): Don't overwrite estimated frequencies
>>         when function has no profile counts.
>>         * predict.h (handle_missing_profiles): Declare.
>>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>>
>> Index: predict.c
>> ===================================================================
>> --- predict.c   (revision 204213)
>> +++ predict.c   (working copy)
>> @@ -2765,6 +2765,107 @@ estimate_loops (void)
>>    BITMAP_FREE (tovisit);
>>  }
>>
>> +/* Drop the profile for NODE to guessed, and update its frequency based on
>> +   whether it is expected to be HOT.  */
>> +
>> +static void
>> +drop_profile (struct cgraph_node *node, bool hot)
>> +{
>> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>> +
>> +  if (dump_file)
>> +    fprintf (dump_file,
>> +             "Dropping 0 profile for %s/%i. %s based on calls.\n",
>> +             cgraph_node_name (node), node->order,
>> +             hot ? "Function is hot" : "Function is normal");
>> +  /* We only expect to miss profiles for functions that are reached
>> +     via non-zero call edges in cases where the function may have
>> +     been linked from another module or library (COMDATs and extern
>> +     templates). See the comments below for handle_missing_profiles.  */
>> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
>> +    warning (0,
>> +             "Missing counts for called function %s/%i",
>> +             cgraph_node_name (node), node->order);
>> +
>> +  profile_status_for_function (fn)
>> +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>> +  node->frequency
>> +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>> +}
>> +
>> +/* In the case of COMDAT routines, multiple object files will contain the same
>> +   function and the linker will select one for the binary. In that case
>> +   all the other copies from the profile instrument binary will be missing
>> +   profile counts. Look for cases where this happened, due to non-zero
>> +   call counts going to 0-count functions, and drop the profile to guessed
>> +   so that we can use the estimated probabilities and avoid optimizing only
>> +   for size.
>> +
>> +   The other case where the profile may be missing is when the routine
>> +   is not going to be emitted to the object file, e.g. for "extern template"
>> +   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
>> +   all other cases of non-zero calls to 0-count functions.  */
>> +
>> +void
>> +handle_missing_profiles (void)
>> +{
>> +  struct cgraph_node *node;
>> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
>> +  vec<struct cgraph_node *> worklist;
>> +  worklist.create (64);
>> +
>> +  /* See if 0 count function has non-0 count callers.  In this case we
>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>> +  FOR_EACH_DEFINED_FUNCTION (node)
>> +    {
>> +      struct cgraph_edge *e;
>> +      gcov_type call_count = 0;
>> +      struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>> +
>> +      if (node->count)
>> +        continue;
>> +      for (e = node->callers; e; e = e->next_caller)
>> +        call_count += e->count;
>> +      if (call_count
>> +          && fn && fn->cfg
>> +          && (call_count * unlikely_count_fraction >= profile_info->runs))
>> +        {
>> +          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
>> +
>> +          drop_profile (node, maybe_hot);
>> +          worklist.safe_push (node);
>> +        }
>> +    }
>> +
>> +  /* Propagate the profile dropping to other 0-count COMDATs that are
>> +     potentially called by COMDATs we already dropped the profile on.  */
>> +  while (worklist.length () > 0)
>> +    {
>> +      struct cgraph_edge *e;
>> +
>> +      node = worklist.pop ();
>> +      for (e = node->callees; e; e = e->next_caller)
>> +        {
>> +          struct cgraph_node *callee = e->callee;
>> +          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
>> +
>> +          if (callee->count > 0)
>> +            continue;
>> +          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>> +              && profile_status_for_function (fn) == PROFILE_READ)
>> +            {
>> +              /* Since there are no non-0 call counts to this function,
>> +                 we don't know for sure whether it is hot. Indicate to
>> +                 the drop_profile routine that function should be marked
>> +                 normal, rather than hot.  */
>> +              drop_profile (node, false);
>> +              worklist.safe_push (callee);
>> +            }
>> +        }
>> +    }
>> +  worklist.release ();
>> +}
>> +
>>  /* Convert counts measured by profile driven feedback to frequencies.
>>     Return nonzero iff there was any nonzero execution count.  */
>>
>> @@ -2774,6 +2875,9 @@ counts_to_freqs (void)
>>    gcov_type count_max, true_count_max = 0;
>>    basic_block bb;
>>
>> +  if (!ENTRY_BLOCK_PTR->count)
>> +    return 0;
>> +
>>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>>      true_count_max = MAX (bb->count, true_count_max);
>>
>> Index: predict.h
>> ===================================================================
>> --- predict.h   (revision 204213)
>> +++ predict.h   (working copy)
>> @@ -37,6 +37,7 @@ enum prediction
>>
>>  extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
>>  extern int counts_to_freqs (void);
>> +extern void handle_missing_profiles (void);
>>  extern void estimate_bb_frequencies (bool);
>>  extern const char *predictor_name (enum br_predictor);
>>  extern tree build_predict_expr (enum br_predictor, enum prediction);
>> Index: tree-profile.c
>> ===================================================================
>> --- tree-profile.c      (revision 204213)
>> +++ tree-profile.c      (working copy)
>> @@ -612,6 +612,8 @@ tree_profiling (void)
>>        pop_cfun ();
>>      }
>>
>> +  handle_missing_profiles ();
>> +
>>    del_node_map ();
>>    return 0;
>>  }
>>
>>>
>>> Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-08 22:17                                           ` Teresa Johnson
@ 2013-11-08 22:28                                             ` Steven Bosscher
  2013-11-10 13:05                                               ` Eric Botcazou
  2013-11-11 16:24                                             ` Jan Hubicka
  1 sibling, 1 reply; 42+ messages in thread
From: Steven Bosscher @ 2013-11-08 22:28 UTC (permalink / raw)
  To: Teresa Johnson; +Cc: Jan Hubicka, Dehao Chen, Xinliang David Li, GCC Patches

On Fri, Nov 8, 2013 at 10:58 PM, Teresa Johnson wrote:

> +static void
> +drop_profile (struct cgraph_node *node, bool hot)
> +{
> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +  if (dump_file)
> +    fprintf (dump_file,
> +             "Dropping 0 profile for %s/%i. %s based on calls.\n",
> +             cgraph_node_name (node), node->order,
> +             hot ? "Function is hot" : "Function is normal");
> +  /* We only expect to miss profiles for functions that are reached
> +     via non-zero call edges in cases where the function may have
> +     been linked from another module or library (COMDATs and extern
> +     templates). See the comments below for handle_missing_profiles.  */
> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
> +    warning (0,
> +             "Missing counts for called function %s/%i",
> +             cgraph_node_name (node), node->order);

Maybe OPT_Wdisabled_optimization?



> +
> +  profile_status_for_function (fn)
> +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> +  node->frequency
> +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;

In GCC code style the = goes at the end of the line:

  profile_status_for_function (fn)
    (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
  node->frequency =
    hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;



> @@ -2774,6 +2875,9 @@ counts_to_freqs (void)
>    gcov_type count_max, true_count_max = 0;
>    basic_block bb;
>
> +  if (!ENTRY_BLOCK_PTR->count)
> +    return 0;
> +

Deserves a one-line comment ;-)




> +void
> +freqs_to_counts (struct cgraph_node *node, gcov_type count)
> +{
> +  basic_block bb;
> +  edge_iterator ei;
> +  edge e;
> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +  FOR_ALL_BB_FN(bb, fn)
> +  {
> +    bb->count = apply_scale (count,
> +                             GCOV_COMPUTE_SCALE (bb->frequency, BB_FREQ_MAX));
> +    FOR_EACH_EDGE (e, ei, bb->succs)
> +      e->count = apply_probability (e->src->count, e->probability);
> +  }

Indent +2:

  FOR_ALL_BB_FN (bb, fn)
    {
      ...
    }


Ciao!
Steven

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-08 22:28                                             ` Steven Bosscher
@ 2013-11-10 13:05                                               ` Eric Botcazou
  2013-11-10 17:16                                                 ` Steven Bosscher
  0 siblings, 1 reply; 42+ messages in thread
From: Eric Botcazou @ 2013-11-10 13:05 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: gcc-patches, Teresa Johnson, Jan Hubicka, Dehao Chen, Xinliang David Li

> > +
> > +  profile_status_for_function (fn)
> > +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> > +  node->frequency
> > +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
> 
> In GCC code style the = goes at the end of the line:
> 
>   profile_status_for_function (fn)
>     (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>   node->frequency =
>     hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;

Absolutely not, Teresa's version is the correct one, see reload.c for example.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-10 13:05                                               ` Eric Botcazou
@ 2013-11-10 17:16                                                 ` Steven Bosscher
  0 siblings, 0 replies; 42+ messages in thread
From: Steven Bosscher @ 2013-11-10 17:16 UTC (permalink / raw)
  To: Eric Botcazou
  Cc: GCC Patches, Teresa Johnson, Jan Hubicka, Dehao Chen, Xinliang David Li

On Sun, Nov 10, 2013 at 1:08 PM, Eric Botcazou wrote:
>> > +
>> > +  profile_status_for_function (fn)
>> > +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>> > +  node->frequency
>> > +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>>
>> In GCC code style the = goes at the end of the line:
>>
>>   profile_status_for_function (fn)
>>     (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>>   node->frequency =
>>     hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>
> Absolutely not, Teresa's version is the correct one, see reload.c for example.


Hmm, "absolutely"?

[stevenb@gcc1-power7 trunk]$ egrep -ch "^\s+= " gcc/*.[ch] | awk
'{sum=sum+$1}END{print sum}'
1797
[stevenb@gcc1-power7 trunk]$ egrep -ch "\s=$" gcc/*.[ch] | awk
'{sum=sum+$1}END{print sum}'
685

I can't find a rule/guide in the GNU or GCC coding style documents.

Anyway, not important. The "=" starting a new line is the majority, so
Theresa please forget about my comment :-)

Ciao!
Steven

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-08 22:17                                           ` Teresa Johnson
  2013-11-08 22:28                                             ` Steven Bosscher
@ 2013-11-11 16:24                                             ` Jan Hubicka
  2013-11-11 17:06                                               ` Teresa Johnson
  1 sibling, 1 reply; 42+ messages in thread
From: Jan Hubicka @ 2013-11-11 16:24 UTC (permalink / raw)
  To: Teresa Johnson; +Cc: Jan Hubicka, Dehao Chen, Xinliang David Li, GCC Patches

> 2013-11-08  Teresa Johnson  <tejohnson@google.com>
>             Jan Hubicka  <jh@suse.cz>
> 
>         * predict.c (drop_profile): New function.
>         (handle_missing_profiles): Ditto.
>         (counts_to_freqs): Don't overwrite estimated frequencies
>         when function has no profile counts.
>         * predict.h (handle_missing_profiles): Declare.
>         * tree-inline.c (freqs_to_counts): New function.
>         (copy_cfg_body): Invoke freqs_to_counts as needed.
>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
> 
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 204516)
> +++ predict.c   (working copy)
> @@ -2765,6 +2765,107 @@ estimate_loops (void)
>    BITMAP_FREE (tovisit);
>  }
> 
> +/* Drop the profile for NODE to guessed, and update its frequency based on
> +   whether it is expected to be HOT.  */
> +
> +static void
> +drop_profile (struct cgraph_node *node, bool hot)
> +{
> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +  if (dump_file)
> +    fprintf (dump_file,
> +             "Dropping 0 profile for %s/%i. %s based on calls.\n",
> +             cgraph_node_name (node), node->order,
> +             hot ? "Function is hot" : "Function is normal");
> +  /* We only expect to miss profiles for functions that are reached
> +     via non-zero call edges in cases where the function may have
> +     been linked from another module or library (COMDATs and extern
> +     templates). See the comments below for handle_missing_profiles.  */
> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
> +    warning (0,
> +             "Missing counts for called function %s/%i",
> +             cgraph_node_name (node), node->order);
> +
> +  profile_status_for_function (fn)
> +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> +  node->frequency
> +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
> +}
> +
> +/* In the case of COMDAT routines, multiple object files will contain the same
> +   function and the linker will select one for the binary. In that case
> +   all the other copies from the profile instrument binary will be missing
> +   profile counts. Look for cases where this happened, due to non-zero
> +   call counts going to 0-count functions, and drop the profile to guessed
> +   so that we can use the estimated probabilities and avoid optimizing only
> +   for size.
> +
> +   The other case where the profile may be missing is when the routine
> +   is not going to be emitted to the object file, e.g. for "extern template"
> +   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
> +   all other cases of non-zero calls to 0-count functions.  */
> +
> +void
> +handle_missing_profiles (void)
> +{
> +  struct cgraph_node *node;
> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
> +  vec<struct cgraph_node *> worklist;
> +  worklist.create (64);
> +
> +  /* See if 0 count function has non-0 count callers.  In this case we
> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    {
> +      struct cgraph_edge *e;
> +      gcov_type call_count = 0;
> +      struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +      if (node->count)
> +        continue;
> +      for (e = node->callers; e; e = e->next_caller)
> +        call_count += e->count;
> +      if (call_count
> +          && fn && fn->cfg
> +          && (call_count * unlikely_count_fraction >= profile_info->runs))
> +        {
> +          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
> +
> +          drop_profile (node, maybe_hot);
> +          worklist.safe_push (node);
> +        }

Perhaps we should add an note/error on mishandled profile when function is not COMDAT?
That way we may notice further bugs in this area.
> +    }
> +
> +  /* Propagate the profile dropping to other 0-count COMDATs that are
> +     potentially called by COMDATs we already dropped the profile on.  */
> +  while (worklist.length () > 0)
> +    {
> +      struct cgraph_edge *e;
> +
> +      node = worklist.pop ();
> +      for (e = node->callees; e; e = e->next_caller)
> +        {
> +          struct cgraph_node *callee = e->callee;
> +          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
> +
> +          if (callee->count > 0)
> +            continue;
> +          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
> +              && profile_status_for_function (fn) == PROFILE_READ)

Perhaps we can check here maybe_hot_bb_p for the caller and rely on profile estimate
to give us false only for really known to be unlikely paths? (i.e. EH handling, noreturns
etc.)
> +            {
> +              /* Since there are no non-0 call counts to this function,
> +                 we don't know for sure whether it is hot. Indicate to
> +                 the drop_profile routine that function should be marked
> +                 normal, rather than hot.  */
> +              drop_profile (node, false);
> +              worklist.safe_push (callee);
> +            }
> +        }
> +    }
> +  worklist.release ();

I would add a pointer set so we avoid duplicates in worklist.  This is potentially quadratic
for large programs.

OK, with these changes.

Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-11 16:24                                             ` Jan Hubicka
@ 2013-11-11 17:06                                               ` Teresa Johnson
  2013-11-11 17:17                                                 ` Jan Hubicka
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-11-11 17:06 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches

On Mon, Nov 11, 2013 at 7:23 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> 2013-11-08  Teresa Johnson  <tejohnson@google.com>
>>             Jan Hubicka  <jh@suse.cz>
>>
>>         * predict.c (drop_profile): New function.
>>         (handle_missing_profiles): Ditto.
>>         (counts_to_freqs): Don't overwrite estimated frequencies
>>         when function has no profile counts.
>>         * predict.h (handle_missing_profiles): Declare.
>>         * tree-inline.c (freqs_to_counts): New function.
>>         (copy_cfg_body): Invoke freqs_to_counts as needed.
>>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>>
>> Index: predict.c
>> ===================================================================
>> --- predict.c   (revision 204516)
>> +++ predict.c   (working copy)
>> @@ -2765,6 +2765,107 @@ estimate_loops (void)
>>    BITMAP_FREE (tovisit);
>>  }
>>
>> +/* Drop the profile for NODE to guessed, and update its frequency based on
>> +   whether it is expected to be HOT.  */
>> +
>> +static void
>> +drop_profile (struct cgraph_node *node, bool hot)
>> +{
>> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>> +
>> +  if (dump_file)
>> +    fprintf (dump_file,
>> +             "Dropping 0 profile for %s/%i. %s based on calls.\n",
>> +             cgraph_node_name (node), node->order,
>> +             hot ? "Function is hot" : "Function is normal");
>> +  /* We only expect to miss profiles for functions that are reached
>> +     via non-zero call edges in cases where the function may have
>> +     been linked from another module or library (COMDATs and extern
>> +     templates). See the comments below for handle_missing_profiles.  */
>> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
>> +    warning (0,
>> +             "Missing counts for called function %s/%i",
>> +             cgraph_node_name (node), node->order);
>> +
>> +  profile_status_for_function (fn)
>> +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>> +  node->frequency
>> +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>> +}
>> +
>> +/* In the case of COMDAT routines, multiple object files will contain the same
>> +   function and the linker will select one for the binary. In that case
>> +   all the other copies from the profile instrument binary will be missing
>> +   profile counts. Look for cases where this happened, due to non-zero
>> +   call counts going to 0-count functions, and drop the profile to guessed
>> +   so that we can use the estimated probabilities and avoid optimizing only
>> +   for size.
>> +
>> +   The other case where the profile may be missing is when the routine
>> +   is not going to be emitted to the object file, e.g. for "extern template"
>> +   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
>> +   all other cases of non-zero calls to 0-count functions.  */
>> +
>> +void
>> +handle_missing_profiles (void)
>> +{
>> +  struct cgraph_node *node;
>> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
>> +  vec<struct cgraph_node *> worklist;
>> +  worklist.create (64);
>> +
>> +  /* See if 0 count function has non-0 count callers.  In this case we
>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>> +  FOR_EACH_DEFINED_FUNCTION (node)
>> +    {
>> +      struct cgraph_edge *e;
>> +      gcov_type call_count = 0;
>> +      struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>> +
>> +      if (node->count)
>> +        continue;
>> +      for (e = node->callers; e; e = e->next_caller)
>> +        call_count += e->count;
>> +      if (call_count
>> +          && fn && fn->cfg
>> +          && (call_count * unlikely_count_fraction >= profile_info->runs))
>> +        {
>> +          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
>> +
>> +          drop_profile (node, maybe_hot);
>> +          worklist.safe_push (node);
>> +        }
>
> Perhaps we should add an note/error on mishandled profile when function is not COMDAT?
> That way we may notice further bugs in this area.

I have a warning like that already in drop_profile(). Is that
sufficient? Also, Steven Bosscher suggested putting that warning under
OPT_Wdisabled_optimization instead of '0', what do you prefer for
that?

>> +    }
>> +
>> +  /* Propagate the profile dropping to other 0-count COMDATs that are
>> +     potentially called by COMDATs we already dropped the profile on.  */
>> +  while (worklist.length () > 0)
>> +    {
>> +      struct cgraph_edge *e;
>> +
>> +      node = worklist.pop ();
>> +      for (e = node->callees; e; e = e->next_caller)
>> +        {
>> +          struct cgraph_node *callee = e->callee;
>> +          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
>> +
>> +          if (callee->count > 0)
>> +            continue;
>> +          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>> +              && profile_status_for_function (fn) == PROFILE_READ)
>
> Perhaps we can check here maybe_hot_bb_p for the caller and rely on profile estimate
> to give us false only for really known to be unlikely paths? (i.e. EH handling, noreturns
> etc.)

Ok, let me try this.

>> +            {
>> +              /* Since there are no non-0 call counts to this function,
>> +                 we don't know for sure whether it is hot. Indicate to
>> +                 the drop_profile routine that function should be marked
>> +                 normal, rather than hot.  */
>> +              drop_profile (node, false);
>> +              worklist.safe_push (callee);
>> +            }
>> +        }
>> +    }
>> +  worklist.release ();
>
> I would add a pointer set so we avoid duplicates in worklist.  This is potentially quadratic
> for large programs.

I'll add a visited_nodes set to keep track of processed nodes so they
don't get added to the worklist multiple times.

Thanks,
Teresa

>
> OK, with these changes.
>
> Honza



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-11 17:06                                               ` Teresa Johnson
@ 2013-11-11 17:17                                                 ` Jan Hubicka
  2013-11-11 17:57                                                   ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Hubicka @ 2013-11-11 17:17 UTC (permalink / raw)
  To: Teresa Johnson; +Cc: Jan Hubicka, Dehao Chen, Xinliang David Li, GCC Patches

> I have a warning like that already in drop_profile(). Is that

I think it should be warning (or silent) for COMDAT and error/note for
other functions (depending on flag_profile_correction).
I guess drop_profile is better place for it indeed.

> sufficient? Also, Steven Bosscher suggested putting that warning under
> OPT_Wdisabled_optimization instead of '0', what do you prefer for
> that?

It is a bit different case than other disabled optimizations we have (where
the optimization does not happen because of --param tunable limits), but I
am fine with both alternatives.
The warning may end up quite noisy so having way to silence it definitely
makes sense.
> 
> >> +    }
> >> +
> >> +  /* Propagate the profile dropping to other 0-count COMDATs that are
> >> +     potentially called by COMDATs we already dropped the profile on.  */
> >> +  while (worklist.length () > 0)
> >> +    {
> >> +      struct cgraph_edge *e;
> >> +
> >> +      node = worklist.pop ();
> >> +      for (e = node->callees; e; e = e->next_caller)
> >> +        {
> >> +          struct cgraph_node *callee = e->callee;
> >> +          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
> >> +
> >> +          if (callee->count > 0)
> >> +            continue;
> >> +          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
> >> +              && profile_status_for_function (fn) == PROFILE_READ)
> >
> > Perhaps we can check here maybe_hot_bb_p for the caller and rely on profile estimate
> > to give us false only for really known to be unlikely paths? (i.e. EH handling, noreturns
> > etc.)
> 
> Ok, let me try this.
> 
> >> +            {
> >> +              /* Since there are no non-0 call counts to this function,
> >> +                 we don't know for sure whether it is hot. Indicate to
> >> +                 the drop_profile routine that function should be marked
> >> +                 normal, rather than hot.  */
> >> +              drop_profile (node, false);
> >> +              worklist.safe_push (callee);
> >> +            }
> >> +        }
> >> +    }
> >> +  worklist.release ();
> >
> > I would add a pointer set so we avoid duplicates in worklist.  This is potentially quadratic
> > for large programs.
> 
> I'll add a visited_nodes set to keep track of processed nodes so they
> don't get added to the worklist multiple times.

Perhaps you can also track this by testing profile_status_for_function. Both solutions are fine for me.

Honza
> 
> Thanks,
> Teresa
> 
> >
> > OK, with these changes.
> >
> > Honza
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-11 17:17                                                 ` Jan Hubicka
@ 2013-11-11 17:57                                                   ` Teresa Johnson
  2013-11-11 18:18                                                     ` Jan Hubicka
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-11-11 17:57 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches

On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> I have a warning like that already in drop_profile(). Is that
>
> I think it should be warning (or silent) for COMDAT and error/note for
> other functions (depending on flag_profile_correction).
> I guess drop_profile is better place for it indeed.

I don't want to warn for COMDAT since this is currently an expected
issue in that case, so I'm afraid it will be too noisy (but there is a
note emitted to the dump file to track how often this occurs). So
currently the warning is only emitted in cases where we don't expect
it to occur (e.g. non-comdat).

>
>> sufficient? Also, Steven Bosscher suggested putting that warning under
>> OPT_Wdisabled_optimization instead of '0', what do you prefer for
>> that?
>
> It is a bit different case than other disabled optimizations we have (where
> the optimization does not happen because of --param tunable limits), but I
> am fine with both alternatives.
> The warning may end up quite noisy so having way to silence it definitely
> makes sense.

I didn't find any warnings being emitted when running this for spec or
internal benchmarks, so how about instead of a warning, I handle it
like you suggest above: depending on the setting of
flag_profile_correction either error or emit a note to the dump file
under MSG_MISSED_OPTIMIZATION?

>>
>> >> +    }
>> >> +
>> >> +  /* Propagate the profile dropping to other 0-count COMDATs that are
>> >> +     potentially called by COMDATs we already dropped the profile on.  */
>> >> +  while (worklist.length () > 0)
>> >> +    {
>> >> +      struct cgraph_edge *e;
>> >> +
>> >> +      node = worklist.pop ();
>> >> +      for (e = node->callees; e; e = e->next_caller)
>> >> +        {
>> >> +          struct cgraph_node *callee = e->callee;
>> >> +          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
>> >> +
>> >> +          if (callee->count > 0)
>> >> +            continue;
>> >> +          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>> >> +              && profile_status_for_function (fn) == PROFILE_READ)
>> >
>> > Perhaps we can check here maybe_hot_bb_p for the caller and rely on profile estimate
>> > to give us false only for really known to be unlikely paths? (i.e. EH handling, noreturns
>> > etc.)
>>
>> Ok, let me try this.
>>
>> >> +            {
>> >> +              /* Since there are no non-0 call counts to this function,
>> >> +                 we don't know for sure whether it is hot. Indicate to
>> >> +                 the drop_profile routine that function should be marked
>> >> +                 normal, rather than hot.  */
>> >> +              drop_profile (node, false);
>> >> +              worklist.safe_push (callee);
>> >> +            }
>> >> +        }
>> >> +    }
>> >> +  worklist.release ();
>> >
>> > I would add a pointer set so we avoid duplicates in worklist.  This is potentially quadratic
>> > for large programs.
>>
>> I'll add a visited_nodes set to keep track of processed nodes so they
>> don't get added to the worklist multiple times.
>
> Perhaps you can also track this by testing profile_status_for_function. Both solutions are fine for me.

Ah - I just realized I am already checking profile_status_for_function
above and adding to the worklist if it is still PROFILE_READ. Since I
call drop_profile before adding to the worklist, we can't end up
adding multiple times. So I don't think there is currently an issue
with this.

Thanks,
Teresa

>
> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > OK, with these changes.
>> >
>> > Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-11 17:57                                                   ` Teresa Johnson
@ 2013-11-11 18:18                                                     ` Jan Hubicka
  2013-11-11 18:23                                                       ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Hubicka @ 2013-11-11 18:18 UTC (permalink / raw)
  To: Teresa Johnson; +Cc: Jan Hubicka, Dehao Chen, Xinliang David Li, GCC Patches

> On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> I have a warning like that already in drop_profile(). Is that
> >
> > I think it should be warning (or silent) for COMDAT and error/note for
> > other functions (depending on flag_profile_correction).
> > I guess drop_profile is better place for it indeed.
> 
> I don't want to warn for COMDAT since this is currently an expected
> issue in that case, so I'm afraid it will be too noisy (but there is a
> note emitted to the dump file to track how often this occurs). So
> currently the warning is only emitted in cases where we don't expect
> it to occur (e.g. non-comdat).

I see, I misread it.
The non-comdat warning IMO ought go the same way as the other profile confussions.
I guess something like
if (!flag_profile_correction)
  error (...);
like existing cases in profile.c
> 
> I didn't find any warnings being emitted when running this for spec or
> internal benchmarks, so how about instead of a warning, I handle it
> like you suggest above: depending on the setting of
> flag_profile_correction either error or emit a note to the dump file
> under MSG_MISSED_OPTIMIZATION?

Yep, lets just go with error with !flag_profile_correction and being silent
otherwise.
If we want to go with warnings, we need to change all the similar places
in profile.c and elsewhere.
> 
> Ah - I just realized I am already checking profile_status_for_function
> above and adding to the worklist if it is still PROFILE_READ. Since I
> call drop_profile before adding to the worklist, we can't end up
> adding multiple times. So I don't think there is currently an issue
> with this.

Yep, indeed.

The patch is OK with the error message as discussed above.

Thanks.
Honza
> 
> Thanks,
> Teresa
> 
> >
> > Honza
> >>
> >> Thanks,
> >> Teresa
> >>
> >> >
> >> > OK, with these changes.
> >> >
> >> > Honza
> >>
> >>
> >>
> >> --
> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-11 18:18                                                     ` Jan Hubicka
@ 2013-11-11 18:23                                                       ` Teresa Johnson
  2013-11-12 20:31                                                         ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-11-11 18:23 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches

On Mon, Nov 11, 2013 at 9:13 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> I have a warning like that already in drop_profile(). Is that
>> >
>> > I think it should be warning (or silent) for COMDAT and error/note for
>> > other functions (depending on flag_profile_correction).
>> > I guess drop_profile is better place for it indeed.
>>
>> I don't want to warn for COMDAT since this is currently an expected
>> issue in that case, so I'm afraid it will be too noisy (but there is a
>> note emitted to the dump file to track how often this occurs). So
>> currently the warning is only emitted in cases where we don't expect
>> it to occur (e.g. non-comdat).
>
> I see, I misread it.
> The non-comdat warning IMO ought go the same way as the other profile confussions.
> I guess something like
> if (!flag_profile_correction)
>   error (...);
> like existing cases in profile.c

Ok, will do (emitting a note to the dump file as in other cases in
profile.c that do this same thing).

>>
>> I didn't find any warnings being emitted when running this for spec or
>> internal benchmarks, so how about instead of a warning, I handle it
>> like you suggest above: depending on the setting of
>> flag_profile_correction either error or emit a note to the dump file
>> under MSG_MISSED_OPTIMIZATION?
>
> Yep, lets just go with error with !flag_profile_correction and being silent
> otherwise.
> If we want to go with warnings, we need to change all the similar places
> in profile.c and elsewhere.
>>
>> Ah - I just realized I am already checking profile_status_for_function
>> above and adding to the worklist if it is still PROFILE_READ. Since I
>> call drop_profile before adding to the worklist, we can't end up
>> adding multiple times. So I don't think there is currently an issue
>> with this.
>
> Yep, indeed.
>
> The patch is OK with the error message as discussed above.

Ok, will do that and fix a couple of minor issues mentioned by Steven
(missing comment and incorrect indentation in one spot). Will retest
just to make sure I didn't miss any warnings which will now be errors.

Thanks,
Teresa

>
> Thanks.
> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > Honza
>> >>
>> >> Thanks,
>> >> Teresa
>> >>
>> >> >
>> >> > OK, with these changes.
>> >> >
>> >> > Honza
>> >>
>> >>
>> >>
>> >> --
>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-11 18:23                                                       ` Teresa Johnson
@ 2013-11-12 20:31                                                         ` Teresa Johnson
  2013-11-12 21:44                                                           ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-11-12 20:31 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches

On Mon, Nov 11, 2013 at 9:17 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Mon, Nov 11, 2013 at 9:13 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> >> I have a warning like that already in drop_profile(). Is that
>>> >
>>> > I think it should be warning (or silent) for COMDAT and error/note for
>>> > other functions (depending on flag_profile_correction).
>>> > I guess drop_profile is better place for it indeed.
>>>
>>> I don't want to warn for COMDAT since this is currently an expected
>>> issue in that case, so I'm afraid it will be too noisy (but there is a
>>> note emitted to the dump file to track how often this occurs). So
>>> currently the warning is only emitted in cases where we don't expect
>>> it to occur (e.g. non-comdat).
>>
>> I see, I misread it.
>> The non-comdat warning IMO ought go the same way as the other profile confussions.
>> I guess something like
>> if (!flag_profile_correction)
>>   error (...);
>> like existing cases in profile.c
>
> Ok, will do (emitting a note to the dump file as in other cases in
> profile.c that do this same thing).
>
>>>
>>> I didn't find any warnings being emitted when running this for spec or
>>> internal benchmarks, so how about instead of a warning, I handle it
>>> like you suggest above: depending on the setting of
>>> flag_profile_correction either error or emit a note to the dump file
>>> under MSG_MISSED_OPTIMIZATION?
>>
>> Yep, lets just go with error with !flag_profile_correction and being silent
>> otherwise.
>> If we want to go with warnings, we need to change all the similar places
>> in profile.c and elsewhere.
>>>
>>> Ah - I just realized I am already checking profile_status_for_function
>>> above and adding to the worklist if it is still PROFILE_READ. Since I
>>> call drop_profile before adding to the worklist, we can't end up
>>> adding multiple times. So I don't think there is currently an issue
>>> with this.
>>
>> Yep, indeed.
>>
>> The patch is OK with the error message as discussed above.
>
> Ok, will do that and fix a couple of minor issues mentioned by Steven
> (missing comment and incorrect indentation in one spot). Will retest
> just to make sure I didn't miss any warnings which will now be errors.
>
> Thanks,
> Teresa
>
>>
>> Thanks.
>> Honza
>>>
>>> Thanks,
>>> Teresa
>>>
>>> >
>>> > Honza
>>> >>
>>> >> Thanks,
>>> >> Teresa
>>> >>
>>> >> >
>>> >> > OK, with these changes.
>>> >> >
>>> >> > Honza

This was committed earlier this morning as r204704. Then I hit the new
error when doing an lto-bootstrap profiledbootstrap for an unrelated
change. I'd like to change this error to a warning for now to avoid
breaking the lto profiledbootstrap while I investigate. I don't think
this really needs to be a hard error at this point.

Running regression tests and lto profiledbootstrap right now. Ok for
trunk if testing succeeds?

2013-11-12  Teresa Johnson  <tejohnson@google.com>

        * predict.c (drop_profile): Error is currently too strict.

Index: predict.c
===================================================================
--- predict.c   (revision 204704)
+++ predict.c   (working copy)
@@ -2792,8 +2792,8 @@ drop_profile (struct cgraph_node *node, bool hot)
                      cgraph_node_name (node), node->order);
         }
       else
-        error ("Missing counts for called function %s/%i",
-               cgraph_node_name (node), node->order);
+        warning (0, "Missing counts for called function %s/%i",
+                 cgraph_node_name (node), node->order);
     }

   profile_status_for_function (fn)

Thanks,
Teresa

>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-12 20:31                                                         ` Teresa Johnson
@ 2013-11-12 21:44                                                           ` Teresa Johnson
  2013-11-12 22:19                                                             ` Jan Hubicka
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-11-12 21:44 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches, Uros Bizjak

On Tue, Nov 12, 2013 at 11:08 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Mon, Nov 11, 2013 at 9:17 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Mon, Nov 11, 2013 at 9:13 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> >> I have a warning like that already in drop_profile(). Is that
>>>> >
>>>> > I think it should be warning (or silent) for COMDAT and error/note for
>>>> > other functions (depending on flag_profile_correction).
>>>> > I guess drop_profile is better place for it indeed.
>>>>
>>>> I don't want to warn for COMDAT since this is currently an expected
>>>> issue in that case, so I'm afraid it will be too noisy (but there is a
>>>> note emitted to the dump file to track how often this occurs). So
>>>> currently the warning is only emitted in cases where we don't expect
>>>> it to occur (e.g. non-comdat).
>>>
>>> I see, I misread it.
>>> The non-comdat warning IMO ought go the same way as the other profile confussions.
>>> I guess something like
>>> if (!flag_profile_correction)
>>>   error (...);
>>> like existing cases in profile.c
>>
>> Ok, will do (emitting a note to the dump file as in other cases in
>> profile.c that do this same thing).
>>
>>>>
>>>> I didn't find any warnings being emitted when running this for spec or
>>>> internal benchmarks, so how about instead of a warning, I handle it
>>>> like you suggest above: depending on the setting of
>>>> flag_profile_correction either error or emit a note to the dump file
>>>> under MSG_MISSED_OPTIMIZATION?
>>>
>>> Yep, lets just go with error with !flag_profile_correction and being silent
>>> otherwise.
>>> If we want to go with warnings, we need to change all the similar places
>>> in profile.c and elsewhere.
>>>>
>>>> Ah - I just realized I am already checking profile_status_for_function
>>>> above and adding to the worklist if it is still PROFILE_READ. Since I
>>>> call drop_profile before adding to the worklist, we can't end up
>>>> adding multiple times. So I don't think there is currently an issue
>>>> with this.
>>>
>>> Yep, indeed.
>>>
>>> The patch is OK with the error message as discussed above.
>>
>> Ok, will do that and fix a couple of minor issues mentioned by Steven
>> (missing comment and incorrect indentation in one spot). Will retest
>> just to make sure I didn't miss any warnings which will now be errors.
>>
>> Thanks,
>> Teresa
>>
>>>
>>> Thanks.
>>> Honza
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>> >
>>>> > Honza
>>>> >>
>>>> >> Thanks,
>>>> >> Teresa
>>>> >>
>>>> >> >
>>>> >> > OK, with these changes.
>>>> >> >
>>>> >> > Honza
>
> This was committed earlier this morning as r204704. Then I hit the new
> error when doing an lto-bootstrap profiledbootstrap for an unrelated
> change. I'd like to change this error to a warning for now to avoid
> breaking the lto profiledbootstrap while I investigate. I don't think
> this really needs to be a hard error at this point.

More info on the lto bootstrap failure:

/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c:790:1:
warning: Missing counts for called function pex_child_error.isra.1/73
[enabled by default]
 }

This is an error handling routine that invokes _exit. Looking at the
ipa-profile dump, I can see that all the counts read in for
pex_child_error have the value 0. But there is a non-zero callsite,
that not surprisingly complains of a profile insanity in the bb's
outgoing edge weights, since pex_child_error never returns:

;;   basic block 38, loop depth 1, count 192, freq 5000
;;   Invalid sum of outgoing counts 0, should be 192
...
  pex_child_error.isra.1D.5005 (_92, executable_40(D),
[/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c :
677] "execv", _91);
;;    succ:       3 [100.0%]  (ABNORMAL,DFS_BACK,IRREDUCIBLE_LOOP,EXECUTABLE)

Not sure why the counts were all 0 for this routine though, I wouldn't
think the early exit should prevent us from updating the counts. But
IMO the best thing to do here is to issue a warning, since I don't
want more issues like this to cause compile errors when we handled
them fine before.

Let me know if the patch below is ok.
Thanks,
Teresa

>
> Running regression tests and lto profiledbootstrap right now. Ok for
> trunk if testing succeeds?
>
> 2013-11-12  Teresa Johnson  <tejohnson@google.com>
>
>         * predict.c (drop_profile): Error is currently too strict.
>
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 204704)
> +++ predict.c   (working copy)
> @@ -2792,8 +2792,8 @@ drop_profile (struct cgraph_node *node, bool hot)
>                       cgraph_node_name (node), node->order);
>          }
>        else
> -        error ("Missing counts for called function %s/%i",
> -               cgraph_node_name (node), node->order);
> +        warning (0, "Missing counts for called function %s/%i",
> +                 cgraph_node_name (node), node->order);
>      }
>
>    profile_status_for_function (fn)
>
> Thanks,
> Teresa
>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-12 21:44                                                           ` Teresa Johnson
@ 2013-11-12 22:19                                                             ` Jan Hubicka
  2013-11-12 22:43                                                               ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Hubicka @ 2013-11-12 22:19 UTC (permalink / raw)
  To: Teresa Johnson
  Cc: Jan Hubicka, Dehao Chen, Xinliang David Li, GCC Patches, Uros Bizjak

> More info on the lto bootstrap failure:
> 
> /usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c:790:1:
> warning: Missing counts for called function pex_child_error.isra.1/73
> [enabled by default]
>  }
> 
> This is an error handling routine that invokes _exit. Looking at the
> ipa-profile dump, I can see that all the counts read in for
> pex_child_error have the value 0. But there is a non-zero callsite,
> that not surprisingly complains of a profile insanity in the bb's
> outgoing edge weights, since pex_child_error never returns:
> 
> ;;   basic block 38, loop depth 1, count 192, freq 5000
> ;;   Invalid sum of outgoing counts 0, should be 192
> ...
>   pex_child_error.isra.1D.5005 (_92, executable_40(D),
> [/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c :
> 677] "execv", _91);
> ;;    succ:       3 [100.0%]  (ABNORMAL,DFS_BACK,IRREDUCIBLE_LOOP,EXECUTABLE)
> 
> Not sure why the counts were all 0 for this routine though, I wouldn't
> think the early exit should prevent us from updating the counts. But
> IMO the best thing to do here is to issue a warning, since I don't
> want more issues like this to cause compile errors when we handled
> them fine before.
> 
> Let me know if the patch below is ok.

OK, so _exit actually makes us to miss streaming of the path leading to the
error message?  Handling of vfork is somewhat broken, as I noticed with Martin
Liska. One problem is that gcov_exit is not clearing the counts.  With vfork
being used in addition to execvs we stream out the data but the counts stays in
the oriignal process memory and then are streamed again.  If execve fails and
leads to _exit I think we can get the miscompare you see.

Does calling gcov_clear within gcov_exit help?  (It is what I have in our tree)

Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-12 22:19                                                             ` Jan Hubicka
@ 2013-11-12 22:43                                                               ` Teresa Johnson
  2013-11-12 23:49                                                                 ` Jan Hubicka
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-11-12 22:43 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches, Uros Bizjak

On Tue, Nov 12, 2013 at 1:33 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> More info on the lto bootstrap failure:
>>
>> /usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c:790:1:
>> warning: Missing counts for called function pex_child_error.isra.1/73
>> [enabled by default]
>>  }
>>
>> This is an error handling routine that invokes _exit. Looking at the
>> ipa-profile dump, I can see that all the counts read in for
>> pex_child_error have the value 0. But there is a non-zero callsite,
>> that not surprisingly complains of a profile insanity in the bb's
>> outgoing edge weights, since pex_child_error never returns:
>>
>> ;;   basic block 38, loop depth 1, count 192, freq 5000
>> ;;   Invalid sum of outgoing counts 0, should be 192
>> ...
>>   pex_child_error.isra.1D.5005 (_92, executable_40(D),
>> [/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c :
>> 677] "execv", _91);
>> ;;    succ:       3 [100.0%]  (ABNORMAL,DFS_BACK,IRREDUCIBLE_LOOP,EXECUTABLE)
>>
>> Not sure why the counts were all 0 for this routine though, I wouldn't
>> think the early exit should prevent us from updating the counts. But
>> IMO the best thing to do here is to issue a warning, since I don't
>> want more issues like this to cause compile errors when we handled
>> them fine before.
>>
>> Let me know if the patch below is ok.
>
> OK, so _exit actually makes us to miss streaming of the path leading to the
> error message?  Handling of vfork is somewhat broken, as I noticed with Martin
> Liska. One problem is that gcov_exit is not clearing the counts.  With vfork
> being used in addition to execvs we stream out the data but the counts stays in
> the oriignal process memory and then are streamed again.  If execve fails and
> leads to _exit I think we can get the miscompare you see.
>
> Does calling gcov_clear within gcov_exit help?  (It is what I have in our tree)

I thought this was already handled by the __gcov_execv wrapper around
execv which invokes __gcov_flush that in turns does a gcov_exit
followed by gcov_clear?

I think the issue is that we register gcov_exit with atexit(), but
_exit does not execute any atexit functions by definition. So that
would explain why the counters from pex_child_error didn't get dumped.
The caller bb invokes execv before invoking pex_child_error:
          execv (executable, to_ptr32 (argv));
          pex_child_error (obj, executable, "execv", errno);
So the counters of the caller bb are dumped (hence the caller bb has
non-zero counts) and cleared, and the pex_child_error does not dump
its counters since it uses _exit instead of exit and therefore doesn't
invoke gcov_exit atexit.

If that is what is going on, then I don't see how adding a call to
gcov_clear to gcov_exit would help. Dealing with situations like this
is why I think it is better to convert this to a warning.

Teresa

>
> Honza



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-12 22:43                                                               ` Teresa Johnson
@ 2013-11-12 23:49                                                                 ` Jan Hubicka
  2013-11-13 15:33                                                                   ` Teresa Johnson
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Hubicka @ 2013-11-12 23:49 UTC (permalink / raw)
  To: Teresa Johnson
  Cc: Jan Hubicka, Dehao Chen, Xinliang David Li, GCC Patches, Uros Bizjak

> On Tue, Nov 12, 2013 at 1:33 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> More info on the lto bootstrap failure:
> >>
> >> /usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c:790:1:
> >> warning: Missing counts for called function pex_child_error.isra.1/73
> >> [enabled by default]
> >>  }
> >>
> >> This is an error handling routine that invokes _exit. Looking at the
> >> ipa-profile dump, I can see that all the counts read in for
> >> pex_child_error have the value 0. But there is a non-zero callsite,
> >> that not surprisingly complains of a profile insanity in the bb's
> >> outgoing edge weights, since pex_child_error never returns:
> >>
> >> ;;   basic block 38, loop depth 1, count 192, freq 5000
> >> ;;   Invalid sum of outgoing counts 0, should be 192
> >> ...
> >>   pex_child_error.isra.1D.5005 (_92, executable_40(D),
> >> [/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c :
> >> 677] "execv", _91);
> >> ;;    succ:       3 [100.0%]  (ABNORMAL,DFS_BACK,IRREDUCIBLE_LOOP,EXECUTABLE)
> >>
> >> Not sure why the counts were all 0 for this routine though, I wouldn't
> >> think the early exit should prevent us from updating the counts. But
> >> IMO the best thing to do here is to issue a warning, since I don't
> >> want more issues like this to cause compile errors when we handled
> >> them fine before.
> >>
> >> Let me know if the patch below is ok.
> >
> > OK, so _exit actually makes us to miss streaming of the path leading to the
> > error message?  Handling of vfork is somewhat broken, as I noticed with Martin
> > Liska. One problem is that gcov_exit is not clearing the counts.  With vfork
> > being used in addition to execvs we stream out the data but the counts stays in
> > the oriignal process memory and then are streamed again.  If execve fails and
> > leads to _exit I think we can get the miscompare you see.
> >
> > Does calling gcov_clear within gcov_exit help?  (It is what I have in our tree)
> 
> I thought this was already handled by the __gcov_execv wrapper around
> execv which invokes __gcov_flush that in turns does a gcov_exit
> followed by gcov_clear?
> 
> I think the issue is that we register gcov_exit with atexit(), but
> _exit does not execute any atexit functions by definition. So that
> would explain why the counters from pex_child_error didn't get dumped.
> The caller bb invokes execv before invoking pex_child_error:
>           execv (executable, to_ptr32 (argv));
>           pex_child_error (obj, executable, "execv", errno);
> So the counters of the caller bb are dumped (hence the caller bb has
> non-zero counts) and cleared, and the pex_child_error does not dump
> its counters since it uses _exit instead of exit and therefore doesn't
> invoke gcov_exit atexit.

Hmm, I see, the problem here is that execv gets BB splitted (and profile is correct)
but before we get into ipa_profile the BBs get re-merged and we incorrectly put
count of 1 to pex_child_error.

I suppose this will always happen when you have function terminating program
(execve) before other function call.  Perhaps we can warn only when the difference in counts
is greater than number of train runs?  

Honza

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-12 23:49                                                                 ` Jan Hubicka
@ 2013-11-13 15:33                                                                   ` Teresa Johnson
  2013-11-14  0:17                                                                     ` Jan Hubicka
  0 siblings, 1 reply; 42+ messages in thread
From: Teresa Johnson @ 2013-11-13 15:33 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Dehao Chen, Xinliang David Li, GCC Patches, Uros Bizjak

On Tue, Nov 12, 2013 at 2:35 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Tue, Nov 12, 2013 at 1:33 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> More info on the lto bootstrap failure:
>> >>
>> >> /usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c:790:1:
>> >> warning: Missing counts for called function pex_child_error.isra.1/73
>> >> [enabled by default]
>> >>  }
>> >>
>> >> This is an error handling routine that invokes _exit. Looking at the
>> >> ipa-profile dump, I can see that all the counts read in for
>> >> pex_child_error have the value 0. But there is a non-zero callsite,
>> >> that not surprisingly complains of a profile insanity in the bb's
>> >> outgoing edge weights, since pex_child_error never returns:
>> >>
>> >> ;;   basic block 38, loop depth 1, count 192, freq 5000
>> >> ;;   Invalid sum of outgoing counts 0, should be 192
>> >> ...
>> >>   pex_child_error.isra.1D.5005 (_92, executable_40(D),
>> >> [/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c :
>> >> 677] "execv", _91);
>> >> ;;    succ:       3 [100.0%]  (ABNORMAL,DFS_BACK,IRREDUCIBLE_LOOP,EXECUTABLE)
>> >>
>> >> Not sure why the counts were all 0 for this routine though, I wouldn't
>> >> think the early exit should prevent us from updating the counts. But
>> >> IMO the best thing to do here is to issue a warning, since I don't
>> >> want more issues like this to cause compile errors when we handled
>> >> them fine before.
>> >>
>> >> Let me know if the patch below is ok.
>> >
>> > OK, so _exit actually makes us to miss streaming of the path leading to the
>> > error message?  Handling of vfork is somewhat broken, as I noticed with Martin
>> > Liska. One problem is that gcov_exit is not clearing the counts.  With vfork
>> > being used in addition to execvs we stream out the data but the counts stays in
>> > the oriignal process memory and then are streamed again.  If execve fails and
>> > leads to _exit I think we can get the miscompare you see.
>> >
>> > Does calling gcov_clear within gcov_exit help?  (It is what I have in our tree)
>>
>> I thought this was already handled by the __gcov_execv wrapper around
>> execv which invokes __gcov_flush that in turns does a gcov_exit
>> followed by gcov_clear?
>>
>> I think the issue is that we register gcov_exit with atexit(), but
>> _exit does not execute any atexit functions by definition. So that
>> would explain why the counters from pex_child_error didn't get dumped.
>> The caller bb invokes execv before invoking pex_child_error:
>>           execv (executable, to_ptr32 (argv));
>>           pex_child_error (obj, executable, "execv", errno);
>> So the counters of the caller bb are dumped (hence the caller bb has
>> non-zero counts) and cleared, and the pex_child_error does not dump
>> its counters since it uses _exit instead of exit and therefore doesn't
>> invoke gcov_exit atexit.
>
> Hmm, I see, the problem here is that execv gets BB splitted (and profile is correct)
> but before we get into ipa_profile the BBs get re-merged and we incorrectly put
> count of 1 to pex_child_error.
>
> I suppose this will always happen when you have function terminating program
> (execve) before other function call.  Perhaps we can warn only when the difference in counts
> is greater than number of train runs?

Ok, that sounds good. Here is the new patch. Is this ok for trunk if
testing (bootstrap regression and lto profiledbootstrap) succeeds?

Thanks,
Teresa

2013-11-13  Teresa Johnson  <tejohnson@google.com>

        * predict.c (drop_profile): Error is currently too strict.
        (handle_missing_profiles): Pass call_count to drop_profile.

Index: predict.c
===================================================================
--- predict.c   (revision 204704)
+++ predict.c   (working copy)
@@ -2766,12 +2766,17 @@ estimate_loops (void)
 }

 /* Drop the profile for NODE to guessed, and update its frequency based on
-   whether it is expected to be HOT.  */
+   whether it is expected to be hot given the CALL_COUNT.  */

 static void
-drop_profile (struct cgraph_node *node, bool hot)
+drop_profile (struct cgraph_node *node, gcov_type call_count)
 {
   struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+  /* In the case where this was called by another function with a
+     dropped profile, call_count will be 0. Since there are no
+     non-zero call counts to this function, we don't know for sure
+     whether it is hot, and therefore it will be marked normal below.  */
+  bool hot = maybe_hot_count_p (NULL, call_count);

   if (dump_file)
     fprintf (dump_file,
@@ -2781,8 +2786,13 @@ static void
   /* We only expect to miss profiles for functions that are reached
      via non-zero call edges in cases where the function may have
      been linked from another module or library (COMDATs and extern
-     templates). See the comments below for handle_missing_profiles.  */
-  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
+     templates). See the comments below for handle_missing_profiles.
+     Also, only warn in cases where the missing counts exceed the
+     number of training runs. In certain cases with an execv followed
+     by a no-return call the profile for the no-return call is not
+     dumped and there can be a mismatch.  */
+  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl)
+      && call_count > profile_info->runs)
     {
       if (flag_profile_correction)
         {
@@ -2792,8 +2802,8 @@ static void
                      cgraph_node_name (node), node->order);
         }
       else
-        error ("Missing counts for called function %s/%i",
-               cgraph_node_name (node), node->order);
+        warning (0, "Missing counts for called function %s/%i",
+                 cgraph_node_name (node), node->order);
     }

   profile_status_for_function (fn)
@@ -2839,9 +2849,7 @@ handle_missing_profiles (void)
           && fn && fn->cfg
           && (call_count * unlikely_count_fraction >= profile_info->runs))
         {
-          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
-
-          drop_profile (node, maybe_hot);
+          drop_profile (node, call_count);
           worklist.safe_push (node);
         }
     }
@@ -2863,11 +2871,7 @@ handle_missing_profiles (void)
           if (DECL_COMDAT (callee->decl) && fn && fn->cfg
               && profile_status_for_function (fn) == PROFILE_READ)
             {
-              /* Since there are no non-0 call counts to this function,
-                 we don't know for sure whether it is hot. Indicate to
-                 the drop_profile routine that function should be marked
-                 normal, rather than hot.  */
-              drop_profile (node, false);
+              drop_profile (node, 0);
               worklist.safe_push (callee);
             }
         }

>
> Honza



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] decide edge's hotness when there is profile info
  2013-11-13 15:33                                                                   ` Teresa Johnson
@ 2013-11-14  0:17                                                                     ` Jan Hubicka
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Hubicka @ 2013-11-14  0:17 UTC (permalink / raw)
  To: Teresa Johnson
  Cc: Jan Hubicka, Dehao Chen, Xinliang David Li, GCC Patches, Uros Bizjak

> Ok, that sounds good. Here is the new patch. Is this ok for trunk if
> testing (bootstrap regression and lto profiledbootstrap) succeeds?
> 
> Thanks,
> Teresa
> 
> 2013-11-13  Teresa Johnson  <tejohnson@google.com>
> 
>         * predict.c (drop_profile): Error is currently too strict.
>         (handle_missing_profiles): Pass call_count to drop_profile.

OK, thanks

Honza
> 
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 204704)
> +++ predict.c   (working copy)
> @@ -2766,12 +2766,17 @@ estimate_loops (void)
>  }
> 
>  /* Drop the profile for NODE to guessed, and update its frequency based on
> -   whether it is expected to be HOT.  */
> +   whether it is expected to be hot given the CALL_COUNT.  */
> 
>  static void
> -drop_profile (struct cgraph_node *node, bool hot)
> +drop_profile (struct cgraph_node *node, gcov_type call_count)
>  {
>    struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +  /* In the case where this was called by another function with a
> +     dropped profile, call_count will be 0. Since there are no
> +     non-zero call counts to this function, we don't know for sure
> +     whether it is hot, and therefore it will be marked normal below.  */
> +  bool hot = maybe_hot_count_p (NULL, call_count);
> 
>    if (dump_file)
>      fprintf (dump_file,
> @@ -2781,8 +2786,13 @@ static void
>    /* We only expect to miss profiles for functions that are reached
>       via non-zero call edges in cases where the function may have
>       been linked from another module or library (COMDATs and extern
> -     templates). See the comments below for handle_missing_profiles.  */
> -  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
> +     templates). See the comments below for handle_missing_profiles.
> +     Also, only warn in cases where the missing counts exceed the
> +     number of training runs. In certain cases with an execv followed
> +     by a no-return call the profile for the no-return call is not
> +     dumped and there can be a mismatch.  */
> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl)
> +      && call_count > profile_info->runs)
>      {
>        if (flag_profile_correction)
>          {
> @@ -2792,8 +2802,8 @@ static void
>                       cgraph_node_name (node), node->order);
>          }
>        else
> -        error ("Missing counts for called function %s/%i",
> -               cgraph_node_name (node), node->order);
> +        warning (0, "Missing counts for called function %s/%i",
> +                 cgraph_node_name (node), node->order);
>      }
> 
>    profile_status_for_function (fn)
> @@ -2839,9 +2849,7 @@ handle_missing_profiles (void)
>            && fn && fn->cfg
>            && (call_count * unlikely_count_fraction >= profile_info->runs))
>          {
> -          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
> -
> -          drop_profile (node, maybe_hot);
> +          drop_profile (node, call_count);
>            worklist.safe_push (node);
>          }
>      }
> @@ -2863,11 +2871,7 @@ handle_missing_profiles (void)
>            if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>                && profile_status_for_function (fn) == PROFILE_READ)
>              {
> -              /* Since there are no non-0 call counts to this function,
> -                 we don't know for sure whether it is hot. Indicate to
> -                 the drop_profile routine that function should be marked
> -                 normal, rather than hot.  */
> -              drop_profile (node, false);
> +              drop_profile (node, 0);
>                worklist.safe_push (callee);
>              }
>          }
> 
> >
> > Honza
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2013-11-13 21:40 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-14 16:13 [PATCH] decide edge's hotness when there is profile info Dehao Chen
2013-10-14 17:15 ` Xinliang David Li
2013-10-14 17:44   ` Dehao Chen
2013-10-14 20:09     ` Jan Hubicka
2013-10-14 21:28       ` Dehao Chen
2013-10-14 21:50         ` Xinliang David Li
2013-10-14 22:00           ` Dehao Chen
2013-10-14 22:01             ` Jan Hubicka
2013-10-14 22:43               ` Dehao Chen
2013-10-15 14:59                 ` Dehao Chen
2013-10-15 15:08                 ` Teresa Johnson
2013-10-15 15:53                   ` Dehao Chen
2013-10-15 16:06                     ` Teresa Johnson
2013-10-15 16:53                       ` Dehao Chen
2013-10-15 21:21                         ` Jan Hubicka
2013-10-15 21:35                           ` Teresa Johnson
2013-10-15 22:39                             ` Jan Hubicka
2013-10-16  0:20                               ` Teresa Johnson
2013-10-16 19:15                                 ` Teresa Johnson
2013-10-18 20:55                                   ` Teresa Johnson
2013-10-18 22:35                                     ` Jan Hubicka
2013-10-30 20:40                                       ` Teresa Johnson
2013-11-05 14:08                                         ` Teresa Johnson
2013-11-08 22:17                                           ` Teresa Johnson
2013-11-08 22:28                                             ` Steven Bosscher
2013-11-10 13:05                                               ` Eric Botcazou
2013-11-10 17:16                                                 ` Steven Bosscher
2013-11-11 16:24                                             ` Jan Hubicka
2013-11-11 17:06                                               ` Teresa Johnson
2013-11-11 17:17                                                 ` Jan Hubicka
2013-11-11 17:57                                                   ` Teresa Johnson
2013-11-11 18:18                                                     ` Jan Hubicka
2013-11-11 18:23                                                       ` Teresa Johnson
2013-11-12 20:31                                                         ` Teresa Johnson
2013-11-12 21:44                                                           ` Teresa Johnson
2013-11-12 22:19                                                             ` Jan Hubicka
2013-11-12 22:43                                                               ` Teresa Johnson
2013-11-12 23:49                                                                 ` Jan Hubicka
2013-11-13 15:33                                                                   ` Teresa Johnson
2013-11-14  0:17                                                                     ` Jan Hubicka
2013-10-14 22:26             ` Xinliang David Li
2013-10-14 22:28               ` Dehao Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).