FDO usability patch -- correct insane profile data

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* FDO usability patch -- correct insane profile data
@ 2011-04-07 23:39 Xinliang David Li
  2011-04-08  9:16 ` Richard Guenther
  2011-04-17 17:01 ` Jan Hubicka
  0 siblings, 2 replies; 8+ messages in thread
From: Xinliang David Li @ 2011-04-07 23:39 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 241 bytes --]

Hi please review the attached patch.

Ok when bootstrap and test finish?

Thanks,

David



2011-04-07  Xinliang David Li  <davidxl@google.com>

	* ipa-cp.c (ipcp_update_profiling): Correct
	negative scale factor due to insane profile data.

[-- Attachment #2: ipa_cp_profile_fix.p --]
[-- Type: text/x-pascal, Size: 1771 bytes --]

Index: ipa-cp.c
===================================================================
--- ipa-cp.c	(revision 171917)
+++ ipa-cp.c	(working copy)
@@ -1113,6 +1113,29 @@ ipcp_update_profiling (void)
 	  scale = ipcp_get_node_scale (orig_node);
 	  node->count = orig_node->count * scale / REG_BR_PROB_BASE;
 	  scale_complement = REG_BR_PROB_BASE - scale;
+
+          /* Negative scale complement can result from insane profile data
+             in which the total incoming edge counts in this module is
+             larger than the callee's entry count. The insane profile data
+             usually gets generated due to the following reasons:
+
+             1) in multithreaded programs, when profile data is dumped
+             to gcda files in gcov_exit, some other threads are still running.
+             The profile counters are dumped in bottom up order (call graph).
+             The caller's BB counters may still be updated while the callee's
+             counter data is already saved to disk.
+
+             2) Comdat functions: comdat functions' profile data are not
+             allocated in comdat. When a comdat callee function gets inlined
+             at some callsites after instrumentation, and the remaining calls
+             to this function resolves to a comdat copy in another module,
+             the profile counters for this function are split. This can
+             result in sum of incoming edge counts from this module being
+             larger than callee instance's entry count.  */
+
+          if (scale_complement < 0 && flag_profile_correction)
+            scale_complement = 0;
+
 	  orig_node->count =
 	    orig_node->count * scale_complement / REG_BR_PROB_BASE;
 	  for (cs = node->callees; cs; cs = cs->next_callee)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: FDO usability patch -- correct insane profile data
  2011-04-07 23:39 FDO usability patch -- correct insane profile data Xinliang David Li
@ 2011-04-08  9:16 ` Richard Guenther
  2011-04-17 17:01 ` Jan Hubicka
  1 sibling, 0 replies; 8+ messages in thread
From: Richard Guenther @ 2011-04-08  9:16 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

On Fri, Apr 8, 2011 at 1:39 AM, Xinliang David Li <davidxl@google.com> wrote:
> Hi please review the attached patch.
>
> Ok when bootstrap and test finish?

Ok.

Thanks,
Richard.

> Thanks,
>
> David
>
>
>
> 2011-04-07  Xinliang David Li  <davidxl@google.com>
>
>        * ipa-cp.c (ipcp_update_profiling): Correct
>        negative scale factor due to insane profile data.
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: FDO usability patch -- correct insane profile data
  2011-04-07 23:39 FDO usability patch -- correct insane profile data Xinliang David Li
  2011-04-08  9:16 ` Richard Guenther
@ 2011-04-17 17:01 ` Jan Hubicka
  2011-04-17 18:53   ` Xinliang David Li
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Hubicka @ 2011-04-17 17:01 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

> Hi please review the attached patch.
> 
> Ok when bootstrap and test finish?
> 
> Thanks,
> 
> David
> 
> 
> 
> 2011-04-07  Xinliang David Li  <davidxl@google.com>
> 
> 	* ipa-cp.c (ipcp_update_profiling): Correct
> 	negative scale factor due to insane profile data.

> Index: ipa-cp.c
> ===================================================================
> --- ipa-cp.c	(revision 171917)
> +++ ipa-cp.c	(working copy)
> @@ -1113,6 +1113,29 @@ ipcp_update_profiling (void)
>  	  scale = ipcp_get_node_scale (orig_node);
>  	  node->count = orig_node->count * scale / REG_BR_PROB_BASE;
>  	  scale_complement = REG_BR_PROB_BASE - scale;
> +
> +          /* Negative scale complement can result from insane profile data
> +             in which the total incoming edge counts in this module is
> +             larger than the callee's entry count. The insane profile data
> +             usually gets generated due to the following reasons:
> +
> +             1) in multithreaded programs, when profile data is dumped
> +             to gcda files in gcov_exit, some other threads are still running.
> +             The profile counters are dumped in bottom up order (call graph).
> +             The caller's BB counters may still be updated while the callee's
> +             counter data is already saved to disk.
> +
> +             2) Comdat functions: comdat functions' profile data are not
> +             allocated in comdat. When a comdat callee function gets inlined
> +             at some callsites after instrumentation, and the remaining calls
> +             to this function resolves to a comdat copy in another module,
> +             the profile counters for this function are split. This can
> +             result in sum of incoming edge counts from this module being
> +             larger than callee instance's entry count.  */
> +
> +          if (scale_complement < 0 && flag_profile_correction)

Please don't use flag_profile_correction in code like this. Even with !flag_profile_correction
the profile is sane only just at the time it is read. Compiler invalidate it for various non-trivial
reasons during the code transformation.  Think if inlining 
int t(int a)
{
  if (a<0)
    return 1;
  else return 2;
}
that is called once with constant -1 and once with constant 0.  Profile say
that the branch has probability 1/2 but in one inline copy it has probablity 0
and in other copy 1.  We simply inline with probability 1/2 and end up with
insane profile in the caller.

I would suggest, for brevity, also drop the comment on why profile can be insane. We have
many places like this, even if it might look bit distracking at start, we probably could
document that somewhere in the internal documentation. Can't think of better place to stick
this.

However the test seems bit symptomatic to me. It matches always when scale > REG_BR_PROB_BASE
and such scale is already nonsential (i.e. clone should not be more often executed than its
original).  It is computed in:

/* Compute the proper scale for NODE.  It is the ratio between the number of
   direct calls (represented on the incoming cgraph_edges) and sum of all
   invocations of NODE (represented as count in cgraph_node).

   FIXME: This code is wrong.  Since the callers can be also clones and
   the clones are not scaled yet, the sums gets unrealistically high.
   To properly compute the counts, we would need to do propagation across
   callgraph (as external call to A might imply call to non-cloned B
   if A's clone calls cloned B).  */
static void
ipcp_compute_node_scale (struct cgraph_node *node)
{ 
  gcov_type sum;
  struct cgraph_edge *cs;

  sum = 0;
  /* Compute sum of all counts of callers. */
  for (cs = node->callers; cs != NULL; cs = cs->next_caller)
    sum += cs->count;
  /* Work around the unrealistically high sum problem.  We just don't want
     the non-cloned body to have negative or very low frequency.  Since
     majority of execution time will be spent in clones anyway, this should
     give good enough profile.  */
  if (sum > node->count * 9 / 10)
    sum = node->count * 9 / 10;
  if (node->count == 0)
    ipcp_set_node_scale (node, 0);
  else
    ipcp_set_node_scale (node, sum * REG_BR_PROB_BASE / node->count);
}

We already test that sum is at most 9/10th of node count and then the scale is
computed as sum * REG_BR_PROB_BASE / node->count so the scale should not
exceed REG_BR_PROB_BASE * 9 / 10.

How it possibly can end up being greater than REG_BR_PROB_BASE at the time it is used?

Honza

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: FDO usability patch -- correct insane profile data
  2011-04-17 17:01 ` Jan Hubicka
@ 2011-04-17 18:53   ` Xinliang David Li
  2011-04-17 19:39     ` Jan Hubicka
  0 siblings, 1 reply; 8+ messages in thread
From: Xinliang David Li @ 2011-04-17 18:53 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches

On Sun, Apr 17, 2011 at 8:29 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Hi please review the attached patch.
>>
>> Ok when bootstrap and test finish?
>>
>> Thanks,
>>
>> David
>>
>>
>>
>> 2011-04-07  Xinliang David Li  <davidxl@google.com>
>>
>>       * ipa-cp.c (ipcp_update_profiling): Correct
>>       negative scale factor due to insane profile data.
>
>> Index: ipa-cp.c
>> ===================================================================
>> --- ipa-cp.c  (revision 171917)
>> +++ ipa-cp.c  (working copy)
>> @@ -1113,6 +1113,29 @@ ipcp_update_profiling (void)
>>         scale = ipcp_get_node_scale (orig_node);
>>         node->count = orig_node->count * scale / REG_BR_PROB_BASE;
>>         scale_complement = REG_BR_PROB_BASE - scale;
>> +
>> +          /* Negative scale complement can result from insane profile data
>> +             in which the total incoming edge counts in this module is
>> +             larger than the callee's entry count. The insane profile data
>> +             usually gets generated due to the following reasons:
>> +
>> +             1) in multithreaded programs, when profile data is dumped
>> +             to gcda files in gcov_exit, some other threads are still running.
>> +             The profile counters are dumped in bottom up order (call graph).
>> +             The caller's BB counters may still be updated while the callee's
>> +             counter data is already saved to disk.
>> +
>> +             2) Comdat functions: comdat functions' profile data are not
>> +             allocated in comdat. When a comdat callee function gets inlined
>> +             at some callsites after instrumentation, and the remaining calls
>> +             to this function resolves to a comdat copy in another module,
>> +             the profile counters for this function are split. This can
>> +             result in sum of incoming edge counts from this module being
>> +             larger than callee instance's entry count.  */
>> +
>> +          if (scale_complement < 0 && flag_profile_correction)
>
> Please don't use flag_profile_correction in code like this. Even with !flag_profile_correction
> the profile is sane only just at the time it is read. Compiler invalidate it for various non-trivial
> reasons during the code transformation.  Think if inlining
> int t(int a)
> {
>  if (a<0)
>    return 1;
>  else return 2;
> }
> that is called once with constant -1 and once with constant 0.  Profile say
> that the branch has probability 1/2 but in one inline copy it has probablity 0
> and in other copy 1.  We simply inline with probability 1/2 and end up with
> insane profile in the caller.

Yes -- this is the insanity due to profile update without context
sensitivity -- different from the scenario that motivates the change
-- which is insanity caused by profile collection.  Dropping the flag
check is fine.


>
> I would suggest, for brevity, also drop the comment on why profile can be insane. We have
> many places like this, even if it might look bit distracking at start, we probably could
> document that somewhere in the internal documentation. Can't think of better place to stick
> this.
>
> However the test seems bit symptomatic to me. It matches always when scale > REG_BR_PROB_BASE
> and such scale is already nonsential (i.e. clone should not be more often executed than its
> original).  It is computed in:
>
> /* Compute the proper scale for NODE.  It is the ratio between the number of
>   direct calls (represented on the incoming cgraph_edges) and sum of all
>   invocations of NODE (represented as count in cgraph_node).
>
>   FIXME: This code is wrong.  Since the callers can be also clones and
>   the clones are not scaled yet, the sums gets unrealistically high.
>   To properly compute the counts, we would need to do propagation across
>   callgraph (as external call to A might imply call to non-cloned B
>   if A's clone calls cloned B).  */
> static void
> ipcp_compute_node_scale (struct cgraph_node *node)
> {
>  gcov_type sum;
>  struct cgraph_edge *cs;
>
>  sum = 0;
>  /* Compute sum of all counts of callers. */
>  for (cs = node->callers; cs != NULL; cs = cs->next_caller)
>    sum += cs->count;
>  /* Work around the unrealistically high sum problem.  We just don't want
>     the non-cloned body to have negative or very low frequency.  Since
>     majority of execution time will be spent in clones anyway, this should
>     give good enough profile.  */
>  if (sum > node->count * 9 / 10)
>    sum = node->count * 9 / 10;
>  if (node->count == 0)
>    ipcp_set_node_scale (node, 0);
>  else
>    ipcp_set_node_scale (node, sum * REG_BR_PROB_BASE / node->count);
> }
>
> We already test that sum is at most 9/10th of node count and then the scale is
> computed as sum * REG_BR_PROB_BASE / node->count so the scale should not
> exceed REG_BR_PROB_BASE * 9 / 10.
>
> How it possibly can end up being greater than REG_BR_PROB_BASE at the time it is used?


Good point. The change above was added based on 4.4.3 where the sum
computation has no capping like above. Yes, with the current capping,
the negative value won't result. However, it is still good to guard
the scale independent of the implementation of ipcp_compute_node_scale
-- it may change and break silently (the comment of the function says
the implementation is wrong...)

Thanks,

David


>
> Honza
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: FDO usability patch -- correct insane profile data
  2011-04-17 18:53   ` Xinliang David Li
@ 2011-04-17 19:39     ` Jan Hubicka
  2011-04-17 19:44       ` Xinliang David Li
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Hubicka @ 2011-04-17 19:39 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Jan Hubicka, GCC Patches

Hi,
hmm, looks we both run into same issue and developed different fixes.
> 
> Good point. The change above was added based on 4.4.3 where the sum
> computation has no capping like above. Yes, with the current capping,
> the negative value won't result. However, it is still good to guard
> the scale independent of the implementation of ipcp_compute_node_scale
> -- it may change and break silently (the comment of the function says
> the implementation is wrong...)

Well, if we want to add check in anticipation that we will introduce bug few
lines up, I think we could just add gcc_assert (scale < BASE); with an comment
explaining that ipcp_compute_node_scale must give results in range even when
profile is not flow consistent.

Since we need to propagate the scales across callgraph (at least for the
correct implementation of the function), masking bug in ipcp_compute_node_scale
would make us to propagate the nonsenses further and silently producing
unnecesarily aplified insanity.

I would pre-approve patch reverting the current change and adding the assert
with comment.

Honza
> 
> Thanks,
> 
> David
> 
> 
> >
> > Honza
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: FDO usability patch -- correct insane profile data
  2011-04-17 19:39     ` Jan Hubicka
@ 2011-04-17 19:44       ` Xinliang David Li
  2011-04-17 22:23         ` Jan Hubicka
  0 siblings, 1 reply; 8+ messages in thread
From: Xinliang David Li @ 2011-04-17 19:44 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches

Adding assertion sounds good to me -- the only problem is that problem
like this hard to be triggered during testing, thus making assertion
less useful as it can be. Is it better to add the assertion with
checking is enabled while still keeping the correction code?

Thanks,

David

On Sun, Apr 17, 2011 at 11:53 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> hmm, looks we both run into same issue and developed different fixes.
>>
>> Good point. The change above was added based on 4.4.3 where the sum
>> computation has no capping like above. Yes, with the current capping,
>> the negative value won't result. However, it is still good to guard
>> the scale independent of the implementation of ipcp_compute_node_scale
>> -- it may change and break silently (the comment of the function says
>> the implementation is wrong...)
>
> Well, if we want to add check in anticipation that we will introduce bug few
> lines up, I think we could just add gcc_assert (scale < BASE); with an comment
> explaining that ipcp_compute_node_scale must give results in range even when
> profile is not flow consistent.
>
> Since we need to propagate the scales across callgraph (at least for the
> correct implementation of the function), masking bug in ipcp_compute_node_scale
> would make us to propagate the nonsenses further and silently producing
> unnecesarily aplified insanity.
>
> I would pre-approve patch reverting the current change and adding the assert
> with comment.
>
> Honza
>>
>> Thanks,
>>
>> David
>>
>>
>> >
>> > Honza
>> >
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: FDO usability patch -- correct insane profile data
  2011-04-17 19:44       ` Xinliang David Li
@ 2011-04-17 22:23         ` Jan Hubicka
  2011-04-19 17:29           ` Xinliang David Li
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Hubicka @ 2011-04-17 22:23 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Jan Hubicka, GCC Patches

> Adding assertion sounds good to me -- the only problem is that problem
> like this hard to be triggered during testing, thus making assertion
> less useful as it can be. Is it better to add the assertion with
> checking is enabled while still keeping the correction code?
Hi,
since we speak about code that has minimal to none testing coverage in our
automates testers (we don't really test anything multithreaded or with mismatching
profiles), I don't think gcc_checking_assert would make any difference.

Since we don't really want the future bug to stay around unnoticed (since it
would resolut in silent misupdates of profiles), we are only shooting to make
analysis easier so next time someone trips around the scenario he won't see
symptomatic message from cgraph verifier.
So I would go for normal assert and comment about nature of the bug.

Honza
> 
> Thanks,
> 
> David
> 
> On Sun, Apr 17, 2011 at 11:53 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> > Hi,
> > hmm, looks we both run into same issue and developed different fixes.
> >>
> >> Good point. The change above was added based on 4.4.3 where the sum
> >> computation has no capping like above. Yes, with the current capping,
> >> the negative value won't result. However, it is still good to guard
> >> the scale independent of the implementation of ipcp_compute_node_scale
> >> -- it may change and break silently (the comment of the function says
> >> the implementation is wrong...)
> >
> > Well, if we want to add check in anticipation that we will introduce bug few
> > lines up, I think we could just add gcc_assert (scale < BASE); with an comment
> > explaining that ipcp_compute_node_scale must give results in range even when
> > profile is not flow consistent.
> >
> > Since we need to propagate the scales across callgraph (at least for the
> > correct implementation of the function), masking bug in ipcp_compute_node_scale
> > would make us to propagate the nonsenses further and silently producing
> > unnecesarily aplified insanity.
> >
> > I would pre-approve patch reverting the current change and adding the assert
> > with comment.
> >
> > Honza
> >>
> >> Thanks,
> >>
> >> David
> >>
> >>
> >> >
> >> > Honza
> >> >
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: FDO usability patch -- correct insane profile data
  2011-04-17 22:23         ` Jan Hubicka
@ 2011-04-19 17:29           ` Xinliang David Li
  0 siblings, 0 replies; 8+ messages in thread
From: Xinliang David Li @ 2011-04-19 17:29 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches

I reverted the original patch and added the assertion instead.

Thanks,

David

On Sun, Apr 17, 2011 at 2:31 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Adding assertion sounds good to me -- the only problem is that problem
>> like this hard to be triggered during testing, thus making assertion
>> less useful as it can be. Is it better to add the assertion with
>> checking is enabled while still keeping the correction code?
> Hi,
> since we speak about code that has minimal to none testing coverage in our
> automates testers (we don't really test anything multithreaded or with mismatching
> profiles), I don't think gcc_checking_assert would make any difference.
>
> Since we don't really want the future bug to stay around unnoticed (since it
> would resolut in silent misupdates of profiles), we are only shooting to make
> analysis easier so next time someone trips around the scenario he won't see
> symptomatic message from cgraph verifier.
> So I would go for normal assert and comment about nature of the bug.
>
> Honza
>>
>> Thanks,
>>
>> David
>>
>> On Sun, Apr 17, 2011 at 11:53 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > Hi,
>> > hmm, looks we both run into same issue and developed different fixes.
>> >>
>> >> Good point. The change above was added based on 4.4.3 where the sum
>> >> computation has no capping like above. Yes, with the current capping,
>> >> the negative value won't result. However, it is still good to guard
>> >> the scale independent of the implementation of ipcp_compute_node_scale
>> >> -- it may change and break silently (the comment of the function says
>> >> the implementation is wrong...)
>> >
>> > Well, if we want to add check in anticipation that we will introduce bug few
>> > lines up, I think we could just add gcc_assert (scale < BASE); with an comment
>> > explaining that ipcp_compute_node_scale must give results in range even when
>> > profile is not flow consistent.
>> >
>> > Since we need to propagate the scales across callgraph (at least for the
>> > correct implementation of the function), masking bug in ipcp_compute_node_scale
>> > would make us to propagate the nonsenses further and silently producing
>> > unnecesarily aplified insanity.
>> >
>> > I would pre-approve patch reverting the current change and adding the assert
>> > with comment.
>> >
>> > Honza
>> >>
>> >> Thanks,
>> >>
>> >> David
>> >>
>> >>
>> >> >
>> >> > Honza
>> >> >
>> >
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-04-19 16:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-07 23:39 FDO usability patch -- correct insane profile data Xinliang David Li
2011-04-08  9:16 ` Richard Guenther
2011-04-17 17:01 ` Jan Hubicka
2011-04-17 18:53   ` Xinliang David Li
2011-04-17 19:39     ` Jan Hubicka
2011-04-17 19:44       ` Xinliang David Li
2011-04-17 22:23         ` Jan Hubicka
2011-04-19 17:29           ` Xinliang David Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).