public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [GOOGLE] Fix AutoFDO size issue
@ 2014-11-13 22:33 Dehao Chen
  2014-11-13 22:57 ` Xinliang David Li
  0 siblings, 1 reply; 10+ messages in thread
From: Dehao Chen @ 2014-11-13 22:33 UTC (permalink / raw)
  To: GCC Patches, David Li

In AutoFDO, we increase einline iterations. This could lead to
extensive code bloat if we have recursive calls like:

dtor() {
  destroy(node);
}

destroy(node) {
  destroy(left)
  destroy(right)
}

In this case, the size growth will be around 8 which is smaller than
threshold (11). However, if we allow this to happen for 2 iterations,
it will expand the size by 1024X. To fix this problem, we want to set
a much smaller threshold in the AutoFDO case. This is because AutoFDO
do not not rely on aggressive einline to gain more profile context.

And also, in AutoFDO pass, after we processed a function, we need to
recompute inline parameters because rebuild_cgraph_edges will zero out
all inline parameters.

The patch is attached below, bootstrapped and perf test on-going. OK
for google-4_9?

Thanks,
Dehao

Index: gcc/auto-profile.c
===================================================================
--- gcc/auto-profile.c (revision 217523)
+++ gcc/auto-profile.c (working copy)
@@ -1771,6 +1771,7 @@ auto_profile (void)
       free_dominance_info (CDI_DOMINATORS);
       free_dominance_info (CDI_POST_DOMINATORS);
       rebuild_cgraph_edges ();
+      compute_inline_parameters (cgraph_get_node
(current_function_decl), true);
       pop_cfun ();
     }

Index: gcc/opts.c
===================================================================
--- gcc/opts.c (revision 217523)
+++ gcc/opts.c (working copy)
@@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
       maybe_set_param_value (
  PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
  opts->x_param_values, opts_set->x_param_values);
+      maybe_set_param_value (
+        PARAM_EARLY_INLINING_INSNS, 4,
+        opts->x_param_values, opts_set->x_param_values);
+      maybe_set_param_value (
+        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
+        opts->x_param_values, opts_set->x_param_values);
       value = true;
     /* No break here - do -fauto-profile processing. */
     case OPT_fauto_profile:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GOOGLE] Fix AutoFDO size issue
  2014-11-13 22:33 [GOOGLE] Fix AutoFDO size issue Dehao Chen
@ 2014-11-13 22:57 ` Xinliang David Li
  2014-11-13 22:59   ` Xinliang David Li
  0 siblings, 1 reply; 10+ messages in thread
From: Xinliang David Li @ 2014-11-13 22:57 UTC (permalink / raw)
  To: Dehao Chen; +Cc: GCC Patches

Is there a need to have 10 iterations of early inline for autofdo?

David

On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
> In AutoFDO, we increase einline iterations. This could lead to
> extensive code bloat if we have recursive calls like:
>
> dtor() {
>   destroy(node);
> }
>
> destroy(node) {
>   destroy(left)
>   destroy(right)
> }
>
> In this case, the size growth will be around 8 which is smaller than
> threshold (11). However, if we allow this to happen for 2 iterations,
> it will expand the size by 1024X. To fix this problem, we want to set
> a much smaller threshold in the AutoFDO case. This is because AutoFDO
> do not not rely on aggressive einline to gain more profile context.
>
> And also, in AutoFDO pass, after we processed a function, we need to
> recompute inline parameters because rebuild_cgraph_edges will zero out
> all inline parameters.
>
> The patch is attached below, bootstrapped and perf test on-going. OK
> for google-4_9?
>
> Thanks,
> Dehao
>
> Index: gcc/auto-profile.c
> ===================================================================
> --- gcc/auto-profile.c (revision 217523)
> +++ gcc/auto-profile.c (working copy)
> @@ -1771,6 +1771,7 @@ auto_profile (void)
>        free_dominance_info (CDI_DOMINATORS);
>        free_dominance_info (CDI_POST_DOMINATORS);
>        rebuild_cgraph_edges ();
> +      compute_inline_parameters (cgraph_get_node
> (current_function_decl), true);
>        pop_cfun ();
>      }
>
> Index: gcc/opts.c
> ===================================================================
> --- gcc/opts.c (revision 217523)
> +++ gcc/opts.c (working copy)
> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>        maybe_set_param_value (
>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>   opts->x_param_values, opts_set->x_param_values);
> +      maybe_set_param_value (
> +        PARAM_EARLY_INLINING_INSNS, 4,
> +        opts->x_param_values, opts_set->x_param_values);
> +      maybe_set_param_value (
> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
> +        opts->x_param_values, opts_set->x_param_values);
>        value = true;
>      /* No break here - do -fauto-profile processing. */
>      case OPT_fauto_profile:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GOOGLE] Fix AutoFDO size issue
  2014-11-13 22:57 ` Xinliang David Li
@ 2014-11-13 22:59   ` Xinliang David Li
  2014-11-13 23:18     ` Dehao Chen
  0 siblings, 1 reply; 10+ messages in thread
From: Xinliang David Li @ 2014-11-13 22:59 UTC (permalink / raw)
  To: Dehao Chen; +Cc: GCC Patches

After inline summary is recomputed, the large code growth problem will
also be better controlled, right?

David

On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote:
> Is there a need to have 10 iterations of early inline for autofdo?
>
> David
>
> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
>> In AutoFDO, we increase einline iterations. This could lead to
>> extensive code bloat if we have recursive calls like:
>>
>> dtor() {
>>   destroy(node);
>> }
>>
>> destroy(node) {
>>   destroy(left)
>>   destroy(right)
>> }
>>
>> In this case, the size growth will be around 8 which is smaller than
>> threshold (11). However, if we allow this to happen for 2 iterations,
>> it will expand the size by 1024X. To fix this problem, we want to set
>> a much smaller threshold in the AutoFDO case. This is because AutoFDO
>> do not not rely on aggressive einline to gain more profile context.
>>
>> And also, in AutoFDO pass, after we processed a function, we need to
>> recompute inline parameters because rebuild_cgraph_edges will zero out
>> all inline parameters.
>>
>> The patch is attached below, bootstrapped and perf test on-going. OK
>> for google-4_9?
>>
>> Thanks,
>> Dehao
>>
>> Index: gcc/auto-profile.c
>> ===================================================================
>> --- gcc/auto-profile.c (revision 217523)
>> +++ gcc/auto-profile.c (working copy)
>> @@ -1771,6 +1771,7 @@ auto_profile (void)
>>        free_dominance_info (CDI_DOMINATORS);
>>        free_dominance_info (CDI_POST_DOMINATORS);
>>        rebuild_cgraph_edges ();
>> +      compute_inline_parameters (cgraph_get_node
>> (current_function_decl), true);
>>        pop_cfun ();
>>      }
>>
>> Index: gcc/opts.c
>> ===================================================================
>> --- gcc/opts.c (revision 217523)
>> +++ gcc/opts.c (working copy)
>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>>        maybe_set_param_value (
>>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>>   opts->x_param_values, opts_set->x_param_values);
>> +      maybe_set_param_value (
>> +        PARAM_EARLY_INLINING_INSNS, 4,
>> +        opts->x_param_values, opts_set->x_param_values);
>> +      maybe_set_param_value (
>> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
>> +        opts->x_param_values, opts_set->x_param_values);
>>        value = true;
>>      /* No break here - do -fauto-profile processing. */
>>      case OPT_fauto_profile:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GOOGLE] Fix AutoFDO size issue
  2014-11-13 22:59   ` Xinliang David Li
@ 2014-11-13 23:18     ` Dehao Chen
  2014-11-13 23:40       ` Xinliang David Li
  0 siblings, 1 reply; 10+ messages in thread
From: Dehao Chen @ 2014-11-13 23:18 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it
should not harm to set max iter as 10.

On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote:
> After inline summary is recomputed, the large code growth problem will
> also be better controlled, right?

For this case, recomputing inline summary does not help because the
code was bloated in first einline phase.

Dehao

>
> David
>
> On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote:
>> Is there a need to have 10 iterations of early inline for autofdo?
>>
>> David
>>
>> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
>>> In AutoFDO, we increase einline iterations. This could lead to
>>> extensive code bloat if we have recursive calls like:
>>>
>>> dtor() {
>>>   destroy(node);
>>> }
>>>
>>> destroy(node) {
>>>   destroy(left)
>>>   destroy(right)
>>> }
>>>
>>> In this case, the size growth will be around 8 which is smaller than
>>> threshold (11). However, if we allow this to happen for 2 iterations,
>>> it will expand the size by 1024X. To fix this problem, we want to set
>>> a much smaller threshold in the AutoFDO case. This is because AutoFDO
>>> do not not rely on aggressive einline to gain more profile context.
>>>
>>> And also, in AutoFDO pass, after we processed a function, we need to
>>> recompute inline parameters because rebuild_cgraph_edges will zero out
>>> all inline parameters.
>>>
>>> The patch is attached below, bootstrapped and perf test on-going. OK
>>> for google-4_9?
>>>
>>> Thanks,
>>> Dehao
>>>
>>> Index: gcc/auto-profile.c
>>> ===================================================================
>>> --- gcc/auto-profile.c (revision 217523)
>>> +++ gcc/auto-profile.c (working copy)
>>> @@ -1771,6 +1771,7 @@ auto_profile (void)
>>>        free_dominance_info (CDI_DOMINATORS);
>>>        free_dominance_info (CDI_POST_DOMINATORS);
>>>        rebuild_cgraph_edges ();
>>> +      compute_inline_parameters (cgraph_get_node
>>> (current_function_decl), true);
>>>        pop_cfun ();
>>>      }
>>>
>>> Index: gcc/opts.c
>>> ===================================================================
>>> --- gcc/opts.c (revision 217523)
>>> +++ gcc/opts.c (working copy)
>>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>>>        maybe_set_param_value (
>>>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>>>   opts->x_param_values, opts_set->x_param_values);
>>> +      maybe_set_param_value (
>>> +        PARAM_EARLY_INLINING_INSNS, 4,
>>> +        opts->x_param_values, opts_set->x_param_values);
>>> +      maybe_set_param_value (
>>> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
>>> +        opts->x_param_values, opts_set->x_param_values);
>>>        value = true;
>>>      /* No break here - do -fauto-profile processing. */
>>>      case OPT_fauto_profile:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GOOGLE] Fix AutoFDO size issue
  2014-11-13 23:18     ` Dehao Chen
@ 2014-11-13 23:40       ` Xinliang David Li
  2014-11-13 23:55         ` Dehao Chen
  0 siblings, 1 reply; 10+ messages in thread
From: Xinliang David Li @ 2014-11-13 23:40 UTC (permalink / raw)
  To: Dehao Chen; +Cc: GCC Patches

On Thu, Nov 13, 2014 at 2:57 PM, Dehao Chen <dehao@google.com> wrote:
> IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it
> should not harm to set max iter as 10.
>
> On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote:
>> After inline summary is recomputed, the large code growth problem will
>> also be better controlled, right?
>
> For this case, recomputing inline summary does not help because the
> code was bloated in first einline phase.

For recursive inlining, the inline summary for the cloned edges need
to be updated to prevent the growth?

david

>
> Dehao
>
>>
>> David
>>
>> On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> Is there a need to have 10 iterations of early inline for autofdo?
>>>
>>> David
>>>
>>> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
>>>> In AutoFDO, we increase einline iterations. This could lead to
>>>> extensive code bloat if we have recursive calls like:
>>>>
>>>> dtor() {
>>>>   destroy(node);
>>>> }
>>>>
>>>> destroy(node) {
>>>>   destroy(left)
>>>>   destroy(right)
>>>> }
>>>>
>>>> In this case, the size growth will be around 8 which is smaller than
>>>> threshold (11). However, if we allow this to happen for 2 iterations,
>>>> it will expand the size by 1024X. To fix this problem, we want to set
>>>> a much smaller threshold in the AutoFDO case. This is because AutoFDO
>>>> do not not rely on aggressive einline to gain more profile context.
>>>>
>>>> And also, in AutoFDO pass, after we processed a function, we need to
>>>> recompute inline parameters because rebuild_cgraph_edges will zero out
>>>> all inline parameters.
>>>>
>>>> The patch is attached below, bootstrapped and perf test on-going. OK
>>>> for google-4_9?
>>>>
>>>> Thanks,
>>>> Dehao
>>>>
>>>> Index: gcc/auto-profile.c
>>>> ===================================================================
>>>> --- gcc/auto-profile.c (revision 217523)
>>>> +++ gcc/auto-profile.c (working copy)
>>>> @@ -1771,6 +1771,7 @@ auto_profile (void)
>>>>        free_dominance_info (CDI_DOMINATORS);
>>>>        free_dominance_info (CDI_POST_DOMINATORS);
>>>>        rebuild_cgraph_edges ();
>>>> +      compute_inline_parameters (cgraph_get_node
>>>> (current_function_decl), true);
>>>>        pop_cfun ();
>>>>      }
>>>>
>>>> Index: gcc/opts.c
>>>> ===================================================================
>>>> --- gcc/opts.c (revision 217523)
>>>> +++ gcc/opts.c (working copy)
>>>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>>>>        maybe_set_param_value (
>>>>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>>>>   opts->x_param_values, opts_set->x_param_values);
>>>> +      maybe_set_param_value (
>>>> +        PARAM_EARLY_INLINING_INSNS, 4,
>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>> +      maybe_set_param_value (
>>>> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>        value = true;
>>>>      /* No break here - do -fauto-profile processing. */
>>>>      case OPT_fauto_profile:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GOOGLE] Fix AutoFDO size issue
  2014-11-13 23:40       ` Xinliang David Li
@ 2014-11-13 23:55         ` Dehao Chen
  2014-11-17 21:14           ` Dehao Chen
  0 siblings, 1 reply; 10+ messages in thread
From: Dehao Chen @ 2014-11-13 23:55 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

We do not do sophisticated recursive call detection in einline phase.
It only happens in ipa-inline phase.

Dehao

On Thu, Nov 13, 2014 at 3:18 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Thu, Nov 13, 2014 at 2:57 PM, Dehao Chen <dehao@google.com> wrote:
>> IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it
>> should not harm to set max iter as 10.
>>
>> On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> After inline summary is recomputed, the large code growth problem will
>>> also be better controlled, right?
>>
>> For this case, recomputing inline summary does not help because the
>> code was bloated in first einline phase.
>
> For recursive inlining, the inline summary for the cloned edges need
> to be updated to prevent the growth?
>
> david
>
>>
>> Dehao
>>
>>>
>>> David
>>>
>>> On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> Is there a need to have 10 iterations of early inline for autofdo?
>>>>
>>>> David
>>>>
>>>> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
>>>>> In AutoFDO, we increase einline iterations. This could lead to
>>>>> extensive code bloat if we have recursive calls like:
>>>>>
>>>>> dtor() {
>>>>>   destroy(node);
>>>>> }
>>>>>
>>>>> destroy(node) {
>>>>>   destroy(left)
>>>>>   destroy(right)
>>>>> }
>>>>>
>>>>> In this case, the size growth will be around 8 which is smaller than
>>>>> threshold (11). However, if we allow this to happen for 2 iterations,
>>>>> it will expand the size by 1024X. To fix this problem, we want to set
>>>>> a much smaller threshold in the AutoFDO case. This is because AutoFDO
>>>>> do not not rely on aggressive einline to gain more profile context.
>>>>>
>>>>> And also, in AutoFDO pass, after we processed a function, we need to
>>>>> recompute inline parameters because rebuild_cgraph_edges will zero out
>>>>> all inline parameters.
>>>>>
>>>>> The patch is attached below, bootstrapped and perf test on-going. OK
>>>>> for google-4_9?
>>>>>
>>>>> Thanks,
>>>>> Dehao
>>>>>
>>>>> Index: gcc/auto-profile.c
>>>>> ===================================================================
>>>>> --- gcc/auto-profile.c (revision 217523)
>>>>> +++ gcc/auto-profile.c (working copy)
>>>>> @@ -1771,6 +1771,7 @@ auto_profile (void)
>>>>>        free_dominance_info (CDI_DOMINATORS);
>>>>>        free_dominance_info (CDI_POST_DOMINATORS);
>>>>>        rebuild_cgraph_edges ();
>>>>> +      compute_inline_parameters (cgraph_get_node
>>>>> (current_function_decl), true);
>>>>>        pop_cfun ();
>>>>>      }
>>>>>
>>>>> Index: gcc/opts.c
>>>>> ===================================================================
>>>>> --- gcc/opts.c (revision 217523)
>>>>> +++ gcc/opts.c (working copy)
>>>>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>>>>>        maybe_set_param_value (
>>>>>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>>>>>   opts->x_param_values, opts_set->x_param_values);
>>>>> +      maybe_set_param_value (
>>>>> +        PARAM_EARLY_INLINING_INSNS, 4,
>>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>> +      maybe_set_param_value (
>>>>> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
>>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>>        value = true;
>>>>>      /* No break here - do -fauto-profile processing. */
>>>>>      case OPT_fauto_profile:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GOOGLE] Fix AutoFDO size issue
  2014-11-13 23:55         ` Dehao Chen
@ 2014-11-17 21:14           ` Dehao Chen
  2014-11-17 22:58             ` Xinliang David Li
  0 siblings, 1 reply; 10+ messages in thread
From: Dehao Chen @ 2014-11-17 21:14 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

The patch was updated to ignore comdat einline tuning for AutoFDO.
Performance testing is green.

OK for google-4_9?

Thanks,
Dehao

Index: gcc/auto-profile.c
===================================================================
--- gcc/auto-profile.c (revision 217523)
+++ gcc/auto-profile.c (working copy)
@@ -1771,6 +1771,7 @@ auto_profile (void)
       free_dominance_info (CDI_DOMINATORS);
       free_dominance_info (CDI_POST_DOMINATORS);
       rebuild_cgraph_edges ();
+      compute_inline_parameters (cgraph_get_node
(current_function_decl), true);
       pop_cfun ();
     }

Index: gcc/ipa-inline.c
===================================================================
--- gcc/ipa-inline.c (revision 217523)
+++ gcc/ipa-inline.c (working copy)
@@ -501,7 +501,7 @@ want_early_inline_function_p (struct cgraph_edge *
      growth);
   want_inline = false;
  }
-      else if (DECL_COMDAT (callee->decl)
+      else if (!flag_auto_profile && DECL_COMDAT (callee->decl)
                && growth <= PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_COMDAT))
         ;
       else if ((n = num_calls (callee)) != 0

On Thu, Nov 13, 2014 at 3:42 PM, Dehao Chen <dehao@google.com> wrote:
> We do not do sophisticated recursive call detection in einline phase.
> It only happens in ipa-inline phase.
>
> Dehao
>
> On Thu, Nov 13, 2014 at 3:18 PM, Xinliang David Li <davidxl@google.com> wrote:
>> On Thu, Nov 13, 2014 at 2:57 PM, Dehao Chen <dehao@google.com> wrote:
>>> IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it
>>> should not harm to set max iter as 10.
>>>
>>> On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> After inline summary is recomputed, the large code growth problem will
>>>> also be better controlled, right?
>>>
>>> For this case, recomputing inline summary does not help because the
>>> code was bloated in first einline phase.
>>
>> For recursive inlining, the inline summary for the cloned edges need
>> to be updated to prevent the growth?
>>
>> david
>>
>>>
>>> Dehao
>>>
>>>>
>>>> David
>>>>
>>>> On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> Is there a need to have 10 iterations of early inline for autofdo?
>>>>>
>>>>> David
>>>>>
>>>>> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
>>>>>> In AutoFDO, we increase einline iterations. This could lead to
>>>>>> extensive code bloat if we have recursive calls like:
>>>>>>
>>>>>> dtor() {
>>>>>>   destroy(node);
>>>>>> }
>>>>>>
>>>>>> destroy(node) {
>>>>>>   destroy(left)
>>>>>>   destroy(right)
>>>>>> }
>>>>>>
>>>>>> In this case, the size growth will be around 8 which is smaller than
>>>>>> threshold (11). However, if we allow this to happen for 2 iterations,
>>>>>> it will expand the size by 1024X. To fix this problem, we want to set
>>>>>> a much smaller threshold in the AutoFDO case. This is because AutoFDO
>>>>>> do not not rely on aggressive einline to gain more profile context.
>>>>>>
>>>>>> And also, in AutoFDO pass, after we processed a function, we need to
>>>>>> recompute inline parameters because rebuild_cgraph_edges will zero out
>>>>>> all inline parameters.
>>>>>>
>>>>>> The patch is attached below, bootstrapped and perf test on-going. OK
>>>>>> for google-4_9?
>>>>>>
>>>>>> Thanks,
>>>>>> Dehao
>>>>>>
>>>>>> Index: gcc/auto-profile.c
>>>>>> ===================================================================
>>>>>> --- gcc/auto-profile.c (revision 217523)
>>>>>> +++ gcc/auto-profile.c (working copy)
>>>>>> @@ -1771,6 +1771,7 @@ auto_profile (void)
>>>>>>        free_dominance_info (CDI_DOMINATORS);
>>>>>>        free_dominance_info (CDI_POST_DOMINATORS);
>>>>>>        rebuild_cgraph_edges ();
>>>>>> +      compute_inline_parameters (cgraph_get_node
>>>>>> (current_function_decl), true);
>>>>>>        pop_cfun ();
>>>>>>      }
>>>>>>
>>>>>> Index: gcc/opts.c
>>>>>> ===================================================================
>>>>>> --- gcc/opts.c (revision 217523)
>>>>>> +++ gcc/opts.c (working copy)
>>>>>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>>>>>>        maybe_set_param_value (
>>>>>>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>>>>>>   opts->x_param_values, opts_set->x_param_values);
>>>>>> +      maybe_set_param_value (
>>>>>> +        PARAM_EARLY_INLINING_INSNS, 4,
>>>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>>> +      maybe_set_param_value (
>>>>>> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
>>>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>>>        value = true;
>>>>>>      /* No break here - do -fauto-profile processing. */
>>>>>>      case OPT_fauto_profile:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GOOGLE] Fix AutoFDO size issue
  2014-11-17 21:14           ` Dehao Chen
@ 2014-11-17 22:58             ` Xinliang David Li
  2014-11-18  1:59               ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Xinliang David Li @ 2014-11-17 22:58 UTC (permalink / raw)
  To: Dehao Chen; +Cc: GCC Patches

Ok for now as a workraround, but this is probably not a long term fix.

David


On Mon, Nov 17, 2014 at 12:47 PM, Dehao Chen <dehao@google.com> wrote:
> The patch was updated to ignore comdat einline tuning for AutoFDO.
> Performance testing is green.
>
> OK for google-4_9?
>
> Thanks,
> Dehao
>
> Index: gcc/auto-profile.c
> ===================================================================
> --- gcc/auto-profile.c (revision 217523)
> +++ gcc/auto-profile.c (working copy)
> @@ -1771,6 +1771,7 @@ auto_profile (void)
>        free_dominance_info (CDI_DOMINATORS);
>        free_dominance_info (CDI_POST_DOMINATORS);
>        rebuild_cgraph_edges ();
> +      compute_inline_parameters (cgraph_get_node
> (current_function_decl), true);
>        pop_cfun ();
>      }
>
> Index: gcc/ipa-inline.c
> ===================================================================
> --- gcc/ipa-inline.c (revision 217523)
> +++ gcc/ipa-inline.c (working copy)
> @@ -501,7 +501,7 @@ want_early_inline_function_p (struct cgraph_edge *
>       growth);
>    want_inline = false;
>   }
> -      else if (DECL_COMDAT (callee->decl)
> +      else if (!flag_auto_profile && DECL_COMDAT (callee->decl)
>                 && growth <= PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_COMDAT))
>          ;
>        else if ((n = num_calls (callee)) != 0
>
> On Thu, Nov 13, 2014 at 3:42 PM, Dehao Chen <dehao@google.com> wrote:
>> We do not do sophisticated recursive call detection in einline phase.
>> It only happens in ipa-inline phase.
>>
>> Dehao
>>
>> On Thu, Nov 13, 2014 at 3:18 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> On Thu, Nov 13, 2014 at 2:57 PM, Dehao Chen <dehao@google.com> wrote:
>>>> IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it
>>>> should not harm to set max iter as 10.
>>>>
>>>> On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> After inline summary is recomputed, the large code growth problem will
>>>>> also be better controlled, right?
>>>>
>>>> For this case, recomputing inline summary does not help because the
>>>> code was bloated in first einline phase.
>>>
>>> For recursive inlining, the inline summary for the cloned edges need
>>> to be updated to prevent the growth?
>>>
>>> david
>>>
>>>>
>>>> Dehao
>>>>
>>>>>
>>>>> David
>>>>>
>>>>> On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>> Is there a need to have 10 iterations of early inline for autofdo?
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
>>>>>>> In AutoFDO, we increase einline iterations. This could lead to
>>>>>>> extensive code bloat if we have recursive calls like:
>>>>>>>
>>>>>>> dtor() {
>>>>>>>   destroy(node);
>>>>>>> }
>>>>>>>
>>>>>>> destroy(node) {
>>>>>>>   destroy(left)
>>>>>>>   destroy(right)
>>>>>>> }
>>>>>>>
>>>>>>> In this case, the size growth will be around 8 which is smaller than
>>>>>>> threshold (11). However, if we allow this to happen for 2 iterations,
>>>>>>> it will expand the size by 1024X. To fix this problem, we want to set
>>>>>>> a much smaller threshold in the AutoFDO case. This is because AutoFDO
>>>>>>> do not not rely on aggressive einline to gain more profile context.
>>>>>>>
>>>>>>> And also, in AutoFDO pass, after we processed a function, we need to
>>>>>>> recompute inline parameters because rebuild_cgraph_edges will zero out
>>>>>>> all inline parameters.
>>>>>>>
>>>>>>> The patch is attached below, bootstrapped and perf test on-going. OK
>>>>>>> for google-4_9?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dehao
>>>>>>>
>>>>>>> Index: gcc/auto-profile.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/auto-profile.c (revision 217523)
>>>>>>> +++ gcc/auto-profile.c (working copy)
>>>>>>> @@ -1771,6 +1771,7 @@ auto_profile (void)
>>>>>>>        free_dominance_info (CDI_DOMINATORS);
>>>>>>>        free_dominance_info (CDI_POST_DOMINATORS);
>>>>>>>        rebuild_cgraph_edges ();
>>>>>>> +      compute_inline_parameters (cgraph_get_node
>>>>>>> (current_function_decl), true);
>>>>>>>        pop_cfun ();
>>>>>>>      }
>>>>>>>
>>>>>>> Index: gcc/opts.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/opts.c (revision 217523)
>>>>>>> +++ gcc/opts.c (working copy)
>>>>>>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>>>>>>>        maybe_set_param_value (
>>>>>>>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>>>>>>>   opts->x_param_values, opts_set->x_param_values);
>>>>>>> +      maybe_set_param_value (
>>>>>>> +        PARAM_EARLY_INLINING_INSNS, 4,
>>>>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>>>> +      maybe_set_param_value (
>>>>>>> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
>>>>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>>>>        value = true;
>>>>>>>      /* No break here - do -fauto-profile processing. */
>>>>>>>      case OPT_fauto_profile:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GOOGLE] Fix AutoFDO size issue
  2014-11-17 22:58             ` Xinliang David Li
@ 2014-11-18  1:59               ` Andi Kleen
  2014-11-18  2:04                 ` Dehao Chen
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2014-11-18  1:59 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Dehao Chen, GCC Patches

Xinliang David Li <davidxl@google.com> writes:

> Ok for now as a workraround, but this is probably not a long term fix.

Is the workaround needed for the mainline autofdo version too?

-Andi
>
> David
>
>
> On Mon, Nov 17, 2014 at 12:47 PM, Dehao Chen <dehao@google.com> wrote:
>> The patch was updated to ignore comdat einline tuning for AutoFDO.
>> Performance testing is green.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GOOGLE] Fix AutoFDO size issue
  2014-11-18  1:59               ` Andi Kleen
@ 2014-11-18  2:04                 ` Dehao Chen
  0 siblings, 0 replies; 10+ messages in thread
From: Dehao Chen @ 2014-11-18  2:04 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Xinliang David Li, GCC Patches

There are actually two patches needed to port to mainline. I'll send
out the patch tomorrow.

Dehao

On Mon, Nov 17, 2014 at 4:58 PM, Andi Kleen <andi@firstfloor.org> wrote:
> Xinliang David Li <davidxl@google.com> writes:
>
>> Ok for now as a workraround, but this is probably not a long term fix.
>
> Is the workaround needed for the mainline autofdo version too?
>
> -Andi
>>
>> David
>>
>>
>> On Mon, Nov 17, 2014 at 12:47 PM, Dehao Chen <dehao@google.com> wrote:
>>> The patch was updated to ignore comdat einline tuning for AutoFDO.
>>> Performance testing is green.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-11-18  1:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-13 22:33 [GOOGLE] Fix AutoFDO size issue Dehao Chen
2014-11-13 22:57 ` Xinliang David Li
2014-11-13 22:59   ` Xinliang David Li
2014-11-13 23:18     ` Dehao Chen
2014-11-13 23:40       ` Xinliang David Li
2014-11-13 23:55         ` Dehao Chen
2014-11-17 21:14           ` Dehao Chen
2014-11-17 22:58             ` Xinliang David Li
2014-11-18  1:59               ` Andi Kleen
2014-11-18  2:04                 ` Dehao Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).