public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [google] pessimize stack accounting during inlining
       [not found] <BANLkTik0OK=0ksWUosRPGW3x23-OAA34ujryD7iimFZH51x58w@mail.gmail.com>
@ 2011-06-08  1:06 ` Xinliang David Li
  2011-06-08  9:11   ` Richard Guenther
  0 siblings, 1 reply; 4+ messages in thread
From: Xinliang David Li @ 2011-06-08  1:06 UTC (permalink / raw)
  To: Mark Heffernan; +Cc: GCC Patches

Ok for google/main.   A good candidate patch for trunk too.

Thanks,

David

On Tue, Jun 7, 2011 at 4:29 PM, Mark Heffernan <meheff@google.com> wrote:
> This patch pessimizes stack accounting during inlining.  This enables
> setting a firm stack size limit (via parameters "large-stack-frame" and
> "large-stack-frame-growth").  Without this patch the inliner is overly
> optimistic about potential stack reuse resulting in actual stack frames much
> larger than the parameterized limits.
> Internal benchmarks show minor performance differences with non-fdo and
> lipo, but overall neutral.  Tested/bootstrapped on x86-64.
> Ok for google-main?
> Mark
>
> 2011-06-07  Mark Heffernan  <meheff@google.com>
>         * cgraph.h (cgraph_global_info): Remove field.
>         * ipa-inline.c (cgraph_clone_inlined_nodes): Change
>         stack frame computation.
>         (cgraph_check_inline_limits): Ditto.
>         (compute_inline_parameters): Remove dead initialization.
>
> Index: gcc/cgraph.h
> ===================================================================
> --- gcc/cgraph.h        (revision 174512)
> +++ gcc/cgraph.h        (working copy)
> @@ -136,8 +136,6 @@ struct GTY(()) cgraph_local_info {
>  struct GTY(()) cgraph_global_info {
>    /* Estimated stack frame consumption by the function.  */
>    HOST_WIDE_INT estimated_stack_size;
> -  /* Expected offset of the stack frame of inlined function.  */
> -  HOST_WIDE_INT stack_frame_offset;
>
>    /* For inline clones this points to the function they will be
>       inlined into.  */
> Index: gcc/ipa-inline.c
> ===================================================================
> --- gcc/ipa-inline.c    (revision 174512)
> +++ gcc/ipa-inline.c    (working copy)
> @@ -229,8 +229,6 @@ void
>  cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate,
>                             bool update_original)
>  {
> -  HOST_WIDE_INT peak;
> -
>    if (duplicate)
>      {
>        /* We may eliminate the need for out-of-line copy to be output.
> @@ -279,13 +277,13 @@ cgraph_clone_inlined_nodes (struct cgrap
>      e->callee->global.inlined_to = e->caller->global.inlined_to;
>    else
>      e->callee->global.inlined_to = e->caller;
> -  e->callee->global.stack_frame_offset
> -    = e->caller->global.stack_frame_offset
> -      + inline_summary (e->caller)->estimated_self_stack_size;
> -  peak = e->callee->global.stack_frame_offset
> -      + inline_summary (e->callee)->estimated_self_stack_size;
> -  if (e->callee->global.inlined_to->global.estimated_stack_size < peak)
> -    e->callee->global.inlined_to->global.estimated_stack_size = peak;
> +
> +  /* Pessimistically assume no sharing of stack space.  That is, the
> +     frame size of a function is estimated as the original frame size
> +     plus the sum of the frame sizes of all inlined callees.  */
> +  e->callee->global.inlined_to->global.estimated_stack_size +=
> +    inline_summary (e->callee)->estimated_self_stack_size;
> +
>    cgraph_propagate_frequency (e->callee);
>
>    /* Recursively clone all bodies.  */
> @@ -430,8 +428,7 @@ cgraph_check_inline_limits (struct cgrap
>
>    stack_size_limit += stack_size_limit * PARAM_VALUE
> (PARAM_STACK_FRAME_GROWTH) / 100;
>
> -  inlined_stack = (to->global.stack_frame_offset
> -                  + inline_summary (to)->estimated_self_stack_size
> +  inlined_stack = (to->global.estimated_stack_size
>                    + what->global.estimated_stack_size);
>    if (inlined_stack  > stack_size_limit
>        && inlined_stack > PARAM_VALUE (PARAM_LARGE_STACK_FRAME))
> @@ -2064,7 +2061,6 @@ compute_inline_parameters (struct cgraph
>    self_stack_size = optimize ? estimated_stack_frame_size (node) : 0;
>    inline_summary (node)->estimated_self_stack_size = self_stack_size;
>    node->global.estimated_stack_size = self_stack_size;
> -  node->global.stack_frame_offset = 0;
>
>    /* Can this function be inlined at all?  */
>    node->local.inlinable = tree_inlinable_function_p (node->decl);
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [google] pessimize stack accounting during inlining
  2011-06-08  1:06 ` [google] pessimize stack accounting during inlining Xinliang David Li
@ 2011-06-08  9:11   ` Richard Guenther
  2011-06-10  0:08     ` Mark Heffernan
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Guenther @ 2011-06-08  9:11 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Mark Heffernan, GCC Patches

On Wed, Jun 8, 2011 at 1:36 AM, Xinliang David Li <davidxl@google.com> wrote:
> Ok for google/main.   A good candidate patch for trunk too.

Well, it's still not a hard limit as we can't tell how many spill slots
or extra call argument or return value slots we need.

Richard.

> Thanks,
>
> David
>
> On Tue, Jun 7, 2011 at 4:29 PM, Mark Heffernan <meheff@google.com> wrote:
>> This patch pessimizes stack accounting during inlining.  This enables
>> setting a firm stack size limit (via parameters "large-stack-frame" and
>> "large-stack-frame-growth").  Without this patch the inliner is overly
>> optimistic about potential stack reuse resulting in actual stack frames much
>> larger than the parameterized limits.
>> Internal benchmarks show minor performance differences with non-fdo and
>> lipo, but overall neutral.  Tested/bootstrapped on x86-64.
>> Ok for google-main?
>> Mark
>>
>> 2011-06-07  Mark Heffernan  <meheff@google.com>
>>         * cgraph.h (cgraph_global_info): Remove field.
>>         * ipa-inline.c (cgraph_clone_inlined_nodes): Change
>>         stack frame computation.
>>         (cgraph_check_inline_limits): Ditto.
>>         (compute_inline_parameters): Remove dead initialization.
>>
>> Index: gcc/cgraph.h
>> ===================================================================
>> --- gcc/cgraph.h        (revision 174512)
>> +++ gcc/cgraph.h        (working copy)
>> @@ -136,8 +136,6 @@ struct GTY(()) cgraph_local_info {
>>  struct GTY(()) cgraph_global_info {
>>    /* Estimated stack frame consumption by the function.  */
>>    HOST_WIDE_INT estimated_stack_size;
>> -  /* Expected offset of the stack frame of inlined function.  */
>> -  HOST_WIDE_INT stack_frame_offset;
>>
>>    /* For inline clones this points to the function they will be
>>       inlined into.  */
>> Index: gcc/ipa-inline.c
>> ===================================================================
>> --- gcc/ipa-inline.c    (revision 174512)
>> +++ gcc/ipa-inline.c    (working copy)
>> @@ -229,8 +229,6 @@ void
>>  cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate,
>>                             bool update_original)
>>  {
>> -  HOST_WIDE_INT peak;
>> -
>>    if (duplicate)
>>      {
>>        /* We may eliminate the need for out-of-line copy to be output.
>> @@ -279,13 +277,13 @@ cgraph_clone_inlined_nodes (struct cgrap
>>      e->callee->global.inlined_to = e->caller->global.inlined_to;
>>    else
>>      e->callee->global.inlined_to = e->caller;
>> -  e->callee->global.stack_frame_offset
>> -    = e->caller->global.stack_frame_offset
>> -      + inline_summary (e->caller)->estimated_self_stack_size;
>> -  peak = e->callee->global.stack_frame_offset
>> -      + inline_summary (e->callee)->estimated_self_stack_size;
>> -  if (e->callee->global.inlined_to->global.estimated_stack_size < peak)
>> -    e->callee->global.inlined_to->global.estimated_stack_size = peak;
>> +
>> +  /* Pessimistically assume no sharing of stack space.  That is, the
>> +     frame size of a function is estimated as the original frame size
>> +     plus the sum of the frame sizes of all inlined callees.  */
>> +  e->callee->global.inlined_to->global.estimated_stack_size +=
>> +    inline_summary (e->callee)->estimated_self_stack_size;
>> +
>>    cgraph_propagate_frequency (e->callee);
>>
>>    /* Recursively clone all bodies.  */
>> @@ -430,8 +428,7 @@ cgraph_check_inline_limits (struct cgrap
>>
>>    stack_size_limit += stack_size_limit * PARAM_VALUE
>> (PARAM_STACK_FRAME_GROWTH) / 100;
>>
>> -  inlined_stack = (to->global.stack_frame_offset
>> -                  + inline_summary (to)->estimated_self_stack_size
>> +  inlined_stack = (to->global.estimated_stack_size
>>                    + what->global.estimated_stack_size);
>>    if (inlined_stack  > stack_size_limit
>>        && inlined_stack > PARAM_VALUE (PARAM_LARGE_STACK_FRAME))
>> @@ -2064,7 +2061,6 @@ compute_inline_parameters (struct cgraph
>>    self_stack_size = optimize ? estimated_stack_frame_size (node) : 0;
>>    inline_summary (node)->estimated_self_stack_size = self_stack_size;
>>    node->global.estimated_stack_size = self_stack_size;
>> -  node->global.stack_frame_offset = 0;
>>
>>    /* Can this function be inlined at all?  */
>>    node->local.inlinable = tree_inlinable_function_p (node->decl);
>>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [google] pessimize stack accounting during inlining
  2011-06-08  9:11   ` Richard Guenther
@ 2011-06-10  0:08     ` Mark Heffernan
  2011-06-10  9:14       ` Richard Guenther
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Heffernan @ 2011-06-10  0:08 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Xinliang David Li, GCC Patches

On Wed, Jun 8, 2011 at 1:54 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> Well, it's still not a hard limit as we can't tell how many spill slots
> or extra call argument or return value slots we need.

Agreed.  It's not perfect.  But I've found this does a reasonable job
of preventing the inliner from pushing the frame size much beyond the
imposed limit especially if the limit is large (eg, many K) relative
to the typical total size of spill slots, arguments, etc.

Mark

>
> Richard.
>
> > Thanks,
> >
> > David
> >
> > On Tue, Jun 7, 2011 at 4:29 PM, Mark Heffernan <meheff@google.com> wrote:
> >> This patch pessimizes stack accounting during inlining.  This enables
> >> setting a firm stack size limit (via parameters "large-stack-frame" and
> >> "large-stack-frame-growth").  Without this patch the inliner is overly
> >> optimistic about potential stack reuse resulting in actual stack frames much
> >> larger than the parameterized limits.
> >> Internal benchmarks show minor performance differences with non-fdo and
> >> lipo, but overall neutral.  Tested/bootstrapped on x86-64.
> >> Ok for google-main?
> >> Mark
> >>
> >> 2011-06-07  Mark Heffernan  <meheff@google.com>
> >>         * cgraph.h (cgraph_global_info): Remove field.
> >>         * ipa-inline.c (cgraph_clone_inlined_nodes): Change
> >>         stack frame computation.
> >>         (cgraph_check_inline_limits): Ditto.
> >>         (compute_inline_parameters): Remove dead initialization.
> >>
> >> Index: gcc/cgraph.h
> >> ===================================================================
> >> --- gcc/cgraph.h        (revision 174512)
> >> +++ gcc/cgraph.h        (working copy)
> >> @@ -136,8 +136,6 @@ struct GTY(()) cgraph_local_info {
> >>  struct GTY(()) cgraph_global_info {
> >>    /* Estimated stack frame consumption by the function.  */
> >>    HOST_WIDE_INT estimated_stack_size;
> >> -  /* Expected offset of the stack frame of inlined function.  */
> >> -  HOST_WIDE_INT stack_frame_offset;
> >>
> >>    /* For inline clones this points to the function they will be
> >>       inlined into.  */
> >> Index: gcc/ipa-inline.c
> >> ===================================================================
> >> --- gcc/ipa-inline.c    (revision 174512)
> >> +++ gcc/ipa-inline.c    (working copy)
> >> @@ -229,8 +229,6 @@ void
> >>  cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate,
> >>                             bool update_original)
> >>  {
> >> -  HOST_WIDE_INT peak;
> >> -
> >>    if (duplicate)
> >>      {
> >>        /* We may eliminate the need for out-of-line copy to be output.
> >> @@ -279,13 +277,13 @@ cgraph_clone_inlined_nodes (struct cgrap
> >>      e->callee->global.inlined_to = e->caller->global.inlined_to;
> >>    else
> >>      e->callee->global.inlined_to = e->caller;
> >> -  e->callee->global.stack_frame_offset
> >> -    = e->caller->global.stack_frame_offset
> >> -      + inline_summary (e->caller)->estimated_self_stack_size;
> >> -  peak = e->callee->global.stack_frame_offset
> >> -      + inline_summary (e->callee)->estimated_self_stack_size;
> >> -  if (e->callee->global.inlined_to->global.estimated_stack_size < peak)
> >> -    e->callee->global.inlined_to->global.estimated_stack_size = peak;
> >> +
> >> +  /* Pessimistically assume no sharing of stack space.  That is, the
> >> +     frame size of a function is estimated as the original frame size
> >> +     plus the sum of the frame sizes of all inlined callees.  */
> >> +  e->callee->global.inlined_to->global.estimated_stack_size +=
> >> +    inline_summary (e->callee)->estimated_self_stack_size;
> >> +
> >>    cgraph_propagate_frequency (e->callee);
> >>
> >>    /* Recursively clone all bodies.  */
> >> @@ -430,8 +428,7 @@ cgraph_check_inline_limits (struct cgrap
> >>
> >>    stack_size_limit += stack_size_limit * PARAM_VALUE
> >> (PARAM_STACK_FRAME_GROWTH) / 100;
> >>
> >> -  inlined_stack = (to->global.stack_frame_offset
> >> -                  + inline_summary (to)->estimated_self_stack_size
> >> +  inlined_stack = (to->global.estimated_stack_size
> >>                    + what->global.estimated_stack_size);
> >>    if (inlined_stack  > stack_size_limit
> >>        && inlined_stack > PARAM_VALUE (PARAM_LARGE_STACK_FRAME))
> >> @@ -2064,7 +2061,6 @@ compute_inline_parameters (struct cgraph
> >>    self_stack_size = optimize ? estimated_stack_frame_size (node) : 0;
> >>    inline_summary (node)->estimated_self_stack_size = self_stack_size;
> >>    node->global.estimated_stack_size = self_stack_size;
> >> -  node->global.stack_frame_offset = 0;
> >>
> >>    /* Can this function be inlined at all?  */
> >>    node->local.inlinable = tree_inlinable_function_p (node->decl);
> >>
> >

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [google] pessimize stack accounting during inlining
  2011-06-10  0:08     ` Mark Heffernan
@ 2011-06-10  9:14       ` Richard Guenther
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Guenther @ 2011-06-10  9:14 UTC (permalink / raw)
  To: Mark Heffernan; +Cc: Xinliang David Li, GCC Patches

On Fri, Jun 10, 2011 at 1:45 AM, Mark Heffernan <meheff@google.com> wrote:
> On Wed, Jun 8, 2011 at 1:54 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> Well, it's still not a hard limit as we can't tell how many spill slots
>> or extra call argument or return value slots we need.
>
> Agreed.  It's not perfect.  But I've found this does a reasonable job
> of preventing the inliner from pushing the frame size much beyond the
> imposed limit especially if the limit is large (eg, many K) relative
> to the typical total size of spill slots, arguments, etc.

Do you have a testcase?

Richard.

> Mark
>
>>
>> Richard.
>>
>> > Thanks,
>> >
>> > David
>> >
>> > On Tue, Jun 7, 2011 at 4:29 PM, Mark Heffernan <meheff@google.com> wrote:
>> >> This patch pessimizes stack accounting during inlining.  This enables
>> >> setting a firm stack size limit (via parameters "large-stack-frame" and
>> >> "large-stack-frame-growth").  Without this patch the inliner is overly
>> >> optimistic about potential stack reuse resulting in actual stack frames much
>> >> larger than the parameterized limits.
>> >> Internal benchmarks show minor performance differences with non-fdo and
>> >> lipo, but overall neutral.  Tested/bootstrapped on x86-64.
>> >> Ok for google-main?
>> >> Mark
>> >>
>> >> 2011-06-07  Mark Heffernan  <meheff@google.com>
>> >>         * cgraph.h (cgraph_global_info): Remove field.
>> >>         * ipa-inline.c (cgraph_clone_inlined_nodes): Change
>> >>         stack frame computation.
>> >>         (cgraph_check_inline_limits): Ditto.
>> >>         (compute_inline_parameters): Remove dead initialization.
>> >>
>> >> Index: gcc/cgraph.h
>> >> ===================================================================
>> >> --- gcc/cgraph.h        (revision 174512)
>> >> +++ gcc/cgraph.h        (working copy)
>> >> @@ -136,8 +136,6 @@ struct GTY(()) cgraph_local_info {
>> >>  struct GTY(()) cgraph_global_info {
>> >>    /* Estimated stack frame consumption by the function.  */
>> >>    HOST_WIDE_INT estimated_stack_size;
>> >> -  /* Expected offset of the stack frame of inlined function.  */
>> >> -  HOST_WIDE_INT stack_frame_offset;
>> >>
>> >>    /* For inline clones this points to the function they will be
>> >>       inlined into.  */
>> >> Index: gcc/ipa-inline.c
>> >> ===================================================================
>> >> --- gcc/ipa-inline.c    (revision 174512)
>> >> +++ gcc/ipa-inline.c    (working copy)
>> >> @@ -229,8 +229,6 @@ void
>> >>  cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate,
>> >>                             bool update_original)
>> >>  {
>> >> -  HOST_WIDE_INT peak;
>> >> -
>> >>    if (duplicate)
>> >>      {
>> >>        /* We may eliminate the need for out-of-line copy to be output.
>> >> @@ -279,13 +277,13 @@ cgraph_clone_inlined_nodes (struct cgrap
>> >>      e->callee->global.inlined_to = e->caller->global.inlined_to;
>> >>    else
>> >>      e->callee->global.inlined_to = e->caller;
>> >> -  e->callee->global.stack_frame_offset
>> >> -    = e->caller->global.stack_frame_offset
>> >> -      + inline_summary (e->caller)->estimated_self_stack_size;
>> >> -  peak = e->callee->global.stack_frame_offset
>> >> -      + inline_summary (e->callee)->estimated_self_stack_size;
>> >> -  if (e->callee->global.inlined_to->global.estimated_stack_size < peak)
>> >> -    e->callee->global.inlined_to->global.estimated_stack_size = peak;
>> >> +
>> >> +  /* Pessimistically assume no sharing of stack space.  That is, the
>> >> +     frame size of a function is estimated as the original frame size
>> >> +     plus the sum of the frame sizes of all inlined callees.  */
>> >> +  e->callee->global.inlined_to->global.estimated_stack_size +=
>> >> +    inline_summary (e->callee)->estimated_self_stack_size;
>> >> +
>> >>    cgraph_propagate_frequency (e->callee);
>> >>
>> >>    /* Recursively clone all bodies.  */
>> >> @@ -430,8 +428,7 @@ cgraph_check_inline_limits (struct cgrap
>> >>
>> >>    stack_size_limit += stack_size_limit * PARAM_VALUE
>> >> (PARAM_STACK_FRAME_GROWTH) / 100;
>> >>
>> >> -  inlined_stack = (to->global.stack_frame_offset
>> >> -                  + inline_summary (to)->estimated_self_stack_size
>> >> +  inlined_stack = (to->global.estimated_stack_size
>> >>                    + what->global.estimated_stack_size);
>> >>    if (inlined_stack  > stack_size_limit
>> >>        && inlined_stack > PARAM_VALUE (PARAM_LARGE_STACK_FRAME))
>> >> @@ -2064,7 +2061,6 @@ compute_inline_parameters (struct cgraph
>> >>    self_stack_size = optimize ? estimated_stack_frame_size (node) : 0;
>> >>    inline_summary (node)->estimated_self_stack_size = self_stack_size;
>> >>    node->global.estimated_stack_size = self_stack_size;
>> >> -  node->global.stack_frame_offset = 0;
>> >>
>> >>    /* Can this function be inlined at all?  */
>> >>    node->local.inlinable = tree_inlinable_function_p (node->decl);
>> >>
>> >
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-06-10  9:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <BANLkTik0OK=0ksWUosRPGW3x23-OAA34ujryD7iimFZH51x58w@mail.gmail.com>
2011-06-08  1:06 ` [google] pessimize stack accounting during inlining Xinliang David Li
2011-06-08  9:11   ` Richard Guenther
2011-06-10  0:08     ` Mark Heffernan
2011-06-10  9:14       ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).