* Re: [google] pessimize stack accounting during inlining [not found] <BANLkTik0OK=0ksWUosRPGW3x23-OAA34ujryD7iimFZH51x58w@mail.gmail.com> @ 2011-06-08 1:06 ` Xinliang David Li 2011-06-08 9:11 ` Richard Guenther 0 siblings, 1 reply; 4+ messages in thread From: Xinliang David Li @ 2011-06-08 1:06 UTC (permalink / raw) To: Mark Heffernan; +Cc: GCC Patches Ok for google/main. A good candidate patch for trunk too. Thanks, David On Tue, Jun 7, 2011 at 4:29 PM, Mark Heffernan <meheff@google.com> wrote: > This patch pessimizes stack accounting during inlining. This enables > setting a firm stack size limit (via parameters "large-stack-frame" and > "large-stack-frame-growth"). Without this patch the inliner is overly > optimistic about potential stack reuse resulting in actual stack frames much > larger than the parameterized limits. > Internal benchmarks show minor performance differences with non-fdo and > lipo, but overall neutral. Tested/bootstrapped on x86-64. > Ok for google-main? > Mark > > 2011-06-07 Mark Heffernan <meheff@google.com> > * cgraph.h (cgraph_global_info): Remove field. > * ipa-inline.c (cgraph_clone_inlined_nodes): Change > stack frame computation. > (cgraph_check_inline_limits): Ditto. > (compute_inline_parameters): Remove dead initialization. > > Index: gcc/cgraph.h > =================================================================== > --- gcc/cgraph.h (revision 174512) > +++ gcc/cgraph.h (working copy) > @@ -136,8 +136,6 @@ struct GTY(()) cgraph_local_info { > struct GTY(()) cgraph_global_info { > /* Estimated stack frame consumption by the function. */ > HOST_WIDE_INT estimated_stack_size; > - /* Expected offset of the stack frame of inlined function. */ > - HOST_WIDE_INT stack_frame_offset; > > /* For inline clones this points to the function they will be > inlined into. */ > Index: gcc/ipa-inline.c > =================================================================== > --- gcc/ipa-inline.c (revision 174512) > +++ gcc/ipa-inline.c (working copy) > @@ -229,8 +229,6 @@ void > cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate, > bool update_original) > { > - HOST_WIDE_INT peak; > - > if (duplicate) > { > /* We may eliminate the need for out-of-line copy to be output. > @@ -279,13 +277,13 @@ cgraph_clone_inlined_nodes (struct cgrap > e->callee->global.inlined_to = e->caller->global.inlined_to; > else > e->callee->global.inlined_to = e->caller; > - e->callee->global.stack_frame_offset > - = e->caller->global.stack_frame_offset > - + inline_summary (e->caller)->estimated_self_stack_size; > - peak = e->callee->global.stack_frame_offset > - + inline_summary (e->callee)->estimated_self_stack_size; > - if (e->callee->global.inlined_to->global.estimated_stack_size < peak) > - e->callee->global.inlined_to->global.estimated_stack_size = peak; > + > + /* Pessimistically assume no sharing of stack space. That is, the > + frame size of a function is estimated as the original frame size > + plus the sum of the frame sizes of all inlined callees. */ > + e->callee->global.inlined_to->global.estimated_stack_size += > + inline_summary (e->callee)->estimated_self_stack_size; > + > cgraph_propagate_frequency (e->callee); > > /* Recursively clone all bodies. */ > @@ -430,8 +428,7 @@ cgraph_check_inline_limits (struct cgrap > > stack_size_limit += stack_size_limit * PARAM_VALUE > (PARAM_STACK_FRAME_GROWTH) / 100; > > - inlined_stack = (to->global.stack_frame_offset > - + inline_summary (to)->estimated_self_stack_size > + inlined_stack = (to->global.estimated_stack_size > + what->global.estimated_stack_size); > if (inlined_stack > stack_size_limit > && inlined_stack > PARAM_VALUE (PARAM_LARGE_STACK_FRAME)) > @@ -2064,7 +2061,6 @@ compute_inline_parameters (struct cgraph > self_stack_size = optimize ? estimated_stack_frame_size (node) : 0; > inline_summary (node)->estimated_self_stack_size = self_stack_size; > node->global.estimated_stack_size = self_stack_size; > - node->global.stack_frame_offset = 0; > > /* Can this function be inlined at all? */ > node->local.inlinable = tree_inlinable_function_p (node->decl); > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [google] pessimize stack accounting during inlining 2011-06-08 1:06 ` [google] pessimize stack accounting during inlining Xinliang David Li @ 2011-06-08 9:11 ` Richard Guenther 2011-06-10 0:08 ` Mark Heffernan 0 siblings, 1 reply; 4+ messages in thread From: Richard Guenther @ 2011-06-08 9:11 UTC (permalink / raw) To: Xinliang David Li; +Cc: Mark Heffernan, GCC Patches On Wed, Jun 8, 2011 at 1:36 AM, Xinliang David Li <davidxl@google.com> wrote: > Ok for google/main. A good candidate patch for trunk too. Well, it's still not a hard limit as we can't tell how many spill slots or extra call argument or return value slots we need. Richard. > Thanks, > > David > > On Tue, Jun 7, 2011 at 4:29 PM, Mark Heffernan <meheff@google.com> wrote: >> This patch pessimizes stack accounting during inlining. This enables >> setting a firm stack size limit (via parameters "large-stack-frame" and >> "large-stack-frame-growth"). Without this patch the inliner is overly >> optimistic about potential stack reuse resulting in actual stack frames much >> larger than the parameterized limits. >> Internal benchmarks show minor performance differences with non-fdo and >> lipo, but overall neutral. Tested/bootstrapped on x86-64. >> Ok for google-main? >> Mark >> >> 2011-06-07 Mark Heffernan <meheff@google.com> >> * cgraph.h (cgraph_global_info): Remove field. >> * ipa-inline.c (cgraph_clone_inlined_nodes): Change >> stack frame computation. >> (cgraph_check_inline_limits): Ditto. >> (compute_inline_parameters): Remove dead initialization. >> >> Index: gcc/cgraph.h >> =================================================================== >> --- gcc/cgraph.h (revision 174512) >> +++ gcc/cgraph.h (working copy) >> @@ -136,8 +136,6 @@ struct GTY(()) cgraph_local_info { >> struct GTY(()) cgraph_global_info { >> /* Estimated stack frame consumption by the function. */ >> HOST_WIDE_INT estimated_stack_size; >> - /* Expected offset of the stack frame of inlined function. */ >> - HOST_WIDE_INT stack_frame_offset; >> >> /* For inline clones this points to the function they will be >> inlined into. */ >> Index: gcc/ipa-inline.c >> =================================================================== >> --- gcc/ipa-inline.c (revision 174512) >> +++ gcc/ipa-inline.c (working copy) >> @@ -229,8 +229,6 @@ void >> cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate, >> bool update_original) >> { >> - HOST_WIDE_INT peak; >> - >> if (duplicate) >> { >> /* We may eliminate the need for out-of-line copy to be output. >> @@ -279,13 +277,13 @@ cgraph_clone_inlined_nodes (struct cgrap >> e->callee->global.inlined_to = e->caller->global.inlined_to; >> else >> e->callee->global.inlined_to = e->caller; >> - e->callee->global.stack_frame_offset >> - = e->caller->global.stack_frame_offset >> - + inline_summary (e->caller)->estimated_self_stack_size; >> - peak = e->callee->global.stack_frame_offset >> - + inline_summary (e->callee)->estimated_self_stack_size; >> - if (e->callee->global.inlined_to->global.estimated_stack_size < peak) >> - e->callee->global.inlined_to->global.estimated_stack_size = peak; >> + >> + /* Pessimistically assume no sharing of stack space. That is, the >> + frame size of a function is estimated as the original frame size >> + plus the sum of the frame sizes of all inlined callees. */ >> + e->callee->global.inlined_to->global.estimated_stack_size += >> + inline_summary (e->callee)->estimated_self_stack_size; >> + >> cgraph_propagate_frequency (e->callee); >> >> /* Recursively clone all bodies. */ >> @@ -430,8 +428,7 @@ cgraph_check_inline_limits (struct cgrap >> >> stack_size_limit += stack_size_limit * PARAM_VALUE >> (PARAM_STACK_FRAME_GROWTH) / 100; >> >> - inlined_stack = (to->global.stack_frame_offset >> - + inline_summary (to)->estimated_self_stack_size >> + inlined_stack = (to->global.estimated_stack_size >> + what->global.estimated_stack_size); >> if (inlined_stack > stack_size_limit >> && inlined_stack > PARAM_VALUE (PARAM_LARGE_STACK_FRAME)) >> @@ -2064,7 +2061,6 @@ compute_inline_parameters (struct cgraph >> self_stack_size = optimize ? estimated_stack_frame_size (node) : 0; >> inline_summary (node)->estimated_self_stack_size = self_stack_size; >> node->global.estimated_stack_size = self_stack_size; >> - node->global.stack_frame_offset = 0; >> >> /* Can this function be inlined at all? */ >> node->local.inlinable = tree_inlinable_function_p (node->decl); >> > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [google] pessimize stack accounting during inlining 2011-06-08 9:11 ` Richard Guenther @ 2011-06-10 0:08 ` Mark Heffernan 2011-06-10 9:14 ` Richard Guenther 0 siblings, 1 reply; 4+ messages in thread From: Mark Heffernan @ 2011-06-10 0:08 UTC (permalink / raw) To: Richard Guenther; +Cc: Xinliang David Li, GCC Patches On Wed, Jun 8, 2011 at 1:54 AM, Richard Guenther <richard.guenther@gmail.com> wrote: > Well, it's still not a hard limit as we can't tell how many spill slots > or extra call argument or return value slots we need. Agreed. It's not perfect. But I've found this does a reasonable job of preventing the inliner from pushing the frame size much beyond the imposed limit especially if the limit is large (eg, many K) relative to the typical total size of spill slots, arguments, etc. Mark > > Richard. > > > Thanks, > > > > David > > > > On Tue, Jun 7, 2011 at 4:29 PM, Mark Heffernan <meheff@google.com> wrote: > >> This patch pessimizes stack accounting during inlining. This enables > >> setting a firm stack size limit (via parameters "large-stack-frame" and > >> "large-stack-frame-growth"). Without this patch the inliner is overly > >> optimistic about potential stack reuse resulting in actual stack frames much > >> larger than the parameterized limits. > >> Internal benchmarks show minor performance differences with non-fdo and > >> lipo, but overall neutral. Tested/bootstrapped on x86-64. > >> Ok for google-main? > >> Mark > >> > >> 2011-06-07 Mark Heffernan <meheff@google.com> > >> * cgraph.h (cgraph_global_info): Remove field. > >> * ipa-inline.c (cgraph_clone_inlined_nodes): Change > >> stack frame computation. > >> (cgraph_check_inline_limits): Ditto. > >> (compute_inline_parameters): Remove dead initialization. > >> > >> Index: gcc/cgraph.h > >> =================================================================== > >> --- gcc/cgraph.h (revision 174512) > >> +++ gcc/cgraph.h (working copy) > >> @@ -136,8 +136,6 @@ struct GTY(()) cgraph_local_info { > >> struct GTY(()) cgraph_global_info { > >> /* Estimated stack frame consumption by the function. */ > >> HOST_WIDE_INT estimated_stack_size; > >> - /* Expected offset of the stack frame of inlined function. */ > >> - HOST_WIDE_INT stack_frame_offset; > >> > >> /* For inline clones this points to the function they will be > >> inlined into. */ > >> Index: gcc/ipa-inline.c > >> =================================================================== > >> --- gcc/ipa-inline.c (revision 174512) > >> +++ gcc/ipa-inline.c (working copy) > >> @@ -229,8 +229,6 @@ void > >> cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate, > >> bool update_original) > >> { > >> - HOST_WIDE_INT peak; > >> - > >> if (duplicate) > >> { > >> /* We may eliminate the need for out-of-line copy to be output. > >> @@ -279,13 +277,13 @@ cgraph_clone_inlined_nodes (struct cgrap > >> e->callee->global.inlined_to = e->caller->global.inlined_to; > >> else > >> e->callee->global.inlined_to = e->caller; > >> - e->callee->global.stack_frame_offset > >> - = e->caller->global.stack_frame_offset > >> - + inline_summary (e->caller)->estimated_self_stack_size; > >> - peak = e->callee->global.stack_frame_offset > >> - + inline_summary (e->callee)->estimated_self_stack_size; > >> - if (e->callee->global.inlined_to->global.estimated_stack_size < peak) > >> - e->callee->global.inlined_to->global.estimated_stack_size = peak; > >> + > >> + /* Pessimistically assume no sharing of stack space. That is, the > >> + frame size of a function is estimated as the original frame size > >> + plus the sum of the frame sizes of all inlined callees. */ > >> + e->callee->global.inlined_to->global.estimated_stack_size += > >> + inline_summary (e->callee)->estimated_self_stack_size; > >> + > >> cgraph_propagate_frequency (e->callee); > >> > >> /* Recursively clone all bodies. */ > >> @@ -430,8 +428,7 @@ cgraph_check_inline_limits (struct cgrap > >> > >> stack_size_limit += stack_size_limit * PARAM_VALUE > >> (PARAM_STACK_FRAME_GROWTH) / 100; > >> > >> - inlined_stack = (to->global.stack_frame_offset > >> - + inline_summary (to)->estimated_self_stack_size > >> + inlined_stack = (to->global.estimated_stack_size > >> + what->global.estimated_stack_size); > >> if (inlined_stack > stack_size_limit > >> && inlined_stack > PARAM_VALUE (PARAM_LARGE_STACK_FRAME)) > >> @@ -2064,7 +2061,6 @@ compute_inline_parameters (struct cgraph > >> self_stack_size = optimize ? estimated_stack_frame_size (node) : 0; > >> inline_summary (node)->estimated_self_stack_size = self_stack_size; > >> node->global.estimated_stack_size = self_stack_size; > >> - node->global.stack_frame_offset = 0; > >> > >> /* Can this function be inlined at all? */ > >> node->local.inlinable = tree_inlinable_function_p (node->decl); > >> > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [google] pessimize stack accounting during inlining 2011-06-10 0:08 ` Mark Heffernan @ 2011-06-10 9:14 ` Richard Guenther 0 siblings, 0 replies; 4+ messages in thread From: Richard Guenther @ 2011-06-10 9:14 UTC (permalink / raw) To: Mark Heffernan; +Cc: Xinliang David Li, GCC Patches On Fri, Jun 10, 2011 at 1:45 AM, Mark Heffernan <meheff@google.com> wrote: > On Wed, Jun 8, 2011 at 1:54 AM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> Well, it's still not a hard limit as we can't tell how many spill slots >> or extra call argument or return value slots we need. > > Agreed. It's not perfect. But I've found this does a reasonable job > of preventing the inliner from pushing the frame size much beyond the > imposed limit especially if the limit is large (eg, many K) relative > to the typical total size of spill slots, arguments, etc. Do you have a testcase? Richard. > Mark > >> >> Richard. >> >> > Thanks, >> > >> > David >> > >> > On Tue, Jun 7, 2011 at 4:29 PM, Mark Heffernan <meheff@google.com> wrote: >> >> This patch pessimizes stack accounting during inlining. This enables >> >> setting a firm stack size limit (via parameters "large-stack-frame" and >> >> "large-stack-frame-growth"). Without this patch the inliner is overly >> >> optimistic about potential stack reuse resulting in actual stack frames much >> >> larger than the parameterized limits. >> >> Internal benchmarks show minor performance differences with non-fdo and >> >> lipo, but overall neutral. Tested/bootstrapped on x86-64. >> >> Ok for google-main? >> >> Mark >> >> >> >> 2011-06-07 Mark Heffernan <meheff@google.com> >> >> * cgraph.h (cgraph_global_info): Remove field. >> >> * ipa-inline.c (cgraph_clone_inlined_nodes): Change >> >> stack frame computation. >> >> (cgraph_check_inline_limits): Ditto. >> >> (compute_inline_parameters): Remove dead initialization. >> >> >> >> Index: gcc/cgraph.h >> >> =================================================================== >> >> --- gcc/cgraph.h (revision 174512) >> >> +++ gcc/cgraph.h (working copy) >> >> @@ -136,8 +136,6 @@ struct GTY(()) cgraph_local_info { >> >> struct GTY(()) cgraph_global_info { >> >> /* Estimated stack frame consumption by the function. */ >> >> HOST_WIDE_INT estimated_stack_size; >> >> - /* Expected offset of the stack frame of inlined function. */ >> >> - HOST_WIDE_INT stack_frame_offset; >> >> >> >> /* For inline clones this points to the function they will be >> >> inlined into. */ >> >> Index: gcc/ipa-inline.c >> >> =================================================================== >> >> --- gcc/ipa-inline.c (revision 174512) >> >> +++ gcc/ipa-inline.c (working copy) >> >> @@ -229,8 +229,6 @@ void >> >> cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate, >> >> bool update_original) >> >> { >> >> - HOST_WIDE_INT peak; >> >> - >> >> if (duplicate) >> >> { >> >> /* We may eliminate the need for out-of-line copy to be output. >> >> @@ -279,13 +277,13 @@ cgraph_clone_inlined_nodes (struct cgrap >> >> e->callee->global.inlined_to = e->caller->global.inlined_to; >> >> else >> >> e->callee->global.inlined_to = e->caller; >> >> - e->callee->global.stack_frame_offset >> >> - = e->caller->global.stack_frame_offset >> >> - + inline_summary (e->caller)->estimated_self_stack_size; >> >> - peak = e->callee->global.stack_frame_offset >> >> - + inline_summary (e->callee)->estimated_self_stack_size; >> >> - if (e->callee->global.inlined_to->global.estimated_stack_size < peak) >> >> - e->callee->global.inlined_to->global.estimated_stack_size = peak; >> >> + >> >> + /* Pessimistically assume no sharing of stack space. That is, the >> >> + frame size of a function is estimated as the original frame size >> >> + plus the sum of the frame sizes of all inlined callees. */ >> >> + e->callee->global.inlined_to->global.estimated_stack_size += >> >> + inline_summary (e->callee)->estimated_self_stack_size; >> >> + >> >> cgraph_propagate_frequency (e->callee); >> >> >> >> /* Recursively clone all bodies. */ >> >> @@ -430,8 +428,7 @@ cgraph_check_inline_limits (struct cgrap >> >> >> >> stack_size_limit += stack_size_limit * PARAM_VALUE >> >> (PARAM_STACK_FRAME_GROWTH) / 100; >> >> >> >> - inlined_stack = (to->global.stack_frame_offset >> >> - + inline_summary (to)->estimated_self_stack_size >> >> + inlined_stack = (to->global.estimated_stack_size >> >> + what->global.estimated_stack_size); >> >> if (inlined_stack > stack_size_limit >> >> && inlined_stack > PARAM_VALUE (PARAM_LARGE_STACK_FRAME)) >> >> @@ -2064,7 +2061,6 @@ compute_inline_parameters (struct cgraph >> >> self_stack_size = optimize ? estimated_stack_frame_size (node) : 0; >> >> inline_summary (node)->estimated_self_stack_size = self_stack_size; >> >> node->global.estimated_stack_size = self_stack_size; >> >> - node->global.stack_frame_offset = 0; >> >> >> >> /* Can this function be inlined at all? */ >> >> node->local.inlinable = tree_inlinable_function_p (node->decl); >> >> >> > > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-06-10 9:05 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <BANLkTik0OK=0ksWUosRPGW3x23-OAA34ujryD7iimFZH51x58w@mail.gmail.com> 2011-06-08 1:06 ` [google] pessimize stack accounting during inlining Xinliang David Li 2011-06-08 9:11 ` Richard Guenther 2011-06-10 0:08 ` Mark Heffernan 2011-06-10 9:14 ` Richard Guenther
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).