From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25900 invoked by alias); 7 Jun 2011 23:37:08 -0000 Received: (qmail 25891 invoked by uid 22791); 7 Jun 2011 23:37:08 -0000 X-SWARE-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (216.239.44.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 07 Jun 2011 23:36:54 +0000 Received: from wpaz1.hot.corp.google.com (wpaz1.hot.corp.google.com [172.24.198.65]) by smtp-out.google.com with ESMTP id p57NarLi022209 for ; Tue, 7 Jun 2011 16:36:53 -0700 Received: from ywl41 (ywl41.prod.google.com [10.192.12.41]) by wpaz1.hot.corp.google.com with ESMTP id p57NaqCD031027 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Tue, 7 Jun 2011 16:36:52 -0700 Received: by ywl41 with SMTP id 41so1904ywl.18 for ; Tue, 07 Jun 2011 16:36:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.91.4.31 with SMTP id g31mr6189096agi.147.1307489811948; Tue, 07 Jun 2011 16:36:51 -0700 (PDT) Received: by 10.91.213.2 with HTTP; Tue, 7 Jun 2011 16:36:51 -0700 (PDT) In-Reply-To: References: Date: Wed, 08 Jun 2011 01:06:00 -0000 Message-ID: Subject: Re: [google] pessimize stack accounting during inlining From: Xinliang David Li To: Mark Heffernan Cc: GCC Patches Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-System-Of-Record: true X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-06/txt/msg00598.txt.bz2 Ok for google/main. A good candidate patch for trunk too. Thanks, David On Tue, Jun 7, 2011 at 4:29 PM, Mark Heffernan wrote: > This patch pessimizes stack accounting during inlining. =A0This enables > setting a firm=A0stack size limit (via parameters=A0"large-stack-frame" a= nd > "large-stack-frame-growth"). =A0Without this patch the inliner is overly > optimistic about potential stack reuse resulting in actual stack frames m= uch > larger than the parameterized limits. > Internal benchmarks show minor performance differences with non-fdo and > lipo, but overall neutral. =A0Tested/bootstrapped on x86-64. > Ok for google-main? > Mark > > 2011-06-07 =A0Mark Heffernan =A0 > =A0 =A0 =A0 =A0 * cgraph.h (cgraph_global_info): Remove field. > =A0 =A0 =A0 =A0 * ipa-inline.c (cgraph_clone_inlined_nodes): Change > =A0 =A0 =A0 =A0 stack frame computation. > =A0 =A0 =A0 =A0 (cgraph_check_inline_limits): Ditto. > =A0 =A0 =A0 =A0 (compute_inline_parameters): Remove dead initialization. > > Index: gcc/cgraph.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/cgraph.h =A0 =A0 =A0 =A0(revision 174512) > +++ gcc/cgraph.h =A0 =A0 =A0 =A0(working copy) > @@ -136,8 +136,6 @@ struct GTY(()) cgraph_local_info { > =A0struct GTY(()) cgraph_global_info { > =A0 =A0/* Estimated stack frame consumption by the function. =A0*/ > =A0 =A0HOST_WIDE_INT estimated_stack_size; > - =A0/* Expected offset of the stack frame of inlined function. =A0*/ > - =A0HOST_WIDE_INT stack_frame_offset; > > =A0 =A0/* For inline clones this points to the function they will be > =A0 =A0 =A0 inlined into. =A0*/ > Index: gcc/ipa-inline.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/ipa-inline.c =A0 =A0(revision 174512) > +++ gcc/ipa-inline.c =A0 =A0(working copy) > @@ -229,8 +229,6 @@ void > =A0cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 bool update_origi= nal) > =A0{ > - =A0HOST_WIDE_INT peak; > - > =A0 =A0if (duplicate) > =A0 =A0 =A0{ > =A0 =A0 =A0 =A0/* We may eliminate the need for out-of-line copy to be ou= tput. > @@ -279,13 +277,13 @@ cgraph_clone_inlined_nodes (struct cgrap > =A0 =A0 =A0e->callee->global.inlined_to =3D e->caller->global.inlined_to; > =A0 =A0else > =A0 =A0 =A0e->callee->global.inlined_to =3D e->caller; > - =A0e->callee->global.stack_frame_offset > - =A0 =A0=3D e->caller->global.stack_frame_offset > - =A0 =A0 =A0+ inline_summary (e->caller)->estimated_self_stack_size; > - =A0peak =3D e->callee->global.stack_frame_offset > - =A0 =A0 =A0+ inline_summary (e->callee)->estimated_self_stack_size; > - =A0if (e->callee->global.inlined_to->global.estimated_stack_size < peak) > - =A0 =A0e->callee->global.inlined_to->global.estimated_stack_size =3D pe= ak; > + > + =A0/* Pessimistically assume no sharing of stack space. =A0That is, the > + =A0 =A0 frame size of a function is estimated as the original frame size > + =A0 =A0 plus the sum of the frame sizes of all inlined callees. =A0*/ > + =A0e->callee->global.inlined_to->global.estimated_stack_size +=3D > + =A0 =A0inline_summary (e->callee)->estimated_self_stack_size; > + > =A0 =A0cgraph_propagate_frequency (e->callee); > > =A0 =A0/* Recursively clone all bodies. =A0*/ > @@ -430,8 +428,7 @@ cgraph_check_inline_limits (struct cgrap > > =A0 =A0stack_size_limit +=3D stack_size_limit * PARAM_VALUE > (PARAM_STACK_FRAME_GROWTH) / 100; > > - =A0inlined_stack =3D (to->global.stack_frame_offset > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0+ inline_summary (to)->estimated_sel= f_stack_size > + =A0inlined_stack =3D (to->global.estimated_stack_size > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0+ what->global.estimated_stack_siz= e); > =A0 =A0if (inlined_stack =A0> stack_size_limit > =A0 =A0 =A0 =A0&& inlined_stack > PARAM_VALUE (PARAM_LARGE_STACK_FRAME)) > @@ -2064,7 +2061,6 @@ compute_inline_parameters (struct cgraph > =A0 =A0self_stack_size =3D optimize ? estimated_stack_frame_size (node) := 0; > =A0 =A0inline_summary (node)->estimated_self_stack_size =3D self_stack_si= ze; > =A0 =A0node->global.estimated_stack_size =3D self_stack_size; > - =A0node->global.stack_frame_offset =3D 0; > > =A0 =A0/* Can this function be inlined at all? =A0*/ > =A0 =A0node->local.inlinable =3D tree_inlinable_function_p (node->decl); >