From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15044 invoked by alias); 9 Jun 2011 23:45:35 -0000 Received: (qmail 15027 invoked by uid 22791); 9 Jun 2011 23:45:34 -0000 X-SWARE-Spam-Status: No, hits=-1.2 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (216.239.44.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 09 Jun 2011 23:45:18 +0000 Received: from kpbe13.cbf.corp.google.com (kpbe13.cbf.corp.google.com [172.25.105.77]) by smtp-out.google.com with ESMTP id p59NjHuh010673 for ; Thu, 9 Jun 2011 16:45:17 -0700 Received: from gxk9 (gxk9.prod.google.com [10.202.11.9]) by kpbe13.cbf.corp.google.com with ESMTP id p59NhVLQ017054 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Thu, 9 Jun 2011 16:45:11 -0700 Received: by gxk9 with SMTP id 9so1435439gxk.26 for ; Thu, 09 Jun 2011 16:45:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.239.3 with SMTP id m3mr1322526anh.15.1307663110972; Thu, 09 Jun 2011 16:45:10 -0700 (PDT) Received: by 10.101.107.5 with HTTP; Thu, 9 Jun 2011 16:45:10 -0700 (PDT) In-Reply-To: References: Date: Fri, 10 Jun 2011 00:08:00 -0000 Message-ID: Subject: Re: [google] pessimize stack accounting during inlining From: Mark Heffernan To: Richard Guenther Cc: Xinliang David Li , GCC Patches Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-System-Of-Record: true X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-06/txt/msg00783.txt.bz2 On Wed, Jun 8, 2011 at 1:54 AM, Richard Guenther wrote: > Well, it's still not a hard limit as we can't tell how many spill slots > or extra call argument or return value slots we need. Agreed. =A0It's not perfect. =A0But I've found this does a reasonable job of preventing the inliner from pushing the frame size much beyond the imposed limit especially if the limit is large (eg, many K) relative to the typical total size of spill slots, arguments, etc. Mark > > Richard. > > > Thanks, > > > > David > > > > On Tue, Jun 7, 2011 at 4:29 PM, Mark Heffernan wrot= e: > >> This patch pessimizes stack accounting during inlining. =A0This enables > >> setting a firm=A0stack size limit (via parameters=A0"large-stack-frame= " and > >> "large-stack-frame-growth"). =A0Without this patch the inliner is over= ly > >> optimistic about potential stack reuse resulting in actual stack frame= s much > >> larger than the parameterized limits. > >> Internal benchmarks show minor performance differences with non-fdo and > >> lipo, but overall neutral. =A0Tested/bootstrapped on x86-64. > >> Ok for google-main? > >> Mark > >> > >> 2011-06-07 =A0Mark Heffernan =A0 > >> =A0 =A0 =A0 =A0 * cgraph.h (cgraph_global_info): Remove field. > >> =A0 =A0 =A0 =A0 * ipa-inline.c (cgraph_clone_inlined_nodes): Change > >> =A0 =A0 =A0 =A0 stack frame computation. > >> =A0 =A0 =A0 =A0 (cgraph_check_inline_limits): Ditto. > >> =A0 =A0 =A0 =A0 (compute_inline_parameters): Remove dead initializatio= n. > >> > >> Index: gcc/cgraph.h > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> --- gcc/cgraph.h =A0 =A0 =A0 =A0(revision 174512) > >> +++ gcc/cgraph.h =A0 =A0 =A0 =A0(working copy) > >> @@ -136,8 +136,6 @@ struct GTY(()) cgraph_local_info { > >> =A0struct GTY(()) cgraph_global_info { > >> =A0 =A0/* Estimated stack frame consumption by the function. =A0*/ > >> =A0 =A0HOST_WIDE_INT estimated_stack_size; > >> - =A0/* Expected offset of the stack frame of inlined function. =A0*/ > >> - =A0HOST_WIDE_INT stack_frame_offset; > >> > >> =A0 =A0/* For inline clones this points to the function they will be > >> =A0 =A0 =A0 inlined into. =A0*/ > >> Index: gcc/ipa-inline.c > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> --- gcc/ipa-inline.c =A0 =A0(revision 174512) > >> +++ gcc/ipa-inline.c =A0 =A0(working copy) > >> @@ -229,8 +229,6 @@ void > >> =A0cgraph_clone_inlined_nodes (struct cgraph_edge *e, bool duplicate, > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 bool update_or= iginal) > >> =A0{ > >> - =A0HOST_WIDE_INT peak; > >> - > >> =A0 =A0if (duplicate) > >> =A0 =A0 =A0{ > >> =A0 =A0 =A0 =A0/* We may eliminate the need for out-of-line copy to be= output. > >> @@ -279,13 +277,13 @@ cgraph_clone_inlined_nodes (struct cgrap > >> =A0 =A0 =A0e->callee->global.inlined_to =3D e->caller->global.inlined_= to; > >> =A0 =A0else > >> =A0 =A0 =A0e->callee->global.inlined_to =3D e->caller; > >> - =A0e->callee->global.stack_frame_offset > >> - =A0 =A0=3D e->caller->global.stack_frame_offset > >> - =A0 =A0 =A0+ inline_summary (e->caller)->estimated_self_stack_size; > >> - =A0peak =3D e->callee->global.stack_frame_offset > >> - =A0 =A0 =A0+ inline_summary (e->callee)->estimated_self_stack_size; > >> - =A0if (e->callee->global.inlined_to->global.estimated_stack_size < p= eak) > >> - =A0 =A0e->callee->global.inlined_to->global.estimated_stack_size =3D= peak; > >> + > >> + =A0/* Pessimistically assume no sharing of stack space. =A0That is, = the > >> + =A0 =A0 frame size of a function is estimated as the original frame = size > >> + =A0 =A0 plus the sum of the frame sizes of all inlined callees. =A0*/ > >> + =A0e->callee->global.inlined_to->global.estimated_stack_size +=3D > >> + =A0 =A0inline_summary (e->callee)->estimated_self_stack_size; > >> + > >> =A0 =A0cgraph_propagate_frequency (e->callee); > >> > >> =A0 =A0/* Recursively clone all bodies. =A0*/ > >> @@ -430,8 +428,7 @@ cgraph_check_inline_limits (struct cgrap > >> > >> =A0 =A0stack_size_limit +=3D stack_size_limit * PARAM_VALUE > >> (PARAM_STACK_FRAME_GROWTH) / 100; > >> > >> - =A0inlined_stack =3D (to->global.stack_frame_offset > >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0+ inline_summary (to)->estimated_= self_stack_size > >> + =A0inlined_stack =3D (to->global.estimated_stack_size > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0+ what->global.estimated_stack_= size); > >> =A0 =A0if (inlined_stack =A0> stack_size_limit > >> =A0 =A0 =A0 =A0&& inlined_stack > PARAM_VALUE (PARAM_LARGE_STACK_FRAME= )) > >> @@ -2064,7 +2061,6 @@ compute_inline_parameters (struct cgraph > >> =A0 =A0self_stack_size =3D optimize ? estimated_stack_frame_size (node= ) : 0; > >> =A0 =A0inline_summary (node)->estimated_self_stack_size =3D self_stack= _size; > >> =A0 =A0node->global.estimated_stack_size =3D self_stack_size; > >> - =A0node->global.stack_frame_offset =3D 0; > >> > >> =A0 =A0/* Can this function be inlined at all? =A0*/ > >> =A0 =A0node->local.inlinable =3D tree_inlinable_function_p (node->decl= ); > >> > >