From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18816 invoked by alias); 25 Jun 2010 09:15:34 -0000 Received: (qmail 18801 invoked by uid 22791); 25 Jun 2010 09:15:30 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,TW_FN X-Spam-Check-By: sourceware.org Received: from mail-wy0-f175.google.com (HELO mail-wy0-f175.google.com) (74.125.82.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 25 Jun 2010 09:15:25 +0000 Received: by wyb36 with SMTP id 36so378188wyb.20 for ; Fri, 25 Jun 2010 02:15:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.144.129 with SMTP id z1mr323698wbu.3.1277457322370; Fri, 25 Jun 2010 02:15:22 -0700 (PDT) Received: by 10.216.182.129 with HTTP; Fri, 25 Jun 2010 02:15:22 -0700 (PDT) In-Reply-To: <20100625083332.GA5209@atrey.karlin.mff.cuni.cz> References: <20100622105258.GA15547@kam.mff.cuni.cz> <20100625083332.GA5209@atrey.karlin.mff.cuni.cz> Date: Fri, 25 Jun 2010 11:06:00 -0000 Message-ID: Subject: Re: Partial inlining From: Richard Guenther To: Jan Hubicka Cc: Richard Guenther , gcc-patches@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-06/txt/msg02558.txt.bz2 On Fri, Jun 25, 2010 at 10:33 AM, Jan Hubicka wrote: >> The above numbers are all with the max-inline-insns-auto reduction? >> I'd like to have that adjustment split out as a separate followup patch. > > Yes, without the change we generally tend to do more inlining (as we prod= uce > more inline candidates). =A0Not exactly in all cases: when we split at po= int > that the call to split function is cold, we reduce size even for unchanges > max-inlie-insns-auto. > > I've removed this change from patch and will commit it sequentially. >> >> The partial-inlining-entry-probability default looks quite high. =A0I wo= uld >> have expected that guessed probabilities of say >> >> =A0 if (flag) >> =A0 =A0 return; >> >> make the entry taken 50% of the time? =A0So, did you experiment with >> other values than 70 or is that something that is suggested by literatur= e? > > Well, not really. =A0It depends on what is inside if conditional. > If it is if (flag < 0), lets say, we will hit non-negative heuristics > and will predict it with 79%. > > The value of 70 seemed high enough to take care of most of random fuzz fr= om the > heruistics, but it is something I plan to experiment with in the future. > In general the predictors on tests of this type are quite ineffective. >> >> You run ipa-split late during early optimizations. =A0Will the new clone >> you create go through the early opts again? =A0Thus, is the placement > > New clone is not considered by early optimizers. It is added as new > function just afterwards. =A0We might want to change this in future, but > that is how cgraph_add_new_function works for now (it queues the function > to be inserted once IPA pass is done and early optimizations are one > IPA pass). > >> at the very end of early opts critical? =A0If so please add a comment > > Well, it will work elsewhere too, just you miss optimizations on the spli= t part > of function then. =A0In general I see little value in moving it earlier, = but I've > added coment. > >> before the NEXT_PASS. =A0I suppose we don't re-run early opts on the >> clone as otherwise we could end up splitting it again (and so for >> a very large function with cascaded if (x) cheap; if (x) cheap; .... >> we will take a very long time to compile because we split of >> each if (x) cheap; one after the other? =A0Please add a testcase >> like this.) > > Splitting is considered just once. =A0But while working on pass, just > for fun I tried to split at every possible split point and quite surprisi= ngly > it did not introduce that much of compile time penalty for such a testcas= e. > (it sped up insn-attrtab compilation that is pretty much like this) > > So multiple split points is something I plan to add, just want to do that > incrementally so the benefits can be more easilly tracked. > > What should the testcase test? That compile-time is properly bounded. So I'd use something like #define FOO if (p) return 1; #define FOO10 FOO FOO FOO FOO FOO FOO FOO FOO FOO #define FOO100 .... int foo(unsigned p) { FOO1000 return 0; } >> >> + =A0for (parm =3D DECL_ARGUMENTS (current_function_decl); parm; >> + =A0 =A0 =A0 parm =3D TREE_CHAIN (parm)) >> + =A0 =A0{ >> + =A0 =A0 =A0if (!is_gimple_reg (parm)) >> + =A0 =A0 =A0 { >> + =A0 =A0 =A0 =A0 if (bitmap_bit_p (non_ssa_vars, DECL_UID (parm))) >> + =A0 =A0 =A0 =A0 =A0 { >> + =A0 =A0 =A0 =A0 =A0 =A0 if (dump_file) >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 fprintf (dump_file, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0" =A0Refused: need to p= ass non-ssa param >> values\n"); >> >> I'm confused by the non_ssa_vars thing. =A0This is just tracking >> whether a non-SSA parameter is being used? > > Yes, to handle non-ssa arguments, we will need to pass them by reference > that means that we will need to change function prototype in less trivial > way than by removing its arguments. > > arg adjustments used in ipa-sra can do this, all we need is to hook them = into > clonning machinery that is something I plan to do relatively soon too (it= will > imply support in virutal clones that needs way to stream them into cgraph= opt > section for WHOPR so it is not completely trivial, but the adjustment cod= e was > meant for this from beggining). > > There are two places we check them. =A0First is for SRA arguments, the ot= her > place is checking that if non-ssa variable is used, it is either used onl= y by > header or by split part, but not by both. > > (for same reason I also do not support yet sharing values computed by hea= der > and used by split part that is something I can do by passing them as argu= ment > or recomputing them in split part if they are cheap. =A0I think especially > for casts in C++ we will want to do the recomputation in trivial cases) >> >> + =A0if (current->split_size <=3D call_overhead) >> + =A0 =A0{ >> + =A0 =A0 =A0if (dump_file) >> + =A0 =A0 =A0 fprintf (dump_file, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0" =A0Refused: split size is smaller tha= n call overhead\n"); >> + =A0 =A0 =A0return; >> + =A0 =A0} >> >> also refuse to split if the size of the header is bigger than that >> of the tail? =A0Otherwise we're just going to inline it back, no? > > Not necesarily - especially when the split point is cold and thus we end > up inlining for size. > > This is something I want to experiment with too. =A0At the moment we prod= uce > more splitting than needed (i.e. we often inline things back) that has > negative effect on compile time and debug info. In practice I didn't found > this bad. > > Keeping the function splitting spearate from inliner heuristics, we can a= lso > play safe and split only when the function becomes inlinable after splitt= ing > or the split part is cold and its size >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (is_gimple_debug (gsi_stmt (bsi))) >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 continue; >> >> I wonder if debug gets confused if you reference vars that are not >> there... > > tree-inline will remove the debug statement (and thus we lose some debug = info > when split part is inlined back), but in general it works. >> +/* Return basic block containing RETURN statement, or EXIT_BLOCK_PTR if >> none >> + =A0 found. =A0*/ >> >> What about BUILT_IN_RETURN? =A0Or returns via EH or abnormally returning >> fns like longjmp? > > EH or abnormal jumps are not problem. =A0Functions returning both via > normal return and BUILT_IN_RETURN (or functions having two return stateme= nt > one with and one without return value) are currently outlawed. > > We can support multiple return BBs if needed, I added TODO for that reaso= n. >> + =A0/* At present we can't pass non-SSA arguments to split function. >> + =A0 =A0 FIXME: this can be relaxed by passing references to arguments.= =A0*/ >> + =A0if (TREE_CODE (t) =3D=3D PARM_DECL) >> + =A0 =A0{ >> + =A0 =A0 =A0if (dump_file) >> + =A0 =A0 =A0 fprintf (dump_file, "Can not split use of non-ssa function >> parameter.\n"); >> + =A0 =A0 =A0return true; >> + =A0 =A0} >> >> I think this will do excessive dumping. =A0At least make it >> dump_flags & TDF_DETAILS. > > Done. >> + =A0if ((!node->callers || !node->callers->next_caller) >> + =A0 =A0 =A0&& !node->address_taken >> + =A0 =A0 =A0&& ((!flag_lto && !flag_whopr) || !node->local.externally_v= isible)) >> + =A0 =A0{ >> + =A0 =A0 =A0if (dump_file) >> + =A0 =A0 =A0 fprintf (dump_file, "Not splitting: not called directly or= called >> once.\n"); >> + =A0 =A0 =A0return 0; >> + =A0 =A0} >> >> Why the check for node->address_taken? =A0Why the check >> for external visibility? > > We need to decide if function can possibly end up calling more than once = when it reach > inliner. =A0When it has address taken, we can devirtualize the call. > When doing LTO, it might be called directly from other unit. >> >> Ok with the above changes (please go over all dumping and make sure >> that without -details we only say that we are splitting and where >> and dump the split function). > > You seem to be consistently on less dumping by default than me. =A0I find= dumping > of reasons why split points are not accepted quite useful, but I guess us= ing > -details is not problem. Well, I think -fdump-tree-all should just dump all functions after each pass. Maybe with very little extra info. -details is for dumping what it did (or not) and why. -fdump-tree-all already is several GB for tramp3d. Richard.