From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-266335-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 18816 invoked by alias); 25 Jun 2010 09:15:34 -0000
Received: (qmail 18801 invoked by uid 22791); 25 Jun 2010 09:15:30 -0000
X-SWARE-Spam-Status: No, hits=-1.8 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,TW_FN
X-Spam-Check-By: sourceware.org
Received: from mail-wy0-f175.google.com (HELO mail-wy0-f175.google.com) (74.125.82.175)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 25 Jun 2010 09:15:25 +0000
Received: by wyb36 with SMTP id 36so378188wyb.20        for <gcc-patches@gcc.gnu.org>; Fri, 25 Jun 2010 02:15:22 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.227.144.129 with SMTP id z1mr323698wbu.3.1277457322370; Fri, 	25 Jun 2010 02:15:22 -0700 (PDT)
Received: by 10.216.182.129 with HTTP; Fri, 25 Jun 2010 02:15:22 -0700 (PDT)
In-Reply-To: <20100625083332.GA5209@atrey.karlin.mff.cuni.cz>
References: <20100622105258.GA15547@kam.mff.cuni.cz>	<alpine.LNX.2.00.1006221417250.1429@zhemvz.fhfr.qr>	<20100625083332.GA5209@atrey.karlin.mff.cuni.cz>
Date: Fri, 25 Jun 2010 11:06:00 -0000
Message-ID: <AANLkTinARNAdJYQPtRTgTagixgtC4_c8Yla9S-GotGTf@mail.gmail.com>
Subject: Re: Partial inlining
From: Richard Guenther <richard.guenther@gmail.com>
To: Jan Hubicka <hubicka@ucw.cz>
Cc: Richard Guenther <rguenther@suse.de>, gcc-patches@gcc.gnu.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2010-06/txt/msg02558.txt.bz2

On Fri, Jun 25, 2010 at 10:33 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> The above numbers are all with the max-inline-insns-auto reduction?
>> I'd like to have that adjustment split out as a separate followup patch.
>
> Yes, without the change we generally tend to do more inlining (as we prod=
uce
> more inline candidates). =A0Not exactly in all cases: when we split at po=
int
> that the call to split function is cold, we reduce size even for unchanges
> max-inlie-insns-auto.
>
> I've removed this change from patch and will commit it sequentially.
>>
>> The partial-inlining-entry-probability default looks quite high. =A0I wo=
uld
>> have expected that guessed probabilities of say
>>
>> =A0 if (flag)
>> =A0 =A0 return;
>>
>> make the entry taken 50% of the time? =A0So, did you experiment with
>> other values than 70 or is that something that is suggested by literatur=
e?
>
> Well, not really. =A0It depends on what is inside if conditional.
> If it is if (flag < 0), lets say, we will hit non-negative heuristics
> and will predict it with 79%.
>
> The value of 70 seemed high enough to take care of most of random fuzz fr=
om the
> heruistics, but it is something I plan to experiment with in the future.
> In general the predictors on tests of this type are quite ineffective.
>>
>> You run ipa-split late during early optimizations. =A0Will the new clone
>> you create go through the early opts again? =A0Thus, is the placement
>
> New clone is not considered by early optimizers. It is added as new
> function just afterwards. =A0We might want to change this in future, but
> that is how cgraph_add_new_function works for now (it queues the function
> to be inserted once IPA pass is done and early optimizations are one
> IPA pass).
>
>> at the very end of early opts critical? =A0If so please add a comment
>
> Well, it will work elsewhere too, just you miss optimizations on the spli=
t part
> of function then. =A0In general I see little value in moving it earlier, =
but I've
> added coment.
>
>> before the NEXT_PASS. =A0I suppose we don't re-run early opts on the
>> clone as otherwise we could end up splitting it again (and so for
>> a very large function with cascaded if (x) cheap; if (x) cheap; ....
>> we will take a very long time to compile because we split of
>> each if (x) cheap; one after the other? =A0Please add a testcase
>> like this.)
>
> Splitting is considered just once. =A0But while working on pass, just
> for fun I tried to split at every possible split point and quite surprisi=
ngly
> it did not introduce that much of compile time penalty for such a testcas=
e.
> (it sped up insn-attrtab compilation that is pretty much like this)
>
> So multiple split points is something I plan to add, just want to do that
> incrementally so the benefits can be more easilly tracked.
>
> What should the testcase test?

That compile-time is properly bounded.  So I'd use something like

#define FOO if (p) return 1;
#define FOO10 FOO FOO FOO FOO FOO FOO FOO FOO FOO
#define FOO100 ....

int foo(unsigned p)
{
  FOO1000
  return 0;
}

>>
>> + =A0for (parm =3D DECL_ARGUMENTS (current_function_decl); parm;
>> + =A0 =A0 =A0 parm =3D TREE_CHAIN (parm))
>> + =A0 =A0{
>> + =A0 =A0 =A0if (!is_gimple_reg (parm))
>> + =A0 =A0 =A0 {
>> + =A0 =A0 =A0 =A0 if (bitmap_bit_p (non_ssa_vars, DECL_UID (parm)))
>> + =A0 =A0 =A0 =A0 =A0 {
>> + =A0 =A0 =A0 =A0 =A0 =A0 if (dump_file)
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 fprintf (dump_file,
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0" =A0Refused: need to p=
ass non-ssa param
>> values\n");
>>
>> I'm confused by the non_ssa_vars thing. =A0This is just tracking
>> whether a non-SSA parameter is being used?
>
> Yes, to handle non-ssa arguments, we will need to pass them by reference
> that means that we will need to change function prototype in less trivial
> way than by removing its arguments.
>
> arg adjustments used in ipa-sra can do this, all we need is to hook them =
into
> clonning machinery that is something I plan to do relatively soon too (it=
 will
> imply support in virutal clones that needs way to stream them into cgraph=
opt
> section for WHOPR so it is not completely trivial, but the adjustment cod=
e was
> meant for this from beggining).
>
> There are two places we check them. =A0First is for SRA arguments, the ot=
her
> place is checking that if non-ssa variable is used, it is either used onl=
y by
> header or by split part, but not by both.
>
> (for same reason I also do not support yet sharing values computed by hea=
der
> and used by split part that is something I can do by passing them as argu=
ment
> or recomputing them in split part if they are cheap. =A0I think especially
> for casts in C++ we will want to do the recomputation in trivial cases)
>>
>> + =A0if (current->split_size <=3D call_overhead)
>> + =A0 =A0{
>> + =A0 =A0 =A0if (dump_file)
>> + =A0 =A0 =A0 fprintf (dump_file,
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0" =A0Refused: split size is smaller tha=
n call overhead\n");
>> + =A0 =A0 =A0return;
>> + =A0 =A0}
>>
>> also refuse to split if the size of the header is bigger than that
>> of the tail? =A0Otherwise we're just going to inline it back, no?
>
> Not necesarily - especially when the split point is cold and thus we end
> up inlining for size.
>
> This is something I want to experiment with too. =A0At the moment we prod=
uce
> more splitting than needed (i.e. we often inline things back) that has
> negative effect on compile time and debug info. In practice I didn't found
> this bad.
>
> Keeping the function splitting spearate from inliner heuristics, we can a=
lso
> play safe and split only when the function becomes inlinable after splitt=
ing
> or the split part is cold and its size
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (is_gimple_debug (gsi_stmt (bsi)))
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 continue;
>>
>> I wonder if debug gets confused if you reference vars that are not
>> there...
>
> tree-inline will remove the debug statement (and thus we lose some debug =
info
> when split part is inlined back), but in general it works.
>> +/* Return basic block containing RETURN statement, or EXIT_BLOCK_PTR if
>> none
>> + =A0 found. =A0*/
>>
>> What about BUILT_IN_RETURN? =A0Or returns via EH or abnormally returning
>> fns like longjmp?
>
> EH or abnormal jumps are not problem. =A0Functions returning both via
> normal return and BUILT_IN_RETURN (or functions having two return stateme=
nt
> one with and one without return value) are currently outlawed.
>
> We can support multiple return BBs if needed, I added TODO for that reaso=
n.
>> + =A0/* At present we can't pass non-SSA arguments to split function.
>> + =A0 =A0 FIXME: this can be relaxed by passing references to arguments.=
 =A0*/
>> + =A0if (TREE_CODE (t) =3D=3D PARM_DECL)
>> + =A0 =A0{
>> + =A0 =A0 =A0if (dump_file)
>> + =A0 =A0 =A0 fprintf (dump_file, "Can not split use of non-ssa function
>> parameter.\n");
>> + =A0 =A0 =A0return true;
>> + =A0 =A0}
>>
>> I think this will do excessive dumping. =A0At least make it
>> dump_flags & TDF_DETAILS.
>
> Done.
>> + =A0if ((!node->callers || !node->callers->next_caller)
>> + =A0 =A0 =A0&& !node->address_taken
>> + =A0 =A0 =A0&& ((!flag_lto && !flag_whopr) || !node->local.externally_v=
isible))
>> + =A0 =A0{
>> + =A0 =A0 =A0if (dump_file)
>> + =A0 =A0 =A0 fprintf (dump_file, "Not splitting: not called directly or=
 called
>> once.\n");
>> + =A0 =A0 =A0return 0;
>> + =A0 =A0}
>>
>> Why the check for node->address_taken? =A0Why the check
>> for external visibility?
>
> We need to decide if function can possibly end up calling more than once =
when it reach
> inliner. =A0When it has address taken, we can devirtualize the call.
> When doing LTO, it might be called directly from other unit.
>>
>> Ok with the above changes (please go over all dumping and make sure
>> that without -details we only say that we are splitting and where
>> and dump the split function).
>
> You seem to be consistently on less dumping by default than me. =A0I find=
 dumping
> of reasons why split points are not accepted quite useful, but I guess us=
ing
> -details is not problem.

Well, I think -fdump-tree-all should just dump all functions after
each pass.  Maybe with very little extra info.  -details is for
dumping what it did (or not) and why.

-fdump-tree-all already is several GB for tramp3d.

Richard.