From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-325005-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 28217 invoked by alias); 21 Aug 2012 17:10:44 -0000
Received: (qmail 28192 invoked by uid 22791); 21 Aug 2012 17:10:38 -0000
X-SWARE-Spam-Status: No, hits=-5.8 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mail-lb0-f175.google.com (HELO mail-lb0-f175.google.com) (209.85.217.175)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 21 Aug 2012 17:10:24 +0000
Received: by lban1 with SMTP id n1so116103lba.20        for <gcc-patches@gcc.gnu.org>; Tue, 21 Aug 2012 10:10:22 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=google.com; s=20120113;        h=mime-version:in-reply-to:references:date:message-id:subject:from:to         :cc:content-type:x-system-of-record:x-gm-message-state;        bh=pKUVPPfSV5GhSEGsH3X2a/6sxxMwuxPxdq7lE5i6oNM=;        b=gmn9ZvqNn4agWtQhvrbBR+ydffTli5T2j1DqtkR+OsJyVbWMJRTls8Oobr4iZhojXL         1aqGh5RAs9lCcs91/HTBJin0eNXLKhJre/kxqLikp2f8co4+l0tKd0+evBG1t0ou3L4x         nDqs/ts7agO0oPJCTKg503r0nU3/Q00f7JFo9XB611nxz7yV+XWijcKg5vsJ8GqT1MFf         xFSE7c7Bb6mi+Nz5VoMcNWp4uF7H58qoAaVO7rzYZtfHZgr6yeBlGwuGDl+/ytYogvJe         car+SbVWv7sQ62NnqZK7jKgB0gqJwfigfCmGezjOJCvs28rDYzHfK3f2pnLDHEmIJfJj         g1LQ==
Received: by 10.152.131.42 with SMTP id oj10mr18086745lab.49.1345569022277;        Tue, 21 Aug 2012 10:10:22 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.152.131.42 with SMTP id oj10mr18086731lab.49.1345569022143; Tue, 21 Aug 2012 10:10:22 -0700 (PDT)
Received: by 10.152.144.198 with HTTP; Tue, 21 Aug 2012 10:10:22 -0700 (PDT)
In-Reply-To: <20120821073420.GA24544@kam.mff.cuni.cz>
References: <CAAe5K+XLFtjuDXYhSPXOXMNxgrWDmmajLRtpzjhbCXn51BkUdA@mail.gmail.com>	<20120820094833.GB5505@kam.mff.cuni.cz>	<CAAkRFZ+mn3oJCJuxB8Atu5FnKdKDhsUnUi9cXPcAA+rhDQXF2g@mail.gmail.com>	<m28vd98jy3.fsf@firstfloor.org>	<20120821012730.GC5505@kam.mff.cuni.cz>	<CAAe5K+X-Bb_dqseNcw30LFmtf9Ke-Hm=VRuqkLKUPpHAF5ukGQ@mail.gmail.com>	<20120821052918.GB2407@kam.mff.cuni.cz>	<CAAkRFZ+Amb+koWx4qAP5DDoOzr5CKEwH4=G+vewtt-ZOwvPCQQ@mail.gmail.com>	<20120821063307.GB14457@kam.mff.cuni.cz>	<CAAkRFZJ4QaaG8=zikLiN1K3FfwRj_08v329BiGF=UP_PF-KBxA@mail.gmail.com>	<20120821073420.GA24544@kam.mff.cuni.cz>
Date: Tue, 21 Aug 2012 17:10:00 -0000
Message-ID: <CAAkRFZLPj0VgABB8EbA2K1V7AjsxhxBp7B6KuXHj_FSyGdav2A@mail.gmail.com>
Subject: Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
From: Xinliang David Li <davidxl@google.com>
To: Jan Hubicka <hubicka@ucw.cz>
Cc: Teresa Johnson <tejohnson@google.com>, Andi Kleen <andi@firstfloor.org>, 	reply@codereview.appspotmail.com, gcc-patches@gcc.gnu.org, 	Rong Xu <xur@google.com>
Content-Type: text/plain; charset=ISO-8859-1
X-System-Of-Record: true
X-Gm-Message-State: ALoCoQk52txd8neFcFqPS90KSYYqCxTbSpTwByviIbEWWpLM84Msc8ejm1Jlk4h5+UP2XhpIxmqo3fs9pqPIpS7/SqC/nj7X7q1e1S3m3Uur189g09iAibzXU6pxL9PxBF9V/+Au2hQ8L10rOlnIWzZlS5CLgJKsYLrcOODVytrueQP8ueCndQ27r8Z3SDixjb7WPYNFnnKf
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2012-08/txt/msg01441.txt.bz2

On Tue, Aug 21, 2012 at 12:34 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Teresa has done some tunings for the unroller so far. The inliner
>> tuning is the next step.
>>
>> >
>> > What concerns me that it is greatly inaccurate - you have no idea how many
>> > instructions given counter is guarding and it can differ quite a lot. Also
>> > inlining/optimization makes working sets significantly different (by factor of
>> > 100 for tramp3d).
>>
>> The pre ipa-inline working set is the one that is needed for ipa
>> inliner tuning. For post-ipa inline code increase transformations,
>> some update is probably needed.
>>
>> >But on the ohter hand any solution at this level will be
>> > greatly inaccurate. So I am curious how reliable data you can get from this?
>> > How you take this into account for the heuristics?
>>
>> This effort is just the first step to allow good heuristics to develop.
>>
>> >
>> > It seems to me that for this use perhaps the simple logic in histogram merging
>> > maximizing the number of BBs for given bucket will work well?  It is
>> > inaccurate, but we are working with greatly inaccurate data anyway.
>> > Except for degenerated cases, the small and unimportant runs will have small BB
>> > counts, while large runs will have larger counts and those are ones we optimize
>> > for anyway.
>>
>> The working set curve for each type of applications contains lots of
>> information that can be mined. The inaccuracy can also be mitigated by
>> more data 'calibration'.
>
> Sure, I think I am leaning towards trying the solution 2) with maximizing
> counter count merging (probably it would make sense to rename it from BB count
> since it is not really BB count and thus it is misleading) and we will see how
> well it works in practice.
>
> We have benefits of much fewer issues with profile locking/unlocking and we
> lose bit of precision on BB counts. I tend to believe that the error will not
> be that important in practice. Another loss is more histogram streaming into
> each gcda file, but with skiping zero entries it should not be major overhead
> problem I hope.
>
> What do you think?
>>
>> >>
>> >>
>> >> >  2) Do we plan to add some features in near future that will anyway require global locking?
>> >> >     I guess LIPO itself does not count since it streams its data into independent file as you
>> >> >     mentioned earlier and locking LIPO file is not that hard.
>> >> >     Does LIPO stream everything into that common file, or does it use combination of gcda files
>> >> >     and common summary?
>> >>
>> >> Actually, LIPO module grouping information are stored in gcda files.
>> >> It is also stored in a separate .imports file (one per object) ---
>> >> this is primarily used by our build system for dependence information.
>> >
>> > I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO behave
>> > on GCC bootstrap?
>>
>> We have not tried gcc bootstrap with LIPO. Gcc compile time is not the
>> main problem for application build -- the link time (for debug build)
>> is.
>
> I was primarily curious how the LIPOs runtime analysis fare in the situation where
> you do very many small train runs on rather large app (sure GCC is small to google's
> use case ;).


There will be race, but as Teresa mentioned, there is a big chance
that the process which finishes the merge the last is also t the final
overrider of the LIPO summary data.


>>
>> > (i.e. it does a lot more work in the libgcov module per each
>> > invocation, so I am curious if it is practically useful at all).
>> >
>> > With LTO based solution a lot can be probably pushed at link time? Before
>> > actual GCC starts from the linker plugin, LIPO module can read gcov CFGs from
>> > gcda files and do all the merging/updating/CFG constructions that is currently
>> > performed at runtime, right?
>>
>> The dynamic cgraph build and analysis is still done at runtime.
>> However, with the new implementation, FE is no longer involved. Gcc
>> driver is modified to understand module grouping, and lto is used to
>> merge the streamed output from aux modules.
>
> I see. Are there any fundamental reasons why it can not be done at link-time
> when all gcda files are available?

For build parallelism, the decision should be made as early as
possible -- that is what makes LIPO 'light'.

> Why the grouping is not done inside linker
> plugin?

It is not delayed into link time. In fact linker plugin is not even involved.

David


>
> Honza
>>
>>
>> David