From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28217 invoked by alias); 21 Aug 2012 17:10:44 -0000 Received: (qmail 28192 invoked by uid 22791); 21 Aug 2012 17:10:38 -0000 X-SWARE-Spam-Status: No, hits=-5.8 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail-lb0-f175.google.com (HELO mail-lb0-f175.google.com) (209.85.217.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 21 Aug 2012 17:10:24 +0000 Received: by lban1 with SMTP id n1so116103lba.20 for ; Tue, 21 Aug 2012 10:10:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-system-of-record:x-gm-message-state; bh=pKUVPPfSV5GhSEGsH3X2a/6sxxMwuxPxdq7lE5i6oNM=; b=gmn9ZvqNn4agWtQhvrbBR+ydffTli5T2j1DqtkR+OsJyVbWMJRTls8Oobr4iZhojXL 1aqGh5RAs9lCcs91/HTBJin0eNXLKhJre/kxqLikp2f8co4+l0tKd0+evBG1t0ou3L4x nDqs/ts7agO0oPJCTKg503r0nU3/Q00f7JFo9XB611nxz7yV+XWijcKg5vsJ8GqT1MFf xFSE7c7Bb6mi+Nz5VoMcNWp4uF7H58qoAaVO7rzYZtfHZgr6yeBlGwuGDl+/ytYogvJe car+SbVWv7sQ62NnqZK7jKgB0gqJwfigfCmGezjOJCvs28rDYzHfK3f2pnLDHEmIJfJj g1LQ== Received: by 10.152.131.42 with SMTP id oj10mr18086745lab.49.1345569022277; Tue, 21 Aug 2012 10:10:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.152.131.42 with SMTP id oj10mr18086731lab.49.1345569022143; Tue, 21 Aug 2012 10:10:22 -0700 (PDT) Received: by 10.152.144.198 with HTTP; Tue, 21 Aug 2012 10:10:22 -0700 (PDT) In-Reply-To: <20120821073420.GA24544@kam.mff.cuni.cz> References: <20120820094833.GB5505@kam.mff.cuni.cz> <20120821012730.GC5505@kam.mff.cuni.cz> <20120821052918.GB2407@kam.mff.cuni.cz> <20120821063307.GB14457@kam.mff.cuni.cz> <20120821073420.GA24544@kam.mff.cuni.cz> Date: Tue, 21 Aug 2012 17:10:00 -0000 Message-ID: Subject: Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057) From: Xinliang David Li To: Jan Hubicka Cc: Teresa Johnson , Andi Kleen , reply@codereview.appspotmail.com, gcc-patches@gcc.gnu.org, Rong Xu Content-Type: text/plain; charset=ISO-8859-1 X-System-Of-Record: true X-Gm-Message-State: ALoCoQk52txd8neFcFqPS90KSYYqCxTbSpTwByviIbEWWpLM84Msc8ejm1Jlk4h5+UP2XhpIxmqo3fs9pqPIpS7/SqC/nj7X7q1e1S3m3Uur189g09iAibzXU6pxL9PxBF9V/+Au2hQ8L10rOlnIWzZlS5CLgJKsYLrcOODVytrueQP8ueCndQ27r8Z3SDixjb7WPYNFnnKf X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-08/txt/msg01441.txt.bz2 On Tue, Aug 21, 2012 at 12:34 AM, Jan Hubicka wrote: >> Teresa has done some tunings for the unroller so far. The inliner >> tuning is the next step. >> >> > >> > What concerns me that it is greatly inaccurate - you have no idea how many >> > instructions given counter is guarding and it can differ quite a lot. Also >> > inlining/optimization makes working sets significantly different (by factor of >> > 100 for tramp3d). >> >> The pre ipa-inline working set is the one that is needed for ipa >> inliner tuning. For post-ipa inline code increase transformations, >> some update is probably needed. >> >> >But on the ohter hand any solution at this level will be >> > greatly inaccurate. So I am curious how reliable data you can get from this? >> > How you take this into account for the heuristics? >> >> This effort is just the first step to allow good heuristics to develop. >> >> > >> > It seems to me that for this use perhaps the simple logic in histogram merging >> > maximizing the number of BBs for given bucket will work well? It is >> > inaccurate, but we are working with greatly inaccurate data anyway. >> > Except for degenerated cases, the small and unimportant runs will have small BB >> > counts, while large runs will have larger counts and those are ones we optimize >> > for anyway. >> >> The working set curve for each type of applications contains lots of >> information that can be mined. The inaccuracy can also be mitigated by >> more data 'calibration'. > > Sure, I think I am leaning towards trying the solution 2) with maximizing > counter count merging (probably it would make sense to rename it from BB count > since it is not really BB count and thus it is misleading) and we will see how > well it works in practice. > > We have benefits of much fewer issues with profile locking/unlocking and we > lose bit of precision on BB counts. I tend to believe that the error will not > be that important in practice. Another loss is more histogram streaming into > each gcda file, but with skiping zero entries it should not be major overhead > problem I hope. > > What do you think? >> >> >> >> >> >> >> > 2) Do we plan to add some features in near future that will anyway require global locking? >> >> > I guess LIPO itself does not count since it streams its data into independent file as you >> >> > mentioned earlier and locking LIPO file is not that hard. >> >> > Does LIPO stream everything into that common file, or does it use combination of gcda files >> >> > and common summary? >> >> >> >> Actually, LIPO module grouping information are stored in gcda files. >> >> It is also stored in a separate .imports file (one per object) --- >> >> this is primarily used by our build system for dependence information. >> > >> > I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO behave >> > on GCC bootstrap? >> >> We have not tried gcc bootstrap with LIPO. Gcc compile time is not the >> main problem for application build -- the link time (for debug build) >> is. > > I was primarily curious how the LIPOs runtime analysis fare in the situation where > you do very many small train runs on rather large app (sure GCC is small to google's > use case ;). There will be race, but as Teresa mentioned, there is a big chance that the process which finishes the merge the last is also t the final overrider of the LIPO summary data. >> >> > (i.e. it does a lot more work in the libgcov module per each >> > invocation, so I am curious if it is practically useful at all). >> > >> > With LTO based solution a lot can be probably pushed at link time? Before >> > actual GCC starts from the linker plugin, LIPO module can read gcov CFGs from >> > gcda files and do all the merging/updating/CFG constructions that is currently >> > performed at runtime, right? >> >> The dynamic cgraph build and analysis is still done at runtime. >> However, with the new implementation, FE is no longer involved. Gcc >> driver is modified to understand module grouping, and lto is used to >> merge the streamed output from aux modules. > > I see. Are there any fundamental reasons why it can not be done at link-time > when all gcda files are available? For build parallelism, the decision should be made as early as possible -- that is what makes LIPO 'light'. > Why the grouping is not done inside linker > plugin? It is not delayed into link time. In fact linker plugin is not even involved. David > > Honza >> >> >> David