From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19501 invoked by alias); 11 Aug 2013 21:05:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 19491 invoked by uid 89); 11 Aug 2013 21:05:37 -0000 X-Spam-SWARE-Status: No, score=-5.3 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_NO,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RP_MATCHES_RCVD autolearn=ham version=3.3.2 Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Sun, 11 Aug 2013 21:05:35 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 81012543951; Sun, 11 Aug 2013 23:05:32 +0200 (CEST) Date: Sun, 11 Aug 2013 21:05:00 -0000 From: Jan Hubicka To: Teresa Johnson Cc: Jan Hubicka , Bernhard Reutner-Fischer , "gcc-patches@gcc.gnu.org" , Steven Bosscher , Jeff Law , "marxin.liska" , Sriraman Tallam , Rong Xu Subject: Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition Message-ID: <20130811210532.GA9197@kam.mff.cuni.cz> References: <20130808222332.GA31755@kam.mff.cuni.cz> <20130809095843.GC31755@kam.mff.cuni.cz> <20130809152804.GA6579@kam.mff.cuni.cz> <20130811122143.GE22678@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-SW-Source: 2013-08/txt/msg00638.txt.bz2 Hi, thinking about it a bit more, I suppose easiest way is to 1) make separate sets of counters for each comdat and place them into comdat section named as DECL_COMDAT_GROUP (node) + cfg_checksum + individual_counter_counts. This will make linker to unify the sections for us. 2) extend API of libgcov initialization so multiple counters can be recorded per file. 3) at merging time, gcov needs to merge all comdat section counters into temporary memory, so multiple merging won't produce bad results 4) counter streaming will need to be updated to deal with separate comdat sections... 5) probably we will want to update histogram production to avoid counting same comdat many times (this can be done by adding an "processed" flag into the per-function sections I don't see any obvious problems with this plan, just that it is quite some work. If you had chance to implement something along these lines, I think it would help ;)) Honza > Cc'ing Rong since he is also working on trying to address the comdat > profile issue. Rong, you may need to see an earlier message for more > context: > http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00558.html > > Teresa > > On Sun, Aug 11, 2013 at 5:21 AM, Jan Hubicka wrote: > >> > >> I see, yes LTO can deal with this better since it has global > >> information. In non-LTO mode (including LIPO) we have the issue. > > > > Either Martin or me will implement merging of the multiple copies at > > LTO link time. This is needed for Martin's code unification patch anyway. > > > > Theoretically gcov runtime can also have symbol names and cfg checksums of > > comdats in the static data and at exit produce buckets based on matching > > names+checksums+counter counts, merge all data into in each bucket to one > > representative by the existing merging routines and then memcpy them to > > all the oriignal copiles. This way all compilation units will receive same > > results. > > > > I am not very keen about making gcov runtime bigger and more complex than it > > needs to be, but having sane profile for comdats seems quite important. > > Perhaps, in GNU toolchain, ordered subsections can be used to make linker to > > produce ordered list of comdats, so the runtime won't need to do hashing + > > lookups. > > > > Honza > >> > >> I take it gimp is built with LTO and therefore shouldn't be hitting > >> this comdat issue? > >> > >> Let me do a couple things: > >> - port over my comdat inlining fix from the google branch to trunk and > >> send it for review. If you or Martin could try it to see if it helps > >> with function splitting to avoid the hits from the cold code that > >> would be great > >> - I'll add some new sanity checking to try to detect non-zero blocks > >> in the cold section, or 0 blocks reached by non-zero edges and see if > >> I can flush out any problems with my tests or a profiledbootstrap or > >> gimp. > >> - I'll try building and profiling gimp myself to see if I can > >> reproduce the issue with code executing out of the cold section. > >> > >> Thanks, > >> Teresa > >> > >> >> > >> >> Also, can you send me reproduction instructions for gimp? I don't > >> >> think I need Martin's patch, but which version of gimp and what is the > >> >> equivalent way for me to train it? I have some scripts to generate a > >> >> similar type of instruction heat map graph that I have been using to > >> >> tune partitioning and function reordering. Essentially it uses linux > >> >> perf to sample on instructions_retired and then munge the data in > >> >> several ways to produce various stats and graphs. One thing that has > >> >> been useful has been to combine the perf data with nm output to > >> >> determine which cold functions are being executed at runtime. > >> > > >> > Martin? > >> > > >> >> > >> >> However, for this to tell me which split cold bbs are being executed I > >> >> need to use a patch that Sri sent for review several months back that > >> >> gives the split cold section its own name: > >> >> http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01571.html > >> >> Steven had some follow up comments that Sri hasn't had a chance to address yet: > >> >> http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00798.html > >> >> (cc'ing Sri as we should probably revive this patch soon to address > >> >> gdb and other issues with detecting split functions properly) > >> > > >> > Intresting, I used linker script for this purposes, but that his GNU ld only... > >> > > >> > Honza > >> >> > >> >> Thanks! > >> >> Teresa > >> >> > >> >> > > >> >> > Honza > >> >> >> > >> >> >> Thanks, > >> >> >> Teresa > >> >> >> > >> >> >> > I think we are really looking primarily for dead parts of the functions (sanity checks/error handling) > >> >> >> > that should not be visited by train run. We can then see how to make the heuristic more aggressive? > >> >> >> > > >> >> >> > Honza > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > >> >> > >> >> > >> >> > >> >> -- > >> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > >> > >> > >> > >> -- > >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413