From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6237 invoked by alias); 9 Aug 2013 21:03:59 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 6227 invoked by uid 89); 9 Aug 2013 21:03:59 -0000 X-Spam-SWARE-Status: No, score=-3.6 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,RDNS_NONE,SPF_PASS autolearn=ham version=3.3.1 Received: from Unknown (HELO mail-qe0-f53.google.com) (209.85.128.53) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Fri, 09 Aug 2013 21:03:58 +0000 Received: by mail-qe0-f53.google.com with SMTP id f6so2601180qej.12 for ; Fri, 09 Aug 2013 14:03:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=6EVr155bv/juHle2q5+xc4Uj/I6JXXBlsnHi9r5l6io=; b=Fs6shzCCSIG6Oe8b+dfPvGgspI+4NIvUUdfrder2jxTlZAG84J0FiyF8pYckzGrHZS GUFKpvy3Pp9ExVKD5vjkUsKXbP1v1Gz7Y6nmHizoeN4/PENcoNRfvPv0FLAWLh0vPPHr iqjeA6yez5w7017NrXVlxJ+c2y0RS6Fnbzq0vQB7A22qKtcq1SBIfGQp3mOjkzZeFf85 NSqByHoOynHm/rXnvtG0UucrpcNERBcvH1h1S3YquIFmywxE1t6g2DZRMTFxYqKTtuf7 SC3uH+HMHSrh9P0Iusa8JKLaa5hwfWjSK3hgMibIPxnrKblvaEBG3iMfLoSf4UpT+XGa rNQw== X-Gm-Message-State: ALoCoQnqy7l/xhJsqSzfclUWPdGaVeaowrXwvCQPlpSYhGeePuYSeTHIG7BWfeiyB7co8r/qiK1+bcHkzWLL0GNI6qsvISYWfSoalzOWit1EtCPXBKH3MvXj+h/NkN/ou9oYIn249mfLgC3vcIWvF1g0vrEyXlkgHAlds6IZ/y+A/jBUnJgkYLh/K1bk0oegqgfIcmACaodcJXtHRoX4jJ0EfAcdgslMhQ== MIME-Version: 1.0 X-Received: by 10.224.169.3 with SMTP id w3mr12900686qay.13.1376082230594; Fri, 09 Aug 2013 14:03:50 -0700 (PDT) Received: by 10.49.40.162 with HTTP; Fri, 9 Aug 2013 14:03:50 -0700 (PDT) In-Reply-To: References: <20130802150529.GC15776@kam.mff.cuni.cz> <20130808222332.GA31755@kam.mff.cuni.cz> <20130809095843.GC31755@kam.mff.cuni.cz> <20130809152804.GA6579@kam.mff.cuni.cz> Date: Fri, 09 Aug 2013 21:03:00 -0000 Message-ID: Subject: Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition From: Teresa Johnson To: =?UTF-8?Q?Martin_Li=C5=A1ka?= Cc: Jan Hubicka , Bernhard Reutner-Fischer , "gcc-patches@gcc.gnu.org" , Steven Bosscher , Jeff Law , Sriraman Tallam Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2013-08/txt/msg00585.txt.bz2 On Fri, Aug 9, 2013 at 8:54 AM, Martin Li=C5=A1ka = wrote: > Hi > > On 9 August 2013 17:28, Jan Hubicka wrote: >>> > Do we sanity check that the cold partition does not contain any block= s of >>> > count 0? It may be that the profile is broken enough to make partiti= oning >>> > not work. >>> >>> Do you mean sanity check that the cold partition does not contain any >>> blocks of count > 0? (they should all be zero) I don't think that >>> sanity check is there, but I can try adding that. >> >> Thanks, lets start with this - I suppose we need to figure out if >> 1) the reachable blocks goes to cold section because partitioning decid= es >> so even if they have non-0 count. >> 2) the reachable blocks goes to cold section because they have incorrec= tly >> updated count to 0 by someone >> 3) profiling gets some blocks wrong. >>> >>> The issue with such a sanity check may be due to the later fixup I >>> have in this patch (fixup_partitions). It is invoked after certain >>> optimizations on the cfg that may make hot blocks previously reached >>> by both hot and cold edges only reachable by cold blocks. These blocks >>> are remarked cold. If the profile data hasn't been updated correctly >>> it is possible that they would still have a non-0 count, although they >>> are essentially cold after the cfg transformation. >> >> Well, or the other posibility is that the edges was updated wrong >> and the blocks are really cold. We need to figure out if that happens >> commonly enough. >> >> I will try to think of some artificial testcases. >> >>> >>> But certainly such a sanity check should always succeed after the >>> original partitioning. >>> >>> > I can think of inlining where the count gets scaled all way down to 0= . Perhaps >>> > count scaling code can be modified to never round towards 0 for block= executing >>> > non-0 times... >>> >>> This reminds me of why this situation could happen. When I have been >>> testing this on the google branch I found situations where COMDAT >>> routines have 0 profile counts (if I remember correctly, this happens >>> when profile-gen binary has call to out-of-line copy of COMDAT in >>> module A, linker chooses the out-of-line copy from module B, therefore >>> the profile data for COMDAT in module A is 0). When the COMDAT gets >>> inlined, the 0 counts on its bbs are scaled to 0, even though the >>> callsite is non-zero. I have a patch that I was planning to send as a >>> follow-up that handles this case by propagating the callsite bb's >>> count to the inlined code when it has 0 counts, scaling by the edge >>> frequencies. I can either include that patch in this one, or send it >>> for review separately right now. Do you want to give it a try with >>> this one to see if it addresses the issue? >> >> This scenario should not happen with LTO setup: the LTO symbol tables co= ntains >> code before early optimization and should be identical with profiling or >> without (modulo the new references and call from profile code). >> >> But this patch seems useful as a backup solution for non-LTO, so yes, pl= ease >> send it separately and I can try to double check that it really do not h= appen >> with LTO. >> (acutally LTO symtab may just chose COMDAT from module that has counts w= ith it. >> It has all the info for it. I was thinkin about it few weeks back. It = is >> bit hard to do - you need to verify that all references from the functio= n are >> the same or linking might fail if you overwrite linker's decisiosns). >>> >>> Also, can you send me reproduction instructions for gimp? I don't >>> think I need Martin's patch, but which version of gimp and what is the >>> equivalent way for me to train it? I have some scripts to generate a >>> similar type of instruction heat map graph that I have been using to >>> tune partitioning and function reordering. Essentially it uses linux >>> perf to sample on instructions_retired and then munge the data in >>> several ways to produce various stats and graphs. One thing that has >>> been useful has been to combine the perf data with nm output to >>> determine which cold functions are being executed at runtime. >> >> Martin? > > I use gimp from git repository, commit: > 88ecd59c3436d302b644a5d25c1938c0e7b60ae0 (from Fet 5 2013) > Link: http://www.gimp.org/source/#gimp_from_git Thanks. Were you building with LTO? And just -O2, or any other options I should use? Teresa > > Martin > >>> >>> However, for this to tell me which split cold bbs are being executed I >>> need to use a patch that Sri sent for review several months back that >>> gives the split cold section its own name: >>> http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01571.html >>> Steven had some follow up comments that Sri hasn't had a chance to addr= ess yet: >>> http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00798.html >>> (cc'ing Sri as we should probably revive this patch soon to address >>> gdb and other issues with detecting split functions properly) >> >> Intresting, I used linker script for this purposes, but that his GNU ld = only... >> >> Honza >>> >>> Thanks! >>> Teresa >>> >>> > >>> > Honza >>> >> >>> >> Thanks, >>> >> Teresa >>> >> >>> >> > I think we are really looking primarily for dead parts of the func= tions (sanity checks/error handling) >>> >> > that should not be visited by train run. We can then see how to m= ake the heuristic more aggressive? >>> >> > >>> >> > Honza >>> >> >>> >> >>> >> >>> >> -- >>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-= 2413 >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 --=20 Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413