From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-346574-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 27001 invoked by alias); 9 Aug 2013 15:28:15 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 26975 invoked by uid 89); 9 Aug 2013 15:28:14 -0000
X-Spam-SWARE-Status: No, score=-3.6 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_NO,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RDNS_NONE autolearn=no version=3.3.1
Received: from Unknown (HELO nikam.ms.mff.cuni.cz) (195.113.20.16)    by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Fri, 09 Aug 2013 15:28:12 +0000
Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202)	id 99C92543925; Fri,  9 Aug 2013 17:28:04 +0200 (CEST)
Date: Fri, 09 Aug 2013 15:28:00 -0000
From: Jan Hubicka <hubicka@ucw.cz>
To: Teresa Johnson <tejohnson@google.com>
Cc: Jan Hubicka <hubicka@ucw.cz>,	Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>,	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,	Steven Bosscher <stevenb.gcc@gmail.com>, Jeff Law <law@redhat.com>,	"marxin.liska" <marxin.liska@gmail.com>,	Sriraman Tallam <tmsriram@google.com>
Subject: Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
Message-ID: <20130809152804.GA6579@kam.mff.cuni.cz>
References: <CAAe5K+U6+xyy95KSeA7+SZ0tUdFt2dmF-vSxNsBsqg53NSyU3Q@mail.gmail.com> <CAC1BbcSnwiAfzKqngLvLBGnmPDLNhYdwbyWDMJ++PzsMd=M-3Q@mail.gmail.com> <CAAe5K+XM_RCJBmtKSyQCfD86hi2Ls_BCyuZkd2WKSoGUFm84DA@mail.gmail.com> <20130802150529.GC15776@kam.mff.cuni.cz> <CAAe5K+WiR02Rs1jYMFRF2F8ey60UO7LwRa8WWq7coQ5Pq8HhiQ@mail.gmail.com> <CAAe5K+WToUznwFFfm5beapXAOOrOgxHR8LXmYBTL70C4VVsT+w@mail.gmail.com> <20130808222332.GA31755@kam.mff.cuni.cz> <CAAe5K+W+8borbPkt4BB1ayRgDbFBtd6oyZsGuUiC854o9t0Rjg@mail.gmail.com> <20130809095843.GC31755@kam.mff.cuni.cz> <CAAe5K+XXT6t5CXBDXPWMNSrLWwqfw8F_J2fNUAN2afqb5qPhzQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAAe5K+XXT6t5CXBDXPWMNSrLWwqfw8F_J2fNUAN2afqb5qPhzQ@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-SW-Source: 2013-08/txt/msg00558.txt.bz2

> > Do we sanity check that the cold partition does not contain any blocks of
> > count 0?  It may be that the profile is broken enough to make partitioning
> > not work.
> 
> Do you mean sanity check that the cold partition does not contain any
> blocks of count > 0? (they should all be zero) I don't think that
> sanity check is there, but I can try adding that.

Thanks, lets start with this - I suppose we need to figure out if
 1) the reachable blocks goes to cold section because partitioning decides
    so even if they have non-0 count.
 2) the reachable blocks goes to cold section because they have incorrectly
    updated count to 0 by someone
 3) profiling gets some blocks wrong.
> 
> The issue with such a sanity check may be due to the later fixup I
> have in this patch (fixup_partitions). It is invoked after certain
> optimizations on the cfg that may make hot blocks previously reached
> by both hot and cold edges only reachable by cold blocks. These blocks
> are remarked cold. If the profile data hasn't been updated correctly
> it is possible that they would still have a non-0 count, although they
> are essentially cold after the cfg transformation.

Well, or the other posibility is that the edges was updated wrong
and the blocks are really cold.  We need to figure out if that happens
commonly enough.

I will try to think of some artificial testcases.

> 
> But certainly such a sanity check should always succeed after the
> original partitioning.
> 
> > I can think of inlining where the count gets scaled all way down to 0.  Perhaps
> > count scaling code can be modified to never round towards 0 for block executing
> > non-0 times...
> 
> This reminds me of why this situation could happen. When I have been
> testing this on the google branch I found situations where COMDAT
> routines have 0 profile counts (if I remember correctly, this happens
> when profile-gen binary has call to out-of-line copy of COMDAT in
> module A, linker chooses the out-of-line copy from module B, therefore
> the profile data for COMDAT in module A is 0). When the COMDAT gets
> inlined, the 0 counts on its bbs are scaled to 0, even though the
> callsite is non-zero. I have a patch that I was planning to send as a
> follow-up that handles this case by propagating the callsite bb's
> count to the inlined code when it has 0 counts, scaling by the edge
> frequencies. I can either include that patch in this one, or send it
> for review separately right now. Do you want to give it a try with
> this one to see if it addresses the issue?

This scenario should not happen with LTO setup: the LTO symbol tables contains
code before early optimization and should be identical with profiling or
without (modulo the new references and call from profile code).

But this patch seems useful as a backup solution for non-LTO, so yes, please
send it separately and I can try to double check that it really do not happen
with LTO.
(acutally LTO symtab may just chose COMDAT from module that has counts with it.
It has all the info for it.  I was thinkin about it few weeks back.  It is
bit hard to do - you need to verify that all references from the function are
the same or linking might fail if you overwrite linker's decisiosns).
> 
> Also, can you send me reproduction instructions for gimp? I don't
> think I need Martin's patch, but which version of gimp and what is the
> equivalent way for me to train it? I have some scripts to generate a
> similar type of instruction heat map graph that I have been using to
> tune partitioning and function reordering. Essentially it uses linux
> perf to sample on instructions_retired and then munge the data in
> several ways to produce various stats and graphs. One thing that has
> been useful has been to combine the perf data with nm output to
> determine which cold functions are being executed at runtime.

Martin?

> 
> However, for this to tell me which split cold bbs are being executed I
> need to use a patch that Sri sent for review several months back that
> gives the split cold section its own name:
>   http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01571.html
> Steven had some follow up comments that Sri hasn't had a chance to address yet:
>   http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00798.html
> (cc'ing Sri as we should probably revive this patch soon to address
> gdb and other issues with detecting split functions properly)

Intresting, I used linker script for this purposes, but that his GNU ld only...

Honza
> 
> Thanks!
> Teresa
> 
> >
> > Honza
> >>
> >> Thanks,
> >> Teresa
> >>
> >> > I think we are really looking primarily for dead parts of the functions (sanity checks/error handling)
> >> > that should not be visited by train run.  We can then see how to make the heuristic more aggressive?
> >> >
> >> > Honza
> >>
> >>
> >>
> >> --
> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413