From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2071 invoked by alias); 9 Aug 2013 09:58:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 2060 invoked by uid 89); 9 Aug 2013 09:58:54 -0000 X-Spam-SWARE-Status: No, score=-3.6 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_NO,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RDNS_NONE autolearn=no version=3.3.1 Received: from Unknown (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Fri, 09 Aug 2013 09:58:52 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id DB2235432EA; Fri, 9 Aug 2013 11:58:43 +0200 (CEST) Date: Fri, 09 Aug 2013 09:58:00 -0000 From: Jan Hubicka To: Teresa Johnson Cc: Jan Hubicka , Bernhard Reutner-Fischer , "gcc-patches@gcc.gnu.org" , Steven Bosscher , Jeff Law , marxin.liska@gmail.com Subject: Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition Message-ID: <20130809095843.GC31755@kam.mff.cuni.cz> References: <20130802150529.GC15776@kam.mff.cuni.cz> <20130808222332.GA31755@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-SW-Source: 2013-08/txt/msg00536.txt.bz2 > On Thu, Aug 8, 2013 at 3:23 PM, Jan Hubicka wrote: > > Hi, > > Martin Liska was kind enough to generate disk seeking graph of gimp statrup with his function reordering. > > His code simply measures time of firest execution of a function and orders functions in the given order. > > The functions stay in the subsections (unlikely/startup/exit/hot/normal) that are then glued together > > in this order. > > > > I am attaching disk seeking with and without -freorder-blocks-and-partition (with your patch). > > > > In 2.pdf you can see two increasing sequences in the text segment. If I am not mistaken the bottom > > one comes for hot and the top one for normal section. The big unused part on bottom is unlikely > > section since most of gimp is not trained. > > 2.pdf is reordered with Martin's technique? > > > > > Now 1.pdf is with -freorder-blocks-and-partition and your patch. You can see there is third sequence > > near bottom of the text seciton. that is beggining of unlikely section, so it tracks jumps where we > > fall into cold section of function. > > 1.pdf is generated using the usual FDO + > -freorder-blocks-and-partition (i.e. not using Martin's technique)? 2.pdf is Martin's reordering (that works ortoghonally to what we already have - it just orders the functions inside idividual subsections. This make the subsections more visible than without his patch). 1.pdf is Marting's rerodering + yours patch (I asked him to double check it is the latest verision) + -freorder-blocks-and-partition. He simply trains and measures the gimp startup, nothing else, so there should not be problem with representativity of the data. > > > > > It still seems rather bad (i.e. good part of unlikely section is actually used). I think the dominator > > based approach is not going to work too reliably (I can "fix" my testcase to contain multiple nested > > conditionals and then the heuristic about predecestors won't help). > > Yes, this doesn't look good. Did you use the latest version of my > patch that doesn't walk the dominators? > > Do you know how many training runs are done for this benchmark? I > think a lot of the issues that you pointed out with the hot loop > preceded by non-looping conditional code as in your earlier example, > or multiple nested conditionals, comes from the fact that the cold > cutoff is not 0, but some number less than the number of training > runs. Perhaps the cutoff for splitting should be 0. Then the main > issue that needs to be corrected is profile insanities, not code that > is executed once (since that would not be marked cold). Hmm, compute_function_frequency uses probably_never_executed_bb_p that requires the count of basic block to be less than number of train runs. In Martin's setup that will be 0. This is the same as what -frerorder-block-of-partition does? > > The only other issue that I can think of here is that the training > data was not representative and didn't execute these blocks. > > > > > What about simply walking the CFG from entry through all edges with non-0 counts and making all reachable > > blocks hot + forcingly make any hot blocks not reachable this way reachable? > > Is this different than what I currently have + changing the cold > cutoff to 0? In that case any blocks reachable through non-0 edges > should be non-0 and marked hot, and the current patch forces the most > frequent paths to all hot blocks to be hot. Do we sanity check that the cold partition does not contain any blocks of count 0? It may be that the profile is broken enough to make partitioning not work. I can think of inlining where the count gets scaled all way down to 0. Perhaps count scaling code can be modified to never round towards 0 for block executing non-0 times... Honza > > Thanks, > Teresa > > > I think we are really looking primarily for dead parts of the functions (sanity checks/error handling) > > that should not be visited by train run. We can then see how to make the heuristic more aggressive? > > > > Honza > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413