From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6195 invoked by alias); 7 Jul 2014 18:31:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 6178 invoked by uid 89); 7 Jul 2014 18:31:30 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.9 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-qg0-f50.google.com Received: from mail-qg0-f50.google.com (HELO mail-qg0-f50.google.com) (209.85.192.50) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 07 Jul 2014 18:31:25 +0000 Received: by mail-qg0-f50.google.com with SMTP id j5so4039785qga.23 for ; Mon, 07 Jul 2014 11:31:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=bnualN8tFWLFSYtTVJwZLKa0Fu7w2WlddhEMgFuBuhQ=; b=g7s3RygR75VowgIkCRqzbb4OQ9IuSIEspxYv3j7EiwH/6NNrxFbYyNMyaavGrg47UF N4DFumci0ZPXDmUJ1dtTGPt0z8B0oO2Q2mv2QgwFGdin8NRTXRH3ssH/vso71poyUZFt EtriA0MaFXiswzISpUTsvCfFzvJTaZuLqqGijJCWdVOqxZWxFf/o32w6ahU1/qXNDWNO DQKuB0nLeV06eBzftJXO1Ei2Y7AHot5e1MtFBWDStrxv+256FgrkJt+kLMm2SxjCirwR X5a1E925C2WG6IoUeHCzuH8bf6oHxy1tfjv825LLwwaNcKeHrrvJTO5U+APw4E+sKzMO Ud0A== X-Gm-Message-State: ALoCoQm+5+H4hp6i7rk2hyaW3kD8FxQCkQJc5X2RPDj1mAlchk5bApga9Q0N9jx/KA8BpAnPjm8b MIME-Version: 1.0 X-Received: by 10.140.81.209 with SMTP id f75mr42674207qgd.60.1404757883205; Mon, 07 Jul 2014 11:31:23 -0700 (PDT) Received: by 10.229.156.5 with HTTP; Mon, 7 Jul 2014 11:31:23 -0700 (PDT) In-Reply-To: <20140526054323.GD9540@kam.mff.cuni.cz> References: <20140526054323.GD9540@kam.mff.cuni.cz> Date: Mon, 07 Jul 2014 18:31:00 -0000 Message-ID: Subject: Re: [PATCH] Handle more COMDAT profiling issues From: Teresa Johnson To: Jan Hubicka Cc: "gcc-patches@gcc.gnu.org" , David Li Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2014-07/txt/msg00493.txt.bz2 Finally had some time to pick this up again. See responses and questions below. But it looks like you are commenting on a stale version of the patch. See the update I sent in March: https://gcc.gnu.org/ml/gcc-patches/2014-03/msg01176.html On Sun, May 25, 2014 at 10:43 PM, Jan Hubicka wrote: >> This patch attempts to address the lost profile issue for COMDATs in >> more circumstances, exposed by function splitting. >> >> My earlier patch handled the case where the comdat had 0 counts since >> the linker kept the copy in a different module. In that case we >> prevent the guessed frequencies on 0-count functions from being >> dropped by counts_to_freq, and then later mark any reached via >> non-zero callgraph edges as guessed. Finally, when one such 0-count >> comdat is inlined the call count is propagated to the callee blocks >> using the guessed probabilities. >> >> However, in this case, there was a comdat that had a very small >> non-zero count, that was being inlined to a much hotter callsite. This >> could happen when there was a copy that was ipa-inlined >> in the profile gen compile, so the copy in that module gets some >> non-zero counts from the ipa inlined instance, but when the out of >> line copy was eliminated by the linker (selected from a different >> module). In this case the inliner was scaling the bb counts up quite a >> lot when inlining. The problem is that you most likely can't trust >> that the 0 count bbs in such a case are really not executed by the >> callsite it is being into, since the counts are very small and >> correspond to a different callsite. In some internal C++ applications >> I am seeing that we execute out of the split cold portion of COMDATs >> for this reason. > > With all the problems we have with COMDATs, it still seems to me that perhaps > best approach would be to merge them in runtime, as we discussed some time ago > and incrementally handle the problem how to get callstack sensitive profiles. I recently committed some functionality to a google branch to do some COMDAT profile merging at runtime, but that was to handle all-zero count copies and operates on top of the LIPO runtime infrastructure. We don't do any fixup in non-zero count COMDATs to avoid losing context-sensitive profiles. >> >> This problem is more complicated to address than the 0-count instance, >> because we need the cgraph to determine which functions to drop the >> profile on, and at that point the estimated frequencies have already >> been overwritten by counts_to_freqs. To handle this broader case, I >> have changed the drop_profile routine to simply re-estimate the >> probabilities/frequencies (and translate these into counts scaled from >> the incoming call edge counts). This unfortunately necessitates >> rebuilding the cgraph, to propagate the new synthesized counts and >> avoid checking failures downstream. But it will only be rebuilt if we >> dropped any profiles. With this solution, some of the older approach >> can be removed (e.g. propagating counts using the guessed >> probabilities during inlining). >> >> Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu. >> Also tested on >> a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens? >> >> Thanks, >> Teresa > > 2014-02-12 Teresa Johnson > > * graphite.c (graphite_finalize): Pass new parameter. > * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. > * predict.c (tree_estimate_probability): New parameter. > (tree_estimate_probability_worker): Renamed from > tree_estimate_probability_driver. > (tree_reestimate_probability): New function. > (tree_estimate_probability_driver): Invoke > tree_estimate_probability_worker. > (freqs_to_counts): Move here from tree-inline.c. > (drop_profile): Re-estimated profiles when dropping counts. > (handle_missing_profiles): Drop for some non-zero functions as well, > get counts from bbs to support invocation before cgraph rebuild. > (counts_to_freqs): Remove code obviated by reestimation. > * predict.h (tree_estimate_probability): Update declaration. > * tree-inline.c (freqs_to_counts): Move to predict.c. > (copy_cfg_body): Remove code obviated by reestimation. > * tree-profile.c (tree_profiling): Invoke handle_missing_profiles > before cgraph rebuild. > > Index: params.def > =================================================================== > --- params.def (revision 207436) > +++ params.def (working copy) > @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME, > "Maximal estimated outcome of branch considered predictable", > 2, 0, 50) > > +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO, > + "min-caller-reestimate-ratio", > + "Minimum caller-to-callee node count ratio to force reestimated branch " > + "probabilities in callee (where 0 means only when callee count is 0)", > + 10, 0, 0) > > OK, so if callee count is 10% of call site count, we do the recomputation? Right. > > @@ -2390,7 +2392,8 @@ void > connect_infinite_loops_to_exit (); > /* We use loop_niter_by_eval, which requires that the loops have > preheaders. */ > - create_preheaders (CP_SIMPLE_PREHEADERS); > + if (!redo) > + create_preheaders (CP_SIMPLE_PREHEADERS); > > So you disable preheaders construction because this all happens at inlining > time when we do not want to change BBs because callgraph is built, right? > Are you sure we do have preheaders built at IPA time for all functions reliably? > (It would make sense to do so, but I think loop analysis is done rather randomly > by IPA analysis passes, such as inliner). > Otherwise the SCEV infrastructure will explode, if I remember correctly. I can't remember why I guarded this, but it appears it was just precautionary. I just changed the implementation to assert before invoking create_preheaders if redo is true and the loops don't already have preheaders, made the call to create_preheaders unconditional, and added an assert afterwards if there was any change when redo is true: gcc_assert (!redo || loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)); bool changed = create_preheaders (CP_SIMPLE_PREHEADERS); gcc_assert (!(redo && changed)); (changed create_preheaders to return a bool indicating whether any preheaders were created). Neither assert fired for either gcc regression tests, or more extensive tests I ran on internal C++ benchmarks. Should I change the patch to make the call unconditional and add these asserts? > > +/* Force re-estimation of the probabilities, because the profile was > + deemed insufficient. */ > + > +static void > +tree_reestimate_probability (void) > +{ > + basic_block bb; > + edge e; > + edge_iterator ei; > + > + /* Need to reset the counts to ensure probabilities are recomputed. */ > + FOR_EACH_BB_FN (bb, cfun) > + { > + bb->count = 0; > + FOR_EACH_EDGE (e, ei, bb->succs) > + e->count = 0; > + } > + tree_estimate_probability_worker (true); > > You also want to remove recorded loop count estimates, since they are also > driven by profile we chose to not trust. Can you point me to the recorded loop count estimates you are referring to? I poked around and didn't find them. The places I looked used the edge counts to estimate, and seemed downstream of this. > > @@ -2842,48 +2927,77 @@ handle_missing_profiles (void) > gcov_type max_tp_first_run = 0; > struct function *fn = DECL_STRUCT_FUNCTION (node->decl); > > - if (node->count) > - continue; > for (e = node->callers; e; e = e->next_caller) > { > - call_count += e->count; > + call_count += cgraph_function_with_gimple_body_p (e->caller) > + ? gimple_bb (e->call_stmt)->count : 0; > > I am quite confused by this change - we you don't want to trust edge counts? > Other parts of the patch seems quite resonable, but I don't really follow here. It is because we are invoking this before rebuilding the cgraph. Here is the comment to this effect that I added just above handle_missing_profiles: + This is now invoked before rebuilding the cgraph after reading profile + counts, so the cgraph edge and node counts are still 0. Therefore we + need to look at the counts on the entry bbs and the call stmt bbs. + That way we can avoid needing to rebuild the cgraph again to reflect + the nodes with dropped profiles. */ Thanks, Teresa > > Honza -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413