From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31759 invoked by alias); 7 Dec 2018 10:00:58 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 31657 invoked by uid 89); 7 Dec 2018 10:00:52 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-16.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPAM_BODY1,SPF_PASS autolearn=ham version=3.3.2 spammy=oct, Oct, defer, ZERO X-HELO: mail-it1-f171.google.com Received: from mail-it1-f171.google.com (HELO mail-it1-f171.google.com) (209.85.166.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 07 Dec 2018 10:00:39 +0000 Received: by mail-it1-f171.google.com with SMTP id o19so6012763itg.5 for ; Fri, 07 Dec 2018 02:00:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lk85Xr+FVzc6IjvatD6lsMeIe39yhTznJujie4lqiWk=; b=d89HM9B3LRy7awNNY9a4bdIqV5Vgo6XpOQd9vC7AkofqaUHSLh9/lrGWwV0QjEUChB Jvjy40L8xOubYUxzv7bbVfE76xPVxDLipgAcoEVWJMTMWsnnQUPVEmiyxkSK54qEAdpH 4qDhVij1YeqXwdrcc/9t3DAyvj3yUbOGiWOlH3hdZrjJQCjVtzKYs14aLmX+YlvBMtrP DcIXYqlzxfVCYzBxAD7ql+e2Lxm24e2O2K8ZXXct+PVMEpW50yof4wRdehM4/RqQuL0M SYXTKsXpHeVh3Dh/x/Ip3GkZze/Vr1uUUexQGJ9XfwTkJNsuh6cxkX36/NxvpicJtjZX Rvog== MIME-Version: 1.0 References: <7f153787-f390-4661-92aa-06d47cefbbf5.bin.cheng@linux.alibaba.com> <20181105141206.4ncu3s2v2jxv6o54@kam.mff.cuni.cz> <20181128162042.4vlsfxv643alnq57@kam.mff.cuni.cz> In-Reply-To: From: "Bin.Cheng" Date: Fri, 07 Dec 2018 10:00:00 -0000 Message-ID: Subject: Re: [PATCH AutoFDO/2]Treat ZERO as common profile probability/count To: Jan Hubicka Cc: bin.cheng@linux.alibaba.com, Richard Guenther , gcc-patches List Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2018-12/txt/msg00434.txt.bz2 On Tue, Dec 4, 2018 at 4:40 PM Bin.Cheng wrote: > > On Thu, Nov 29, 2018 at 12:20 AM Jan Hubicka wrote: > > > > > On Tue, Nov 20, 2018 at 6:55 PM bin.cheng wrote: > > > > > > > > Sender:Jan Hubicka > > > > Sent at:2018 Nov 5 (Mon) 22:21 > > > > To:Richard Biener > > > > Cc:bin.cheng ; GCC Patches > > > > Subject:Re: [PATCH AutoFDO/2]Treat ZERO as common profile probability/count > > > > > > > > > > > > > > > On Wed, Oct 31, 2018 at 7:30 AM bin.cheng wrote: > > > > > > > > > > > > > > Hi, > > > > > > > In new profile probability/count infra, we have different precision quality categories, > > > > > > > and probabilities/counts of different categories are not supposed to be compared or > > > > > > > calculated. Though in general is an improvement, it introduces unexpected behavior. > > > > > > > Specifically, class profile_probablity and profile_count themselves are implemented > > > > > > > by comparing probabilities/counts against profile_count::zero(). while zero() is of > > > > > > > profile_precision category, it's always compared different to zero of other precision > > > > > > > categories including afdo. > > > > > > > > > > > > > > I can see two ways fixing this: 1) Treat zero as a common probability/count regardless > > > > > > > of its category; 2) Provide an "is_zero" method rather than relying on "==" comparison > > > > > > > against probability_count::zero(). 2) requires lots of code changes so I went with 1) > > > > > > > in this patch set. This patch doesn't handle "always" but it might be. > > > > > > > > > > > > > > This patch also corrects a minor issue where we try to invert an uninitialized value. > > > > > > > > > > > > > > Bootstrap and test on x86_64 in patch set. Is it OK? > > > > > > > > > > > > I'll defer on the emit_store_flag_force change, likewise for the zero > > > > > > handling in > > > > > > compares - I don't think zeros of different qualities should compare equal. > > > > > > Would compares against ::always() not have the very same issue? > > > > > > Likewise ::even(), > > > > > > ::likely(), etc.? Those always get guessed quality. > > > > > > > > > > > > The invert change looks OK to me. The related change to the always() API would > > > > > > suggest to replace guessed_always() with always (guessed) and also do similar > > > > > > changes throughout the whole API... > > > > > > > > > > > > Honza? > > > > > > > > > > The zeros are really differenct zeros. profile_count::zero makes us to > > > > > drop the basic block into cold section because we know that it won't be > > > > > executed in normal run of program (either we have accurate profile > > > > > feedback or by proving that the program is on way to crash or user > > > > > annotated cold section). Having guessed zero or auto-fdo zero won't > > > > > make us to do such agressive size optimization. > > > > > This is important since those zeros relatively commonly happens by > > > > > accident and thus if we dropped all the code to cold section the cold > > > > > section would be visited relativel often during execution of program > > > > > which would eliminate its need. > > > > > > > > > > Most comparsion in profile-count.h which goes agains profile_count==zero > > > > > are realy intended to pass only for this "aboslute zero". They bypass > > > > > the precision adjusmtents which normally happen when you merge values > > > > > of different precision. > > > > > > > > > > What kind of unexpected behaviour are you seeing? > > > > > We already have nonzero_p which is what we use when we want to know that > > > > > count is non-zero in some sense of precision. > > > > Hi Honza, > > > > Sorry for letting this slip away. So in case of AutoFDO, due to the nature > > > > of sampling, lots of funcs/bbs are annotated with zero profile_count in afdo > > > > precision, and we have checks against zero profile_count in precise precision > > > > All these checks end up with false and cause issues. Take the code in > > > > update_profiling_info as an example: > > > > > > > > update_profiling_info (struct cgraph_node *orig_node, > > > > struct cgraph_node *new_node) > > > > { > > > > struct cgraph_edge *cs; > > > > struct caller_statistics stats; > > > > profile_count new_sum, orig_sum; > > > > profile_count remainder, orig_node_count = orig_node->count; > > > > > > > > if (!(orig_node_count.ipa () > profile_count::zero ())) > > > > return; > > > > //... > > > > for (cs = new_node->callees; cs; cs = cs->next_callee) > > > > cs->count = cs->count.apply_scale (new_sum, orig_node_count); > > > > > > > > Since we also have below code in profile_count::operator>, > > > > if (other == profile_count::zero ()) > > > > return !(*this == profile_count::zero ()); > > > > > > > > If orig_node_count is afdo zero, the above zero check for orig_node_count > > > > returns false, we end up with passing zero density to apply_scale issue and > > > > asserting. > > > > > > > > In this updated patch, I restrcited changes only to profile_count::operator > > > > <, >, <= and >=. Plus, I think there is a latent typo in operator>= because > > > > current code return TRUE if '*this' is precise zero and 'other' is precise > > > > non-zero. > > > > @@ -879,7 +879,7 @@ public: > > > > if (other == profile_count::zero ()) > > > > return true; > > > > if (*this == profile_count::zero ()) > > > > - return !(other == profile_count::zero ()); > > > > + return !other.nonzero_p (); > > > > We already have > > > > True: > > profile_count::zero < any other value > > any other value > profile_count::zero > > profile_count::zero <= any initialized value > > profile_count::zero <= profile_count::zero > > any initialized value >= profile_count::zero > > > > false > > profile_count::zero > any other value > > any other value < profile_count::zero > > > > You are right about typo in >=, it should be: > > > > Index: profile-count.h > > =================================================================== > > --- profile-count.h (revision 266450) > > +++ profile-count.h (working copy) > > @@ -879,7 +879,7 @@ > > if (other == profile_count::zero ()) > > return true; > > if (*this == profile_count::zero ()) > > - return !(other == profile_count::zero ()); > > + return other == profile_count::zero (); > > gcc_checking_assert (compatible_p (other)); > > return m_val >= other.m_val; > > } > > > > With your patch we get false for: > > profile_count::zero < guessed/auto_fdo/other 0 > > guessed/auto_fdo/other > profile_count::zero > > guessed/auto_fdo/other <= profile_count::zero > > profile_count::zero >= profile_count::zero > > > > The original idea was to intentionally make profile_count::zero smaller > > than any toher types of initialized values, since it is more strict hint > > that the path will not be taken. > > For example in bb_reorder if you end up with "funny" profile with two > > exit edges one having profile_count::zero and other being zero as result > > of (unsucesfull) profile updates it is still better idea to pick the > > profile_count::zero for taken edge. With your patch it will end up > > picking either of the paths. > > > > How the patch helps to your situation? > Hi Honza, thanks very much for elaborating. Issue in case of autofdo > is as described in last message: > Given update_profiling_info implemented as below: > > update_profiling_info (struct cgraph_node *orig_node, > struct cgraph_node *new_node) > { > struct cgraph_edge *cs; > struct caller_statistics stats; > profile_count new_sum, orig_sum; > profile_count remainder, orig_node_count = orig_node->count; > > //*****Operator ">" returns true if orig_node_count == autofdo.zero. > if (!(orig_node_count.ipa () > profile_count::zero ())) > return; > //... > for (cs = new_node->callees; cs; cs = cs->next_callee) > //*****Result in apply_scale called with autofdo.zero as the 2nd argument. > cs->count = cs->count.apply_scale (new_sum, orig_node_count); > > Also apply_scale is implemented as: > profile_count apply_scale (profile_count num, profile_count den) const > { > if (*this == profile_count::zero ()) > return *this; > if (num == profile_count::zero ()) > return num; > if (!initialized_p () || !num.initialized_p () || !den.initialized_p ()) > return profile_count::uninitialized (); > if (num == den) > return *this; > gcc_checking_assert (den.m_val); > > Here we have (num != zero && den == autofdo.zero), it triggers the > gcc_checking_assert. > According to your explanation, guess we need to call force_nonzero for > orig_node_count before calling apply_scale, right? Hi Honza, I have committed the typo fix as revision 266885. Also I followed your suggestion (IIUC) by calling profile_count::adjust_for_ipa_scaling for zero den in function update_profiling_info. It works and does make more sense than changing the global zero check logic. Patch tested as before, is it ok? Thanks, bin diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 4471bae11c7..5074ef63da1 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -3715,9 +3715,11 @@ update_profiling_info (struct cgraph_node *orig_node, new_sum = orig_node_count.combine_with_ipa_count (new_sum); orig_node->count = remainder; + profile_count::adjust_for_ipa_scaling (&new_sum, &orig_node_count); for (cs = new_node->callees; cs; cs = cs->next_callee) cs->count = cs->count.apply_scale (new_sum, orig_node_count); + profile_count::adjust_for_ipa_scaling (&remainder, &orig_node_count); for (cs = orig_node->callees; cs; cs = cs->next_callee) cs->count = cs->count.apply_scale (remainder, orig_node_count); 2018-12-07 Bin Cheng * ipa-cp.c (update_profiling_info): Call adjust_for_ipa_scaling for zero profile count.