From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-350219-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 28578 invoked by alias); 1 Oct 2013 17:36:20 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 28568 invoked by uid 89); 1 Oct 2013 17:36:20 -0000
Received: from mail-qe0-f52.google.com (HELO mail-qe0-f52.google.com) (209.85.128.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 01 Oct 2013 17:36:20 +0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.3.2
X-HELO: mail-qe0-f52.google.com
Received: by mail-qe0-f52.google.com with SMTP id i11so5267299qej.11        for <gcc-patches@gcc.gnu.org>; Tue, 01 Oct 2013 10:36:17 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:mime-version:in-reply-to:references:date         :message-id:subject:from:to:cc:content-type;        bh=Tf7Vimji7ntMYBwLc6o5Dqk8niaNkXxCqJ1CuaFvLeo=;        b=hE69kRkH95xh8zepxeTDLsEw9IyRveZ24dJyFZdRT8h273WOf+W8LAIEMosO4II0on         tsr1TbnFd3iD+wdxD7Y3JMHjmjMwuRm1B7n+3oVu92GGSRw/DnmlkC6uYrcmsMyeKlRy         BwxKjZ0JlGFe9CiI24S4bWlB4jXFX+2GteP232VR7XShl537G9dgNxXk3TQwnHjF2j38         MJvSOONGaLUcNlqKiieNOxFLZr0IdygTqqJ4TLQR9A+dFbOgLVLX5TUymq9DbxO3lTMo         PrU56TTCVgaY8mpkg6SjkQRnYx/6nrd18q3B+Over9c/TcKpw6tDrRyhzrvnEXKKfBdD         ZLaw==
X-Gm-Message-State: ALoCoQkf8BluPaNhseXqsS6XT/ZNRFeZ9UkJsBh0x9KYskfjMrQAupbR5ODC7IIuGHNH4WW82InyZE9cm+yxp5vtpbJYd7DBi7gapvpRg5dUoRlHtOEN+63DbY1wvNP788x/DV6X+qqtU55Tf688y5K3Hnd5xS4nBAklo8LC0b9snc4foa5fyCGqH20csoYnAbSltof+VcCfH7mDhe7JlSGtlDJaQ26GIQ==
MIME-Version: 1.0
X-Received: by 10.229.201.67 with SMTP id ez3mr37078494qcb.1.1380648977077; Tue, 01 Oct 2013 10:36:17 -0700 (PDT)
Received: by 10.49.24.225 with HTTP; Tue, 1 Oct 2013 10:36:16 -0700 (PDT)
In-Reply-To: <CAAe5K+VEWv8Tb-zbUGQHimvXd7SYh+690RC78tgAnmppgwpG6w@mail.gmail.com>
References: <CAAe5K+UcMevsDcXOeq5tu0+u-FrLVVWR6Wd20ZhBHNWdNsQ4Zw@mail.gmail.com>	<CAAe5K+W+VEERrsaCCjCD=n40_MO2EPashDp5qnKoS8SLaSjBjQ@mail.gmail.com>	<20130817204408.GA16557@kam.mff.cuni.cz>	<CAAe5K+XGKvd+_8Ukp0kpOWvc2k165F=4fdemf-iDz+QkirLPmg@mail.gmail.com>	<20130819150942.GA28264@kam.mff.cuni.cz>	<CAAe5K+UnqBfxYrZxSkjRudq8NYni_9ih+t=us+2hH+UPsrwDLA@mail.gmail.com>	<CAAe5K+UhCzfwm_WomE1yk+ET1tBiNT5svfn_LAc57MUSqJbnaQ@mail.gmail.com>	<20130831160420.GC7492@kam.mff.cuni.cz>	<20130831214614.GA12372@kam.mff.cuni.cz>	<CAAe5K+U8kAu8epSpUEMyY4bV_rTxxw78reSvu431w1fOxKkw9A@mail.gmail.com>	<20130924175727.GA24697@kam.mff.cuni.cz>	<CAAe5K+VEWv8Tb-zbUGQHimvXd7SYh+690RC78tgAnmppgwpG6w@mail.gmail.com>
Date: Tue, 01 Oct 2013 17:36:00 -0000
Message-ID: <CAAe5K+UQaz+UAwnfKX4KZH5v6JYOKKavKXhRCgqDH0i2o1h6Wg@mail.gmail.com>
Subject: Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
From: Teresa Johnson <tejohnson@google.com>
To: Jan Hubicka <hubicka@ucw.cz>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, "marxin.liska" <marxin.liska@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-IsSubscribed: yes
X-SW-Source: 2013-10/txt/msg00059.txt.bz2

On Tue, Sep 24, 2013 at 11:25 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Tue, Sep 24, 2013 at 10:57 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>
>>> I looked at one that failed after 100 as well (20031204-1.c). In this
>>> case, it was due to expansion which was creating multiple branches/bbs
>>> from a logical OR and guessing incorrectly on how to assign the
>>> counts:
>>>
>>>  if (octets == 4 && (*cp == ':' || *cp == '\0')) {
>>>
>>> The (*cp == ':' || *cp == '\0') part looked like the following going
>>> into RTL expansion:
>>>
>>>   [20031204-1.c : 31:33] _29 = _28 == 58;
>>>   [20031204-1.c : 31:33] _30 = _28 == 0;
>>>   [20031204-1.c : 31:33] _31 = _29 | _30;
>>>   [20031204-1.c : 31:18] if (_31 != 0)
>>>     goto <bb 16>;
>>>   else
>>>     goto <bb 19>;
>>>
>>> where the result of the OR was always true, so bb 16 had a count of
>>> 100 and bb 19 a count of 0. When it was expanded, the expanded version
>>> of the above turned into 2 bbs with a branch in between. Both
>>> comparisons were done in the first bb, but the first bb checked
>>> whether the result of the *cp == '\0' compare was true, and if not
>>> branched to the check for whether the *cp == ':' compare was true. It
>>> gave the branch to the second check against ':' a count of 0, so that
>>> bb got a count of 0 and was split out, and put the count of 100 on the
>>> fall through assuming the compare with '\0' always evaluated to true.
>>> In reality, this OR condition was always true because *cp was ':', not
>>> '\0'. Therefore, the count of 0 on the second block with the check for
>>> ':' was incorrect, we ended up trying to execute it, and failed.
>>
>> I see, we produce:
>> ;; if (_26 != 0)
>>
>> (insn 94 93 95 (set (reg:CCZ 17 flags)
>>         (compare:CCZ (reg:QI 107 [ D.2184 ])
>>             (const_int 0 [0]))) a.c:31 -1
>>      (nil))
>>
>> (insn 95 94 96 (set (reg:QI 122 [ D.2186 ])
>>         (eq:QI (reg:CCZ 17 flags)
>>             (const_int 0 [0]))) a.c:31 -1
>>      (nil))
>>
>> (insn 96 95 97 (set (reg:CCZ 17 flags)
>>         (compare:CCZ (reg:QI 122 [ D.2186 ])
>>             (const_int 0 [0]))) a.c:31 -1
>>      (nil))
>>
>> (jump_insn 97 96 98 (set (pc)
>>         (if_then_else (ne (reg:CCZ 17 flags)
>>                 (const_int 0 [0]))
>>             (label_ref 100)
>>             (pc))) a.c:31 -1
>>      (expr_list:REG_BR_PROB (const_int 6100 [0x17d4])
>>         (nil)))
>>
>> (insn 98 97 99 (set (reg:CCZ 17 flags)
>>         (compare:CCZ (reg:QI 108 [ D.2186 ])
>>             (const_int 0 [0]))) a.c:31 -1
>>      (nil))
>>
>> (jump_insn 99 98 100 (set (pc)
>>         (if_then_else (eq (reg:CCZ 17 flags)
>>                 (const_int 0 [0]))
>>             (label_ref 0)
>>             (pc))) a.c:31 -1
>>      (expr_list:REG_BR_PROB (const_int 3900 [0xf3c])
>>         (nil)))
>>
>> (code_label 100 99 0 14 "" [0 uses])
>>
>> That is because we TER together "_26 = _25 | _24" and "if (_26 != 0)"
>>
>> First I think the logic of do_jump should really be moved to trees.  It is not
>> doing things that can not be adequately represented by gimple.
>>
>> I am not that certain we want to move it before profiling though.
>>>
>>> Presumably we had the correct profile data for both blocks, but the
>>> accuracy was reduced when the OR was represented as a logical
>>> computation with a single branch. We could change the expansion code
>>> to do something different, e.g. treat as a 50-50 branch. But we would
>>> still end up with integer truncation issues when there was a single
>>> training run. But that could be dealt with conservatively in the
>>
>> Yep, but it is still better than what we have now - if the test above was
>> in hot part of program (i.e. not executed once), we will end up optimizing
>> the second conditional for size.
>>
>> So I think it is do_jump bug to not distribute probabilities across the two
>> conditoinals introduced.
>>> bbpart code as I suggested for the jump threading issue above. I.e. a
>>> cold block with incoming non-cold edges conservatively not marked cold
>>> for splitting.
>>
>> Yep, we can probably do that, but we ought to fix the individual cases
>> above at least for resonable number of runs.
>
> I made this change and it removed a few of the failures.
>
> I looked at another case that still failed with 1 train run but passed
> with 100. It turned out to be another truncation issue exposed by RTL
> expansion, where we created some control flow for a memset builtin
> which was in a block with an execution count of 1. Some of the blocks
> got frequencies less than half the original block, so the count was
> rounded down or truncated to 0. I noticed that in this case (as well
> as the jump threading case I fixed by looking for non-zero incoming
> edges in partitioning) that the bb frequency was non-zero.
>
> Why not just have probably_never_executed_bb_p return simply return
> false bb->frequency is non-zero (right now it does the opposite -
> returns true when bb->frequency is 0)? Making this change removed a
> bunch of other failures. With this change as well, there are only 3
> cases that still fail with 1 train run that pass with 100. Need to
> look at those.
>
>>
>> Will you look into logic of do_jump or shall I try to dive in?
>
> I can take a look, but probably won't have a chance until late this
> week. If you don't get to it before then I will see if I can figure
> out why it is applying the branch probabilities this way.

Turned out not to be too tricky to fix the do_jump issue affecting
20031204-1.c. The patch below fixes the issue for this test case
(along with the patch I posted a couple days ago that handles profile
insanities conservatively in the case where there is 1 training run
and we truncate counts):

Index: dojump.c
===================================================================
--- dojump.c (revision 202947)
+++ dojump.c (working copy)
@@ -325,15 +325,20 @@
       break;

     case TRUTH_ORIF_EXPR:
+      /* Spread the probability evenly between the two conditions. So
+         the first condition has half the total probability of being true,
+         and therefore has half the probability of being false
+         (i.e. falls through to the second condition). If we reach the
+         second condition, it will be true with the original probability.  */
       if (if_true_label == NULL_RTX)
  {
           drop_through_label = gen_label_rtx ();
-  do_jump (op0, NULL_RTX, drop_through_label, prob);
+  do_jump (op0, NULL_RTX, drop_through_label, prob / 2);
   do_jump (op1, if_false_label, NULL_RTX, prob);
  }
       else
  {
-  do_jump (op0, NULL_RTX, if_true_label, prob);
+  do_jump (op0, NULL_RTX, if_true_label, prob / 2);
   do_jump (op1, if_false_label, if_true_label, prob);
  }
       break;

I am regression testing this now and will post the patch for review in
a separate thread.

Honza, any comments on the patch I posted a couple days ago that
treats the profile insanities conservatively?

Thanks,
Teresa

>
> Teresa
>
>>
>> Honza
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413


-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413