From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7140 invoked by alias); 26 Sep 2013 05:41:45 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 7124 invoked by uid 89); 26 Sep 2013 05:41:42 -0000 Received: from mail-qa0-f48.google.com (HELO mail-qa0-f48.google.com) (209.85.216.48) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 26 Sep 2013 05:41:42 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.7 required=5.0 tests=AWL,BAYES_50,NO_RELAYS,SPAM_SUBJECT autolearn=no version=3.3.2 X-HELO: mail-qa0-f48.google.com Received: by mail-qa0-f48.google.com with SMTP id hu16so543131qab.7 for ; Wed, 25 Sep 2013 22:41:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=vfJChukkWKiD00VTn5BoC041i0+X+3QTnyjTnTu0WbI=; b=JAUL6OHD2e2sw6OrkFwUaWSI8EBCUoZU0m2pHUhvIFxbeZSNM8OPwhTmb7dWDzE8cW 6St0YAXZ64+ILgFfCC98u8PQlBEnK7MNrFwsjpqgxFzwo8ejhDJ+1Mclzf9nrARUQvJr dmuWFMnsr7WXpTGLVhe2gKYJmmTzq/oAJyV7tZWq6WD7T35DcpASpjp/BD4/aYKXOzrg 2ISW9+BSw5XbqQO12B7bUGrWsTf7Kj6Z3vRU5XJgRKNNVX2tbQpm9H6oQkeqDT77PHyB AZqzWNp/Qc9EZeNcL3Es2dEnVisDw40R5n7f+XLGjkAYiClS4Xae+RlVHWWRkXo8WZpu dqhA== X-Gm-Message-State: ALoCoQnWdoIvEytVUVuqAoGj89ovGXZ28LiaPyMywi450My5hF4LXqk6uelWeIuzTrDB2PqN2QMB/30jVdZYAP/lpQTinY/bhkRLxm/AUwUZSUPJwZ4U/sBsRZDs1WB4QXyTG29IYnxyJ36Oa+t9bNyzogl1wXmNvLLtQwDNVWeFgKQPyEVWQpsSwkujUoeV8WbMOb+/6Ll11MC0ZEERsWjCrNh+OQ8XJQ== MIME-Version: 1.0 X-Received: by 10.49.30.227 with SMTP id v3mr137304qeh.92.1380174098620; Wed, 25 Sep 2013 22:41:38 -0700 (PDT) Received: by 10.49.24.225 with HTTP; Wed, 25 Sep 2013 22:41:38 -0700 (PDT) In-Reply-To: References: <20130817204408.GA16557@kam.mff.cuni.cz> <20130819150942.GA28264@kam.mff.cuni.cz> <20130831160420.GC7492@kam.mff.cuni.cz> <20130831214614.GA12372@kam.mff.cuni.cz> <20130924175727.GA24697@kam.mff.cuni.cz> Date: Thu, 26 Sep 2013 08:44:00 -0000 Message-ID: Subject: Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition From: Teresa Johnson To: Jan Hubicka Cc: "gcc-patches@gcc.gnu.org" , "marxin.liska" Content-Type: text/plain; charset=ISO-8859-1 X-IsSubscribed: yes X-SW-Source: 2013-09/txt/msg01915.txt.bz2 On Wed, Sep 25, 2013 at 2:33 PM, Teresa Johnson wrote: > On Tue, Sep 24, 2013 at 11:25 AM, Teresa Johnson wrote: >> On Tue, Sep 24, 2013 at 10:57 AM, Jan Hubicka wrote: >>>> >>>> I looked at one that failed after 100 as well (20031204-1.c). In this >>>> case, it was due to expansion which was creating multiple branches/bbs >>>> from a logical OR and guessing incorrectly on how to assign the >>>> counts: >>>> >>>> if (octets == 4 && (*cp == ':' || *cp == '\0')) { >>>> >>>> The (*cp == ':' || *cp == '\0') part looked like the following going >>>> into RTL expansion: >>>> >>>> [20031204-1.c : 31:33] _29 = _28 == 58; >>>> [20031204-1.c : 31:33] _30 = _28 == 0; >>>> [20031204-1.c : 31:33] _31 = _29 | _30; >>>> [20031204-1.c : 31:18] if (_31 != 0) >>>> goto ; >>>> else >>>> goto ; >>>> >>>> where the result of the OR was always true, so bb 16 had a count of >>>> 100 and bb 19 a count of 0. When it was expanded, the expanded version >>>> of the above turned into 2 bbs with a branch in between. Both >>>> comparisons were done in the first bb, but the first bb checked >>>> whether the result of the *cp == '\0' compare was true, and if not >>>> branched to the check for whether the *cp == ':' compare was true. It >>>> gave the branch to the second check against ':' a count of 0, so that >>>> bb got a count of 0 and was split out, and put the count of 100 on the >>>> fall through assuming the compare with '\0' always evaluated to true. >>>> In reality, this OR condition was always true because *cp was ':', not >>>> '\0'. Therefore, the count of 0 on the second block with the check for >>>> ':' was incorrect, we ended up trying to execute it, and failed. >>> >>> I see, we produce: >>> ;; if (_26 != 0) >>> >>> (insn 94 93 95 (set (reg:CCZ 17 flags) >>> (compare:CCZ (reg:QI 107 [ D.2184 ]) >>> (const_int 0 [0]))) a.c:31 -1 >>> (nil)) >>> >>> (insn 95 94 96 (set (reg:QI 122 [ D.2186 ]) >>> (eq:QI (reg:CCZ 17 flags) >>> (const_int 0 [0]))) a.c:31 -1 >>> (nil)) >>> >>> (insn 96 95 97 (set (reg:CCZ 17 flags) >>> (compare:CCZ (reg:QI 122 [ D.2186 ]) >>> (const_int 0 [0]))) a.c:31 -1 >>> (nil)) >>> >>> (jump_insn 97 96 98 (set (pc) >>> (if_then_else (ne (reg:CCZ 17 flags) >>> (const_int 0 [0])) >>> (label_ref 100) >>> (pc))) a.c:31 -1 >>> (expr_list:REG_BR_PROB (const_int 6100 [0x17d4]) >>> (nil))) >>> >>> (insn 98 97 99 (set (reg:CCZ 17 flags) >>> (compare:CCZ (reg:QI 108 [ D.2186 ]) >>> (const_int 0 [0]))) a.c:31 -1 >>> (nil)) >>> >>> (jump_insn 99 98 100 (set (pc) >>> (if_then_else (eq (reg:CCZ 17 flags) >>> (const_int 0 [0])) >>> (label_ref 0) >>> (pc))) a.c:31 -1 >>> (expr_list:REG_BR_PROB (const_int 3900 [0xf3c]) >>> (nil))) >>> >>> (code_label 100 99 0 14 "" [0 uses]) >>> >>> That is because we TER together "_26 = _25 | _24" and "if (_26 != 0)" >>> >>> First I think the logic of do_jump should really be moved to trees. It is not >>> doing things that can not be adequately represented by gimple. >>> >>> I am not that certain we want to move it before profiling though. >>>> >>>> Presumably we had the correct profile data for both blocks, but the >>>> accuracy was reduced when the OR was represented as a logical >>>> computation with a single branch. We could change the expansion code >>>> to do something different, e.g. treat as a 50-50 branch. But we would >>>> still end up with integer truncation issues when there was a single >>>> training run. But that could be dealt with conservatively in the >>> >>> Yep, but it is still better than what we have now - if the test above was >>> in hot part of program (i.e. not executed once), we will end up optimizing >>> the second conditional for size. >>> >>> So I think it is do_jump bug to not distribute probabilities across the two >>> conditoinals introduced. >>>> bbpart code as I suggested for the jump threading issue above. I.e. a >>>> cold block with incoming non-cold edges conservatively not marked cold >>>> for splitting. >>> >>> Yep, we can probably do that, but we ought to fix the individual cases >>> above at least for resonable number of runs. >> >> I made this change and it removed a few of the failures. >> >> I looked at another case that still failed with 1 train run but passed >> with 100. It turned out to be another truncation issue exposed by RTL >> expansion, where we created some control flow for a memset builtin >> which was in a block with an execution count of 1. Some of the blocks >> got frequencies less than half the original block, so the count was >> rounded down or truncated to 0. I noticed that in this case (as well >> as the jump threading case I fixed by looking for non-zero incoming >> edges in partitioning) that the bb frequency was non-zero. >> >> Why not just have probably_never_executed_bb_p return simply return >> false bb->frequency is non-zero (right now it does the opposite - >> returns true when bb->frequency is 0)? Making this change removed a >> bunch of other failures. With this change as well, there are only 3 >> cases that still fail with 1 train run that pass with 100. Need to >> look at those. > > FYI, These turned out to be more jump threading issues. I am currently > working on getting jump threading profile updates to work properly, I > think I'm pretty close. Haven't had a chance to look at do_jump yet. Correction, it was not the 3 tests I mentioned above, but a different set of tests being affected by this jump threading issue. I have a patch I am regression testing. But there are other profile insanities being caused upstream of jump threading that I haven't tracked down. So I will also test and send the patch to handle some of these in the splitting/cold cold detection code. It also contains the change to probably_never_executed_bb_p that I mention above to return false when bb->frequency is non-zero. Teresa > > Teresa > >> >>> >>> Will you look into logic of do_jump or shall I try to dive in? >> >> I can take a look, but probably won't have a chance until late this >> week. If you don't get to it before then I will see if I can figure >> out why it is applying the branch probabilities this way. >> >> Teresa >> >>> >>> Honza >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413