From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 55616 invoked by alias); 19 Jul 2016 14:46:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 55555 invoked by uid 89); 19 Jul 2016 14:46:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=decisions, badly, messed, our X-HELO: gate.crashing.org Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Tue, 19 Jul 2016 14:46:15 +0000 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.13.8) with ESMTP id u6JEk8mI028907; Tue, 19 Jul 2016 09:46:09 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id u6JEk3xN028904; Tue, 19 Jul 2016 09:46:04 -0500 Date: Tue, 19 Jul 2016 14:46:00 -0000 From: Segher Boessenkool To: Bernd Schmidt Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com Subject: Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns Message-ID: <20160719144602.GA26941@gate.crashing.org> References: <019d5b4c3f6b8119e1511e33a16a8ea96078b094.1465347472.git.segher@kernel.crashing.org> <20160718163411.GA5741@gate.crashing.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-IsSubscribed: yes X-SW-Source: 2016-07/txt/msg01148.txt.bz2 On Mon, Jul 18, 2016 at 07:03:04PM +0200, Bernd Schmidt wrote: > >>>+ /* The frequency of executing the prologue for this BB and all BBs > >>>+ dominated by it. */ > >>>+ gcov_type cost; > >> > >>Is this frequency consideration the only thing that attempts to prevent > >>placing prologue insns into loops? > > > >Yes. The algorithm makes sure the prologues are executed as infrequently > >as possible. If a block that would get a prologue has the same frequency > >as a predecessor does, and that predecessor always has that first block as > >eventual successor, the prologue is moved to the earlier block (this > >handles the case where both have a frequency of zero, and other cases > >where the range of freq is too limited). > > Ugh, that is really scaring me. I'd much prefer a classification of > valid blocks based on cfg structure alone - I'll need serious convincing > that the frequency data is reliable enough for what you are trying to do. But you need the profile to make even reasonably good decisions. The standard example: 1 / \ 2 3 \ / 4 / \ 5 6 \ / 7 where 3 and 6 need some prologue, the rest do not. If freq(3) + freq(6) > freq(1), it is better to put the prologue at 1; if not, it is better to place it at 3 and 6. If you do not use the profile, you cannot do better than the status quo, i.e. always place it at 1. In the general case, you have the choice between putting the prologue at some basic block X, or at certain blocks dominated by X. This algorithm chooses the case that has the prologue executed the least often in total, and that is really all there is to it. Yes, our profile data sometimes is, uh, less than optimal. But: - All our other passes use it, too; - What matters most here is comparing the execution frequency locally, and that is not usually messed up so badly; - All our other passes use it, too; - The important cases (loops, exceptional cases) normally have a pretty reasonable profile; - All our other passes use it, too; - Benchmarking shows big wins with this patch. Segher