From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25965 invoked by alias); 26 Aug 2016 16:27:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 25955 invoked by uid 89); 26 Aug 2016 16:27:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=001, 0.01, afterwards, Hx-languages-length:1590 X-HELO: gate.crashing.org Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 26 Aug 2016 16:27:13 +0000 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.13.8) with ESMTP id u7QGR9Pc030739; Fri, 26 Aug 2016 11:27:10 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id u7QGR90u030738; Fri, 26 Aug 2016 11:27:09 -0500 Date: Fri, 26 Aug 2016 16:27:00 -0000 From: Segher Boessenkool To: Bernd Schmidt Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH v2 0/9] Separate shrink-wrapping Message-ID: <20160826162709.GA30044@gate.crashing.org> References: <81710c02-05bf-fb65-dedc-8ba389c0d8e8@redhat.com> <20160826145001.GA21746@gate.crashing.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-IsSubscribed: yes X-SW-Source: 2016-08/txt/msg01881.txt.bz2 On Fri, Aug 26, 2016 at 05:03:34PM +0200, Bernd Schmidt wrote: > On 08/26/2016 04:50 PM, Segher Boessenkool wrote: > >The head comment starts with > > > >+/* Separate shrink-wrapping > >+ > >+ Instead of putting all of the prologue and epilogue in one spot, we > >+ can put parts of it in places where those components are executed less > >+ frequently. > > > >and that is the long and short of it. > > And that comment puzzles me. Surely prologue and epilogue are executed > only once currently, so how does frequency come into it? Again - please > provide an example. If some component is only needed for 0.01% of executions of a function, running it once for every execution is 10000 times too much. The trivial example is a function that does an early exit, but uses one or a few non-volatile registers before that exit. This happens in e.g. glibc's malloc, if you want an easily accessed example. With the current code, *all* components will be saved and then restored shortly afterwards. > >The full-prologue algorithm makes as many blocks run without prologue as > >possible, by duplicating blocks where that helps. If you do this for > >every component you can and up with 2**40 blocks for just 40 components, > > Ok, so why wouldn't we use the existing code with the duplication part > disabled? That would not perform nearly as well. > That's a later addition anyway and isn't necessary to do > shrink-wrapping in the first place. No, it always did that, just not as often (it only duplicated straight-line code before). Segher