From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 92230 invoked by alias); 8 Sep 2016 16:41:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 92218 invoked by uid 89); 8 Sep 2016 16:41:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=frequently, Surely, shortly, 0.01 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 08 Sep 2016 16:41:40 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AC9BE83F41; Thu, 8 Sep 2016 16:41:38 +0000 (UTC) Received: from localhost.localdomain (ovpn-116-111.phx2.redhat.com [10.3.116.111]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u88Gfc77029799; Thu, 8 Sep 2016 12:41:38 -0400 Subject: Re: [PATCH v2 0/9] Separate shrink-wrapping To: Segher Boessenkool , Bernd Schmidt References: <81710c02-05bf-fb65-dedc-8ba389c0d8e8@redhat.com> <20160826145001.GA21746@gate.crashing.org> <20160826162709.GA30044@gate.crashing.org> Cc: gcc-patches@gcc.gnu.org From: Jeff Law Message-ID: <2c1fee68-4753-779c-5d75-90e6c7f86776@redhat.com> Date: Thu, 08 Sep 2016 16:58:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160826162709.GA30044@gate.crashing.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-09/txt/msg00454.txt.bz2 On 08/26/2016 10:27 AM, Segher Boessenkool wrote: > On Fri, Aug 26, 2016 at 05:03:34PM +0200, Bernd Schmidt wrote: >> On 08/26/2016 04:50 PM, Segher Boessenkool wrote: >>> The head comment starts with >>> >>> +/* Separate shrink-wrapping >>> + >>> + Instead of putting all of the prologue and epilogue in one spot, we >>> + can put parts of it in places where those components are executed less >>> + frequently. >>> >>> and that is the long and short of it. >> >> And that comment puzzles me. Surely prologue and epilogue are executed >> only once currently, so how does frequency come into it? Again - please >> provide an example. > > If some component is only needed for 0.01% of executions of a function, > running it once for every execution is 10000 times too much. > > The trivial example is a function that does an early exit, but uses one > or a few non-volatile registers before that exit. This happens in e.g. > glibc's malloc, if you want an easily accessed example. With the current > code, *all* components will be saved and then restored shortly afterwards. So can you expand on the malloc example a bit -- I'm pretty sure I understand what you're trying to do, but a concrete example may help Bernd and be useful for archival purposes. I also know that Carlos is interested in the malloc example -- so I'd like to be able to pass that along to him. Given the multiple early exit and fast paths through the allocator, I'm not at all surprised that sinking different components of the prologue to different locations is useful. Also if there's a case where sinking into a loop occurs, definitely point that out. > >>> The full-prologue algorithm makes as many blocks run without prologue as >>> possible, by duplicating blocks where that helps. If you do this for >>> every component you can and up with 2**40 blocks for just 40 components, >> >> Ok, so why wouldn't we use the existing code with the duplication part >> disabled? > > That would not perform nearly as well. > >> That's a later addition anyway and isn't necessary to do >> shrink-wrapping in the first place. > > No, it always did that, just not as often (it only duplicated straight-line > code before). Presumably (I haven't looked yet), the duplication is so that we can isolate one or more paths which in turn allows sinking the prologue further on some of those paths. This is something I'll definitely want to look at -- block duplication to facilitate code elimination (or in this case avoid code insertion) hits several areas of interest to me -- and how we balance duplication vs runtime savings is always interesting. Jeff