From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-435491-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 92230 invoked by alias); 8 Sep 2016 16:41:42 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 92218 invoked by uid 89); 8 Sep 2016 16:41:41 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=frequently, Surely, shortly, 0.01
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 08 Sep 2016 16:41:40 +0000
Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24])	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id AC9BE83F41;	Thu,  8 Sep 2016 16:41:38 +0000 (UTC)
Received: from localhost.localdomain (ovpn-116-111.phx2.redhat.com [10.3.116.111])	by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u88Gfc77029799;	Thu, 8 Sep 2016 12:41:38 -0400
Subject: Re: [PATCH v2 0/9] Separate shrink-wrapping
To: Segher Boessenkool <segher@kernel.crashing.org>,        Bernd Schmidt <bschmidt@redhat.com>
References: <cover.1470015604.git.segher@kernel.crashing.org> <81710c02-05bf-fb65-dedc-8ba389c0d8e8@redhat.com> <20160826145001.GA21746@gate.crashing.org> <cd56e044-1061-ea55-8e2a-2932c76a64aa@redhat.com> <20160826162709.GA30044@gate.crashing.org>
Cc: gcc-patches@gcc.gnu.org
From: Jeff Law <law@redhat.com>
Message-ID: <2c1fee68-4753-779c-5d75-90e6c7f86776@redhat.com>
Date: Thu, 08 Sep 2016 16:58:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <20160826162709.GA30044@gate.crashing.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2016-09/txt/msg00454.txt.bz2

On 08/26/2016 10:27 AM, Segher Boessenkool wrote:
> On Fri, Aug 26, 2016 at 05:03:34PM +0200, Bernd Schmidt wrote:
>> On 08/26/2016 04:50 PM, Segher Boessenkool wrote:
>>> The head comment starts with
>>>
>>> +/* Separate shrink-wrapping
>>> +
>>> +   Instead of putting all of the prologue and epilogue in one spot, we
>>> +   can put parts of it in places where those components are executed less
>>> +   frequently.
>>>
>>> and that is the long and short of it.
>>
>> And that comment puzzles me. Surely prologue and epilogue are executed
>> only once currently, so how does frequency come into it? Again - please
>> provide an example.
>
> If some component is only needed for 0.01% of executions of a function,
> running it once for every execution is 10000 times too much.
>
> The trivial example is a function that does an early exit, but uses one
> or a few non-volatile registers before that exit.  This happens in e.g.
> glibc's malloc, if you want an easily accessed example.  With the current
> code, *all* components will be saved and then restored shortly afterwards.
So can you expand on the malloc example a bit -- I'm pretty sure I 
understand what you're trying to do, but a concrete example may help 
Bernd and be useful for archival purposes.

I also know that Carlos is interested in the malloc example -- so I'd 
like to be able to pass that along to him.

Given the multiple early exit and fast paths through the allocator, I'm 
not at all surprised that sinking different components of the prologue 
to different locations is useful.

Also if there's a case where sinking into a loop occurs, definitely 
point that out.

>
>>> The full-prologue algorithm makes as many blocks run without prologue as
>>> possible, by duplicating blocks where that helps.  If you do this for
>>> every component you can and up with 2**40 blocks for just 40 components,
>>
>> Ok, so why wouldn't we use the existing code with the duplication part
>> disabled?
>
> That would not perform nearly as well.
>
>> That's a later addition anyway and isn't necessary to do
>> shrink-wrapping in the first place.
>
> No, it always did that, just not as often (it only duplicated straight-line
> code before).
Presumably (I haven't looked yet), the duplication is so that we can 
isolate one or more paths which in turn allows sinking the prologue 
further on some of those paths.

This is something I'll definitely want to look at -- block duplication 
to facilitate code elimination (or in this case avoid code insertion) 
hits several areas of interest to me -- and how we balance duplication 
vs runtime savings is always interesting.

Jeff