From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31882 invoked by alias); 2 Mar 2018 23:00:50 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 25498 invoked by uid 89); 2 Mar 2018 23:00:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,SPF_NEUTRAL autolearn=no version=3.3.2 spammy=H*F:D*seznam.cz, five, our, hear X-HELO: popelka.ms.mff.cuni.cz Date: Fri, 02 Mar 2018 23:00:00 -0000 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: Carlos O'Donell Cc: libc-alpha@sourceware.org Subject: Re: Possible inline malloc alternatives: bitmap Message-ID: <20180302230009.GA5448@domone> References: <20180301213318.GB3062@domone> <085a3206-4ae8-5d0a-800c-134d9d508ba1@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <085a3206-4ae8-5d0a-800c-134d9d508ba1@redhat.com> User-Agent: Mutt/1.5.20 (2009-06-14) X-SW-Source: 2018-03/txt/msg00058.txt.bz2 On Fri, Mar 02, 2018 at 08:27:50AM -0800, Carlos O'Donell wrote: > On 03/01/2018 01:33 PM, Ondřej Bílka wrote: > > Like I mentioned in other thread I plan to make improved malloc+friends. > > It will be split to several aspects. > > Great to hear that! > > Please keep in mind that at Red Hat we are working on 3 relatively new > things for malloc: > > * Trace functionality (part of our whole-system trace work) > * Replay functionality (our simulator) These two are wasted work as any relation to actual performance is purely coincidental. Running a program first with allocator A, then with allocator B and comparing total run time is the only way to get accurate results (Unless it is just microoptimalization that doesn't change any allocated address.). It doesn't measure majority of performance factors because it happens outside of malloc depending on resulting cache layout so obviously it couldn't measure performance. Simple case to demonstrate this is: int i, j, al; int *ary; int *tmalloc (int i){ // al+=1; // al+=10; // al = rand () % 20000; al = rand () % 1000; return ary + al; } int main(){ al = 0; ary = malloc (100000000); int *x[1000]; for (i=0; i<1000; i++) x[i] = (int *) tmalloc (4); for (j=0; j<1000000; j++) for (i=0; i<1000; i++) x[i][0] = 42; } Where depending on four cases how this simple allocator sets layout it takes four times more time in last case than in first case. These cache factors are ubiquious. For example it could look that keeping metadata outside of allocated chunk is faster because prefetching that it does for application singificantly reduces overall latency. > The harder work that I face today is not about performance though, the harder > work, and I would hope you could help us on this, is *limiting* RSS consumption > and improving the core algorithms used in malloc to be able to limit RSS > without an immense cost to performance. > > Developers want to deploy memory limited systems and glibc's malloc does not > always consider RSS costs. Some issues around this are: > > * fastbins prevent free'ing down of heap from top. > * fastbins can grow unbounded. > * heaps can only be freed from top down (unless you call malloc_trim() which > can find sub-pages in the heap to free back). > Real problem with first two that current code is convoluted mess. For different representations of small allocations its easy to bound space usage with tunable of maximal number of pages for these. When no page is available it will just do allocation from heap. This would be easy to add with clean code. As for third problem of freeing pages after thinking for few hours I realized that bitmaps are only sane way to keep track of that (and five other insane ways) Idea is that we will do madvise(dont_need) to every page that wasn't used for specified amount of time. You need for bitmaps free - denotes page that is completely free cleared - free page on which we called madvise live1 - these two capture recent action on page. After specified time period we copy live1 to live2 and zero live1 live2 Then we could find that pages needing madvise are result of formula: free & ~cleared & ~(live1 | live2) Then there is trick that for large allocations one should always clear upper half of memory because buffer rarely uses whole area.