From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-90751-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 31882 invoked by alias); 2 Mar 2018 23:00:50 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 25498 invoked by uid 89); 2 Mar 2018 23:00:25 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,SPF_NEUTRAL autolearn=no version=3.3.2 spammy=H*F:D*seznam.cz, five, our, hear
X-HELO: popelka.ms.mff.cuni.cz
Date: Fri, 02 Mar 2018 23:00:00 -0000
From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= <neleai@seznam.cz>
To: Carlos O'Donell <carlos@redhat.com>
Cc: libc-alpha@sourceware.org
Subject: Re: Possible inline malloc alternatives: bitmap
Message-ID: <20180302230009.GA5448@domone>
References: <20180301213318.GB3062@domone>
 <085a3206-4ae8-5d0a-800c-134d9d508ba1@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <085a3206-4ae8-5d0a-800c-134d9d508ba1@redhat.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-SW-Source: 2018-03/txt/msg00058.txt.bz2

On Fri, Mar 02, 2018 at 08:27:50AM -0800, Carlos O'Donell wrote:
> On 03/01/2018 01:33 PM, OndÅej BÃ­lka wrote:
> > Like I mentioned in other thread I plan to make improved malloc+friends.
> > It will be split to several aspects.
> 
> Great to hear that!
> 
> Please keep in mind that at Red Hat we are working on 3 relatively new
> things for malloc:
> 
> * Trace functionality (part of our whole-system trace work)
> * Replay functionality (our simulator)

These two are wasted work as any relation to actual performance is
purely coincidental. 

Running a program first with allocator A, then with
allocator B and comparing total run time is the only way to get accurate
results (Unless it is just microoptimalization that doesn't change any
allocated address.). 

It doesn't measure majority of performance factors because it happens
outside of malloc depending on resulting cache layout so obviously it
couldn't measure performance.

Simple case to demonstrate this is:

int i, j, al;

int *ary;
int *tmalloc (int i){
//  al+=1;
//  al+=10;
//  al = rand () % 20000;
  al = rand () % 1000;
  return  ary + al;
}

int main(){
  al = 0;
  ary = malloc (100000000);
int *x[1000];
for (i=0; i<1000; i++)
  x[i] = (int *) tmalloc (4);
for (j=0; j<1000000; j++)
  for (i=0; i<1000; i++)
    x[i][0] = 42;
}

Where depending on four cases how this simple allocator sets layout it
takes four times more time in last case than in first case. These cache
factors are ubiquious. 
For example it could look that keeping metadata outside of allocated
chunk is faster because prefetching that it does for application
singificantly reduces overall latency.


> The harder work that I face today is not about performance though, the harder
> work, and I would hope you could help us on this, is *limiting* RSS consumption
> and improving the core algorithms used in malloc to be able to limit RSS
> without an immense cost to performance.
> 
> Developers want to deploy memory limited systems and glibc's malloc does not
> always consider RSS costs. Some issues around this are:
> 
> * fastbins prevent free'ing down of heap from top.
> * fastbins can grow unbounded.
> * heaps can only be freed from top down (unless you call malloc_trim() which
>   can find sub-pages in the heap to free back).
> 
Real problem with first two that current code is convoluted mess.

For different representations of small allocations its easy to bound
space usage with tunable of maximal number of pages for these. When no
page is available it will just do allocation from heap. This would be
easy to add with clean code.

As for third problem of freeing pages after thinking for few hours I realized 
that bitmaps are only sane way to keep track of that (and five other insane ways)

Idea is that we will do madvise(dont_need) to every page that wasn't
used for specified amount of time. You need for bitmaps

free - denotes page that is completely free
cleared - free page on which we called madvise
live1 - these two capture recent action on page. After specified time period we copy live1 to live2 and zero live1
live2 

Then we could find that pages needing madvise are result of formula: free & ~cleared & ~(live1 | live2)

Then there is trick that for large allocations one should always clear
upper half of memory because buffer rarely uses whole area.