public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "Ondřej Bílka" <neleai@seznam.cz>
To: Carlos O'Donell <carlos@redhat.com>
Cc: libc-alpha@sourceware.org
Subject: Re: Possible inline malloc alternatives: bitmap
Date: Fri, 02 Mar 2018 23:00:00 -0000	[thread overview]
Message-ID: <20180302230009.GA5448@domone> (raw)
In-Reply-To: <085a3206-4ae8-5d0a-800c-134d9d508ba1@redhat.com>

On Fri, Mar 02, 2018 at 08:27:50AM -0800, Carlos O'Donell wrote:
> On 03/01/2018 01:33 PM, Ondřej Bílka wrote:
> > Like I mentioned in other thread I plan to make improved malloc+friends.
> > It will be split to several aspects.
> 
> Great to hear that!
> 
> Please keep in mind that at Red Hat we are working on 3 relatively new
> things for malloc:
> 
> * Trace functionality (part of our whole-system trace work)
> * Replay functionality (our simulator)

These two are wasted work as any relation to actual performance is
purely coincidental. 

Running a program first with allocator A, then with
allocator B and comparing total run time is the only way to get accurate
results (Unless it is just microoptimalization that doesn't change any
allocated address.). 

It doesn't measure majority of performance factors because it happens
outside of malloc depending on resulting cache layout so obviously it
couldn't measure performance.

Simple case to demonstrate this is:

int i, j, al;

int *ary;
int *tmalloc (int i){
//  al+=1;
//  al+=10;
//  al = rand () % 20000;
  al = rand () % 1000;
  return  ary + al;
}

int main(){
  al = 0;
  ary = malloc (100000000);
int *x[1000];
for (i=0; i<1000; i++)
  x[i] = (int *) tmalloc (4);
for (j=0; j<1000000; j++)
  for (i=0; i<1000; i++)
    x[i][0] = 42;
}

Where depending on four cases how this simple allocator sets layout it
takes four times more time in last case than in first case. These cache
factors are ubiquious. 
For example it could look that keeping metadata outside of allocated
chunk is faster because prefetching that it does for application
singificantly reduces overall latency.


> The harder work that I face today is not about performance though, the harder
> work, and I would hope you could help us on this, is *limiting* RSS consumption
> and improving the core algorithms used in malloc to be able to limit RSS
> without an immense cost to performance.
> 
> Developers want to deploy memory limited systems and glibc's malloc does not
> always consider RSS costs. Some issues around this are:
> 
> * fastbins prevent free'ing down of heap from top.
> * fastbins can grow unbounded.
> * heaps can only be freed from top down (unless you call malloc_trim() which
>   can find sub-pages in the heap to free back).
> 
Real problem with first two that current code is convoluted mess.

For different representations of small allocations its easy to bound
space usage with tunable of maximal number of pages for these. When no
page is available it will just do allocation from heap. This would be
easy to add with clean code.

As for third problem of freeing pages after thinking for few hours I realized 
that bitmaps are only sane way to keep track of that (and five other insane ways)

Idea is that we will do madvise(dont_need) to every page that wasn't
used for specified amount of time. You need for bitmaps

free - denotes page that is completely free
cleared - free page on which we called madvise
live1 - these two capture recent action on page. After specified time period we copy live1 to live2 and zero live1
live2 

Then we could find that pages needing madvise are result of formula: free & ~cleared & ~(live1 | live2)

Then there is trick that for large allocations one should always clear
upper half of memory because buffer rarely uses whole area.

  parent reply	other threads:[~2018-03-02 23:00 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-02  5:54 Ondřej Bílka
2018-03-02  5:54 ` Ondřej Bílka
2018-03-02 18:20   ` DJ Delorie
2018-03-02 16:28 ` Carlos O'Donell
2018-03-02 20:21   ` Paul Eggert
2018-03-03 12:56     ` Ondřej Bílka
2018-03-03 20:24       ` Paul Eggert
2018-03-03 23:37         ` Carlos O'Donell
2018-03-03 23:40           ` Carlos O'Donell
2018-03-04 15:51           ` Ondřej Bílka
2018-03-04 11:59         ` Per call-site malloc specialization Ondřej Bílka
2018-03-02 23:00   ` Ondřej Bílka [this message]
2018-03-05 20:32     ` Possible inline malloc alternatives: bitmap DJ Delorie
2018-03-06 20:42 ` best-fit algorithm with bitmap heap Ondřej Bílka
2018-03-06 21:26   ` Ondřej Bílka
2018-03-04 14:07 Possible inline malloc alternatives: bitmap Wilco Dijkstra
2018-03-05 18:57 ` Carlos O'Donell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180302230009.GA5448@domone \
    --to=neleai@seznam.cz \
    --cc=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).