public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Possible inline malloc alternatives: bitmap
@ 2018-03-02  5:54 Ondřej Bílka
  2018-03-02  5:54 ` Ondřej Bílka
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Ondřej Bílka @ 2018-03-02  5:54 UTC (permalink / raw)
  To: libc-alpha

Like I mentioned in other thread I plan to make improved malloc+friends.
It will be split to several aspects.

This one is about option to inline small size allocations which would
give same performance as using memory pool.

I planned to make this generic but it isn't case now as I realized
posibility of using bit tricks which would lead to more effective free
as it doesn't need to determine size of allocation.

Representation of small sizes is simple, for each 1024 page there would
be metadata with 64-bit integer which has i-th bit 1 if and only if
there is free chunk at offset 16*i of page which turns both allocation
and free into simple bit manipulation. 

Sample code how would inline look like is below. To handle space usage
one would in refill_bucket set up bitmap pages.

For free a case to call noninline_free if bitmap is zero serves to
register that there are some free chunks at that page.

For returning memory to different thread that created it one would use
atomic operations with basically same algorithm.


For medium same idea of denoting free into bitmap could work with same
code , with bit
operations I could quickly merge all adjacent free chunks into one and
delete old entries.

This one is at idea stage, I don't know how effective it will be to
merge multiple adjacent frees. I could do it with following bit tricks:

You need three bitmaps

allocated - 1 where allocated chunk starts
old_free - 1 where free chunk that is in data structure starts
new_free - 1 where chunk that was free'd but isn't in data structure
yet.
To handle are first construct mask with ones upto position of lowest
1 in new_free 

low_mask = new_free ^ (new_free - 1)

To get lower end you must compare 

old_free & low_mask > allocated & low_mask

if this is true then area starts on highest 1 bit of old_free &
low_mask, otherwise it starts with lowest bit of new_free. In first case
we need to remove that old_free element from data structure.

Then its easy to see that free area would end at position of lowest 1 in
formula

allocated & (~low_mask)

Additional elements that need to be deleted from data structure are
determined by following but we accumulate them into bitmap to_delete to handle
them all together at end. 

to_delete |= old_free & high_end_mask & ~low_mask;

then we mask handled elements of new_free and repeat while new_free is
nonzero.


struct page_map {
  int thread;
  uint64_t map;
  uint64_t atomic_map;
};

extern inline struct page_map *get_page_map(void *p)
{
  size_t m = (size_t) p;
  m = 1024 * (m / 1024) - 32;
  return (struct page_map *) m;
}

extern inline size_t base_addr (struct page_map *p)
{
   return ((size_t) p) + 32;
}

struct page_map *buckets[256/16];

extern inline void *inl_alloc(size_t size)
{
  if (size < 256)
   {
     int bucket = (size + 15) / 16;
     uint64_t map = buckets[bucket]->map;
     /* note that we handle case of malloc(0) 
        by always calling refill_buckets(0) with suitably filled  bucket[0] */
     if (map == 0)
       return refill_buckets (size);
     uint64_t new_map = map & (map - 1);
     size_t address = base_addr (buckets[bucket]) + 16 * __builtin_ctzl (map);
     buckets[bucket]->map = new_map;
     return (void *) address;
   }
  else
    return big_alloc (size);
}

extern inline void inl_free (void *p)
{
#ifndef PAGEMAP_NULL_VALID
  if (p == NULL) return;
#endif
  struct page_map *m = get_page_map (p);
  if (m->thread == thread_id && m->map != 0)
    m->map |= 1 << ((((size_t) p) / 16) % 64);
  else 
    {
      uint64_t n = m->atomic_map;
      uint64_t nm = n | (1 << ((((size_t) p) / 16) % 64));
      if (n == 0 || !atomic_compare_swap(&(m->atomic_map), n, nm))
        noninline_free(p);
    }
}


^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: Possible inline malloc alternatives: bitmap
@ 2018-03-04 14:07 Wilco Dijkstra
  2018-03-05 18:57 ` Carlos O'Donell
  0 siblings, 1 reply; 17+ messages in thread
From: Wilco Dijkstra @ 2018-03-04 14:07 UTC (permalink / raw)
  To: eggert, Ondřej Bílka; +Cc: nd, Carlos O'Donell, libc-alpha

Paul Eggert wrote:

> You can freely read an older NightWatch paper here:
>
> Although NightWatch's source code is freely readable (it's on GitHub), it does not specify a software licence so I have not read it and don't recommend that you read it. 

> > One problem is that it merges several different issues.
>
> Yes, a problem common to many systems papers.

Indeed, it is not relevant to the issues we are having with GLIBC malloc.  The paper is about
system wide cache partitioning which is very system specific and between processes.

> I didn't quite follow your profiling proposals. As near as I can make out you'd like to log calls to malloc, free, etc. But for good results don't you also need to log memory accesses? Otherwise how will you know which objects are hot?

I don't believe it is feasible to profile malloc more accurately, but neither is it necessary
for a quick evaluation of malloc improvements like (de)allocation performance and
fragmentation reduction. You always need to run the original benchmark to prove the
gains are real.

Wilco

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-03-06 21:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-02  5:54 Possible inline malloc alternatives: bitmap Ondřej Bílka
2018-03-02  5:54 ` Ondřej Bílka
2018-03-02 18:20   ` DJ Delorie
2018-03-02 16:28 ` Carlos O'Donell
2018-03-02 20:21   ` Paul Eggert
2018-03-03 12:56     ` Ondřej Bílka
2018-03-03 20:24       ` Paul Eggert
2018-03-03 23:37         ` Carlos O'Donell
2018-03-03 23:40           ` Carlos O'Donell
2018-03-04 15:51           ` Ondřej Bílka
2018-03-04 11:59         ` Per call-site malloc specialization Ondřej Bílka
2018-03-02 23:00   ` Possible inline malloc alternatives: bitmap Ondřej Bílka
2018-03-05 20:32     ` DJ Delorie
2018-03-06 20:42 ` best-fit algorithm with bitmap heap Ondřej Bílka
2018-03-06 21:26   ` Ondřej Bílka
2018-03-04 14:07 Possible inline malloc alternatives: bitmap Wilco Dijkstra
2018-03-05 18:57 ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).