public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Possible inline malloc alternatives: bitmap
@ 2018-03-02  5:54 Ondřej Bílka
  2018-03-02  5:54 ` Ondřej Bílka
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Ondřej Bílka @ 2018-03-02  5:54 UTC (permalink / raw)
  To: libc-alpha

Like I mentioned in other thread I plan to make improved malloc+friends.
It will be split to several aspects.

This one is about option to inline small size allocations which would
give same performance as using memory pool.

I planned to make this generic but it isn't case now as I realized
posibility of using bit tricks which would lead to more effective free
as it doesn't need to determine size of allocation.

Representation of small sizes is simple, for each 1024 page there would
be metadata with 64-bit integer which has i-th bit 1 if and only if
there is free chunk at offset 16*i of page which turns both allocation
and free into simple bit manipulation. 

Sample code how would inline look like is below. To handle space usage
one would in refill_bucket set up bitmap pages.

For free a case to call noninline_free if bitmap is zero serves to
register that there are some free chunks at that page.

For returning memory to different thread that created it one would use
atomic operations with basically same algorithm.


For medium same idea of denoting free into bitmap could work with same
code , with bit
operations I could quickly merge all adjacent free chunks into one and
delete old entries.

This one is at idea stage, I don't know how effective it will be to
merge multiple adjacent frees. I could do it with following bit tricks:

You need three bitmaps

allocated - 1 where allocated chunk starts
old_free - 1 where free chunk that is in data structure starts
new_free - 1 where chunk that was free'd but isn't in data structure
yet.
To handle are first construct mask with ones upto position of lowest
1 in new_free 

low_mask = new_free ^ (new_free - 1)

To get lower end you must compare 

old_free & low_mask > allocated & low_mask

if this is true then area starts on highest 1 bit of old_free &
low_mask, otherwise it starts with lowest bit of new_free. In first case
we need to remove that old_free element from data structure.

Then its easy to see that free area would end at position of lowest 1 in
formula

allocated & (~low_mask)

Additional elements that need to be deleted from data structure are
determined by following but we accumulate them into bitmap to_delete to handle
them all together at end. 

to_delete |= old_free & high_end_mask & ~low_mask;

then we mask handled elements of new_free and repeat while new_free is
nonzero.


struct page_map {
  int thread;
  uint64_t map;
  uint64_t atomic_map;
};

extern inline struct page_map *get_page_map(void *p)
{
  size_t m = (size_t) p;
  m = 1024 * (m / 1024) - 32;
  return (struct page_map *) m;
}

extern inline size_t base_addr (struct page_map *p)
{
   return ((size_t) p) + 32;
}

struct page_map *buckets[256/16];

extern inline void *inl_alloc(size_t size)
{
  if (size < 256)
   {
     int bucket = (size + 15) / 16;
     uint64_t map = buckets[bucket]->map;
     /* note that we handle case of malloc(0) 
        by always calling refill_buckets(0) with suitably filled  bucket[0] */
     if (map == 0)
       return refill_buckets (size);
     uint64_t new_map = map & (map - 1);
     size_t address = base_addr (buckets[bucket]) + 16 * __builtin_ctzl (map);
     buckets[bucket]->map = new_map;
     return (void *) address;
   }
  else
    return big_alloc (size);
}

extern inline void inl_free (void *p)
{
#ifndef PAGEMAP_NULL_VALID
  if (p == NULL) return;
#endif
  struct page_map *m = get_page_map (p);
  if (m->thread == thread_id && m->map != 0)
    m->map |= 1 << ((((size_t) p) / 16) % 64);
  else 
    {
      uint64_t n = m->atomic_map;
      uint64_t nm = n | (1 << ((((size_t) p) / 16) % 64));
      if (n == 0 || !atomic_compare_swap(&(m->atomic_map), n, nm))
        noninline_free(p);
    }
}


^ permalink raw reply	[flat|nested] 19+ messages in thread
[parent not found: <DB6PR0801MB2053B88AEF84C7772505072B83D90@DB6PR0801MB2053.eurprd08.prod.outlook.com>]

end of thread, other threads:[~2018-03-07 17:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-02  5:54 Possible inline malloc alternatives: bitmap Ondřej Bílka
2018-03-02  5:54 ` Ondřej Bílka
2018-03-02 18:20   ` DJ Delorie
2018-03-02 16:28 ` Carlos O'Donell
2018-03-02 20:21   ` Paul Eggert
2018-03-03 12:56     ` Ondřej Bílka
2018-03-03 20:24       ` Paul Eggert
2018-03-03 23:37         ` Carlos O'Donell
2018-03-03 23:40           ` Carlos O'Donell
2018-03-04 15:51           ` Ondřej Bílka
2018-03-04 11:59         ` Per call-site malloc specialization Ondřej Bílka
2018-03-02 23:00   ` Possible inline malloc alternatives: bitmap Ondřej Bílka
2018-03-05 20:32     ` DJ Delorie
2018-03-06 20:42 ` best-fit algorithm with bitmap heap Ondřej Bílka
2018-03-06 21:26   ` Ondřej Bílka
     [not found] <DB6PR0801MB2053B88AEF84C7772505072B83D90@DB6PR0801MB2053.eurprd08.prod.outlook.com>
2018-03-06 23:25 ` Wilco Dijkstra
2018-03-07  7:28   ` Ondřej Bílka
2018-03-07 14:43     ` Wilco Dijkstra
2018-03-07 17:56       ` Ondřej Bílka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).