public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Re: best-fit algorithm with bitmap heap
       [not found] <DB6PR0801MB2053B88AEF84C7772505072B83D90@DB6PR0801MB2053.eurprd08.prod.outlook.com>
@ 2018-03-06 23:25 ` Wilco Dijkstra
  2018-03-07  7:28   ` Ondřej Bílka
  0 siblings, 1 reply; 6+ messages in thread
From: Wilco Dijkstra @ 2018-03-06 23:25 UTC (permalink / raw)
  To: libc-alpha; +Cc: nd

Hi,

I think before going into the details it is worth considering whether it is even
feasible to use a bitmap search for first-fit, let alone best-fit. What you seem
to be saying is there will be lots of small pages for allocation, each of which
has its own bitmap. If say we have 100MBytes worth of data allocated, you will
need to start searching every single bitmap that has free space. That would imply
adding a fast data structure that keeps track of all the free space across all
the bitmaps. Then this needs to be kept up to date by both malloc and free, as
well as supporting multiple threads. It seems to me this would become the key
bottleneck - so it's essential to sort out the overall allocation strategy first.

Wilco


     

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: best-fit algorithm with bitmap heap
  2018-03-06 23:25 ` best-fit algorithm with bitmap heap Wilco Dijkstra
@ 2018-03-07  7:28   ` Ondřej Bílka
  2018-03-07 14:43     ` Wilco Dijkstra
  0 siblings, 1 reply; 6+ messages in thread
From: Ondřej Bílka @ 2018-03-07  7:28 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: libc-alpha, nd

On Tue, Mar 06, 2018 at 11:25:31PM +0000, Wilco Dijkstra wrote:
> Hi,
> 
> I think before going into the details it is worth considering whether it is even
> feasible to use a bitmap search for first-fit, let alone best-fit. What you seem
> to be saying is there will be lots of small pages for allocation, each of which
> has its own bitmap. If say we have 100MBytes worth of data allocated, you will
> need to start searching every single bitmap that has free space. That would imply
> adding a fast data structure that keeps track of all the free space across all
> the bitmaps. Then this needs to be kept up to date by both malloc and free, as
> well as supporting multiple threads. It seems to me this would become the key
> bottleneck - so it's essential to sort out the overall allocation strategy first.
> 
No, there is no searching and as I described it is really easy.

I could do this for all sizes by making these structures two-level,
larger size are above mmap threshold.

For multiple frees done with bitmap one uses simple trick that if on
free free_bitmap is zero and add it to list pending_pages. When it is
time to do free it just traverses pending_pages, does free and resets 
corresponding free_bitmap to zero. 

For given capacity best fit would just point you to array with all free
pointers of given capacity. Here idea is that instead of deleting
pointer from linked list on free free we delete it in malloc with few 
bit operations if free region starts and ends at claimed points.

Problem of this is that we need to store those pointers separately and
to limit space usage delete invalid pointers when we exceed capacity.

Without this idea free is more complicated, it needs another bitmap with
free chunks that are double linked list, following formula should
identify those excluding start of new free region which should be handled
separately.

free_in_list & (alloc_first_updated ^ (alloc_first_updated - free_bitmap)).

In my previous mail I wrote check if chunk is valid incorrecty, previous one should be correct.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: best-fit algorithm with bitmap heap
  2018-03-07  7:28   ` Ondřej Bílka
@ 2018-03-07 14:43     ` Wilco Dijkstra
  2018-03-07 17:56       ` Ondřej Bílka
  0 siblings, 1 reply; 6+ messages in thread
From: Wilco Dijkstra @ 2018-03-07 14:43 UTC (permalink / raw)
  To: Ondřej Bílka; +Cc: libc-alpha, nd

Ondřej Bílka wrote:
> No, there is no searching and as I described it is really easy.

There is always a search here and that is completely unavoidable.

If you have a free list which contains multiple different sizes, you need
to walk it until you get a block that fits. This could be slow if there are
thousands of entries in that free list. If you use individual free lists for
every supported allocation size (only practical for small sizes), and the
free list for the requested size is empty, you now need to search all larger
free lists until you find one which isn't empty (if you cache which sizes
have free blocks then that's yet another data structure to keep up to date).

I think bitmaps are a good choice for allocation of small fixed size blocks,
but for medium sized blocks it's less clear.

Wilco



    

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: best-fit algorithm with bitmap heap
  2018-03-07 14:43     ` Wilco Dijkstra
@ 2018-03-07 17:56       ` Ondřej Bílka
  0 siblings, 0 replies; 6+ messages in thread
From: Ondřej Bílka @ 2018-03-07 17:56 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: libc-alpha, nd

On Wed, Mar 07, 2018 at 02:43:19PM +0000, Wilco Dijkstra wrote:
> Ondřej Bílka wrote:
> > No, there is no searching and as I described it is really easy.
> 
> There is always a search here and that is completely unavoidable.
> 
> If you have a free list which contains multiple different sizes, you need
> to walk it until you get a block that fits. This could be slow if there are
> thousands of entries in that free list. If you use individual free lists for
> every supported allocation size (only practical for small sizes), and the
> free list for the requested size is empty, you now need to search all larger
> free lists until you find one which isn't empty (if you cache which sizes
> have free blocks then that's yet another data structure to keep up to date).
> 
> I think bitmaps are a good choice for allocation of small fixed size blocks,
> but for medium sized blocks it's less clear.
>
No, that is pretty easy to do if you know data structures. Well I
thought that I also send mail with how I do best fit but didn't so I do
that now.

If there are 64 bins then one just needs bitmap bin_map which bins are
empty and which not. Getting first nonempty bin is just three
instructions.

int best_fit_bin(int bin)
{
  return __builtin_ctzl (bin_map & (-1L << bin));
}

For 64*64 bins you need two-level structure with 64 individual bitmaps
and one high bitmap if lower bitmaps are nonzero. Same idea, only done
in two levels.

int best_fit_bin(int bin)
{
  int bin_hi = bin / 64;
  bin = bin % 64;
  if  (bin_map[bin_hi] & (-1L << bin))
    return 64*bin_hi + __builtin_ctzl (bin_map[bin_hi] & (-1L << bin))
  else {
    bin_hi = __builtin_ctzl (high_bin_map & (-1L << (bin_hi+1)));
    return 64*bin_hi + __builtin_ctzl (bin_map[bin_hi]);
  }
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: best-fit algorithm with bitmap heap
  2018-03-06 20:42 ` best-fit algorithm with bitmap heap Ondřej Bílka
@ 2018-03-06 21:26   ` Ondřej Bílka
  0 siblings, 0 replies; 6+ messages in thread
From: Ondřej Bílka @ 2018-03-06 21:26 UTC (permalink / raw)
  To: libc-alpha

Correction as I initially discovered better solution but forgot about
that when writing previous post. Operations are lot simpler.

Description below is mostly same except that free is simpler. I could do
multiple frees with free_bitmap with 1 at start of each chunk to be freed.
Bitmaps alloc_first and alloc_last with 1 at start/end of allocated
chunk stay. Removing bits corresponding to freed chunk is actually
simple with following formulas

alloc_first = alloc_first - free_bitmap;
alloc_last = alloc_last & (alloc_last - free_bitmap);

Endpoints (noninclusive) of free areas that changed are given by following bitmap

free_last = alloc_first & (alloc_first ^ (alloc_first - free_bitmap))

When processing these you get starts from highest bit in alloc_last & mask

Forgot to add that cross page allocation is possible by adding more
cases but I don't know if bit better space utilization is worth worse
performance.

Check if block is valid could be also simplified and we could use same
formulas as those to remove bits, to allocate offset s and size l we
check and change bitmaps with following checks:

if (((alloc_first ^ (~ (1<<s))) != alloc_first) &&
    alloc_last & (alloc_last - (1<<s)) == alloc_last & (~(1<<(s+l))))
  {
    alloc_first = alloc_first ^ (~ (1<<s));
    alloc_last = alloc_last & (~(1<<(s+l)));
    /* valid, do allocation */
  }


On Tue, Mar 06, 2018 at 09:42:43PM +0100, Ondřej Bílka wrote:
> On Thu, Mar 01, 2018 at 10:33:18PM +0100, Ondřej Bílka wrote:
> > Like I mentioned in other thread I plan to make improved malloc+friends.
> > It will be split to several aspects.
> > 
> I got idea when I pondered about merging chunks and deleting old data
> that you don't have to delete them from data structure. 
> It requires to save data about free chunks separately and data to check 
> if chunk information is up-to-date.
> To limit space usage one would need to use fixed size buffer with these, when its full one
> needs to delete all invalid entries from buffer, and add/remove elements from bigger
> storage to make buffer half full.
> 
> I could do this quickly with bitmaps and some conditions for sake of simplicity.
> 1. It would consider only allocation where capacity is 1-16 integer multiple of block 
> 2. Data would be in 64-block pages where first and last block are reserved by allocator.
> 
> As data structure for page start of allocated chunk would be denoted by
> 1 in corresponding  bit of alloc_first and end in corresponding bit of alloc_last. 
> Reserved blocks are there to put sentinel 1 at end of bitmaps. 
> For faster operations bits in alloc_first could be in reversed order.
> 
> When one has address and capacity of b blocks testing if it is valid free area is easy,
> from address we determine header and number s of starting block.
> First we check that free space starts there with formula (alloc_last & (1<<(s-1)))
> To test that it is free space until s+b which should be start of chunk
> it should be formula like this(maybe its off by one somewhere, didn't test that)
> ((alloc_first>>s)<<(64-b)) == 1<<63
> 
> On free with offset s one zeroes allocated bits with 
> 
> alloc_first = alloc_first ^ (1<<s);
> 
> Start of free region is just after index of highest 1 bit of alloc_first & ((1<<s) - 1)
> Here comes comment that alloc_first could have reversed order as lowest
> 1 bit could be easier to compute
> 
> End of allocation could be zeroes just with easy bit operations like
> following:
> 
> ml = alloc_last & (~((1<<s)-1));
> mask = ml ^ (ml - 1);
> alloc_last = alloc_last ^ mask ^ (mask / 2);
> 
> end of free region is last 1 bit of alloc_last & (~mask)
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* best-fit algorithm with bitmap heap
  2018-03-02  5:54 Possible inline malloc alternatives: bitmap Ondřej Bílka
@ 2018-03-06 20:42 ` Ondřej Bílka
  2018-03-06 21:26   ` Ondřej Bílka
  0 siblings, 1 reply; 6+ messages in thread
From: Ondřej Bílka @ 2018-03-06 20:42 UTC (permalink / raw)
  To: libc-alpha

On Thu, Mar 01, 2018 at 10:33:18PM +0100, Ondřej Bílka wrote:
> Like I mentioned in other thread I plan to make improved malloc+friends.
> It will be split to several aspects.
> 
I got idea when I pondered about merging chunks and deleting old data
that you don't have to delete them from data structure. 
It requires to save data about free chunks separately and data to check 
if chunk information is up-to-date.
To limit space usage one would need to use fixed size buffer with these, when its full one
needs to delete all invalid entries from buffer, and add/remove elements from bigger
storage to make buffer half full.

I could do this quickly with bitmaps and some conditions for sake of simplicity.
1. It would consider only allocation where capacity is 1-16 integer multiple of block 
2. Data would be in 64-block pages where first and last block are reserved by allocator.

As data structure for page start of allocated chunk would be denoted by
1 in corresponding  bit of alloc_first and end in corresponding bit of alloc_last. 
Reserved blocks are there to put sentinel 1 at end of bitmaps. 
For faster operations bits in alloc_first could be in reversed order.

When one has address and capacity of b blocks testing if it is valid free area is easy,
from address we determine header and number s of starting block.
First we check that free space starts there with formula (alloc_last & (1<<(s-1)))
To test that it is free space until s+b which should be start of chunk
it should be formula like this(maybe its off by one somewhere, didn't test that)
((alloc_first>>s)<<(64-b)) == 1<<63

On free with offset s one zeroes allocated bits with 

alloc_first = alloc_first ^ (1<<s);

Start of free region is just after index of highest 1 bit of alloc_first & ((1<<s) - 1)
Here comes comment that alloc_first could have reversed order as lowest
1 bit could be easier to compute

End of allocation could be zeroes just with easy bit operations like
following:

ml = alloc_last & (~((1<<s)-1));
mask = ml ^ (ml - 1);
alloc_last = alloc_last ^ mask ^ (mask / 2);

end of free region is last 1 bit of alloc_last & (~mask)


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-07 17:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <DB6PR0801MB2053B88AEF84C7772505072B83D90@DB6PR0801MB2053.eurprd08.prod.outlook.com>
2018-03-06 23:25 ` best-fit algorithm with bitmap heap Wilco Dijkstra
2018-03-07  7:28   ` Ondřej Bílka
2018-03-07 14:43     ` Wilco Dijkstra
2018-03-07 17:56       ` Ondřej Bílka
2018-03-02  5:54 Possible inline malloc alternatives: bitmap Ondřej Bílka
2018-03-06 20:42 ` best-fit algorithm with bitmap heap Ondřej Bílka
2018-03-06 21:26   ` Ondřej Bílka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).