From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 100114 invoked by alias); 2 Mar 2018 05:54:32 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 100086 invoked by uid 89); 2 Mar 2018 05:54:31 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.3 required=5.0 tests=AWL,BAYES_00,DATE_IN_PAST_06_12,FREEMAIL_FROM,SPF_NEUTRAL autolearn=no version=3.3.2 spammy=H*F:D*seznam.cz X-HELO: popelka.ms.mff.cuni.cz Date: Fri, 02 Mar 2018 05:54:00 -0000 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= To: libc-alpha@sourceware.org Subject: Possible inline malloc alternatives: bitmap Message-ID: <20180301213318.GB3062@domone> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-SW-Source: 2018-03/txt/msg00027.txt.bz2 Like I mentioned in other thread I plan to make improved malloc+friends. It will be split to several aspects. This one is about option to inline small size allocations which would give same performance as using memory pool. I planned to make this generic but it isn't case now as I realized posibility of using bit tricks which would lead to more effective free as it doesn't need to determine size of allocation. Representation of small sizes is simple, for each 1024 page there would be metadata with 64-bit integer which has i-th bit 1 if and only if there is free chunk at offset 16*i of page which turns both allocation and free into simple bit manipulation. Sample code how would inline look like is below. To handle space usage one would in refill_bucket set up bitmap pages. For free a case to call noninline_free if bitmap is zero serves to register that there are some free chunks at that page. For returning memory to different thread that created it one would use atomic operations with basically same algorithm. For medium same idea of denoting free into bitmap could work with same code , with bit operations I could quickly merge all adjacent free chunks into one and delete old entries. This one is at idea stage, I don't know how effective it will be to merge multiple adjacent frees. I could do it with following bit tricks: You need three bitmaps allocated - 1 where allocated chunk starts old_free - 1 where free chunk that is in data structure starts new_free - 1 where chunk that was free'd but isn't in data structure yet. To handle are first construct mask with ones upto position of lowest 1 in new_free low_mask = new_free ^ (new_free - 1) To get lower end you must compare old_free & low_mask > allocated & low_mask if this is true then area starts on highest 1 bit of old_free & low_mask, otherwise it starts with lowest bit of new_free. In first case we need to remove that old_free element from data structure. Then its easy to see that free area would end at position of lowest 1 in formula allocated & (~low_mask) Additional elements that need to be deleted from data structure are determined by following but we accumulate them into bitmap to_delete to handle them all together at end. to_delete |= old_free & high_end_mask & ~low_mask; then we mask handled elements of new_free and repeat while new_free is nonzero. struct page_map { int thread; uint64_t map; uint64_t atomic_map; }; extern inline struct page_map *get_page_map(void *p) { size_t m = (size_t) p; m = 1024 * (m / 1024) - 32; return (struct page_map *) m; } extern inline size_t base_addr (struct page_map *p) { return ((size_t) p) + 32; } struct page_map *buckets[256/16]; extern inline void *inl_alloc(size_t size) { if (size < 256) { int bucket = (size + 15) / 16; uint64_t map = buckets[bucket]->map; /* note that we handle case of malloc(0) by always calling refill_buckets(0) with suitably filled bucket[0] */ if (map == 0) return refill_buckets (size); uint64_t new_map = map & (map - 1); size_t address = base_addr (buckets[bucket]) + 16 * __builtin_ctzl (map); buckets[bucket]->map = new_map; return (void *) address; } else return big_alloc (size); } extern inline void inl_free (void *p) { #ifndef PAGEMAP_NULL_VALID if (p == NULL) return; #endif struct page_map *m = get_page_map (p); if (m->thread == thread_id && m->map != 0) m->map |= 1 << ((((size_t) p) / 16) % 64); else { uint64_t n = m->atomic_map; uint64_t nm = n | (1 << ((((size_t) p) / 16) % 64)); if (n == 0 || !atomic_compare_swap(&(m->atomic_map), n, nm)) noninline_free(p); } }