From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dcvr.yhbt.net (dcvr.yhbt.net [173.255.242.215]) by sourceware.org (Postfix) with ESMTPS id 8DF113858CDB for ; Tue, 9 Apr 2024 09:33:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8DF113858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=80x24.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=80x24.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8DF113858CDB Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=173.255.242.215 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712655238; cv=none; b=uZT550JlYOGeNAriK1KV0ZF9/f7wfrPt15ouCYOEH0JgJEdlHTt1ODNVfyvSkc30lDixyy6zm0az7AckQ/mvWrl6C6rogpVY+air72WaPIgTodwwv6j8UhFgqTftfGS+l6AA0esYH41HLoibzgGFY+YyP91Ao+xhvmZ6bDgKNeI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712655238; c=relaxed/simple; bh=mXgr/ZI2C9ZpRPVzDOAKoPGf7h5Yhk5cl3IoCHPY9n4=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=emKGlFWdmzzG+XeGurZQ9hgCc1q+/iflK0zj26pDL1Mbi3iay5MiqBICwgr6FNciGzzmdUEJNutwrqUohBZpwO8QuN1wc+Qfy2OY7BJQJgaMmuc6NfVbWe73yZ4QDjxO92Hg3gaHl//nRoxa2uEhlVkGlLen00fEKCgjEnn0Ckk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id E2E751F44D; Tue, 9 Apr 2024 09:33:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1712655233; bh=mXgr/ZI2C9ZpRPVzDOAKoPGf7h5Yhk5cl3IoCHPY9n4=; h=Date:From:To:Subject:From; b=GPyW4a19oulUnK9RamU0vx/VEVtnSHJGxieYZvaVf+oiE8X/V2QQ/gYg56++M/L33 Tl+5hEBQTAUj6u72F4HPZPPw/VeTQ0hQD7ZlSsPG4as5h8CsEcG+IUOh1SBNUp+xuv n2zoXcCxadx8B4v8Cb6kGxlpu8dNoyWvxJAEz8EM= Date: Tue, 9 Apr 2024 09:33:52 +0000 From: Eric Wong To: libc-alpha@sourceware.org Subject: [RFT] malloc: reduce largebin granularity to reduce fragmentation Message-ID: <20240409093352.M757838@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Anybody volunteers to help test and get reproducible results for this? Thanks in advance. Testing and getting reproducible results is proving extremely expensive due to trace sizes and CPU usage required to handle my existing web/IMAP/NNTP-facing traffic. But the theory works in my mind and could be a solution to a problem I've noticed for decades at this point across long-lived Ruby and Perl web daemons. I'm also considering having this as a tunable and mallopt(3). And perhaps limiting the alignment to pagesize can work, because 20% of a large sliding mmap_threshold on 64-bit is several megabytes... ------8<------ From: Eric Wong Subject: [PATCH] malloc: reduce largebin granularity to reduce fragmentation TL;DR: trade initial best fits for better fits over long process lifetimes Size classes in modern versions of jemalloc add a 0-20% overhead in an attempt to get better fits over time when dealing with unpredictably-sized allocations and frees. Initially, this can be more wasteful than the best (initial) fit strategy used by dlmalloc, but less wasteful than buddy allocators (which can have 0-99% overhead). While the dlmalloc "best fit" strategy is ideal for one-off permanent allocations and applications that only deal in uniform allocation sizes; "best fit" gets bogged down over time when dealing with variably-sized allocations with mixed and interleaving lifetimes commonly seen in high-level language runtimes. Such allocation patterns are common in long-lived web app, NNTP, and IMAP servers doing text processing with concurrent ("C10K"). Unpredictable lifetimes are often a result of resource sharing between network clients of disparate lifetimes, but some of it is outside the application developers control. Fragmentation is further exacerbated by long-lived allocations happening late in process life of high-level language runtimes (e.g. Perl and Ruby). These runtimes and their standard libraries can trigger long-lived allocations late in process life due to the use lazy loading, caching, and internal slab allocators using malloc to create slabs. This change only affects largebin allocations which are too small for mmap, and too small for smallbins and fastbins. --- malloc/malloc.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/malloc/malloc.c b/malloc/malloc.c index bcb6e5b83c..9a1057f5a7 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -3288,6 +3288,56 @@ tcache_thread_shutdown (void) #endif /* !USE_TCACHE */ +/* + * Use jemalloc-inspired size classes for largebin allocations to + * minimize fragmentation. This means we pay a 0-20% overhead on + * allocations to improve the likelyhood of reuse across many + * odd-sized allocations and frees. This should reduce fragmentation + * in long-lived applications processing various text fragments + * (e.g. mail, news, and web services) + */ +static inline void +size_class_align (size_t *nb, size_t *req) +{ + // TODO: make this a mallopt + tunable? + + if (in_smallbin_range (*nb)) + return; + + size_t mm_thresh = MAX (DEFAULT_MMAP_THRESHOLD_MAX, mp_.mmap_threshold); + + if (*nb >= mm_thresh && mp_.n_mmaps < mp_.n_mmaps_max) + return; + + size_t n = *req - 1; + for (size_t i = 1; i < sizeof (size_t) * CHAR_BIT; i <<= 1) + n |= n >> i; + + size_t next_power_of_two = n + 1; + + /* + * This size alignment causes up to 20% overhead which can be several + * megabytes on 64-bit systems with high mmap_threshold. Perhaps we + * can clamp alignment to pagesize or similar to save space. + */ + size_t align = next_power_of_two >> 3; + size_t areq = ALIGN_UP (*req, align); + size_t anb = checked_request2size (areq); + + if (anb == 0) + return; // aligned size is too big, but unaligned wasn't + + if (anb < mm_thresh || mp_.n_mmaps >= mp_.n_mmaps_max) + { + *nb = anb; + *req = areq; + } + else // too big for largebins, force it to mmap + { + *nb = mm_thresh; + } +} + #if IS_IN (libc) void * __libc_malloc (size_t bytes) @@ -3308,6 +3358,7 @@ __libc_malloc (size_t bytes) __set_errno (ENOMEM); return NULL; } + size_class_align (&tbytes, &bytes); size_t tc_idx = csize2tidx (tbytes); MAYBE_INIT_TCACHE (); @@ -3503,6 +3554,8 @@ __libc_realloc (void *oldmem, size_t bytes) return newmem; } + size_class_align (&nb, &bytes); + if (SINGLE_THREAD_P) { newp = _int_realloc (ar_ptr, oldp, oldsize, nb); @@ -3713,6 +3766,13 @@ __libc_calloc (size_t n, size_t elem_size) } sz = bytes; + size_t nb = checked_request2size (sz); + if (nb == 0) + { + __set_errno (ENOMEM); + return NULL; + } + size_class_align (&nb, &sz); if (!__malloc_initialized) ptmalloc_init ();