From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id E75CF3858D20 for ; Wed, 9 Aug 2023 21:59:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E75CF3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1691618341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to; bh=SRRxyC2YwJQyDXzyyjojDPp95L5eu2LEPrjGKADAc2s=; b=TJ//hSSmcqixLybO2YrgJM75kkN9r/sZhhf4FqgPhvPP+fQK82YQjBOYSsB2NBWCiQ5Rra qj5L3TZG6p1WrRFURc8/PuvS8CUBODIouVSjOydARnSOCsQTvgyqycuLZOxkAZRz+VnYYm Mb3FYq8Pn91O/Qn6UhdEdtz5YRUu5Xc= Received: from mimecast-mx02.redhat.com (66.187.233.73 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-669-FDb8tLe6PMu3dl2Y5ZiTVw-1; Wed, 09 Aug 2023 17:58:58 -0400 X-MC-Unique: FDb8tLe6PMu3dl2Y5ZiTVw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1F15F2804611; Wed, 9 Aug 2023 21:58:58 +0000 (UTC) Received: from greed.delorie.com (unknown [10.22.8.214]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F1A38140E96D; Wed, 9 Aug 2023 21:58:57 +0000 (UTC) Received: from greed.delorie.com.redhat.com (localhost [127.0.0.1]) by greed.delorie.com (8.15.2/8.15.2) with ESMTP id 379LwvMd3093711; Wed, 9 Aug 2023 17:58:57 -0400 From: DJ Delorie To: Florian Weimer Cc: libc-alpha@sourceware.org, toolybird@tuta.io Subject: Re: [PATCH] malloc: Enable merging of remainders in memalign (bug 30723) In-Reply-To: <877cq4yrpg.fsf@oldenburg.str.redhat.com> (message from Florian Weimer on Wed, 09 Aug 2023 20:51:39 +0200) Date: Wed, 09 Aug 2023 17:58:57 -0400 Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Florian Weimer writes: > Previously, calling _int_free from _int_memalign could put remainders > into the tcache or into fastbins, where they are invisible to the > low-level allocator. This results in missed merge opportunities > because once these freed chunks become available to the low-level > allocator, further memalign allocations (even of the same size are) > likely obstructing merges. I would note that everything else malloc does has this problem, too. Without limits or checks on fastbin size, anything that puts a chunk on the fastbin could "permanently" block consolidation somewhere. This is why tcache has a fixed upper limit. If we had a more robust malloc benchmark suite, I'd suggest re-visiting (a potentially bigger) tcache vs fastbins to try to reduce fragmentation, as memory size seems to be becoming more important wrt memory speed these days. LGTM. Reviewed-by: DJ Delorie > +static void _int_free_merge_chunk (mstate, mchunkptr, INTERNAL_SIZE_T); > +static INTERNAL_SIZE_T _int_free_create_chunk (mstate, > + mchunkptr, INTERNAL_SIZE_T, > + mchunkptr, INTERNAL_SIZE_T); > +static void _int_free_maybe_consolidate (mstate, INTERNAL_SIZE_T); So what we're doing is breaking out some of the functionality of _int_free() into separately-callable functions, and these are the prototypes for those new functions. Ok. I'll review the "diff -bw" for the remainder, to simplify it... - nextchunk = chunk_at_offset(p, size); + _int_free_merge_chunk (av, p, size); + + if (!have_lock) + __libc_lock_unlock (av->mutex); + } + /* + If the chunk was allocated via mmap, release via munmap(). + */ + + else { + munmap_chunk (p); + } +} + +/* Try to merge chunk P of SIZE bytes with its neighbors. Put the + resulting chunk on the appropriate bin list. P must not be on a + bin list yet, and it can be in use. */ +static void +_int_free_merge_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size) +{ + mchunkptr nextchunk = chunk_at_offset(p, size); Split off the end of _int_free into separate functions, and call them. Ok. @@ -4652,16 +4677,17 @@ _int_free (mstate av, mchunkptr p, int h if (__glibc_unlikely (!prev_inuse(nextchunk))) malloc_printerr ("double free or corruption (!prev)"); - nextsize = chunksize(nextchunk); + INTERNAL_SIZE_T nextsize = chunksize(nextchunk); Because it's first time in this new function, OK. - /* consolidate backward */ - if (!prev_inuse(p)) { - prevsize = prev_size (p); + /* Consolidate backward. */ + if (!prev_inuse(p)) + { + INTERNAL_SIZE_T prevsize = prev_size (p); Ok. - if (nextchunk != av->top) { + /* Write the chunk header, maybe after merging with the following chunk. */ + size = _int_free_create_chunk (av, p, size, nextchunk, nextsize); + _int_free_maybe_consolidate (av, size); +} + +/* Create a chunk at P of SIZE bytes, with SIZE potentially increased + to cover the immediately following chunk NEXTCHUNK of NEXTSIZE + bytes (if NEXTCHUNK is unused). The chunk at P is not actually + read and does not have to be initialized. After creation, it is + placed on the appropriate bin list. The function returns the size + of the new chunk. */ +static INTERNAL_SIZE_T +_int_free_create_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size, + mchunkptr nextchunk, INTERNAL_SIZE_T nextsize) +{ + if (nextchunk != av->top) + { Again, breaking out this block of code into a separate function, and calling it. Ok. /* get and clear inuse bit */ - nextinuse = inuse_bit_at_offset(nextchunk, nextsize); + bool nextinuse = inuse_bit_at_offset (nextchunk, nextsize); Ok. - bck = unsorted_chunks(av); - fwd = bck->fd; + mchunkptr bck = unsorted_chunks (av); + mchunkptr fwd = bck->fd; Ok. - /* - If the chunk borders the current high end of memory, - consolidate into top - */ - - else { + else + { + /* If the chunk borders the current high end of memory, + consolidate into top. */ ok. - /* - If freeing a large space, consolidate possibly-surrounding - chunks. Then, if the total unused topmost memory exceeds trim - threshold, ask malloc_trim to reduce top. - - Unless max_fast is 0, we don't know if there are fastbins - bordering top, so we cannot tell for sure whether threshold - has been reached unless fastbins are consolidated. But we - don't want to consolidate on each free. As a compromise, - consolidation is performed if FASTBIN_CONSOLIDATION_THRESHOLD - is reached. - */ + return size; +} Ok. - if ((unsigned long)(size) >= FASTBIN_CONSOLIDATION_THRESHOLD) { +/* If freeing a large space, consolidate possibly-surrounding + chunks. Then, if the total unused topmost memory exceeds trim + threshold, ask malloc_trim to reduce top. */ +static void +_int_free_maybe_consolidate (mstate av, INTERNAL_SIZE_T size) +{ + /* Unless max_fast is 0, we don't know if there are fastbins + bordering top, so we cannot tell for sure whether threshold has + been reached unless fastbins are consolidated. But we don't want + to consolidate on each free. As a compromise, consolidation is + performed if FASTBIN_CONSOLIDATION_THRESHOLD is reached. */ + if (size >= FASTBIN_CONSOLIDATION_THRESHOLD) + { Ok. More breakout. - if (av == &main_arena) { + if (av == &main_arena) + { Ok. #ifndef MORECORE_CANNOT_TRIM - if ((unsigned long)(chunksize(av->top)) >= - (unsigned long)(mp_.trim_threshold)) + if (chunksize (av->top) >= mp_.trim_threshold) Ok. - } else { - /* Always try heap_trim(), even if the top chunk is not - large, because the corresponding heap might go away. */ + } + else + { + /* Always try heap_trim, even if the top chunk is not large, + because the corresponding heap might go away. */ Ok. - - if (!have_lock) - __libc_lock_unlock (av->mutex); - } - /* - If the chunk was allocated via mmap, release via munmap(). - */ - - else { - munmap_chunk (p); - } This was moved to above. Ok. @@ -5221,7 +5254,7 @@ _int_memalign (mstate av, size_t alignme (av != &main_arena ? NON_MAIN_ARENA : 0)); set_inuse_bit_at_offset (newp, newsize); set_head_size (p, leadsize | (av != &main_arena ? NON_MAIN_ARENA : 0)); - _int_free (av, p, 1); + _int_free_merge_chunk (av, p, leadsize); Ok. @@ -5232,14 +5265,26 @@ _int_memalign (mstate av, size_t alignme if (!chunk_is_mmapped (p)) { size = chunksize (p); - if ((unsigned long) (size) > (unsigned long) (nb + MINSIZE)) + mchunkptr nextchunk = chunk_at_offset(p, size); + INTERNAL_SIZE_T nextsize = chunksize(nextchunk); + if (size > nb) Ok. The minsize check is moved below. remainder_size = size - nb; + if (remainder_size >= MINSIZE + || nextchunk == av->top + || !inuse_bit_at_offset (nextchunk, nextsize)) + { + /* We can only give back the tail if it is larger than + MINSIZE, or if the following chunk is unused (top + chunk or unused in-heap chunk). Otherwise we would + create a chunk that is smaller than MINSIZE. */ Ok. remainder = chunk_at_offset (p, nb); - set_head (remainder, remainder_size | PREV_INUSE | - (av != &main_arena ? NON_MAIN_ARENA : 0)); set_head_size (p, nb); - _int_free (av, remainder, 1); + remainder_size = _int_free_create_chunk (av, remainder, + remainder_size, + nextchunk, nextsize); + _int_free_maybe_consolidate (av, remainder_size); + } The conditional changes from "big enough" to "big enough, or mergeable" so OK.