* [PATCH] malloc: Move mmap code out of __libc_free hotpath
@ 2025-03-24 19:29 Wilco Dijkstra
2025-03-24 20:41 ` Florian Weimer
2025-03-24 22:16 ` DJ Delorie
0 siblings, 2 replies; 9+ messages in thread
From: Wilco Dijkstra @ 2025-03-24 19:29 UTC (permalink / raw)
To: 'GNU C Library'
Currently __libc_free checks for a freed mmap chunk in the fast path.
Also errno is always saved and restored to preserve it. Since mmap chunks
are larger than the largest tcache chunk, it is safe to delay this and
handle tcache, smallbin and medium bin blocks first. Move saving of errno
to cases that actually need it.
Performance of bench-malloc-thread improves by 6.4% for 1 thread and
4.9% for 32 threads on Neoverse V2.
Passes regress, OK for commit?
---
diff --git a/malloc/malloc.c b/malloc/malloc.c
index b73ddbf554461da34d99258fae87c6ece6d175ba..f362227e68b78e25583aa971cb6ee131f44a92e5 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3313,6 +3313,9 @@ tcache_init(void)
if (tcache_shutting_down)
return;
+ /* Preserve errno when called from free() - _int_malloc may corrupt it. */
+ int err = errno;
+
arena_get (ar_ptr, bytes);
victim = _int_malloc (ar_ptr, bytes);
if (!victim && ar_ptr != NULL)
@@ -3321,10 +3324,11 @@ tcache_init(void)
victim = _int_malloc (ar_ptr, bytes);
}
-
if (ar_ptr != NULL)
__libc_lock_unlock (ar_ptr->mutex);
+ __set_errno (err);
+
/* In a low memory situation, we may not be able to allocate memory
- in which case, we just keep trying later. However, we
typically do this very early, so either there is sufficient
@@ -3446,37 +3450,15 @@ __libc_free (void *mem)
if (__glibc_unlikely (mtag_enabled))
*(volatile char *)mem;
- int err = errno;
-
p = mem2chunk (mem);
- if (chunk_is_mmapped (p)) /* release mmapped memory. */
- {
- /* See if the dynamic brk/mmap threshold needs adjusting.
- Dumped fake mmapped chunks do not affect the threshold. */
- if (!mp_.no_dyn_threshold
- && chunksize_nomask (p) > mp_.mmap_threshold
- && chunksize_nomask (p) <= DEFAULT_MMAP_THRESHOLD_MAX)
- {
- mp_.mmap_threshold = chunksize (p);
- mp_.trim_threshold = 2 * mp_.mmap_threshold;
- LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
- mp_.mmap_threshold, mp_.trim_threshold);
- }
- munmap_chunk (p);
- }
- else
- {
- MAYBE_INIT_TCACHE ();
-
- /* Mark the chunk as belonging to the library again. */
- (void)tag_region (chunk2mem (p), memsize (p));
+ MAYBE_INIT_TCACHE ();
- ar_ptr = arena_for_chunk (p);
- _int_free (ar_ptr, p, 0);
- }
+ /* Mark the chunk as belonging to the library again. */
+ (void)tag_region (chunk2mem (p), memsize (p));
- __set_errno (err);
+ ar_ptr = arena_for_chunk (p);
+ _int_free (ar_ptr, p, 0);
}
libc_hidden_def (__libc_free)
@@ -4665,6 +4647,9 @@ _int_free_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size, int have_lock)
else if (!chunk_is_mmapped(p)) {
+ /* Preserve errno in case block merging results in munmap. */
+ int err = errno;
+
/* If we're single-threaded, don't lock the arena. */
if (SINGLE_THREAD_P)
have_lock = true;
@@ -4676,13 +4661,33 @@ _int_free_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size, int have_lock)
if (!have_lock)
__libc_lock_unlock (av->mutex);
+
+ __set_errno (err);
}
/*
If the chunk was allocated via mmap, release via munmap().
*/
else {
+
+ /* Preserve errno in case munmap sets it. */
+ int err = errno;
+
+ /* See if the dynamic brk/mmap threshold needs adjusting.
+ Dumped fake mmapped chunks do not affect the threshold. */
+ if (!mp_.no_dyn_threshold
+ && chunksize_nomask (p) > mp_.mmap_threshold
+ && chunksize_nomask (p) <= DEFAULT_MMAP_THRESHOLD_MAX)
+ {
+ mp_.mmap_threshold = chunksize (p);
+ mp_.trim_threshold = 2 * mp_.mmap_threshold;
+ LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
+ mp_.mmap_threshold, mp_.trim_threshold);
+ }
+
munmap_chunk (p);
+
+ __set_errno (err);
}
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] malloc: Move mmap code out of __libc_free hotpath
2025-03-24 19:29 [PATCH] malloc: Move mmap code out of __libc_free hotpath Wilco Dijkstra
@ 2025-03-24 20:41 ` Florian Weimer
2025-03-24 21:15 ` Wilco Dijkstra
2025-03-24 22:16 ` DJ Delorie
1 sibling, 1 reply; 9+ messages in thread
From: Florian Weimer @ 2025-03-24 20:41 UTC (permalink / raw)
To: Wilco Dijkstra; +Cc: 'GNU C Library'
* Wilco Dijkstra:
> Currently __libc_free checks for a freed mmap chunk in the fast path.
> Also errno is always saved and restored to preserve it. Since mmap chunks
> are larger than the largest tcache chunk, it is safe to delay this and
> handle tcache, smallbin and medium bin blocks first. Move saving of errno
> to cases that actually need it.
I think the “Since mmap chunks are larger than the largest tcache
chunk” part is not correct. We can fall back to performing mmap for a
smallish allocation. But I think it's still fine to put such chunks
into tcache. Maybe add a comment to this effect?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] malloc: Move mmap code out of __libc_free hotpath
2025-03-24 20:41 ` Florian Weimer
@ 2025-03-24 21:15 ` Wilco Dijkstra
2025-03-25 9:11 ` Florian Weimer
0 siblings, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2025-03-24 21:15 UTC (permalink / raw)
To: Florian Weimer; +Cc: 'GNU C Library'
Hi Florian,
>> Currently __libc_free checks for a freed mmap chunk in the fast path.
>> Also errno is always saved and restored to preserve it. Since mmap chunks
>> are larger than the largest tcache chunk, it is safe to delay this and
>> handle tcache, smallbin and medium bin blocks first. Move saving of errno
>> to cases that actually need it.
>
> I think the “Since mmap chunks are larger than the largest tcache
> chunk” part is not correct. We can fall back to performing mmap for a
> smallish allocation. But I think it's still fine to put such chunks
> into tcache. Maybe add a comment to this effect?
Well assuming minimum page size is 4KB it should not currently be possible
for the smallest mmap chunk to get into tcache. However the available size
of an mmap chunk is different, so it would go wrong unless we make the
chunk layouts compatible (that would be a simplification that removes the
special cases in memsize and musable).
Cheers,
Wilco
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] malloc: Move mmap code out of __libc_free hotpath
2025-03-24 19:29 [PATCH] malloc: Move mmap code out of __libc_free hotpath Wilco Dijkstra
2025-03-24 20:41 ` Florian Weimer
@ 2025-03-24 22:16 ` DJ Delorie
2025-03-25 1:17 ` Wilco Dijkstra
2025-03-26 16:53 ` [PATCH v2] " Wilco Dijkstra
1 sibling, 2 replies; 9+ messages in thread
From: DJ Delorie @ 2025-03-24 22:16 UTC (permalink / raw)
To: Wilco Dijkstra; +Cc: libc-alpha
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Currently __libc_free checks for a freed mmap chunk in the fast path.
> Also errno is always saved and restored to preserve it. Since mmap chunks
> are larger than the largest tcache chunk, it is safe to delay this and
> handle tcache, smallbin and medium bin blocks first. Move saving of errno
> to cases that actually need it.
> + /* Preserve errno when called from free() - _int_malloc may corrupt it. */
> + int err = errno;
We're in tcache_init so the performance hit here is negligible.
>
> -
> if (ar_ptr != NULL)
> __libc_lock_unlock (ar_ptr->mutex);
>
> + __set_errno (err);
> +
Ok.
> @@ -3446,37 +3450,15 @@ __libc_free (void *mem)
> if (__glibc_unlikely (mtag_enabled))
> *(volatile char *)mem;
>
> - int err = errno;
> -
> p = mem2chunk (mem);
>
> - if (chunk_is_mmapped (p)) /* release mmapped memory. */
> - {
> - /* See if the dynamic brk/mmap threshold needs adjusting.
> - Dumped fake mmapped chunks do not affect the threshold. */
> - if (!mp_.no_dyn_threshold
> - && chunksize_nomask (p) > mp_.mmap_threshold
> - && chunksize_nomask (p) <= DEFAULT_MMAP_THRESHOLD_MAX)
> - {
... the rest of this looks ok, but it leaves _libc_free and _int_free as
being very small functions. Do we need them? Or are we relying on the
inlining to make them efficient?
LGTM
Reviewed-by: DJ Delorie <dj@redhat.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] malloc: Move mmap code out of __libc_free hotpath
2025-03-24 22:16 ` DJ Delorie
@ 2025-03-25 1:17 ` Wilco Dijkstra
2025-03-25 3:59 ` DJ Delorie
2025-03-26 16:53 ` [PATCH v2] " Wilco Dijkstra
1 sibling, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2025-03-25 1:17 UTC (permalink / raw)
To: DJ Delorie; +Cc: libc-alpha
Hi DJ,
> ... the rest of this looks ok, but it leaves _libc_free and _int_free as
> being very small functions. Do we need them? Or are we relying on the
> inlining to make them efficient?
No we don't need them - the next step is move _int_free-* into __libc_free,
simplify the code and use a similar tailcall approach as __libc_malloc. This
gives similar speedups as malloc!
Cheers,
Wilco
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] malloc: Move mmap code out of __libc_free hotpath
2025-03-25 1:17 ` Wilco Dijkstra
@ 2025-03-25 3:59 ` DJ Delorie
2025-03-26 17:06 ` Wilco Dijkstra
0 siblings, 1 reply; 9+ messages in thread
From: DJ Delorie @ 2025-03-25 3:59 UTC (permalink / raw)
To: Wilco Dijkstra; +Cc: libc-alpha
Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
>> ... the rest of this looks ok, but it leaves _libc_free and _int_free as
>> being very small functions. Do we need them? Or are we relying on the
>> inlining to make them efficient?
>
> No we don't need them - the next step is move _int_free-* into __libc_free,
> simplify the code and use a similar tailcall approach as __libc_malloc. This
> gives similar speedups as malloc!
Does this imply that the tcache stuff will only live in __libc_free and
nothing else will populate tcache from internally?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] malloc: Move mmap code out of __libc_free hotpath
2025-03-24 21:15 ` Wilco Dijkstra
@ 2025-03-25 9:11 ` Florian Weimer
0 siblings, 0 replies; 9+ messages in thread
From: Florian Weimer @ 2025-03-25 9:11 UTC (permalink / raw)
To: Wilco Dijkstra; +Cc: 'GNU C Library'
* Wilco Dijkstra:
> Hi Florian,
>
>>> Currently __libc_free checks for a freed mmap chunk in the fast path.
>>> Also errno is always saved and restored to preserve it. Since mmap chunks
>>> are larger than the largest tcache chunk, it is safe to delay this and
>>> handle tcache, smallbin and medium bin blocks first. Move saving of errno
>>> to cases that actually need it.
>>
>> I think the “Since mmap chunks are larger than the largest tcache
>> chunk” part is not correct. We can fall back to performing mmap for a
>> smallish allocation. But I think it's still fine to put such chunks
>> into tcache. Maybe add a comment to this effect?
>
> Well assuming minimum page size is 4KB it should not currently be possible
> for the smallest mmap chunk to get into tcache.
I think 2K alignment will produce a 2K chunk. Still outside the range
of tcache, though. I thought it reached further.
> However the available size of an mmap chunk is different, so it would
> go wrong unless we make the chunk layouts compatible (that would be a
> simplification that removes the special cases in memsize and musable).
Oh, I had forgotten about that.
In this case, we probably want some sort of assert that this doesn't go
wrong if someone changes TCACHE_MAX_BINS. And the comment needs
updating that the number is no longer arbitrary.
Thanks,
Florian
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2] malloc: Move mmap code out of __libc_free hotpath
2025-03-24 22:16 ` DJ Delorie
2025-03-25 1:17 ` Wilco Dijkstra
@ 2025-03-26 16:53 ` Wilco Dijkstra
1 sibling, 0 replies; 9+ messages in thread
From: Wilco Dijkstra @ 2025-03-26 16:53 UTC (permalink / raw)
To: DJ Delorie; +Cc: libc-alpha
v2: Fix x86 failures with mmap chunks, also ensure mmap cannot be in tcache
Currently __libc_free checks for a freed mmap chunk in the fast path.
Also errno is always saved and restored to preserve it. Since mmap chunks
are larger than the largest tcache chunk, it is safe to delay this and
handle tcache, smallbin and medium bin blocks first. Move saving of errno
to cases that actually need it. Remove a safety check that fails on mmap
chunks and a check that the smallest mmap chunk is too large for tcache.
Performance of bench-malloc-thread improves by 9.2% for 1 thread and
6.9% for 32 threads on Neoverse V2.
Passes regress, OK for commit?
---
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 7e4c1399385051b1989cbc0ac14266d2138695af..ddf22ffe7cc10c11035a1b31c80589cfaef0a06d 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3313,6 +3313,14 @@ tcache_init(void)
if (tcache_shutting_down)
return;
+ /* Check minimum mmap chunk is larger than max tcache size. This means
+ mmap chunks with their different layout are never added to tcache. */
+ if (MAX_TCACHE_SIZE >= GLRO (dl_pagesize) / 2)
+ malloc_printerr ("max tcache size too large");
+
+ /* Preserve errno when called from free() - _int_malloc may corrupt it. */
+ int err = errno;
+
arena_get (ar_ptr, bytes);
victim = _int_malloc (ar_ptr, bytes);
if (!victim && ar_ptr != NULL)
@@ -3321,10 +3329,11 @@ tcache_init(void)
victim = _int_malloc (ar_ptr, bytes);
}
-
if (ar_ptr != NULL)
__libc_lock_unlock (ar_ptr->mutex);
+ __set_errno (err);
+
/* In a low memory situation, we may not be able to allocate memory
- in which case, we just keep trying later. However, we
typically do this very early, so either there is sufficient
@@ -3446,37 +3455,15 @@ __libc_free (void *mem)
if (__glibc_unlikely (mtag_enabled))
*(volatile char *)mem;
- int err = errno;
-
p = mem2chunk (mem);
- if (chunk_is_mmapped (p)) /* release mmapped memory. */
- {
- /* See if the dynamic brk/mmap threshold needs adjusting.
- Dumped fake mmapped chunks do not affect the threshold. */
- if (!mp_.no_dyn_threshold
- && chunksize_nomask (p) > mp_.mmap_threshold
- && chunksize_nomask (p) <= DEFAULT_MMAP_THRESHOLD_MAX)
- {
- mp_.mmap_threshold = chunksize (p);
- mp_.trim_threshold = 2 * mp_.mmap_threshold;
- LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
- mp_.mmap_threshold, mp_.trim_threshold);
- }
- munmap_chunk (p);
- }
- else
- {
- MAYBE_INIT_TCACHE ();
-
- /* Mark the chunk as belonging to the library again. */
- (void)tag_region (chunk2mem (p), memsize (p));
+ MAYBE_INIT_TCACHE ();
- ar_ptr = arena_for_chunk (p);
- _int_free (ar_ptr, p, 0);
- }
+ /* Mark the chunk as belonging to the library again. */
+ (void)tag_region (chunk2mem (p), memsize (p));
- __set_errno (err);
+ ar_ptr = arena_for_chunk (p);
+ _int_free (ar_ptr, p, 0);
}
libc_hidden_def (__libc_free)
@@ -4563,9 +4550,8 @@ _int_free_check (mstate av, mchunkptr p, INTERNAL_SIZE_T size)
if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0)
|| __builtin_expect (misaligned_chunk (p), 0))
malloc_printerr ("free(): invalid pointer");
- /* We know that each chunk is at least MINSIZE bytes in size or a
- multiple of MALLOC_ALIGNMENT. */
- if (__glibc_unlikely (size < MINSIZE || !aligned_OK (size)))
+ /* We know that each chunk is at least MINSIZE bytes. */
+ if (__glibc_unlikely (size < MINSIZE))
malloc_printerr ("free(): invalid size");
check_inuse_chunk (av, p);
@@ -4662,6 +4648,9 @@ _int_free_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size, int have_lock)
else if (!chunk_is_mmapped(p)) {
+ /* Preserve errno in case block merging results in munmap. */
+ int err = errno;
+
/* If we're single-threaded, don't lock the arena. */
if (SINGLE_THREAD_P)
have_lock = true;
@@ -4673,13 +4662,33 @@ _int_free_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size, int have_lock)
if (!have_lock)
__libc_lock_unlock (av->mutex);
+
+ __set_errno (err);
}
/*
If the chunk was allocated via mmap, release via munmap().
*/
else {
+
+ /* Preserve errno in case munmap sets it. */
+ int err = errno;
+
+ /* See if the dynamic brk/mmap threshold needs adjusting.
+ Dumped fake mmapped chunks do not affect the threshold. */
+ if (!mp_.no_dyn_threshold
+ && chunksize_nomask (p) > mp_.mmap_threshold
+ && chunksize_nomask (p) <= DEFAULT_MMAP_THRESHOLD_MAX)
+ {
+ mp_.mmap_threshold = chunksize (p);
+ mp_.trim_threshold = 2 * mp_.mmap_threshold;
+ LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
+ mp_.mmap_threshold, mp_.trim_threshold);
+ }
+
munmap_chunk (p);
+
+ __set_errno (err);
}
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] malloc: Move mmap code out of __libc_free hotpath
2025-03-25 3:59 ` DJ Delorie
@ 2025-03-26 17:06 ` Wilco Dijkstra
0 siblings, 0 replies; 9+ messages in thread
From: Wilco Dijkstra @ 2025-03-26 17:06 UTC (permalink / raw)
To: DJ Delorie; +Cc: libc-alpha
Hi DJ,
> Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
>>> ... the rest of this looks ok, but it leaves _libc_free and _int_free as
>>> being very small functions. Do we need them? Or are we relying on the
>>> inlining to make them efficient?
>>
>> No we don't need them - the next step is move _int_free-* into __libc_free,
>> simplify the code and use a similar tailcall approach as __libc_malloc. This
>> gives similar speedups as malloc!
>
> Does this imply that the tcache stuff will only live in __libc_free and
> nothing else will populate tcache from internally?
Yes, for free there doesn't seem to be a need for another interface that frees into
tcache besides __libc_free. On current trunk the _int_free functions are used once.
So basically there is no need to use _int_free, either you want tcache and thus use
__libc_free, or if you don't, use _int_free_chunk. Note tcache_thread_shutdown
calls __libc_free eventhough it should be calling _int_free_chunk and then has to
jump through hoops to avoid infinite recursion...
Cheers,
Wilco
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-03-26 17:07 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-24 19:29 [PATCH] malloc: Move mmap code out of __libc_free hotpath Wilco Dijkstra
2025-03-24 20:41 ` Florian Weimer
2025-03-24 21:15 ` Wilco Dijkstra
2025-03-25 9:11 ` Florian Weimer
2025-03-24 22:16 ` DJ Delorie
2025-03-25 1:17 ` Wilco Dijkstra
2025-03-25 3:59 ` DJ Delorie
2025-03-26 17:06 ` Wilco Dijkstra
2025-03-26 16:53 ` [PATCH v2] " Wilco Dijkstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).