public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] malloc: Improve malloc initialization
@ 2025-03-24 23:12 Wilco Dijkstra
  2025-03-24 23:50 ` DJ Delorie
  2025-03-31 14:21 ` Florian Weimer
  0 siblings, 2 replies; 7+ messages in thread
From: Wilco Dijkstra @ 2025-03-24 23:12 UTC (permalink / raw)
  To: 'GNU C Library'


Move malloc initialization out of hot paths - it is only required after checking
fastbins in _int_malloc or before tcache initialization. Bench-malloc-thread
improves by 1.8% for 1 thread and 1.3% for 32 threads on Neoverse V2.

Passes regress, OK for commit?

---

diff --git a/malloc/malloc.c b/malloc/malloc.c
index b73ddbf554461da34d99258fae87c6ece6d175ba..b8eb4180570d20223109b1f0084f4a01bb97b9d6 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3313,6 +3313,10 @@ tcache_init(void)
   if (tcache_shutting_down)
     return;
 
+  /* Ensure malloc is initialized before tcache.  */
+  if (!__malloc_initialized)
+    ptmalloc_init ();
+
   arena_get (ar_ptr, bytes);
   victim = _int_malloc (ar_ptr, bytes);
   if (!victim && ar_ptr != NULL)
@@ -3389,8 +3393,6 @@ __libc_malloc (size_t bytes)
   _Static_assert (PTRDIFF_MAX <= SIZE_MAX / 2,
                   "PTRDIFF_MAX is not more than half of SIZE_MAX");
 
-  if (!__malloc_initialized)
-    ptmalloc_init ();
 #if USE_TCACHE
   bool err = tcache_try_malloc (bytes, &victim);
 
@@ -3488,9 +3490,6 @@ __libc_realloc (void *oldmem, size_t bytes)
 
   void *newp;             /* chunk to return */
 
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
 #if REALLOC_ZERO_BYTES_FREES
   if (bytes == 0 && oldmem != NULL)
     {
@@ -3619,9 +3618,6 @@ libc_hidden_def (__libc_realloc)
 void *
 __libc_memalign (size_t alignment, size_t bytes)
 {
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
   void *address = RETURN_ADDRESS (0);
   return _mid_memalign (alignment, bytes, address);
 }
@@ -3632,9 +3628,6 @@ void *
 weak_function
 aligned_alloc (size_t alignment, size_t bytes)
 {
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
 /* Similar to memalign, but starting with ISO C17 the standard
    requires an error for alignments that are not supported by the
    implementation.  Valid alignments for the current implementation
@@ -3742,9 +3735,6 @@ _mid_memalign (size_t alignment, size_t bytes, void *address)
 void *
 __libc_valloc (size_t bytes)
 {
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
   void *address = RETURN_ADDRESS (0);
   size_t pagesize = GLRO (dl_pagesize);
   return _mid_memalign (pagesize, bytes, address);
@@ -3753,9 +3743,6 @@ __libc_valloc (size_t bytes)
 void *
 __libc_pvalloc (size_t bytes)
 {
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
   void *address = RETURN_ADDRESS (0);
   size_t pagesize = GLRO (dl_pagesize);
   size_t rounded_bytes;
@@ -3790,9 +3777,6 @@ __libc_calloc (size_t n, size_t elem_size)
 
   sz = bytes;
 
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
 #if USE_TCACHE
   bool err = tcache_try_malloc (bytes, &mem);
 
@@ -3816,7 +3800,7 @@ __libc_calloc (size_t n, size_t elem_size)
   else
     arena_get (av, sz);
 
-  if (av)
+  if (av && top (av) != NULL)
     {
       /* Check if we hand out the top chunk, in which case there may be no
 	 need to clear. */
@@ -4029,6 +4013,10 @@ _int_malloc (mstate av, size_t bytes)
 	}
     }
 
+  /* Ensure malloc is initialized before accessing small/large bins.  */
+  if (!__malloc_initialized)
+    ptmalloc_init ();
+
   /*
      If a small request, check regular bin.  Since these "smallbins"
      hold one size each, no searching within bins is necessary.
@@ -5848,9 +5836,6 @@ __posix_memalign (void **memptr, size_t alignment, size_t size)
 {
   void *mem;
 
-  if (!__malloc_initialized)
-    ptmalloc_init ();
-
   /* Test whether the SIZE argument is valid.  It must be a power of
      two multiple of sizeof (void *).  */
   if (alignment % sizeof (void *) != 0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] malloc: Improve malloc initialization
  2025-03-24 23:12 [PATCH] malloc: Improve malloc initialization Wilco Dijkstra
@ 2025-03-24 23:50 ` DJ Delorie
  2025-03-25  0:30   ` Wilco Dijkstra
  2025-03-31 14:21 ` Florian Weimer
  1 sibling, 1 reply; 7+ messages in thread
From: DJ Delorie @ 2025-03-24 23:50 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: libc-alpha

Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Move malloc initialization out of hot paths - it is only required after checking
> fastbins in _int_malloc or before tcache initialization. Bench-malloc-thread
> improves by 1.8% for 1 thread and 1.3% for 32 threads on Neoverse V2.

ptmalloc_init sets up the tunables needed to properly manage tcache and
fastbins...


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] malloc: Improve malloc initialization
  2025-03-24 23:50 ` DJ Delorie
@ 2025-03-25  0:30   ` Wilco Dijkstra
  2025-03-25  4:02     ` DJ Delorie
  2025-03-25  4:57     ` Florian Weimer
  0 siblings, 2 replies; 7+ messages in thread
From: Wilco Dijkstra @ 2025-03-25  0:30 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha

Hi DJ,

> Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
>> Move malloc initialization out of hot paths - it is only required after checking
>> fastbins in _int_malloc or before tcache initialization. Bench-malloc-thread
>> improves by 1.8% for 1 thread and 1.3% for 32 threads on Neoverse V2.
>
> ptmalloc_init sets up the tunables needed to properly manage tcache and
> fastbins...

And that's fine because one can't enter tcache code when it has not been
initialized yet. The same is true for the fastbins - global_max_fast is zero
intialized, so we can't enter the fastbin case until initialization has been run.

Or would you prefer calling ptmalloc_init explicitly during startup?

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] malloc: Improve malloc initialization
  2025-03-25  0:30   ` Wilco Dijkstra
@ 2025-03-25  4:02     ` DJ Delorie
  2025-03-25  4:57     ` Florian Weimer
  1 sibling, 0 replies; 7+ messages in thread
From: DJ Delorie @ 2025-03-25  4:02 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: libc-alpha

Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
>>> Move malloc initialization out of hot paths - it is only required after checking
>>> fastbins in _int_malloc or before tcache initialization. Bench-malloc-thread
>>> improves by 1.8% for 1 thread and 1.3% for 32 threads on Neoverse V2.
>>
>> ptmalloc_init sets up the tunables needed to properly manage tcache and
>> fastbins...
>
> And that's fine because one can't enter tcache code when it has not been
> initialized yet. The same is true for the fastbins - global_max_fast is zero
> intialized, so we can't enter the fastbin case until initialization has been run.
>
> Or would you prefer calling ptmalloc_init explicitly during startup?

I think I brought that up in the distant past but don't recall the
reason.  My concern is that ptmalloc_init does a *lot* and malloc is
complicated; if we're going to defer calling it we need to be really
sure that anything we do access is in some stable state without
ptmalloc_init being called.  I mean, do we even know if there will be
any arenas to have an uninitialized fastbin in?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] malloc: Improve malloc initialization
  2025-03-25  0:30   ` Wilco Dijkstra
  2025-03-25  4:02     ` DJ Delorie
@ 2025-03-25  4:57     ` Florian Weimer
       [not found]       ` <PAWPR08MB89825A7349835D46FAEB687783A12@PAWPR08MB8982.eurprd08.prod.outlook.com>
  1 sibling, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2025-03-25  4:57 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: DJ Delorie, libc-alpha

* Wilco Dijkstra:

> Or would you prefer calling ptmalloc_init explicitly during startup?

That means the initialization could would run even if malloc is
interposed.  I wonder if that has adverse effects.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] malloc: Improve malloc initialization
       [not found]       ` <PAWPR08MB89825A7349835D46FAEB687783A12@PAWPR08MB8982.eurprd08.prod.outlook.com>
@ 2025-03-27 14:29         ` Wilco Dijkstra
  0 siblings, 0 replies; 7+ messages in thread
From: Wilco Dijkstra @ 2025-03-27 14:29 UTC (permalink / raw)
  To: Florian Weimer; +Cc: DJ Delorie, libc-alpha, Florian Weimer

Hi Florian,

>> Or would you prefer calling ptmalloc_init explicitly during startup?
>
> That means the initialization could would run even if malloc is
> interposed.  I wonder if that has adverse effects.

It would read some tunables and write a lots of global variables which are
never used afterwards. It's basically wasting cycles, but it's harmless.

We could avoid that by using function that would be overridden. Unfortunately
mallopt() is not supported by either Mimalloc or jemalloc, but we could call
malloc_usable_size (NULL) or posix_memalign (NULL, 0, 0) which could do
initialization without having any sideeffects in external allocators.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] malloc: Improve malloc initialization
  2025-03-24 23:12 [PATCH] malloc: Improve malloc initialization Wilco Dijkstra
  2025-03-24 23:50 ` DJ Delorie
@ 2025-03-31 14:21 ` Florian Weimer
  1 sibling, 0 replies; 7+ messages in thread
From: Florian Weimer @ 2025-03-31 14:21 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: 'GNU C Library'

* Wilco Dijkstra:

> Move malloc initialization out of hot paths - it is only required after checking
> fastbins in _int_malloc or before tcache initialization. Bench-malloc-thread
> improves by 1.8% for 1 thread and 1.3% for 32 threads on Neoverse V2.
>
> Passes regress, OK for commit?
>
> ---
>
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index b73ddbf554461da34d99258fae87c6ece6d175ba..b8eb4180570d20223109b1f0084f4a01bb97b9d6 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -3313,6 +3313,10 @@ tcache_init(void)
>    if (tcache_shutting_down)
>      return;
>  
> +  /* Ensure malloc is initialized before tcache.  */
> +  if (!__malloc_initialized)
> +    ptmalloc_init ();
> +
>    arena_get (ar_ptr, bytes);
>    victim = _int_malloc (ar_ptr, bytes);
>    if (!victim && ar_ptr != NULL)
> @@ -3389,8 +3393,6 @@ __libc_malloc (size_t bytes)
>    _Static_assert (PTRDIFF_MAX <= SIZE_MAX / 2,
>                    "PTRDIFF_MAX is not more than half of SIZE_MAX");
>  
> -  if (!__malloc_initialized)
> -    ptmalloc_init ();

I think we should document the invariant that tache_init is called as
part of pthread_create, before the new thread starts running.  This way,
ptmalloc_init does not need to be thread-safe.  I think it's still the
case after this patch becaue MAYBE_INIT_TCACHE is called unconditionally
early and does not depend on the allocation size.  (Not sure why this is
a macro and not a function.)  I'd suggest comments on ptmalloc_init and
MAYBE_INIT_TCACHE that reference each other, and one of them should
describe the expected sequence of events.  Maybe also put a comment on
MAYBE_INIT_TCACHE in __libc_malloc2?

Regarding doing this initialization early and unconditionally: This
could well work today, especially if we do it early via
__libc_early_init for dynamically linked builds (so that malloc is
available in PREINIT and LD_PRELOAD objects).

For static builds, additional work will be required, or we could keep
the current scheme for that case.  (I think in general, malloc can be
called before ELF constructors have run in the static case.)

If we ever start allocating during initialization (I expect us to
reserve address space for all possible struct malloc_state objects, for
example) and we want to avoid that for interposed mallocs, then I think
a lazy scheme is the only feasible way.  We can't call into the
interposed malloc from __libc_early_init because its ELF constructor has
not run yet.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-03-31 14:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-24 23:12 [PATCH] malloc: Improve malloc initialization Wilco Dijkstra
2025-03-24 23:50 ` DJ Delorie
2025-03-25  0:30   ` Wilco Dijkstra
2025-03-25  4:02     ` DJ Delorie
2025-03-25  4:57     ` Florian Weimer
     [not found]       ` <PAWPR08MB89825A7349835D46FAEB687783A12@PAWPR08MB8982.eurprd08.prod.outlook.com>
2025-03-27 14:29         ` Wilco Dijkstra
2025-03-31 14:21 ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).