Malloc NUMA Aware

public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed

* Malloc NUMA Aware
@ 2020-01-03  4:18 Nicholas Krause
  2020-01-04 14:54 ` Carlos O'Donell
  0 siblings, 1 reply; 3+ messages in thread
From: Nicholas Krause @ 2020-01-03  4:18 UTC (permalink / raw)
  To: libc-help

Greetings,
With Ryzen and other microprocessors being more NUMA based in design due to
chiplets e.t.c., I curious as to the support for NUMA aware allocations 
in glibc.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Malloc NUMA Aware
  2020-01-03  4:18 Malloc NUMA Aware Nicholas Krause
@ 2020-01-04 14:54 ` Carlos O'Donell
  2020-01-04 17:11   ` Nicholas Krause
  0 siblings, 1 reply; 3+ messages in thread
From: Carlos O'Donell @ 2020-01-04 14:54 UTC (permalink / raw)
  To: Nicholas Krause; +Cc: libc-help

On Thu, Jan 2, 2020 at 11:18 PM Nicholas Krause <xerofoify@gmail.com> wrote:
> With Ryzen and other microprocessors being more NUMA based in design due to
> chiplets e.t.c., I curious as to the support for NUMA aware allocations
> in glibc.

NUMA awareness means different things to different people.

(1) A general purpose allocator that is optimal on NUMA systems?

(2) A new API that allows finer grained control over the location of
the memory requested within the hierarchy?

In general glibc's malloc binds a thread to a given arena (collection
of heaps) and attempts to avoid moving the thread during its lifetime.
This behaviour should be helpful on NUMA systems. This resistance to
moving the threads should mean that a thread's own allocations are
more likely to come from a local node and perform well (data
locality). In summary we are aware that there could be NUMA problems,
we think about them as we redesign and refine glibc's malloc, but we
haven't see any reported bad performance. Likewise there are issues to
consider with transparent huge pages (THP).

When using a producer consumer model with threads on different nodes
this becomes problematic. No matter which node you allocate from, if
your producer and consumer threads are on different nodes, then this
will cause performance degradation. Therefore it's often better to pin
such threads on a single node given knowledge of the system hierarchy.

There is a lot of discussion about this around the use of OpenMP on
large >1024 CPU systems. There is also a discussion about a NUMA-aware
allocator for OpenMP as part of the standard.

The only NUMA-aware allocator that I know about with new API's is
being developed by Intel as memkind
(https://github.com/memkind/memkind) but this is completely different
from the current glibc malloc which is a general-purpose system
allocator.

Does that answer your questions?

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Malloc NUMA Aware
  2020-01-04 14:54 ` Carlos O'Donell
@ 2020-01-04 17:11   ` Nicholas Krause
  0 siblings, 0 replies; 3+ messages in thread
From: Nicholas Krause @ 2020-01-04 17:11 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-help



On 1/4/20 9:54 AM, Carlos O'Donell wrote:
> On Thu, Jan 2, 2020 at 11:18 PM Nicholas Krause <xerofoify@gmail.com> wrote:
>> With Ryzen and other microprocessors being more NUMA based in design due to
>> chiplets e.t.c., I curious as to the support for NUMA aware allocations
>> in glibc.
> NUMA awareness means different things to different people.
>
> (1) A general purpose allocator that is optimal on NUMA systems?
>
> (2) A new API that allows finer grained control over the location of
> the memory requested within the hierarchy?
>
> In general glibc's malloc binds a thread to a given arena (collection
> of heaps) and attempts to avoid moving the thread during its lifetime.
> This behaviour should be helpful on NUMA systems. This resistance to
> moving the threads should mean that a thread's own allocations are
> more likely to come from a local node and perform well (data
> locality). In summary we are aware that there could be NUMA problems,
> we think about them as we redesign and refine glibc's malloc, but we
> haven't see any reported bad performance. Likewise there are issues to
> consider with transparent huge pages (THP).
>
> When using a producer consumer model with threads on different nodes
> this becomes problematic. No matter which node you allocate from, if
> your producer and consumer threads are on different nodes, then this
> will cause performance degradation. Therefore it's often better to pin
> such threads on a single node given knowledge of the system hierarchy.
>
> There is a lot of discussion about this around the use of OpenMP on
> large >1024 CPU systems. There is also a discussion about a NUMA-aware
> allocator for OpenMP as part of the standard.
>
> The only NUMA-aware allocator that I know about with new API's is
> being developed by Intel as memkind
> (https://github.com/memkind/memkind) but this is completely different
> from the current glibc malloc which is a general-purpose system
> allocator.
>
> Does that answer your questions?
>
> Cheers,
> Carlos.
I wasn't sure how much the areas helped so that does answer my
question.
Nick

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-01-04 17:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-03  4:18 Malloc NUMA Aware Nicholas Krause
2020-01-04 14:54 ` Carlos O'Donell
2020-01-04 17:11   ` Nicholas Krause

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).