public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug malloc/30945] New: Core affinity setting incurs lock contentions between threads
@ 2023-10-06  0:24 mail at roychan dot org
  2023-10-06  0:27 ` [Bug malloc/30945] " mail at roychan dot org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: mail at roychan dot org @ 2023-10-06  0:24 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30945

            Bug ID: 30945
           Summary: Core affinity setting incurs lock contentions between
                    threads
           Product: glibc
           Version: 2.38
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: malloc
          Assignee: unassigned at sourceware dot org
          Reporter: mail at roychan dot org
  Target Milestone: ---

Created attachment 15156
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15156&action=edit
the example program to reproduce the issue

Hi,

I recently encounter poor malloc/free performance when building a
data-intensive application. The deserialization library we used works 10x
slower than expected. Investigations show that this is due to the arena_get2
function uses __get_nprocs_sched instead of __get_nprocs. Without changing core
affinity settings, this call returns the real number of cores so the upper
limit of total arenas is set correctly. However, if a thread is pinned to a
core, further malloc calls only sees n = 1 because the function returns only
schedulable cores. Therefore, the maximum number of arenas will be 8 on 64-bit
platforms.

This leads to arena lock contentions between threads if:

- The program spans multiple cores (say, more than 8 cores).
- Threads are pinned to cores before any malloc calls, so they have not
  attached to any arenas.
- Later memory allocations are served from the arenas.
- No MALLOC_ARENA_MAX tunable is set to manually increase the limit.

A mail thread about this briefly discussed this issue last year:
https://sourceware.org/pipermail/libc-alpha/2022-June/140123.html
However, it did not give a program that can be used to easily reproduce the
(un)expected behaviors. Here I would like to provide a minimal example that can
will expose the problem, and, if possible, initiate further discussions about
whether the core counting in arena_get can be better implemented.

The program accepts 3 arguments. The first one is the number of cores, the
second one is whether the thread is pinned to a core right after its creation,
and the third one is whether we would like to apply a small "fix". The fix is
add a free(malloc(8)) right before we set the affinity in each thread. In this
case, each thread can see all the cores so they can create and attach to a
"local" arena that is not shared. The output is the average time each thread
uses to finish a bunch of malloc/frees.

The following is the result I collected from my PC with 16-core Ryzen 9 5950X,
running Linux kernel 6.5.5 and glibc 2.38. The program is compiled using gcc
13.2.1 without optimizations flags.

    ./a.out 32 false false
    ---
    nr_cpu: 32 pin: no fix: no
    thread average (ms): 16.233663

    ./a.out 32 true false
    ---
    nr_cpu: 32 pin: yes fix: no
    thread average (ms): 1360.919047

    ./a.out 32 true true
    ---
    nr_cpu: 32 pin: yes fix: yes
    thread average (ms): 15.505453

    env GLIBC_TUNABLES='glibc.malloc.arena_max=32' ./a.out 32 true false
    ---
    nr_cpu: 32 pin: yes fix: no
    thread average (ms): 16.036667

Also recorded a few runs with perf. It suggested massive overheads in
__lll_lock_wait_private and __lll_lock_wake_private calls.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-12 21:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-06  0:24 [Bug malloc/30945] New: Core affinity setting incurs lock contentions between threads mail at roychan dot org
2023-10-06  0:27 ` [Bug malloc/30945] " mail at roychan dot org
2023-10-11 16:11 ` adhemerval.zanella at linaro dot org
2023-10-12  2:32 ` sam at gentoo dot org
2023-11-22 14:19 ` adhemerval.zanella at linaro dot org
2024-01-11  9:41 ` fweimer at redhat dot com
2024-02-11 22:04 ` kuganv at gmail dot com
2024-02-12 10:14 ` sam at gentoo dot org
2024-02-12 13:24 ` adhemerval.zanella at linaro dot org
2024-02-12 21:49 ` kuganv at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).