public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "mail at roychan dot org" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug malloc/30945] New: Core affinity setting incurs lock contentions between threads
Date: Fri, 06 Oct 2023 00:24:43 +0000	[thread overview]
Message-ID: <bug-30945-131@http.sourceware.org/bugzilla/> (raw)

https://sourceware.org/bugzilla/show_bug.cgi?id=30945

            Bug ID: 30945
           Summary: Core affinity setting incurs lock contentions between
                    threads
           Product: glibc
           Version: 2.38
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: malloc
          Assignee: unassigned at sourceware dot org
          Reporter: mail at roychan dot org
  Target Milestone: ---

Created attachment 15156
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15156&action=edit
the example program to reproduce the issue

Hi,

I recently encounter poor malloc/free performance when building a
data-intensive application. The deserialization library we used works 10x
slower than expected. Investigations show that this is due to the arena_get2
function uses __get_nprocs_sched instead of __get_nprocs. Without changing core
affinity settings, this call returns the real number of cores so the upper
limit of total arenas is set correctly. However, if a thread is pinned to a
core, further malloc calls only sees n = 1 because the function returns only
schedulable cores. Therefore, the maximum number of arenas will be 8 on 64-bit
platforms.

This leads to arena lock contentions between threads if:

- The program spans multiple cores (say, more than 8 cores).
- Threads are pinned to cores before any malloc calls, so they have not
  attached to any arenas.
- Later memory allocations are served from the arenas.
- No MALLOC_ARENA_MAX tunable is set to manually increase the limit.

A mail thread about this briefly discussed this issue last year:
https://sourceware.org/pipermail/libc-alpha/2022-June/140123.html
However, it did not give a program that can be used to easily reproduce the
(un)expected behaviors. Here I would like to provide a minimal example that can
will expose the problem, and, if possible, initiate further discussions about
whether the core counting in arena_get can be better implemented.

The program accepts 3 arguments. The first one is the number of cores, the
second one is whether the thread is pinned to a core right after its creation,
and the third one is whether we would like to apply a small "fix". The fix is
add a free(malloc(8)) right before we set the affinity in each thread. In this
case, each thread can see all the cores so they can create and attach to a
"local" arena that is not shared. The output is the average time each thread
uses to finish a bunch of malloc/frees.

The following is the result I collected from my PC with 16-core Ryzen 9 5950X,
running Linux kernel 6.5.5 and glibc 2.38. The program is compiled using gcc
13.2.1 without optimizations flags.

    ./a.out 32 false false
    ---
    nr_cpu: 32 pin: no fix: no
    thread average (ms): 16.233663

    ./a.out 32 true false
    ---
    nr_cpu: 32 pin: yes fix: no
    thread average (ms): 1360.919047

    ./a.out 32 true true
    ---
    nr_cpu: 32 pin: yes fix: yes
    thread average (ms): 15.505453

    env GLIBC_TUNABLES='glibc.malloc.arena_max=32' ./a.out 32 true false
    ---
    nr_cpu: 32 pin: yes fix: no
    thread average (ms): 16.036667

Also recorded a few runs with perf. It suggested massive overheads in
__lll_lock_wait_private and __lll_lock_wake_private calls.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

             reply	other threads:[~2023-10-06  0:24 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-06  0:24 mail at roychan dot org [this message]
2023-10-06  0:27 ` [Bug malloc/30945] " mail at roychan dot org
2023-10-11 16:11 ` adhemerval.zanella at linaro dot org
2023-10-12  2:32 ` sam at gentoo dot org
2023-11-22 14:19 ` adhemerval.zanella at linaro dot org
2024-01-11  9:41 ` fweimer at redhat dot com
2024-02-11 22:04 ` kuganv at gmail dot com
2024-02-12 10:14 ` sam at gentoo dot org
2024-02-12 13:24 ` adhemerval.zanella at linaro dot org
2024-02-12 21:49 ` kuganv at gmail dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-30945-131@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=glibc-bugs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).