From: Nikolay Shustov <nikolay.shustov@gmail.com>
To: "Ben Woodard" <woodard@redhat.com>,
"Paulo César Pereira de Andrade"
<paulo.cesar.pereira.de.andrade@gmail.com>
Cc: libc-alpha@sourceware.org
Subject: Re: GLIBC malloc behavior question
Date: Tue, 7 Feb 2023 13:01:40 -0500 [thread overview]
Message-ID: <1ccd66cd-7d6e-1825-95e8-38b49320737f@gmail.com> (raw)
In-Reply-To: <ceb50abd-a5bf-8ac9-a23f-57dfe60be1f6@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 7254 bytes --]
Got it, thanks.
For now, I am using this tunable merely to hep to surface whatever I
might have missed in terms of long living objects and heap fragmentation.
But I am definitely going to play with it when I am reasonably sure that
it is what does the major impact.
On 2/7/23 12:26, Ben Woodard wrote:
>
> On 2/7/23 08:55, Nikolay Shustov via Libc-alpha wrote:
>>> There is no garbage collector thread or something similar in some
>>> worker thread. But maybe something similar could be done in your
>>> code.
>>
>> No, there is nothing of the kind in the application.
>>
>>> You might experiment with a tradeoff speed vs memory usage. The
>>> minimum memory usage should be achieved with MALLOC_ARENA_MAX=1
>>> see 'man mallopt' for other options.
>>
>> MALLOC_ARENA_MAX=1 made a huge difference.
> I just wanted to point out that it isn't 1 or the default. That was
> most likely a simple test to test a hypothesis about what could be
> going wrong. This is a tunable knob and your application could have a
> sweet spot. For some of the applications that I help support, we have
> empirically found that a good number is slightly lower than the number
> of processors that they system has. e.g. if their are 16 cores giving
> it 12 arenas doesn't impact speed but makes the memory footprint more
> compact.
>> The initial memory allocations went done on the scale of magnitude.
>> In fact, I do not see that much of the application slowdown but this
>> will need more profiling.
>> The stable allocations growth is still ~2Mb/second.
>>
>> I am going to investigate your idea of long living objects
>> contention/memory fragmentation.
>>
>> This sounds very probably, even though I do not see real memory leaks
>> even after all the aux threads died.
>> I have TLS instances in use, maybe those really get in the way.
>>
>> Thanks a lot for your help.
>> If/when I find something new or interesting, I will send an update -
>> hope it will help someone else, too.
>>
>> Regards,
>> - Nikolay
>>
>> On 2/7/23 11:16, Paulo César Pereira de Andrade wrote:
>>> Em ter., 7 de fev. de 2023 às 12:07, Nikolay Shustov via Libc-alpha
>>> <libc-alpha@sourceware.org> escreveu:
>>>> Hi,
>>>> I have a question about the malloc() behavior which I observe.
>>>> The synopsis is that the during the stress load, the application
>>>> aggressively allocates virtual memory without any upper limit.
>>>> Just to note, after application is loaded just with the peak of
>>>> activity
>>>> and goes idle, its virtual memory doesn't scale back (I do not expect
>>>> much of that though - should I?).
>>> There is no garbage collector thread or something similar in some
>>> worker thread. But maybe something similar could be done in your
>>> code.
>>>
>>>> The application is heavily multithreaded; at its peak of its activitiy
>>>> it creates new threads and destroys them at a pace of approx.
>>>> 100/second.
>>>> After the long and tedious investigation I dare to say that there
>>>> are no
>>>> memory leaks involved.
>>>> (Well, there were memory leaks and I first went after those; found and
>>>> fixed - but the result did not change much.)
>>> You might experiment with a tradeoff speed vs memory usage. The
>>> minimum memory usage should be achieved with MALLOC_ARENA_MAX=1
>>> see 'man mallopt' for other options.
>>>
>>>> The application is cross-platform and runs on Windows and some other
>>>> platforms too.
>>>> There is an OS abstraction layer that provides the unified thread and
>>>> memory allocation API for business logic, but the business logic that
>>>> triggers memory allocations is platform-independent.
>>>> There are no progressive memory allocations in OS abstraction layer
>>>> which could be blamed for the memory growth.
>>>>
>>>> The thing is, on Windows, for the same activity there is no such
>>>> application memory growth at all.
>>>> It allocates memory moderately and scales back after peak of activity.
>>>> This makes me think it is not the business logic to be blamed (to the
>>>> extent of that it does not leak memory).
>>>>
>>>> I used valigrind to profile for memory leaks and heap usage.
>>>> Please see massif outputs attached (some callstacks had to be
>>>> trimmed out).
>>>> I am also attaching the memory map for the application (run without
>>>> valgrind); snapshot is taken after all the threads but main were
>>>> destroyed and application is idle.
>>>>
>>>> The pace of the virtual memory growth is not quite linear.
>>> Most likely there are long lived objects doing contention and also
>>> probably memory fragmentation, preventing returning memory to
>>> the system after a free call.
>>>
>>>> From my observation, it allocates a big hunk in the beginning of the
>>>> peak loading, then in some time starts to grow in steps of ~80Mb / 10
>>>> seconds, then after some times starts to steadily grow it at pace of
>>>> ~2Mb/second.
>>>>
>>>> Some stats from the host:
>>>>
>>>> OS: Red Hat Enterprise Linux Server release 7.9 (Maipo)
>>>>
>>>> ldd -version
>>>>
>>>> ldd (GNU libc) 2.17
>>>> Copyright (C) 2012 Free Software Foundation, Inc.
>>>> This is free software; see the source for copying conditions.
>>>> There
>>>> is NO
>>>> warranty; not even for MERCHANTABILITY or FITNESS FOR A
>>>> PARTICULAR
>>>> PURPOSE.
>>>> Written by Roland McGrath and Ulrich Drepper.
>>>>
>>>> uname -a
>>>>
>>>> Linux <skipped> 3.10.0-1160.53.1.el7.x86_64 #1 SMP Thu Dec 16
>>>> 10:19:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>>
>>>> At a peak load, the number of application threads is ~180.
>>>> If application is left running, I did not observe it would hit any
>>>> max
>>>> virtual memory threshold and eventually ends up with hitting ulimit.
>>>>
>>>> My questions are:
>>>>
>>>> - Is this memory growth an expected behavior?
>>> It should eventually stabilize. But it is possible that some
>>> allocation
>>> pattern is causing both, fragmentation and long lived objects
>>> preventing
>>> consolidation of memory chunks.
>>>
>>>> - What can be done to prevent it from happening?
>>> First approach is MALLOC_ARENA_MAX. After that some coding
>>> patterns might help, for example, have large long lived objects
>>> allocated
>>> from the same thread, preferably at startup.
>>> Can also attempt to cache some memory, but note that caching is also
>>> an easy way to get contention. To avoid this, you could use memory from
>>> buffers from mmap.
>>>
>>> Depending on your code, you can also experiment with jemalloc or
>>> tcmalloc. I would suggest tcmalloc, as its main feature is to work
>>> in multithreaded environments:
>>>
>>> https://gperftools.github.io/gperftools/tcmalloc.html
>>>
>>> Glibc newer than 2.17 has a per thread cache, but the issue you
>>> are experimenting is not it being slow, but memory usage. AFAIK
>>> tcmalloc
>>> has a kind of garbage collector, but it should not be much different
>>> than
>>> glibc consolidation logic; it should only run during free, and if
>>> there is
>>> some contention, it might not be able to release memory.
>>>
>>>> Thanks in advance,
>>>> - Nikolay
>>> Thanks!
>>>
>>> Paulo
>>
>
next prev parent reply other threads:[~2023-02-07 18:01 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-07 15:06 Nikolay Shustov
2023-02-07 16:16 ` Paulo César Pereira de Andrade
2023-02-07 16:55 ` Nikolay Shustov
2023-02-07 17:26 ` Ben Woodard
2023-02-07 18:01 ` Nikolay Shustov [this message]
2023-02-07 20:56 ` Nikolay Shustov
2023-02-07 21:38 ` Paulo César Pereira de Andrade
2023-02-07 23:41 ` Nikolay Shustov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1ccd66cd-7d6e-1825-95e8-38b49320737f@gmail.com \
--to=nikolay.shustov@gmail.com \
--cc=libc-alpha@sourceware.org \
--cc=paulo.cesar.pereira.de.andrade@gmail.com \
--cc=woodard@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).