public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Ben Woodard <woodard@redhat.com>
To: Nikolay.Shustov@gmail.com,
	"Paulo César Pereira de Andrade"
	<paulo.cesar.pereira.de.andrade@gmail.com>
Cc: libc-alpha@sourceware.org
Subject: Re: GLIBC malloc behavior question
Date: Tue, 7 Feb 2023 09:26:53 -0800	[thread overview]
Message-ID: <ceb50abd-a5bf-8ac9-a23f-57dfe60be1f6@redhat.com> (raw)
In-Reply-To: <c902439a-48e5-7f7e-be47-44c4ce644d65@gmail.com>


On 2/7/23 08:55, Nikolay Shustov via Libc-alpha wrote:
>>  There is no garbage collector thread or something similar in some
>> worker thread. But maybe something similar could be done in your
>> code.
>
> No, there is nothing of the kind in the application.
>
>> You might experiment with a tradeoff speed vs memory usage. The
>> minimum memory usage should be achieved with MALLOC_ARENA_MAX=1
>> see 'man mallopt' for other options.
>
> MALLOC_ARENA_MAX=1 made a huge difference.
I just wanted to point out that it isn't 1 or the default. That was most 
likely a simple test to test a hypothesis about what could be going 
wrong. This is a tunable knob and your application could have a sweet 
spot. For some of the applications that I help support, we have 
empirically found that a good number is slightly lower than the number 
of processors that they system has. e.g. if their are 16 cores giving it 
12 arenas doesn't impact speed but makes the memory footprint more compact.
> The initial memory allocations went done on the scale of magnitude.
> In fact, I do not see that much of the application slowdown but this 
> will need more profiling.
> The stable allocations growth is still ~2Mb/second.
>
> I am going to investigate your idea of long living objects 
> contention/memory fragmentation.
>
> This sounds very probably, even though I do not see real memory leaks 
> even after all the aux threads died.
> I have TLS instances in use, maybe those really get in the way.
>
> Thanks a lot for your help.
> If/when I find something new or interesting, I will send an update - 
> hope it will help someone else, too.
>
> Regards,
> - Nikolay
>
> On 2/7/23 11:16, Paulo César Pereira de Andrade wrote:
>> Em ter., 7 de fev. de 2023 às 12:07, Nikolay Shustov via Libc-alpha
>> <libc-alpha@sourceware.org>  escreveu:
>>> Hi,
>>> I have a question about the malloc() behavior which I observe.
>>> The synopsis is that the during the stress load, the application
>>> aggressively allocates virtual memory without any upper limit.
>>> Just to note, after application is loaded just with the peak of 
>>> activity
>>> and goes idle, its virtual memory doesn't scale back (I do not expect
>>> much of that though - should I?).
>>    There is no garbage collector thread or something similar in some
>> worker thread. But maybe something similar could be done in your
>> code.
>>
>>> The application is heavily multithreaded; at its peak of its activitiy
>>> it creates new threads and destroys them at a pace of approx. 
>>> 100/second.
>>> After the long and tedious investigation I dare to say that there 
>>> are no
>>> memory leaks involved.
>>> (Well, there were memory leaks and I first went after those; found and
>>> fixed - but the result did not change much.)
>>    You might experiment with a tradeoff speed vs memory usage. The
>> minimum memory usage should be achieved with MALLOC_ARENA_MAX=1
>> see 'man mallopt' for other options.
>>
>>> The application is cross-platform and runs on Windows and some other
>>> platforms too.
>>> There is an OS abstraction layer that provides the unified thread and
>>> memory allocation API for business logic, but the business logic that
>>> triggers memory allocations is platform-independent.
>>> There are no progressive memory allocations in OS abstraction layer
>>> which could be blamed for the memory growth.
>>>
>>> The thing is, on Windows, for the same activity there is no such
>>> application memory growth at all.
>>> It allocates memory moderately and scales back after peak of activity.
>>> This makes me think it is not the business logic to be blamed (to the
>>> extent of that it does not leak memory).
>>>
>>> I used valigrind to profile for memory leaks and heap usage.
>>> Please see massif outputs attached (some callstacks had to be 
>>> trimmed out).
>>> I am also attaching the memory map for the application (run without
>>> valgrind); snapshot is taken after all the threads but main were
>>> destroyed and application is idle.
>>>
>>> The pace of the virtual memory growth is not quite linear.
>>    Most likely there are long lived objects doing contention and also
>> probably memory fragmentation, preventing returning memory to
>> the system after a free call.
>>
>>>   From my observation, it allocates a big hunk in the beginning of the
>>> peak loading, then in some time starts to grow in steps of ~80Mb / 10
>>> seconds, then after some times starts to steadily grow it at pace of
>>> ~2Mb/second.
>>>
>>> Some stats from the host:
>>>
>>>      OS: Red Hat Enterprise Linux Server release 7.9 (Maipo)
>>>
>>> ldd -version
>>>
>>>      ldd (GNU libc) 2.17
>>>      Copyright (C) 2012 Free Software Foundation, Inc.
>>>      This is free software; see the source for copying conditions. 
>>> There
>>>      is NO
>>>      warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>>>      PURPOSE.
>>>      Written by Roland McGrath and Ulrich Drepper.
>>>
>>> uname -a
>>>
>>>      Linux <skipped> 3.10.0-1160.53.1.el7.x86_64 #1 SMP Thu Dec 16
>>>      10:19:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>>
>>> At a peak load, the number of application threads is ~180.
>>> If application is left running, I did not observe it would hit any  max
>>> virtual memory threshold and eventually ends up with hitting ulimit.
>>>
>>> My questions are:
>>>
>>> - Is this memory growth an expected behavior?
>>    It should eventually stabilize. But it is possible that some 
>> allocation
>> pattern is causing both, fragmentation and long lived objects preventing
>> consolidation of memory chunks.
>>
>>> - What can be done to prevent it from happening?
>>    First approach is MALLOC_ARENA_MAX. After that some coding
>> patterns might help, for example, have large long lived objects 
>> allocated
>> from the same thread, preferably at startup.
>>    Can also attempt to cache some memory, but note that caching is also
>> an easy way to get contention. To avoid this, you could use memory from
>> buffers from mmap.
>>
>>    Depending on your code, you can also experiment with jemalloc or
>> tcmalloc. I would suggest tcmalloc, as its main feature is to work
>> in multithreaded environments:
>>
>> https://gperftools.github.io/gperftools/tcmalloc.html
>>
>>    Glibc newer than 2.17 has a per thread cache, but the issue you
>> are experimenting is not it being slow, but memory usage. AFAIK tcmalloc
>> has a kind of garbage collector, but it should not be much different 
>> than
>> glibc consolidation logic; it should only run during free, and if 
>> there is
>> some contention, it might not be able to release memory.
>>
>>> Thanks in advance,
>>> - Nikolay
>> Thanks!
>>
>> Paulo
>


  reply	other threads:[~2023-02-07 17:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-07 15:06 Nikolay Shustov
2023-02-07 16:16 ` Paulo César Pereira de Andrade
2023-02-07 16:55   ` Nikolay Shustov
2023-02-07 17:26     ` Ben Woodard [this message]
2023-02-07 18:01       ` Nikolay Shustov
2023-02-07 20:56         ` Nikolay Shustov
2023-02-07 21:38           ` Paulo César Pereira de Andrade
2023-02-07 23:41             ` Nikolay Shustov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ceb50abd-a5bf-8ac9-a23f-57dfe60be1f6@redhat.com \
    --to=woodard@redhat.com \
    --cc=Nikolay.Shustov@gmail.com \
    --cc=libc-alpha@sourceware.org \
    --cc=paulo.cesar.pereira.de.andrade@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).