From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=DNiB=6D=gmail.com=paulo.cesar.pereira.de.andrade@sourceware.org>
Received: from mail-lj1-x22c.google.com (mail-lj1-x22c.google.com [IPv6:2a00:1450:4864:20::22c])
	by sourceware.org (Postfix) with ESMTPS id B0D183858D33
	for <libc-alpha@sourceware.org>; Tue,  7 Feb 2023 16:16:24 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B0D183858D33
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-lj1-x22c.google.com with SMTP id o5so16156298ljj.1
        for <libc-alpha@sourceware.org>; Tue, 07 Feb 2023 08:16:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=QsvNa65lujVppq6l5mc4bssjub+LkUBUbiajN5n96i8=;
        b=OwEV2iB5vGm1TvhbRneGmECbs8zcvdmkm+Rv0ouKkzBdMl2+QBOysXZo5uVtf04gsq
         jrkbid8fPHI/pnUitrSgKN2Lq1X4FyY/oU4XPDX/TDt4yQrMAxCwJhUZrco4hnpV63U+
         7DAjr7fPWjBk7dwAQQl3MDDsSjcbQyyDxGMgj3J+q3I+imgGJ2QrmZFovMoyQPMcRB4x
         ub6jU2OKla8D6VvrpuHP+p9ujJbGcSwdHcqBweB6qDD9wigheB74DVh1pQPrc8ty7OdU
         f4gGM6e502eH4v8DpoSNmUv3e6sSQedvhJhc06Bw5R7eRqYTxw0SNsIIVhqLMVDW/cw8
         ycBA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=QsvNa65lujVppq6l5mc4bssjub+LkUBUbiajN5n96i8=;
        b=bcAsarFgtbQXaPeacZvPFoIf4nSJoSAwO6ItrPss1o1GykqhqNyBeu8rrt7M+O2wEO
         38YXXyer5MXcPqbt3jPVp/7zTPHJXrQjNeWfuwGFVJw8yUfBFkJz5z8irN9rjKuRda2s
         Lq459NuA6QmoaYUqKmKHd/Z2qMUeByYfu0Vh8A5xxvFvIi4QZu5oe1qVzMgtRboqI+ux
         M0WaOtSXyco/3fwDFu/b9EZz3Gk+h92FlHH/mjUxpbeSSIEYL1wheXKtIhAsnOdrtoV5
         KAUEHGy3fSiX289Ow0HGtpz2/DoS4linJKNwdM/vR2ZEfK86vDP64CEt6GLZuhm7dX2S
         Ep/Q==
X-Gm-Message-State: AO0yUKUh9HQIQk/LSUWFCv0wF2F/v9vpUyQsYrRtn7EsrAzxvJwQopfC
	8oDWQKNQLlo4RA3TUjeu4WzIAxGtW9RO6bZsL+ikxsW4
X-Google-Smtp-Source: AK7set8lAEcpG6Rmu/lAXZsYyIxJVMn2FtXRDkRCpagidUbtcZLuo5knIx7X4ZwEfBAWbAwYw4ME34dw//YIcwU2TI4=
X-Received: by 2002:a05:651c:1309:b0:28b:663a:168d with SMTP id
 u9-20020a05651c130900b0028b663a168dmr611105lja.53.1675786583091; Tue, 07 Feb
 2023 08:16:23 -0800 (PST)
MIME-Version: 1.0
References: <bcbfa7fc-d7c6-7c23-76f4-22c3e2f036a8@gmail.com>
In-Reply-To: <bcbfa7fc-d7c6-7c23-76f4-22c3e2f036a8@gmail.com>
From: =?UTF-8?Q?Paulo_C=C3=A9sar_Pereira_de_Andrade?= <paulo.cesar.pereira.de.andrade@gmail.com>
Date: Tue, 7 Feb 2023 13:16:10 -0300
Message-ID: <CAHAq8pHcCHn-Bkw-Ny96DYaTc2DojD6pHVmhczBMon9PqMhBGA@mail.gmail.com>
Subject: Re: GLIBC malloc behavior question
To: Nikolay.Shustov@gmail.com
Cc: libc-alpha@sourceware.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libc-alpha.sourceware.org>

Em ter., 7 de fev. de 2023 =C3=A0s 12:07, Nikolay Shustov via Libc-alpha
<libc-alpha@sourceware.org> escreveu:
>
> Hi,
> I have a question about the malloc() behavior which I observe.
> The synopsis is that the during the stress load, the application
> aggressively allocates virtual memory without any upper limit.
> Just to note, after application is loaded just with the peak of activity
> and goes idle, its virtual memory doesn't scale back (I do not expect
> much of that though - should I?).

  There is no garbage collector thread or something similar in some
worker thread. But maybe something similar could be done in your
code.

> The application is heavily multithreaded; at its peak of its activitiy
> it creates new threads and destroys them at a pace of approx. 100/second.
> After the long and tedious investigation I dare to say that there are no
> memory leaks involved.
> (Well, there were memory leaks and I first went after those; found and
> fixed - but the result did not change much.)

  You might experiment with a tradeoff speed vs memory usage. The
minimum memory usage should be achieved with MALLOC_ARENA_MAX=3D1
see 'man mallopt' for other options.

> The application is cross-platform and runs on Windows and some other
> platforms too.
> There is an OS abstraction layer that provides the unified thread and
> memory allocation API for business logic, but the business logic that
> triggers memory allocations is platform-independent.
> There are no progressive memory allocations in OS abstraction layer
> which could be blamed for the memory growth.
>
> The thing is, on Windows, for the same activity there is no such
> application memory growth at all.
> It allocates memory moderately and scales back after peak of activity.
> This makes me think it is not the business logic to be blamed (to the
> extent of that it does not leak memory).
>
> I used valigrind to profile for memory leaks and heap usage.
> Please see massif outputs attached (some callstacks had to be trimmed out=
).
> I am also attaching the memory map for the application (run without
> valgrind); snapshot is taken after all the threads but main were
> destroyed and application is idle.
>
> The pace of the virtual memory growth is not quite linear.

  Most likely there are long lived objects doing contention and also
probably memory fragmentation, preventing returning memory to
the system after a free call.

>  From my observation, it allocates a big hunk in the beginning of the
> peak loading, then in some time starts to grow in steps of ~80Mb / 10
> seconds, then after some times starts to steadily grow it at pace of
> ~2Mb/second.
>
> Some stats from the host:
>
>     OS: Red Hat Enterprise Linux Server release 7.9 (Maipo)
>
> ldd -version
>
>     ldd (GNU libc) 2.17
>     Copyright (C) 2012 Free Software Foundation, Inc.
>     This is free software; see the source for copying conditions. There
>     is NO
>     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>     PURPOSE.
>     Written by Roland McGrath and Ulrich Drepper.
>
> uname -a
>
>     Linux <skipped> 3.10.0-1160.53.1.el7.x86_64 #1 SMP Thu Dec 16
>     10:19:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
>
>
> At a peak load, the number of application threads is ~180.
> If application is left running, I did not observe it would hit any  max
> virtual memory threshold and eventually ends up with hitting ulimit.
>
> My questions are:
>
> - Is this memory growth an expected behavior?

  It should eventually stabilize. But it is possible that some allocation
pattern is causing both, fragmentation and long lived objects preventing
consolidation of memory chunks.

> - What can be done to prevent it from happening?

  First approach is MALLOC_ARENA_MAX. After that some coding
patterns might help, for example, have large long lived objects allocated
from the same thread, preferably at startup.
  Can also attempt to cache some memory, but note that caching is also
an easy way to get contention. To avoid this, you could use memory from
buffers from mmap.

  Depending on your code, you can also experiment with jemalloc or
tcmalloc. I would suggest tcmalloc, as its main feature is to work
in multithreaded environments:

https://gperftools.github.io/gperftools/tcmalloc.html

  Glibc newer than 2.17 has a per thread cache, but the issue you
are experimenting is not it being slow, but memory usage. AFAIK tcmalloc
has a kind of garbage collector, but it should not be much different than
glibc consolidation logic; it should only run during free, and if there is
some contention, it might not be able to release memory.

> Thanks in advance,
> - Nikolay

Thanks!

Paulo