From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <carlos@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by sourceware.org (Postfix) with ESMTPS id 7025E3858D35
 for <libc-help@sourceware.org>; Thu, 25 Nov 2021 18:20:24 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7025E3858D35
Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com
 [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-461-p4V8V3_fPmunk0lJAIdZeA-1; Thu, 25 Nov 2021 13:20:21 -0500
X-MC-Unique: p4V8V3_fPmunk0lJAIdZeA-1
Received: by mail-qk1-f197.google.com with SMTP id
 bk35-20020a05620a1a2300b0046d2a9b93dfso2666999qkb.16
 for <libc-help@sourceware.org>; Thu, 25 Nov 2021 10:20:21 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:message-id:date:mime-version:user-agent:subject
 :content-language:to:references:from:organization:in-reply-to
 :content-transfer-encoding;
 bh=FWTfNN7uXA6tvVHkbcn9czyYyyqVRwUN4RYWJ18XiSM=;
 b=cQeVMBwU4EWJvsCr+HREyGiEfNlfVPvjK/GZr0EK8T89TgVOoSfTRLOGjzLbDpyOw4
 ry04Xhbe2xiMI1B99Ezs5o9jyCsBr3rsnFgrR0bqgdSjMhgSe5UoGsZWRsWpwmKg/oAO
 QyBMXTqaqFjWrY0CabE3okDqs0+PTW3cFxlhB/1cvLH4SKK92q+3TB4aN7pU8T+krMu3
 9wQCrUGtPpKSeyzbD89cyoeeF1f/Jxz7lwf3xlScCog4M0S6ROiLdv+5/T6ZHGxfZUez
 uFZBwLcj3gW3F9PtDczRYZYgezbhVcNyRf9MGDenlWpu4fvs6Y57uxpRwX6eCwzrBOeJ
 AGcg==
X-Gm-Message-State: AOAM532/StOI7fwyMwrQAQXIBZLcDEC5VLeI2yYsbtUmtpq5kRR/9Qs3
 DWaY5WHHiN4O0ZVT5E4LJmarJo7qC83NSnFnDE5yA3qbgkRyicyqAaJmoIMydA/SpBchkAz/kNL
 24PYAyEHjQy0B5WC4lD8=
X-Received: by 2002:a37:f60a:: with SMTP id y10mr17183233qkj.518.1637864420509; 
 Thu, 25 Nov 2021 10:20:20 -0800 (PST)
X-Google-Smtp-Source: ABdhPJxmMlEdVSs/vM8BpaNWRILjLuY3DJy49veWpACzyw7DVvvKflIYF82p+R9BBCZiOXHPbz1CKQ==
X-Received: by 2002:a37:f60a:: with SMTP id y10mr17183195qkj.518.1637864420183; 
 Thu, 25 Nov 2021 10:20:20 -0800 (PST)
Received: from [192.168.0.241] (135-23-175-80.cpe.pppoe.ca. [135.23.175.80])
 by smtp.gmail.com with ESMTPSA id n19sm2019444qta.78.2021.11.25.10.20.19
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 25 Nov 2021 10:20:19 -0800 (PST)
Message-ID: <5a2c2e65-241a-2b22-bb8d-87c18768145e@redhat.com>
Date: Thu, 25 Nov 2021 13:20:18 -0500
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.1.0
Subject: Re: Excessive memory consumption when using malloc()
To: Christian Hoff <christian_hoff@gmx.net>,
 libc-help <libc-help@sourceware.org>
References: <bb70214a-029a-df1f-983e-87a8d3c05d58@gmx.net>
From: Carlos O'Donell <carlos@redhat.com>
Organization: Red Hat
In-Reply-To: <bb70214a-029a-df1f-983e-87a8d3c05d58@gmx.net>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A,
 RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-help@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-help mailing list <libc-help.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-help>,
 <mailto:libc-help-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-help/>
List-Help: <mailto:libc-help-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-help>,
 <mailto:libc-help-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Nov 2021 18:20:26 -0000

On 11/25/21 12:20, Christian Hoff via Libc-help wrote:
> Hello all,
>
> we are facing the a problem with the memory allocator in glibc 2.17 on
> RHEL 7.9. Or application allocates about 10 GB of memory (split into
> chunks that are each around 512 KB large). This memory is used for some
> computations and released afterwards. After a while, the application is
> running the same computations again, but this time in different threads.
> The first issue we are seeing is that - after the computations are done
> - the 10 GB of memory is not released back to the operating system. Only
> after calling malloc_trim() manually with GDB, the size of the process
> shrinks dramatically from ~10GB to 400 MB. So, at this point, the unused
> memory from the computations is finally returned to the OS.

How many cpus does the system have?

How many threads do you create?

Is this 10GiB of RSS or VSS?

For very large systems glibc malloc will create up to 8 arenas per CPU.

Each arena starts with a default 64MiB VMA reservation.

On a 128 core system this appears as a ~65GiB VSS reservation.
 
> Our wish would be that the memory is returned to the OS without us
> having to call malloc_trim(). And I understand that glibc also trims the
> heap when there is sufficient free space in top of it (the
> M_TRIM_THRESHOLD in mallopt() controls when this should happen). What
> could be the reason why this is not working in our case? Could it be
> related to heap fragmentation? But assuming that is the reason, why is
> malloc_trim() nevertheless able to free this memory?

The normal trimming strategy is to trim from the top of the heap down.

Chunks at the top of the heap are coalesced and eventually when the chunk is big enough
the heap is freed down.

This coalescing and freeing is prevented it there are in-use chunks in the heap.

Consider this scenario:
- Make many large allocations that have a short lifetime.
- Make one small allocation that has a very long lifetime.
- Free all the large allocations.

The heap cannot be freed downwards because of the small long liftetime allocation.

The call to malloc_trim() walks the heap chunks and frees page-sized chunks or
larger without the requirement that they come from the top of the heap.

In glibc's allocator, mixing lifetimes for allocations will cause heap growth.

I have an important question to ask now:

Do you use aligned allocations?

We have right now an outstanding defect where aligned allocations create small
residual free chunks, and when free'd back and allocated again as an aligned
chunk, we are forced to split chunks again, which can lead to ratcheting effects
with certain aligned allocations.

We had a prototype patch for this in Fedora in 2019:
https://lists.fedoraproject.org/archives/list/glibc@lists.fedoraproject.org/thread/2PCHP5UWONIOAEUG34YBAQQYD7JL5JJ4/
 
> And then we also have one other problem. The first run of the
> computations is always fine: we allocate 10 GB of memory and the
> application grows to 10 GB. Afterwards, we release those 10 GB of memory
> since the computations are now done and at this point the freed memory
> is returned back to the allocator (however, the size of the process
> remains 10 GB unless we call malloc_trim()). But if we now re-run the
> same computations again a second time (this time using different
> threads), a problem occurs. In this case, the size of the application
> grows well beyond 10 GB. It can get 20 GB or larger and the process is
> eventually killed because the system runs out of memory.

You need to determine what is going on under the hood here.

You may want to just use malloc_info() to get a routine dump of the heap state.

This will give us a starting point to see what is growing.

We have a malloc allocation tracer that you can use to capture a workload and
share a snapshot of the workload with upstream:
https://pagure.io/glibc-malloc-trace-utils

Sharing the workload might be hard because this is a full API trace and it gets
difficult to share.

> Do you have any idea why this happens? To me it seems like the threads
> are assigned to different arenas and therefore the previously freed 10
> GB of memory can not be re-used as they are in different arenas. Is that
> possible?

I don't know why this happens.

Threads once bound to an arena are normally never move unless an allocation fails.
 
> A workaround I have found is to set M_MMAP_THRESHOLD to 128 KB - then
> the memory for the computations is always allocated using mmap() and
> returned back to the system immediately when it is free()'ed. This
> solves both of the issues. But I am afraid that this workaround could
> degrade the performance of our application. So, we are grateful for any
> better solution to this problem.

It will degrade performance because you must do a syscall all the time. You can try
raising the value.

-- 
Cheers,
Carlos.