From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 7025E3858D35 for ; Thu, 25 Nov 2021 18:20:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7025E3858D35 Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-461-p4V8V3_fPmunk0lJAIdZeA-1; Thu, 25 Nov 2021 13:20:21 -0500 X-MC-Unique: p4V8V3_fPmunk0lJAIdZeA-1 Received: by mail-qk1-f197.google.com with SMTP id bk35-20020a05620a1a2300b0046d2a9b93dfso2666999qkb.16 for ; Thu, 25 Nov 2021 10:20:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:organization:in-reply-to :content-transfer-encoding; bh=FWTfNN7uXA6tvVHkbcn9czyYyyqVRwUN4RYWJ18XiSM=; b=cQeVMBwU4EWJvsCr+HREyGiEfNlfVPvjK/GZr0EK8T89TgVOoSfTRLOGjzLbDpyOw4 ry04Xhbe2xiMI1B99Ezs5o9jyCsBr3rsnFgrR0bqgdSjMhgSe5UoGsZWRsWpwmKg/oAO QyBMXTqaqFjWrY0CabE3okDqs0+PTW3cFxlhB/1cvLH4SKK92q+3TB4aN7pU8T+krMu3 9wQCrUGtPpKSeyzbD89cyoeeF1f/Jxz7lwf3xlScCog4M0S6ROiLdv+5/T6ZHGxfZUez uFZBwLcj3gW3F9PtDczRYZYgezbhVcNyRf9MGDenlWpu4fvs6Y57uxpRwX6eCwzrBOeJ AGcg== X-Gm-Message-State: AOAM532/StOI7fwyMwrQAQXIBZLcDEC5VLeI2yYsbtUmtpq5kRR/9Qs3 DWaY5WHHiN4O0ZVT5E4LJmarJo7qC83NSnFnDE5yA3qbgkRyicyqAaJmoIMydA/SpBchkAz/kNL 24PYAyEHjQy0B5WC4lD8= X-Received: by 2002:a37:f60a:: with SMTP id y10mr17183233qkj.518.1637864420509; Thu, 25 Nov 2021 10:20:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJxmMlEdVSs/vM8BpaNWRILjLuY3DJy49veWpACzyw7DVvvKflIYF82p+R9BBCZiOXHPbz1CKQ== X-Received: by 2002:a37:f60a:: with SMTP id y10mr17183195qkj.518.1637864420183; Thu, 25 Nov 2021 10:20:20 -0800 (PST) Received: from [192.168.0.241] (135-23-175-80.cpe.pppoe.ca. [135.23.175.80]) by smtp.gmail.com with ESMTPSA id n19sm2019444qta.78.2021.11.25.10.20.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 25 Nov 2021 10:20:19 -0800 (PST) Message-ID: <5a2c2e65-241a-2b22-bb8d-87c18768145e@redhat.com> Date: Thu, 25 Nov 2021 13:20:18 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.1.0 Subject: Re: Excessive memory consumption when using malloc() To: Christian Hoff , libc-help References: From: Carlos O'Donell Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Nov 2021 18:20:26 -0000 On 11/25/21 12:20, Christian Hoff via Libc-help wrote: > Hello all, > > we are facing the a problem with the memory allocator in glibc 2.17 on > RHEL 7.9. Or application allocates about 10 GB of memory (split into > chunks that are each around 512 KB large). This memory is used for some > computations and released afterwards. After a while, the application is > running the same computations again, but this time in different threads. > The first issue we are seeing is that - after the computations are done > - the 10 GB of memory is not released back to the operating system. Only > after calling malloc_trim() manually with GDB, the size of the process > shrinks dramatically from ~10GB to 400 MB. So, at this point, the unused > memory from the computations is finally returned to the OS. How many cpus does the system have? How many threads do you create? Is this 10GiB of RSS or VSS? For very large systems glibc malloc will create up to 8 arenas per CPU. Each arena starts with a default 64MiB VMA reservation. On a 128 core system this appears as a ~65GiB VSS reservation. > Our wish would be that the memory is returned to the OS without us > having to call malloc_trim(). And I understand that glibc also trims the > heap when there is sufficient free space in top of it (the > M_TRIM_THRESHOLD in mallopt() controls when this should happen). What > could be the reason why this is not working in our case? Could it be > related to heap fragmentation? But assuming that is the reason, why is > malloc_trim() nevertheless able to free this memory? The normal trimming strategy is to trim from the top of the heap down. Chunks at the top of the heap are coalesced and eventually when the chunk is big enough the heap is freed down. This coalescing and freeing is prevented it there are in-use chunks in the heap. Consider this scenario: - Make many large allocations that have a short lifetime. - Make one small allocation that has a very long lifetime. - Free all the large allocations. The heap cannot be freed downwards because of the small long liftetime allocation. The call to malloc_trim() walks the heap chunks and frees page-sized chunks or larger without the requirement that they come from the top of the heap. In glibc's allocator, mixing lifetimes for allocations will cause heap growth. I have an important question to ask now: Do you use aligned allocations? We have right now an outstanding defect where aligned allocations create small residual free chunks, and when free'd back and allocated again as an aligned chunk, we are forced to split chunks again, which can lead to ratcheting effects with certain aligned allocations. We had a prototype patch for this in Fedora in 2019: https://lists.fedoraproject.org/archives/list/glibc@lists.fedoraproject.org/thread/2PCHP5UWONIOAEUG34YBAQQYD7JL5JJ4/ > And then we also have one other problem. The first run of the > computations is always fine: we allocate 10 GB of memory and the > application grows to 10 GB. Afterwards, we release those 10 GB of memory > since the computations are now done and at this point the freed memory > is returned back to the allocator (however, the size of the process > remains 10 GB unless we call malloc_trim()). But if we now re-run the > same computations again a second time (this time using different > threads), a problem occurs. In this case, the size of the application > grows well beyond 10 GB. It can get 20 GB or larger and the process is > eventually killed because the system runs out of memory. You need to determine what is going on under the hood here. You may want to just use malloc_info() to get a routine dump of the heap state. This will give us a starting point to see what is growing. We have a malloc allocation tracer that you can use to capture a workload and share a snapshot of the workload with upstream: https://pagure.io/glibc-malloc-trace-utils Sharing the workload might be hard because this is a full API trace and it gets difficult to share. > Do you have any idea why this happens? To me it seems like the threads > are assigned to different arenas and therefore the previously freed 10 > GB of memory can not be re-used as they are in different arenas. Is that > possible? I don't know why this happens. Threads once bound to an arena are normally never move unless an allocation fails. > A workaround I have found is to set M_MMAP_THRESHOLD to 128 KB - then > the memory for the computations is always allocated using mmap() and > returned back to the system immediately when it is free()'ed. This > solves both of the issues. But I am afraid that this workaround could > degrade the performance of our application. So, we are grateful for any > better solution to this problem. It will degrade performance because you must do a syscall all the time. You can try raising the value. -- Cheers, Carlos.