From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x32a.google.com (mail-ot1-x32a.google.com [IPv6:2607:f8b0:4864:20::32a]) by sourceware.org (Postfix) with ESMTPS id 9242F3858D33 for ; Wed, 8 Mar 2023 17:20:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9242F3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-ot1-x32a.google.com with SMTP id f19-20020a9d5f13000000b00693ce5a2f3eso9357214oti.8 for ; Wed, 08 Mar 2023 09:20:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1678295999; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=ETeJqFfAqWp4Tg5Id3zvWtT127nLQBDcEzLmzG7DyzU=; b=K3DlaeC0uh9v7mz4weq1JqjHixEqcuXyawF21L6IFnpE9/RA36du344sxGCXw7Y/PS 2lid97h5PUs9zuYIFm5rXJyNs2PphfFErNDi34ACgCvywWpGWwqV7Qiv510ynpK7yYQJ HUvv7rUsK+48ewtr8FNGhxEAbVQcOKOL35Dkzo14yeNiTO2T/ku+sn7DCEQCSHjWicBa mp3VeEzn6Y0ldMi7h2QAx2Tsd6siLqNlXjt49Jjt0/zQeL7w11YB2yXDzxiuPLdY8gGg kNtKFUj/iHUmP3GWWCKkA7bD8vyO2IjrlHGIeJpmlrTgym38e6Xbcc2D8jnbU82cb6rf wpQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678295999; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ETeJqFfAqWp4Tg5Id3zvWtT127nLQBDcEzLmzG7DyzU=; b=DaqYM+3861mS5yB84H/lyC/lG9p/npaZepBpWHmRpjE7/kxXiI3BcG24FJXT/MsvVC aIk3VXP/GWF1tJOMXZY4BYJ782RTSr+57TEJ7RJo2u768uR5j0qonKkdgk/bQOMtFQ/M Il8IzHayXAxNCcPgXt84Bhwop6+wRJNrpwxE/fkKVFzbLhGhGkMQYr97RpKJNXBvH/Nk p3hpNs9wAzsMIVYhFMDtBtKdsqdaepJ4FhcWNINb1blW6I5hGgDClnSwzHKqtOpp7rAb 8QChos7v72DCh7UQ/gSFilUBLemhP3MNopLTYxvdoTarrsFBhwLoC6Y5dPDGMFkivSi8 FH9Q== X-Gm-Message-State: AO0yUKV8C3shGKen/yfY8i4nXdwnjfJF9E5mnpOqxnt2MYnKmVuKWnJQ J4EIBjzF6QjkgFH9aWlR/CnWtQ== X-Google-Smtp-Source: AK7set9+y7puJmhkdcYf1F14e2AFPDvs8s+zznQ9Tl3EJwT/he4lDIfU3WPQ4Z5soUs1LK2l0T/Msw== X-Received: by 2002:a9d:7214:0:b0:694:7e59:55c4 with SMTP id u20-20020a9d7214000000b006947e5955c4mr3704289otj.16.1678295999304; Wed, 08 Mar 2023 09:19:59 -0800 (PST) Received: from ?IPV6:2804:1b3:a7c0:544b:655d:5559:758d:90f7? ([2804:1b3:a7c0:544b:655d:5559:758d:90f7]) by smtp.gmail.com with ESMTPSA id c19-20020a056830001300b00690d0daa17esm6632324otp.3.2023.03.08.09.19.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 08 Mar 2023 09:19:58 -0800 (PST) Message-ID: <06a84799-3a73-2bff-e157-281eed68febf@linaro.org> Date: Wed, 8 Mar 2023 14:19:55 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [RFC] Stack allocation, hugepages and RSS implications Content-Language: en-US To: Cupertino Miranda , libc-alpha@sourceware.org Cc: "Jose E. Marchesi" , Elena Zannoni , Cupertino Miranda References: <87pm9j4azf.fsf@oracle.com> <87mt4n49ak.fsf@oracle.com> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <87mt4n49ak.fsf@oracle.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 08/03/23 11:17, Cupertino Miranda via Libc-alpha wrote: > > Hi everyone, > > For performance purposes, one of ours in-house applications requires to enable > TRANSPARENT_HUGEPAGES_ALWAYS option in linux kernel, actually making the > kernel to force all of the big enough and alligned memory allocations to > reside in hugepages. I believe the reason behind this decision is to > have more control on data location. He have, since 2.35, the glibc.malloc.hugetlb tunables, where setting to 1 enables MADV_HUGEPAGE madvise for mmap allocated pages if mode is set as 'madvise' (/sys/kernel/mm/transparent_hugepage/enabled). One option would to use it instead of 'always' and use glibc.malloc.hugetlb=1. The main drawback of this strategy is this system wide setting, so it might affect other user/programs as well. > > For stack allocation, it seems that hugepages make resident set size > (RSS) increase significantly, and without any apparent benefit, as the > huge page will be split in small pages even before leaving glibc stack > allocation code. > > As an example, this is what happens in case of a pthread_create with 2MB > stack size: > 1. mmap request for the 2MB allocation with PROT_NONE; > a huge page is "registered" by the kernel > 2. the thread descriptor is writen in the end of the stack. > this will trigger a page exception in the kernel which will make the actual > memory allocation of the 2MB. > 3. an mprotect changes protection on the guard (one of the small pages of the > allocated space): > at this point the kernel needs to break the 2MB page into many small pages > in order to change the protection on that memory region. > This will eliminate any benefit of having small pages for stack allocation, > but also makes RSS to be increaded by 2MB even though nothing was > written to most of the small pages. > > As an exercise I added __madvise(..., MADV_NOHUGEPAGE) right after > the __mmap in nptl/allocatestack.c. As expected, RSS was significantly reduced for > the application. > > At this point I am very much confident that there is a real benefit in our > particular use case to enforce stacks not ever to use hugepages. > > This RFC is to understand if I have missed some option in glibc that would > allow to better control stack allocation. > If not, I am tempted to propose/submit a change, in the form of a tunable, to > enforce NOHUGEPAGES for stacks. > > In any case, I wonder if there is an actual use case where an hugepage would > survive glibc stack allocation and will bring an actual benefit. > > Looking forward for your comments. Maybe also a similar strategy on pthread stack allocation, where if transparent hugepages is 'always' and glibc.malloc.hugetlb is 3 we set MADV_NOHUGEPAGE on internal mmaps. So value of '3' means disable THP, which might be confusing but currently we have '0' as 'use system default'. It can be also another tunable, like glibc.hugetlb to decouple from malloc code. Ideally it will require to cache the __malloc_thp_mode, so we avoid the non required mprotected calls, similar to what we need on malloc do_set_hugetlb (it also assumes that once the programs calls the initial malloc, any system wide change to THP won't take effect).