Re: [RFC] Stack allocation, hugepages and RSS implications

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Cupertino Miranda <cupertino.miranda@oracle.com>
To: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: libc-alpha@sourceware.org,
	"Jose E. Marchesi" <jose.marchesi@oracle.com>,
	Elena Zannoni <elena.zannoni@oracle.com>,
	Cupertino Miranda <cupertinomiranda@gmail.com>
Subject: Re: [RFC] Stack allocation, hugepages and RSS implications
Date: Thu, 09 Mar 2023 09:38:07 +0000	[thread overview]
Message-ID: <87edpy464g.fsf@oracle.com> (raw)
In-Reply-To: <06a84799-3a73-2bff-e157-281eed68febf@linaro.org>


Adhemerval Zanella Netto writes:

> On 08/03/23 11:17, Cupertino Miranda via Libc-alpha wrote:
>>
>> Hi everyone,
>>
>> For performance purposes, one of ours in-house applications requires to enable
>> TRANSPARENT_HUGEPAGES_ALWAYS option in linux kernel, actually making the
>> kernel to force all of the big enough and alligned memory allocations to
>> reside in hugepages.  I believe the reason behind this decision is to
>> have more control on data location.
>
> He have, since 2.35, the glibc.malloc.hugetlb tunables, where setting to 1
> enables MADV_HUGEPAGE madvise for mmap allocated pages if mode is set as
> 'madvise' (/sys/kernel/mm/transparent_hugepage/enabled).  One option would
> to use it instead of 'always' and use glibc.malloc.hugetlb=1.
>
> The main drawback of this strategy is this system wide setting, so it
> might affect other user/programs as well.
>
>>
>> For stack allocation, it seems that hugepages make resident set size
>> (RSS) increase significantly, and without any apparent benefit, as the
>> huge page will be split in small pages even before leaving glibc stack
>> allocation code.
>>
>> As an example, this is what happens in case of a pthread_create with 2MB
>> stack size:
>>  1. mmap request for the 2MB allocation with PROT_NONE;
>>       a huge page is "registered" by the kernel
>>  2. the thread descriptor is writen in the end of the stack.
>>       this will trigger a page exception in the kernel which will make the actual
>>       memory allocation of the 2MB.
>>  3. an mprotect changes protection on the guard (one of the small pages of the
>>     allocated space):
>>       at this point the kernel needs to break the 2MB page into many small pages
>>       in order to change the protection on that memory region.
>>       This will eliminate any benefit of having small pages for stack allocation,
>>       but also makes RSS to be increaded by 2MB even though nothing was
>>       written to most of the small pages.
>>
>> As an exercise I added __madvise(..., MADV_NOHUGEPAGE) right after
>> the __mmap in nptl/allocatestack.c. As expected, RSS was significantly reduced for
>> the application.
>>
>> At this point I am very much confident that there is a real benefit in our
>> particular use case to enforce stacks not ever to use hugepages.
>>
>> This RFC is to understand if I have missed some option in glibc that would
>> allow to better control stack allocation.
>> If not, I am tempted to propose/submit a change, in the form of a tunable, to
>> enforce NOHUGEPAGES for stacks.
>>
>> In any case, I wonder if there is an actual use case where an hugepage would
>> survive glibc stack allocation and will bring an actual benefit.
>>
>> Looking forward for your comments.
>
> Maybe also a similar strategy on pthread stack allocation, where if transparent
> hugepages is 'always' and glibc.malloc.hugetlb is 3 we set MADV_NOHUGEPAGE on
> internal mmaps.  So value of '3' means disable THP, which might be confusing
> but currently we have '0' as 'use system default'.  It can be also another
> tunable, like glibc.hugetlb to decouple from malloc code.
>
The intent would not be to disable hugepages on all internal mmaps, as I
think you said, but rather just do it for stack allocations.
Although more work, I would say if we add this to a tunable then maybe
we should move it from malloc namespace.
If moving it out of malloc is not Ok for backcompatibility reasons, then
I would say create a new tunable specific for the purpose, like
glibc.stack_nohugetlb ?

The more I think about this the less I feel we will ever be able to
practically use hugepages in stacks. We can declare them as such, but
soon enough the kernel would split them in small pages.

> Ideally it will require to cache the __malloc_thp_mode, so we avoid the non
> required mprotected calls, similar to what we need on malloc do_set_hugetlb
> (it also assumes that once the programs calls the initial malloc, any system
> wide change to THP won't take effect).
Very good point. Did not think about this before.

next prev parent reply	other threads:[~2023-03-09  9:38 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87pm9j4azf.fsf@oracle.com>
2023-03-08 14:17 ` Cupertino Miranda
2023-03-08 14:53   ` Cristian Rodríguez
2023-03-08 15:12     ` Cupertino Miranda
2023-03-08 17:19   ` Adhemerval Zanella Netto
2023-03-09  9:38     ` Cupertino Miranda [this message]
2023-03-09 17:11       ` Adhemerval Zanella Netto
2023-03-09 18:11         ` Cupertino Miranda
2023-03-09 18:15           ` Adhemerval Zanella Netto
2023-03-09 19:01             ` Cupertino Miranda
2023-03-09 19:11               ` Adhemerval Zanella Netto
2023-03-09 10:54   ` Florian Weimer
2023-03-09 14:29     ` Cupertino Miranda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87edpy464g.fsf@oracle.com \
    --to=cupertino.miranda@oracle.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=cupertinomiranda@gmail.com \
    --cc=elena.zannoni@oracle.com \
    --cc=jose.marchesi@oracle.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).