From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oo1-xc31.google.com (mail-oo1-xc31.google.com [IPv6:2607:f8b0:4864:20::c31]) by sourceware.org (Postfix) with ESMTPS id A32413858D28 for ; Tue, 11 Apr 2023 19:56:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A32413858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oo1-xc31.google.com with SMTP id x19-20020a056820105300b0053e1205c84bso1186758oot.9 for ; Tue, 11 Apr 2023 12:56:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1681243017; x=1683835017; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=AmMzFS5/Pgc5tHZaf84gbgj8itlseKd2buaqxS6ZR4w=; b=AfTv9BEudRkkq8AAHYhRrW7PtPySatT2H6mpjEO6kZDorWO0rVZCrwXjZFRsAYnH55 0uy7TgGF7ryhrhCt6SeBtFmSpSCJnRo9ywInKNSBLFkwY84Tjgucf0XKMKzvc+G4HGpO 57MCMGHjHYUWVjdqw8yRtqPpxngDWV/HR8q9KQPm/IGJSlJ7BCfOzWuFQWxxHXkd/KPW 3+9wcHOZxQwPHV9AvcUtpf3O9giFMtHva5ydrqWVDIbq5a4M5+Rwb7I5mc2QZG5Fc5Vh Hh3jQ3dr4eHxJIkFCZYKiUzGbEMqAvgpn/dBlRNcrneckrIazsLzkr/9dpm7B4UE/q+q Vu0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1681243017; x=1683835017; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AmMzFS5/Pgc5tHZaf84gbgj8itlseKd2buaqxS6ZR4w=; b=rGvlG8+sBozMS8AcAa1xoQyZ+pm8lB56XB1J8iodm/1k+O4uXLYoNID0B4xQVX28qR YSrpW+i0M3+FmlMFw5NR+T7gQAdQ23LYEif1o7u+rEyh9n5KL6hjz5D0hW9Q45BaX2Po 15Lo6u81msSIrWwF5nEFUwdvuNnJ3w+M8phok35tCXSZa5GblNgPIDXnVU+AVK5CE5ir livs2ZR0/OqMaTQCRPgKz2ScC+9wIohAJr24mCPUb8mc3cwfjMPh29pHgL7OhGKNJk4e lvCyqUCktDlxrkffoQI9+VLqBMN3Lo76SCmTdeybupDLmUlXPqmMirVHYsRPdLoVlfuG 4Xww== X-Gm-Message-State: AAQBX9f1CFl23mTVqtUNoBQijemT/WWO6WfprR9VfQ0Rfdl8n8N6Hu45 W/BlOL28gOCRA8/m5fZ2LZ+GmQ== X-Google-Smtp-Source: AKy350Zw5j2OazkEaCjTSs+mouzX59WPhIazXJtahsSL9JdCnGN3jQOmezfBdxw0Bjx6UnabBUtAjQ== X-Received: by 2002:a4a:8963:0:b0:538:b486:3414 with SMTP id g32-20020a4a8963000000b00538b4863414mr5333998ooi.6.1681243016739; Tue, 11 Apr 2023 12:56:56 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c2:55a1:7428:425e:4ee7:30b6? ([2804:1b3:a7c2:55a1:7428:425e:4ee7:30b6]) by smtp.gmail.com with ESMTPSA id q65-20020a4a3344000000b0053bb063121asm6188084ooq.9.2023.04.11.12.56.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 11 Apr 2023 12:56:55 -0700 (PDT) Message-ID: <8f313a5d-f16a-d682-1d78-f216c446099f@linaro.org> Date: Tue, 11 Apr 2023 16:56:52 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: [PATCH v5 1/1] Created tunable to force small pages on stack allocation. Content-Language: en-US To: Cupertino Miranda , libc-alpha@sourceware.org, Florian Weimer Cc: jose.marchesi@oracle.com, elena.zannoni@oracle.com References: <20230328152258.54844-1-cupertino.miranda@oracle.com> <20230328152258.54844-2-cupertino.miranda@oracle.com> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <20230328152258.54844-2-cupertino.miranda@oracle.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-14.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 28/03/23 12:22, Cupertino Miranda via Libc-alpha wrote: > Created tunable glibc.pthread.stack_hugetlb to control when hugepages > can be used for stack allocation. > In case THP are enabled and glibc.pthread.stack_hugetlb is set to > 0, glibc will madvise the kernel not to use allow hugepages for stack > allocations. > > Changed from v1: > - removed the __malloc_thp_mode calls to check if hugetlb is > enabled. > > Changed from v2: > - Added entry in manual/tunables.texi > - Fixed tunable default to description > - Code style corrections. > > Changes from v3: > - Improve tunables.texi. > > Changes from v4: > - Improved text in tunables.texi by suggestion of Adhemerval. Florian has raised some concern [1] that reported RSS increase is not technically correct because the once the kernel need to split the Huge Page, it does not need keep all of them (only the one that actually generate the soft fault). However this is not what I see using the previous testcase that creates lot of threads to force the THP usage and checking the /proc/self/smaps_rollout. The resulting 'Private_Dirty' still accounts for *all* the default smalls pages once kernel decides to split the page, and it seems to be same outcome from a recent OpenJDK thread [2]. Afaiu the kernel does not keep track which possible small pages from the THP has been already hit when the guard page is mprotect (which forces the split), so when the kernel reverts back to using default pages it keeps all the pages. This is also a recent kernel discussion which similar conclusion [3]. So this patch is LGTM, and I will install this shortly. I also discussed on the same call if it would be better to make the m advise the *default* behavior if the pthread stack usage will always ended up requiring the kernel to split up to use default pages, i.e: 1. THP (/sys/kernel/mm/transparent_hugepage/enabled) is set to 'always'. 2. The stack size is multiple of THP size (/sys/kernel/mm/transparent_hugepage/hpage_pmd_size). 3. And if stack size minus guard pages is still multiple of THP size ((stack_size - guard_size) % thp_size == 0). It does not mean that the stack will automatically backup by THP, but it also means that depending of the process VMA it might generate some RSS waste once kernel decides to use THP for the stack. And it should also make the tunables not required. [1] https://sourceware.org/glibc/wiki/PatchworkReviewMeetings [2] https://bugs.openjdk.org/browse/JDK-8303215?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&showAll=true [3] https://lore.kernel.org/linux-mm/278ec047-4c5d-ab71-de36-094dbed4067c@redhat.com/T/ > --- > manual/tunables.texi | 15 +++++++++++++++ > nptl/allocatestack.c | 6 ++++++ > nptl/nptl-stack.c | 1 + > nptl/nptl-stack.h | 3 +++ > nptl/pthread_mutex_conf.c | 8 ++++++++ > sysdeps/nptl/dl-tunables.list | 6 ++++++ > 6 files changed, 39 insertions(+) > > diff --git a/manual/tunables.texi b/manual/tunables.texi > index 70dd2264c5..130f94b2bc 100644 > --- a/manual/tunables.texi > +++ b/manual/tunables.texi > @@ -459,6 +459,21 @@ registration on behalf of the application. > Restartable sequences are a Linux-specific extension. > @end deftp > > +@deftp Tunable glibc.pthread.stack_hugetlb > +This tunable controls whether to use Huge Pages in the stacks created by > +@code{pthread_create}. This tunable only affects the stacks created by > +@theglibc{}, it has no effect on stack assigned with > +@code{pthread_attr_setstack}. > + > +The default is @samp{1} where the system default value is used. Setting > +its value to @code{0} enables the use of @code{madvise} with > +@code{MADV_NOHUGEPAGE} after stack creation with @code{mmap}. > + > +This is a memory utilization optimization, since internal glibc setup of either > +the thread descriptor and the guard page might force the kernel to move the > +thread stack originally backup by Huge Pages to default pages. > +@end deftp > + > @node Hardware Capability Tunables > @section Hardware Capability Tunables > @cindex hardware capability tunables > diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c > index c7adbccd6f..f9d8cdfd08 100644 > --- a/nptl/allocatestack.c > +++ b/nptl/allocatestack.c > @@ -369,6 +369,12 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, > if (__glibc_unlikely (mem == MAP_FAILED)) > return errno; > > + /* Do madvise in case the tunable glibc.pthread.stack_hugetlb is > + set to 0, disabling hugetlb. */ > + if (__glibc_unlikely (__nptl_stack_hugetlb == 0) > + && __madvise (mem, size, MADV_NOHUGEPAGE) != 0) > + return errno; > + > /* SIZE is guaranteed to be greater than zero. > So we can never get a null pointer back from mmap. */ > assert (mem != NULL); > diff --git a/nptl/nptl-stack.c b/nptl/nptl-stack.c > index 5eb7773575..e829711cb5 100644 > --- a/nptl/nptl-stack.c > +++ b/nptl/nptl-stack.c > @@ -21,6 +21,7 @@ > #include > > size_t __nptl_stack_cache_maxsize = 40 * 1024 * 1024; > +int32_t __nptl_stack_hugetlb = 1; > > void > __nptl_stack_list_del (list_t *elem) > diff --git a/nptl/nptl-stack.h b/nptl/nptl-stack.h > index 34f8bbb15e..cf90b27c2b 100644 > --- a/nptl/nptl-stack.h > +++ b/nptl/nptl-stack.h > @@ -27,6 +27,9 @@ > /* Maximum size of the cache, in bytes. 40 MiB by default. */ > extern size_t __nptl_stack_cache_maxsize attribute_hidden; > > +/* Should allow stacks to use hugetlb. (1) is default. */ > +extern int32_t __nptl_stack_hugetlb; > + > /* Check whether the stack is still used or not. */ > static inline bool > __nptl_stack_in_use (struct pthread *pd) > diff --git a/nptl/pthread_mutex_conf.c b/nptl/pthread_mutex_conf.c > index 329c4cbb8f..60ef9095aa 100644 > --- a/nptl/pthread_mutex_conf.c > +++ b/nptl/pthread_mutex_conf.c > @@ -45,6 +45,12 @@ TUNABLE_CALLBACK (set_stack_cache_size) (tunable_val_t *valp) > __nptl_stack_cache_maxsize = valp->numval; > } > > +static void > +TUNABLE_CALLBACK (set_stack_hugetlb) (tunable_val_t *valp) > +{ > + __nptl_stack_hugetlb = (int32_t) valp->numval; > +} > + > void > __pthread_tunables_init (void) > { > @@ -52,5 +58,7 @@ __pthread_tunables_init (void) > TUNABLE_CALLBACK (set_mutex_spin_count)); > TUNABLE_GET (stack_cache_size, size_t, > TUNABLE_CALLBACK (set_stack_cache_size)); > + TUNABLE_GET (stack_hugetlb, int32_t, > + TUNABLE_CALLBACK (set_stack_hugetlb)); > } > #endif > diff --git a/sysdeps/nptl/dl-tunables.list b/sysdeps/nptl/dl-tunables.list > index bd1ddb121d..4cde9500b6 100644 > --- a/sysdeps/nptl/dl-tunables.list > +++ b/sysdeps/nptl/dl-tunables.list > @@ -33,5 +33,11 @@ glibc { > maxval: 1 > default: 1 > } > + stack_hugetlb { > + type: INT_32 > + minval: 0 > + maxval: 1 > + default: 1 > + } > } > }