From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eastern.birch.relay.mailchannels.net (eastern.birch.relay.mailchannels.net [23.83.209.55]) by sourceware.org (Postfix) with ESMTPS id DE69A3854812 for ; Wed, 18 Aug 2021 18:42:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DE69A3854812 X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id E16DF54328C; Wed, 18 Aug 2021 18:42:14 +0000 (UTC) Received: from pdx1-sub0-mail-a39.g.dreamhost.com (100-96-18-119.trex-nlb.outbound.svc.cluster.local [100.96.18.119]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 50D1454270E; Wed, 18 Aug 2021 18:42:13 +0000 (UTC) X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from pdx1-sub0-mail-a39.g.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384) by 100.96.18.119 (trex/6.3.3); Wed, 18 Aug 2021 18:42:14 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org X-MailChannels-Auth-Id: dreamhost X-White-Little: 2fca95a4426da65f_1629312134540_1781096269 X-MC-Loop-Signature: 1629312134539:2414543357 X-MC-Ingress-Time: 1629312134539 Received: from pdx1-sub0-mail-a39.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a39.g.dreamhost.com (Postfix) with ESMTP id 07ADA7F44F; Wed, 18 Aug 2021 11:42:13 -0700 (PDT) Received: from [192.168.1.165] (unknown [1.186.101.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a39.g.dreamhost.com (Postfix) with ESMTPSA id 3DEB57E602; Wed, 18 Aug 2021 11:42:08 -0700 (PDT) Subject: Re: [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages To: Adhemerval Zanella , libc-alpha@sourceware.org Cc: Norbert Manthey , Guillaume Morin References: <20210818142000.128752-1-adhemerval.zanella@linaro.org> <20210818142000.128752-2-adhemerval.zanella@linaro.org> X-DH-BACKEND: pdx1-sub0-mail-a39 From: Siddhesh Poyarekar Message-ID: <0f4f0950-a262-ac78-55f3-f566cd63e416@sourceware.org> Date: Thu, 19 Aug 2021 00:12:03 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210818142000.128752-2-adhemerval.zanella@linaro.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3494.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NEUTRAL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2021 18:42:32 -0000 On 8/18/21 7:49 PM, Adhemerval Zanella wrote: > Linux Transparent Huge Pages (THP) current support three different > states: 'never', 'madvise', and 'always'. The 'never' is > self-explanatory and 'always' will enable THP for all anonymous > memory. However, 'madvise' is still the default for some system and > for such case THP will be only used if the memory range is explicity > advertise by the program through a madvise(MADV_HUGEPAGE) call. > > To enable it a new tunable is provided, 'glibc.malloc.thp_madvise', > where setting to a value diffent than 0 enables the madvise call. > Linux current only support one page size for THP, even if the > architecture supports multiple sizes. > > This patch issues the madvise(MADV_HUGEPAGE) call after a successful > mmap() call at sysmalloc() with sizes larger than the default huge > page size. The madvise() call is disable is system does not support > THP or if it has the mode set to "never". > > Checked on x86_64-linux-gnu. > --- > NEWS | 5 +- > elf/dl-tunables.list | 5 ++ > elf/tst-rtld-list-tunables.exp | 1 + > malloc/arena.c | 5 ++ > malloc/malloc-internal.h | 1 + > malloc/malloc.c | 48 ++++++++++++++ > manual/tunables.texi | 9 +++ > sysdeps/generic/Makefile | 8 +++ > sysdeps/generic/malloc-hugepages.c | 31 +++++++++ > sysdeps/generic/malloc-hugepages.h | 37 +++++++++++ > sysdeps/unix/sysv/linux/malloc-hugepages.c | 76 ++++++++++++++++++++++ > 11 files changed, 225 insertions(+), 1 deletion(-) > create mode 100644 sysdeps/generic/malloc-hugepages.c > create mode 100644 sysdeps/generic/malloc-hugepages.h > create mode 100644 sysdeps/unix/sysv/linux/malloc-hugepages.c > > diff --git a/NEWS b/NEWS > index 79c895e382..9b2345d08c 100644 > --- a/NEWS > +++ b/NEWS > @@ -9,7 +9,10 @@ Version 2.35 > > Major new features: > > - [Add new features here] > +* On Linux, a new tunable, glibc.malloc.thp_madvise, can be used to > + make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk calls. > + It might improve performance with Transparent Huge Pages madvise mode > + depending of the workload. > > Deprecated and removed features, and other changes affecting compatibility: > > diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list > index 8ddd4a2314..67df6dbc2c 100644 > --- a/elf/dl-tunables.list > +++ b/elf/dl-tunables.list > @@ -92,6 +92,11 @@ glibc { > minval: 0 > security_level: SXID_IGNORE > } > + thp_madvise { > + type: INT_32 > + minval: 0 > + maxval: 1 > + } > } > cpu { > hwcap_mask { > diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp > index 9f66c52885..d8109fa31c 100644 > --- a/elf/tst-rtld-list-tunables.exp > +++ b/elf/tst-rtld-list-tunables.exp > @@ -8,6 +8,7 @@ glibc.malloc.perturb: 0 (min: 0, max: 255) > glibc.malloc.tcache_count: 0x0 (min: 0x0, max: 0x[f]+) > glibc.malloc.tcache_max: 0x0 (min: 0x0, max: 0x[f]+) > glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0x[f]+) > +glibc.malloc.thp_madvise: 0 (min: 0, max: 1) > glibc.malloc.top_pad: 0x0 (min: 0x0, max: 0x[f]+) > glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0x[f]+) > glibc.rtld.nns: 0x4 (min: 0x1, max: 0x10) > diff --git a/malloc/arena.c b/malloc/arena.c > index 667484630e..81bff54303 100644 > --- a/malloc/arena.c > +++ b/malloc/arena.c > @@ -231,6 +231,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_count, size_t) > TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t) > #endif > TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t) > +TUNABLE_CALLBACK_FNDECL (set_thp_madvise, int32_t) > #else > /* Initialization routine. */ > #include > @@ -331,6 +332,7 @@ ptmalloc_init (void) > TUNABLE_CALLBACK (set_tcache_unsorted_limit)); > # endif > TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast)); > + TUNABLE_GET (thp_madvise, int32_t, TUNABLE_CALLBACK (set_thp_madvise)); > #else > if (__glibc_likely (_environ != NULL)) > { > @@ -509,6 +511,9 @@ new_heap (size_t size, size_t top_pad) > __munmap (p2, HEAP_MAX_SIZE); > return 0; > } > + > + sysmadvise_thp (p2, size); > + > h = (heap_info *) p2; > h->size = size; > h->mprotect_size = size; > diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h > index 0c7b5a183c..7493e34d86 100644 > --- a/malloc/malloc-internal.h > +++ b/malloc/malloc-internal.h > @@ -22,6 +22,7 @@ > #include > #include > #include > +#include > > /* Called in the parent process before a fork. */ > void __malloc_fork_lock_parent (void) attribute_hidden; > diff --git a/malloc/malloc.c b/malloc/malloc.c > index e065785af7..ad3eec41ac 100644 > --- a/malloc/malloc.c > +++ b/malloc/malloc.c > @@ -1881,6 +1881,11 @@ struct malloc_par > INTERNAL_SIZE_T arena_test; > INTERNAL_SIZE_T arena_max; > > +#if HAVE_TUNABLES > + /* Transparent Large Page support. */ > + INTERNAL_SIZE_T thp_pagesize; > +#endif > + > /* Memory map support */ > int n_mmaps; > int n_mmaps_max; > @@ -2009,6 +2014,20 @@ free_perturb (char *p, size_t n) > > #include > > +/* ----------- Routines dealing with transparent huge pages ----------- */ > + > +static inline void > +sysmadvise_thp (void *p, INTERNAL_SIZE_T size) > +{ > +#if HAVE_TUNABLES && defined (MADV_HUGEPAGE) > + /* Do not consider areas smaller than a huge page or if the tunable is > + not active. */ You also shouldn't bother setting it if /sys/kernel/mm/transparent_hugepage/enabled is set to "enabled" since it's redundant. > + if (mp_.thp_pagesize == 0 || size < mp_.thp_pagesize) > + return; > + __madvise (p, size, MADV_HUGEPAGE); > +#endif > +} > + > /* ------------------- Support for multiple arenas -------------------- */ > #include "arena.c" > > @@ -2446,6 +2465,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) > > if (mm != MAP_FAILED) > { > + sysmadvise_thp (mm, size); > + > /* > The offset to the start of the mmapped region is stored > in the prev_size field of the chunk. This allows us to adjust > @@ -2607,6 +2628,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) > if (size > 0) > { > brk = (char *) (MORECORE (size)); > + if (brk != (char *) (MORECORE_FAILURE)) > + sysmadvise_thp (brk, size); > LIBC_PROBE (memory_sbrk_more, 2, brk, size); > } > > @@ -2638,6 +2661,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) > > if (mbrk != MAP_FAILED) > { > + sysmadvise_thp (mbrk, size); > + > /* We do not need, and cannot use, another sbrk call to find end */ > brk = mbrk; > snd_brk = brk + size; > @@ -2749,6 +2774,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) > correction = 0; > snd_brk = (char *) (MORECORE (0)); > } > + else > + sysmadvise_thp (snd_brk, correction); > } > > /* handle non-contiguous cases */ > @@ -2989,6 +3016,8 @@ mremap_chunk (mchunkptr p, size_t new_size) > if (cp == MAP_FAILED) > return 0; > > + sysmadvise_thp (cp, new_size); > + > p = (mchunkptr) (cp + offset); > > assert (aligned_OK (chunk2mem (p))); > @@ -5325,6 +5354,25 @@ do_set_mxfast (size_t value) > return 0; > } > > +#if HAVE_TUNABLES > +static __always_inline int > +do_set_thp_madvise (int32_t value) > +{ > + if (value > 0) > + { > + enum malloc_thp_mode_t thp_mode = __malloc_thp_mode (); > + /* > + Only enables THP usage is system does support it and has at least > + always or madvise mode. Otherwise the madvise() call is wasteful. > + */ > + if (thp_mode != malloc_thp_mode_not_supported > + && thp_mode != malloc_thp_mode_never) > + mp_.thp_pagesize = __malloc_default_thp_pagesize (); > + } > + return 0; > +} > +#endif > + > int > __libc_mallopt (int param_number, int value) > { > diff --git a/manual/tunables.texi b/manual/tunables.texi > index 658547c613..93c46807f9 100644 > --- a/manual/tunables.texi > +++ b/manual/tunables.texi > @@ -270,6 +270,15 @@ pointer, so add 4 on 32-bit systems or 8 on 64-bit systems to the size > passed to @code{malloc} for the largest bin size to enable. > @end deftp > > +@deftp Tunable glibc.malloc.thp_madivse > +This tunable enable the use of @code{madvise} with @code{MADV_HUGEPAGE} after > +the system allocator allocated memory through @code{mmap} if the system supports > +Transparent Huge Page (currently only Linux). > + > +The default value of this tunable is @code{0}, which disable its usage. > +Setting to a positive value enable the @code{madvise} call. > +@end deftp > + > @node Dynamic Linking Tunables > @section Dynamic Linking Tunables > @cindex dynamic linking tunables > diff --git a/sysdeps/generic/Makefile b/sysdeps/generic/Makefile > index a209e85cc4..8eef83c94d 100644 > --- a/sysdeps/generic/Makefile > +++ b/sysdeps/generic/Makefile > @@ -27,3 +27,11 @@ sysdep_routines += framestate unwind-pe > shared-only-routines += framestate unwind-pe > endif > endif > + > +ifeq ($(subdir),malloc) > +sysdep_malloc_debug_routines += malloc-hugepages > +endif > + > +ifeq ($(subdir),misc) > +sysdep_routines += malloc-hugepages > +endif > diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c > new file mode 100644 > index 0000000000..262bcdbeb8 > --- /dev/null > +++ b/sysdeps/generic/malloc-hugepages.c > @@ -0,0 +1,31 @@ > +/* Huge Page support. Generic implementation. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public License as > + published by the Free Software Foundation; either version 2.1 of the > + License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; see the file COPYING.LIB. If > + not, see . */ > + > +#include > + > +size_t > +__malloc_default_thp_pagesize (void) > +{ > + return 0; > +} > + > +enum malloc_thp_mode_t > +__malloc_thp_mode (void) > +{ > + return malloc_thp_mode_not_supported; > +} > diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h > new file mode 100644 > index 0000000000..664cda9b67 > --- /dev/null > +++ b/sysdeps/generic/malloc-hugepages.h > @@ -0,0 +1,37 @@ > +/* Malloc huge page support. Generic implementation. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public License as > + published by the Free Software Foundation; either version 2.1 of the > + License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; see the file COPYING.LIB. If > + not, see . */ > + > +#ifndef _MALLOC_HUGEPAGES_H > +#define _MALLOC_HUGEPAGES_H > + > +#include > + > +/* Return the default transparent huge page size. */ > +size_t __malloc_default_thp_pagesize (void) attribute_hidden; > + > +enum malloc_thp_mode_t > +{ > + malloc_thp_mode_always, > + malloc_thp_mode_madvise, > + malloc_thp_mode_never, > + malloc_thp_mode_not_supported > +}; > + > +enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden; > + > +#endif /* _MALLOC_HUGEPAGES_H */ > diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c > new file mode 100644 > index 0000000000..66589127cd > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c > @@ -0,0 +1,76 @@ > +/* Huge Page support. Linux implementation. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public License as > + published by the Free Software Foundation; either version 2.1 of the > + License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; see the file COPYING.LIB. If > + not, see . */ > + > +#include > +#include > +#include > + > +size_t > +__malloc_default_thp_pagesize (void) > +{ Likewise for page size; this could be cached too. > + int fd = __open64_nocancel ( > + "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", O_RDONLY); > + if (fd == -1) > + return 0; > + > + > + char str[INT_BUFSIZE_BOUND (size_t)]; > + ssize_t s = __read_nocancel (fd, str, sizeof (str)); > + __close_nocancel (fd); > + > + if (s < 0) > + return 0; > + > + int r = 0; > + for (ssize_t i = 0; i < s; i++) > + { > + if (str[i] == '\n') > + break; > + r *= 10; > + r += str[i] - '0'; > + } > + return r; > +} > + > +enum malloc_thp_mode_t > +__malloc_thp_mode (void) > +{ > + int fd = __open64_nocancel ("/sys/kernel/mm/transparent_hugepage/enabled", > + O_RDONLY); > + if (fd == -1) > + return malloc_thp_mode_not_supported; > + > + static const char mode_always[] = "[always] madvise never\n"; > + static const char mode_madvise[] = "always [madvise] never\n"; > + static const char mode_never[] = "always madvise [never]\n"; > + > + char str[sizeof(mode_always)]; > + ssize_t s = __read_nocancel (fd, str, sizeof (str)); > + __close_nocancel (fd); > + > + if (s == sizeof (mode_always) - 1) > + { > + if (strcmp (str, mode_always) == 0) > + return malloc_thp_mode_always; > + else if (strcmp (str, mode_madvise) == 0) > + return malloc_thp_mode_madvise; > + else if (strcmp (str, mode_never) == 0) > + return malloc_thp_mode_never; > + } > + return malloc_thp_mode_not_supported; > +} >