From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by sourceware.org (Postfix) with ESMTPS id CAFD839AF4E6 for ; Thu, 19 Aug 2021 12:00:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CAFD839AF4E6 Received: by mail-pl1-x634.google.com with SMTP id w6so3758177plg.9 for ; Thu, 19 Aug 2021 05:00:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=2p+JGC0jmRwphUDddHqwK9YJBhUm3zCxxjrVTKIlh6M=; b=HnoUwl8WoiVC0bvmOO8XwktPg2vhE7VRgIYHlNqagYPBddliQ1wWs1dzjvYTbo7/ey EcNLK4IhqkzGwtefVoifE4hChWFPyanEjvO2Pj1wNGM+g7+YjLjXR5Y9OAlD2R8Xus3u gHCnpA3yvsxuVtwFqnLCWCd57qa+zruH7sl8UjWAL2ZesJhNGxJPyQJ+fnv85dVNq/3P NjS3L0SzbB6PlucknFNcipOFNoSIaabgG3Gw8hSHWTOfORqQov/0VLAgWrvNHHfcDGkg Q25sjLFitxYEdYGGcw+qSu36JDiAVoXTBFPLZ19vOQ/VPzM689KK/hljdVlkUDL4E1g9 aDZg== X-Gm-Message-State: AOAM533HyqY85Rxlr7GwzqTPj5xweVkQ30U5ooPbGBATLQvbmKs/Zldo DZ4V+Kk1kusQQxYKeG9D8moJfw== X-Google-Smtp-Source: ABdhPJy9U4YJ+rw+xaqT0bzNAJ1+44tYCTh/YBXjP3SV4sYbhxNFyPKkJ50+4TIMbDfBiK6z0r9GtQ== X-Received: by 2002:a17:90a:4d8e:: with SMTP id m14mr14764873pjh.106.1629374418516; Thu, 19 Aug 2021 05:00:18 -0700 (PDT) Received: from ?IPv6:2804:431:c7ca:cd83:aa1a:7bd:9935:9bba? ([2804:431:c7ca:cd83:aa1a:7bd:9935:9bba]) by smtp.gmail.com with ESMTPSA id b20sm8111504pji.24.2021.08.19.05.00.16 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 19 Aug 2021 05:00:18 -0700 (PDT) Subject: Re: [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages To: Siddhesh Poyarekar , libc-alpha@sourceware.org Cc: Norbert Manthey , Guillaume Morin References: <20210818142000.128752-1-adhemerval.zanella@linaro.org> <20210818142000.128752-2-adhemerval.zanella@linaro.org> <0f4f0950-a262-ac78-55f3-f566cd63e416@sourceware.org> From: Adhemerval Zanella Message-ID: <704b7a6c-805e-5177-9265-c63500711d03@linaro.org> Date: Thu, 19 Aug 2021 09:00:15 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <0f4f0950-a262-ac78-55f3-f566cd63e416@sourceware.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Aug 2021 12:00:30 -0000 On 18/08/2021 15:42, Siddhesh Poyarekar wrote: > On 8/18/21 7:49 PM, Adhemerval Zanella wrote: >> Linux Transparent Huge Pages (THP) current support three different >> states: 'never', 'madvise', and 'always'.  The 'never' is >> self-explanatory and 'always' will enable THP for all anonymous >> memory.  However, 'madvise' is still the default for some system and >> for such case THP will be only used if the memory range is explicity >> advertise by the program through a madvise(MADV_HUGEPAGE) call. >> >> To enable it a new tunable is provided, 'glibc.malloc.thp_madvise', >> where setting to a value diffent than 0 enables the madvise call. >> Linux current only support one page size for THP, even if the >> architecture supports multiple sizes. >> >> This patch issues the madvise(MADV_HUGEPAGE) call after a successful >> mmap() call at sysmalloc() with sizes larger than the default huge >> page size.  The madvise() call is disable is system does not support >> THP or if it has the mode set to "never". >> >> Checked on x86_64-linux-gnu. >> --- >>   NEWS                                       |  5 +- >>   elf/dl-tunables.list                       |  5 ++ >>   elf/tst-rtld-list-tunables.exp             |  1 + >>   malloc/arena.c                             |  5 ++ >>   malloc/malloc-internal.h                   |  1 + >>   malloc/malloc.c                            | 48 ++++++++++++++ >>   manual/tunables.texi                       |  9 +++ >>   sysdeps/generic/Makefile                   |  8 +++ >>   sysdeps/generic/malloc-hugepages.c         | 31 +++++++++ >>   sysdeps/generic/malloc-hugepages.h         | 37 +++++++++++ >>   sysdeps/unix/sysv/linux/malloc-hugepages.c | 76 ++++++++++++++++++++++ >>   11 files changed, 225 insertions(+), 1 deletion(-) >>   create mode 100644 sysdeps/generic/malloc-hugepages.c >>   create mode 100644 sysdeps/generic/malloc-hugepages.h >>   create mode 100644 sysdeps/unix/sysv/linux/malloc-hugepages.c >> >> diff --git a/NEWS b/NEWS >> index 79c895e382..9b2345d08c 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -9,7 +9,10 @@ Version 2.35 >>     Major new features: >>   -  [Add new features here] >> +* On Linux, a new tunable, glibc.malloc.thp_madvise, can be used to >> +  make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk calls. >> +  It might improve performance with Transparent Huge Pages madvise mode >> +  depending of the workload. >>     Deprecated and removed features, and other changes affecting compatibility: >>   diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list >> index 8ddd4a2314..67df6dbc2c 100644 >> --- a/elf/dl-tunables.list >> +++ b/elf/dl-tunables.list >> @@ -92,6 +92,11 @@ glibc { >>         minval: 0 >>         security_level: SXID_IGNORE >>       } >> +    thp_madvise { >> +      type: INT_32 >> +      minval: 0 >> +      maxval: 1 >> +    } >>     } >>     cpu { >>       hwcap_mask { >> diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp >> index 9f66c52885..d8109fa31c 100644 >> --- a/elf/tst-rtld-list-tunables.exp >> +++ b/elf/tst-rtld-list-tunables.exp >> @@ -8,6 +8,7 @@ glibc.malloc.perturb: 0 (min: 0, max: 255) >>   glibc.malloc.tcache_count: 0x0 (min: 0x0, max: 0x[f]+) >>   glibc.malloc.tcache_max: 0x0 (min: 0x0, max: 0x[f]+) >>   glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0x[f]+) >> +glibc.malloc.thp_madvise: 0 (min: 0, max: 1) >>   glibc.malloc.top_pad: 0x0 (min: 0x0, max: 0x[f]+) >>   glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0x[f]+) >>   glibc.rtld.nns: 0x4 (min: 0x1, max: 0x10) >> diff --git a/malloc/arena.c b/malloc/arena.c >> index 667484630e..81bff54303 100644 >> --- a/malloc/arena.c >> +++ b/malloc/arena.c >> @@ -231,6 +231,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_count, size_t) >>   TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t) >>   #endif >>   TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t) >> +TUNABLE_CALLBACK_FNDECL (set_thp_madvise, int32_t) >>   #else >>   /* Initialization routine. */ >>   #include >> @@ -331,6 +332,7 @@ ptmalloc_init (void) >>              TUNABLE_CALLBACK (set_tcache_unsorted_limit)); >>   # endif >>     TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast)); >> +  TUNABLE_GET (thp_madvise, int32_t, TUNABLE_CALLBACK (set_thp_madvise)); >>   #else >>     if (__glibc_likely (_environ != NULL)) >>       { >> @@ -509,6 +511,9 @@ new_heap (size_t size, size_t top_pad) >>         __munmap (p2, HEAP_MAX_SIZE); >>         return 0; >>       } >> + >> +  sysmadvise_thp (p2, size); >> + >>     h = (heap_info *) p2; >>     h->size = size; >>     h->mprotect_size = size; >> diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h >> index 0c7b5a183c..7493e34d86 100644 >> --- a/malloc/malloc-internal.h >> +++ b/malloc/malloc-internal.h >> @@ -22,6 +22,7 @@ >>   #include >>   #include >>   #include >> +#include >>     /* Called in the parent process before a fork.  */ >>   void __malloc_fork_lock_parent (void) attribute_hidden; >> diff --git a/malloc/malloc.c b/malloc/malloc.c >> index e065785af7..ad3eec41ac 100644 >> --- a/malloc/malloc.c >> +++ b/malloc/malloc.c >> @@ -1881,6 +1881,11 @@ struct malloc_par >>     INTERNAL_SIZE_T arena_test; >>     INTERNAL_SIZE_T arena_max; >>   +#if HAVE_TUNABLES >> +  /* Transparent Large Page support.  */ >> +  INTERNAL_SIZE_T thp_pagesize; >> +#endif >> + >>     /* Memory map support */ >>     int n_mmaps; >>     int n_mmaps_max; >> @@ -2009,6 +2014,20 @@ free_perturb (char *p, size_t n) >>     #include >>   +/* ----------- Routines dealing with transparent huge pages ----------- */ >> + >> +static inline void >> +sysmadvise_thp (void *p, INTERNAL_SIZE_T size) >> +{ >> +#if HAVE_TUNABLES && defined (MADV_HUGEPAGE) >> +  /* Do not consider areas smaller than a huge page or if the tunable is >> +     not active.  */ > > You also shouldn't bother setting it if /sys/kernel/mm/transparent_hugepage/enabled is set to "enabled" since it's redundant. I think you means 'always' and it should be handled by __malloc_thp_mode() (which would about 'tph_pagesize' to have a value different than 0). I also did not considered 'always' because I saw some results on powerpc where even with 'always' mode issuing the madvise did improve. I am not sure why exactly, it might be the case for the program header that is no sufficient aligned (and with tunable it would be). But maybe for 'always' it would be better to disable the madvise() as well. > >> +  if (mp_.thp_pagesize == 0 || size < mp_.thp_pagesize) >> +    return; >> +  __madvise (p, size, MADV_HUGEPAGE); >> +#endif >> +} >> + >>   /* ------------------- Support for multiple arenas -------------------- */ >>   #include "arena.c" >>   @@ -2446,6 +2465,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) >>               if (mm != MAP_FAILED) >>               { >> +          sysmadvise_thp (mm, size); >> + >>                 /* >>                    The offset to the start of the mmapped region is stored >>                    in the prev_size field of the chunk. This allows us to adjust >> @@ -2607,6 +2628,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) >>         if (size > 0) >>           { >>             brk = (char *) (MORECORE (size)); >> +      if (brk != (char *) (MORECORE_FAILURE)) >> +        sysmadvise_thp (brk, size); >>             LIBC_PROBE (memory_sbrk_more, 2, brk, size); >>           } >>   @@ -2638,6 +2661,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) >>                   if (mbrk != MAP_FAILED) >>                   { >> +          sysmadvise_thp (mbrk, size); >> + >>                     /* We do not need, and cannot use, another sbrk call to find end */ >>                     brk = mbrk; >>                     snd_brk = brk + size; >> @@ -2749,6 +2774,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) >>                         correction = 0; >>                         snd_brk = (char *) (MORECORE (0)); >>                       } >> +          else >> +            sysmadvise_thp (snd_brk, correction); >>                   } >>                   /* handle non-contiguous cases */ >> @@ -2989,6 +3016,8 @@ mremap_chunk (mchunkptr p, size_t new_size) >>     if (cp == MAP_FAILED) >>       return 0; >>   +  sysmadvise_thp (cp, new_size); >> + >>     p = (mchunkptr) (cp + offset); >>       assert (aligned_OK (chunk2mem (p))); >> @@ -5325,6 +5354,25 @@ do_set_mxfast (size_t value) >>     return 0; >>   } >>   +#if HAVE_TUNABLES >> +static __always_inline int >> +do_set_thp_madvise (int32_t value) >> +{ >> +  if (value > 0) >> +    { >> +      enum malloc_thp_mode_t thp_mode = __malloc_thp_mode (); >> +      /* >> +     Only enables THP usage is system does support it and has at least >> +     always or madvise mode.  Otherwise the madvise() call is wasteful. >> +       */ >> +      if (thp_mode != malloc_thp_mode_not_supported >> +      && thp_mode != malloc_thp_mode_never) >> +    mp_.thp_pagesize = __malloc_default_thp_pagesize (); >> +    } >> +  return 0; >> +} >> +#endif >> + >>   int >>   __libc_mallopt (int param_number, int value) >>   { >> diff --git a/manual/tunables.texi b/manual/tunables.texi >> index 658547c613..93c46807f9 100644 >> --- a/manual/tunables.texi >> +++ b/manual/tunables.texi >> @@ -270,6 +270,15 @@ pointer, so add 4 on 32-bit systems or 8 on 64-bit systems to the size >>   passed to @code{malloc} for the largest bin size to enable. >>   @end deftp >>   +@deftp Tunable glibc.malloc.thp_madivse >> +This tunable enable the use of @code{madvise} with @code{MADV_HUGEPAGE} after >> +the system allocator allocated memory through @code{mmap} if the system supports >> +Transparent Huge Page (currently only Linux). >> + >> +The default value of this tunable is @code{0}, which disable its usage. >> +Setting to a positive value enable the @code{madvise} call. >> +@end deftp >> + >>   @node Dynamic Linking Tunables >>   @section Dynamic Linking Tunables >>   @cindex dynamic linking tunables >> diff --git a/sysdeps/generic/Makefile b/sysdeps/generic/Makefile >> index a209e85cc4..8eef83c94d 100644 >> --- a/sysdeps/generic/Makefile >> +++ b/sysdeps/generic/Makefile >> @@ -27,3 +27,11 @@ sysdep_routines += framestate unwind-pe >>   shared-only-routines += framestate unwind-pe >>   endif >>   endif >> + >> +ifeq ($(subdir),malloc) >> +sysdep_malloc_debug_routines += malloc-hugepages >> +endif >> + >> +ifeq ($(subdir),misc) >> +sysdep_routines += malloc-hugepages >> +endif >> diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c >> new file mode 100644 >> index 0000000000..262bcdbeb8 >> --- /dev/null >> +++ b/sysdeps/generic/malloc-hugepages.c >> @@ -0,0 +1,31 @@ >> +/* Huge Page support.  Generic implementation. >> +   Copyright (C) 2021 Free Software Foundation, Inc. >> +   This file is part of the GNU C Library. >> + >> +   The GNU C Library is free software; you can redistribute it and/or >> +   modify it under the terms of the GNU Lesser General Public License as >> +   published by the Free Software Foundation; either version 2.1 of the >> +   License, or (at your option) any later version. >> + >> +   The GNU C Library is distributed in the hope that it will be useful, >> +   but WITHOUT ANY WARRANTY; without even the implied warranty of >> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU >> +   Lesser General Public License for more details. >> + >> +   You should have received a copy of the GNU Lesser General Public >> +   License along with the GNU C Library; see the file COPYING.LIB.  If >> +   not, see .  */ >> + >> +#include >> + >> +size_t >> +__malloc_default_thp_pagesize (void) >> +{ >> +  return 0; >> +} >> + >> +enum malloc_thp_mode_t >> +__malloc_thp_mode (void) >> +{ >> +  return malloc_thp_mode_not_supported; >> +} >> diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h >> new file mode 100644 >> index 0000000000..664cda9b67 >> --- /dev/null >> +++ b/sysdeps/generic/malloc-hugepages.h >> @@ -0,0 +1,37 @@ >> +/* Malloc huge page support.  Generic implementation. >> +   Copyright (C) 2021 Free Software Foundation, Inc. >> +   This file is part of the GNU C Library. >> + >> +   The GNU C Library is free software; you can redistribute it and/or >> +   modify it under the terms of the GNU Lesser General Public License as >> +   published by the Free Software Foundation; either version 2.1 of the >> +   License, or (at your option) any later version. >> + >> +   The GNU C Library is distributed in the hope that it will be useful, >> +   but WITHOUT ANY WARRANTY; without even the implied warranty of >> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU >> +   Lesser General Public License for more details. >> + >> +   You should have received a copy of the GNU Lesser General Public >> +   License along with the GNU C Library; see the file COPYING.LIB.  If >> +   not, see .  */ >> + >> +#ifndef _MALLOC_HUGEPAGES_H >> +#define _MALLOC_HUGEPAGES_H >> + >> +#include >> + >> +/* Return the default transparent huge page size.  */ >> +size_t __malloc_default_thp_pagesize (void) attribute_hidden; >> + >> +enum malloc_thp_mode_t >> +{ >> +  malloc_thp_mode_always, >> +  malloc_thp_mode_madvise, >> +  malloc_thp_mode_never, >> +  malloc_thp_mode_not_supported >> +}; >> + >> +enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden; >> + >> +#endif /* _MALLOC_HUGEPAGES_H */ >> diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c >> new file mode 100644 >> index 0000000000..66589127cd >> --- /dev/null >> +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c >> @@ -0,0 +1,76 @@ >> +/* Huge Page support.  Linux implementation. >> +   Copyright (C) 2021 Free Software Foundation, Inc. >> +   This file is part of the GNU C Library. >> + >> +   The GNU C Library is free software; you can redistribute it and/or >> +   modify it under the terms of the GNU Lesser General Public License as >> +   published by the Free Software Foundation; either version 2.1 of the >> +   License, or (at your option) any later version. >> + >> +   The GNU C Library is distributed in the hope that it will be useful, >> +   but WITHOUT ANY WARRANTY; without even the implied warranty of >> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU >> +   Lesser General Public License for more details. >> + >> +   You should have received a copy of the GNU Lesser General Public >> +   License along with the GNU C Library; see the file COPYING.LIB.  If >> +   not, see .  */ >> + >> +#include >> +#include >> +#include >> + >> +size_t >> +__malloc_default_thp_pagesize (void) >> +{ > > Likewise for page size; this could be cached too. I think there is no much sense to cache since it is used once at malloc initialization. > >> +  int fd = __open64_nocancel ( >> +    "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", O_RDONLY); >> +  if (fd == -1) >> +    return 0; >> + >> + >> +  char str[INT_BUFSIZE_BOUND (size_t)]; >> +  ssize_t s = __read_nocancel (fd, str, sizeof (str)); >> +  __close_nocancel (fd); >> + >> +  if (s < 0) >> +    return 0; >> + >> +  int r = 0; >> +  for (ssize_t i = 0; i < s; i++) >> +    { >> +      if (str[i] == '\n') >> +    break; >> +      r *= 10; >> +      r += str[i] - '0'; >> +    } >> +  return r; >> +} >> + >> +enum malloc_thp_mode_t >> +__malloc_thp_mode (void) >> +{ >> +  int fd = __open64_nocancel ("/sys/kernel/mm/transparent_hugepage/enabled", >> +                  O_RDONLY); >> +  if (fd == -1) >> +    return malloc_thp_mode_not_supported; >> + >> +  static const char mode_always[]  = "[always] madvise never\n"; >> +  static const char mode_madvise[] = "always [madvise] never\n"; >> +  static const char mode_never[]   = "always madvise [never]\n"; >> + >> +  char str[sizeof(mode_always)]; >> +  ssize_t s = __read_nocancel (fd, str, sizeof (str)); >> +  __close_nocancel (fd); >> + >> +  if (s == sizeof (mode_always) - 1) >> +    { >> +      if (strcmp (str, mode_always) == 0) >> +    return malloc_thp_mode_always; >> +      else if (strcmp (str, mode_madvise) == 0) >> +    return malloc_thp_mode_madvise; >> +      else if (strcmp (str, mode_never) == 0) >> +    return malloc_thp_mode_never; >> +    } >> +  return malloc_thp_mode_not_supported; >> +} >> >