public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Siddhesh Poyarekar <siddhesh@sourceware.org>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>,
	libc-alpha@sourceware.org
Cc: Norbert Manthey <nmanthey@conp-solutions.com>,
	Guillaume Morin <guillaume@morinfr.org>
Subject: Re: [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages
Date: Thu, 19 Aug 2021 00:12:03 +0530	[thread overview]
Message-ID: <0f4f0950-a262-ac78-55f3-f566cd63e416@sourceware.org> (raw)
In-Reply-To: <20210818142000.128752-2-adhemerval.zanella@linaro.org>

On 8/18/21 7:49 PM, Adhemerval Zanella wrote:
> Linux Transparent Huge Pages (THP) current support three different
> states: 'never', 'madvise', and 'always'.  The 'never' is
> self-explanatory and 'always' will enable THP for all anonymous
> memory.  However, 'madvise' is still the default for some system and
> for such case THP will be only used if the memory range is explicity
> advertise by the program through a madvise(MADV_HUGEPAGE) call.
> 
> To enable it a new tunable is provided, 'glibc.malloc.thp_madvise',
> where setting to a value diffent than 0 enables the madvise call.
> Linux current only support one page size for THP, even if the
> architecture supports multiple sizes.
> 
> This patch issues the madvise(MADV_HUGEPAGE) call after a successful
> mmap() call at sysmalloc() with sizes larger than the default huge
> page size.  The madvise() call is disable is system does not support
> THP or if it has the mode set to "never".
> 
> Checked on x86_64-linux-gnu.
> ---
>   NEWS                                       |  5 +-
>   elf/dl-tunables.list                       |  5 ++
>   elf/tst-rtld-list-tunables.exp             |  1 +
>   malloc/arena.c                             |  5 ++
>   malloc/malloc-internal.h                   |  1 +
>   malloc/malloc.c                            | 48 ++++++++++++++
>   manual/tunables.texi                       |  9 +++
>   sysdeps/generic/Makefile                   |  8 +++
>   sysdeps/generic/malloc-hugepages.c         | 31 +++++++++
>   sysdeps/generic/malloc-hugepages.h         | 37 +++++++++++
>   sysdeps/unix/sysv/linux/malloc-hugepages.c | 76 ++++++++++++++++++++++
>   11 files changed, 225 insertions(+), 1 deletion(-)
>   create mode 100644 sysdeps/generic/malloc-hugepages.c
>   create mode 100644 sysdeps/generic/malloc-hugepages.h
>   create mode 100644 sysdeps/unix/sysv/linux/malloc-hugepages.c
> 
> diff --git a/NEWS b/NEWS
> index 79c895e382..9b2345d08c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,7 +9,10 @@ Version 2.35
>   
>   Major new features:
>   
> -  [Add new features here]
> +* On Linux, a new tunable, glibc.malloc.thp_madvise, can be used to
> +  make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk calls.
> +  It might improve performance with Transparent Huge Pages madvise mode
> +  depending of the workload.
>   
>   Deprecated and removed features, and other changes affecting compatibility:
>   
> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
> index 8ddd4a2314..67df6dbc2c 100644
> --- a/elf/dl-tunables.list
> +++ b/elf/dl-tunables.list
> @@ -92,6 +92,11 @@ glibc {
>         minval: 0
>         security_level: SXID_IGNORE
>       }
> +    thp_madvise {
> +      type: INT_32
> +      minval: 0
> +      maxval: 1
> +    }
>     }
>     cpu {
>       hwcap_mask {
> diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp
> index 9f66c52885..d8109fa31c 100644
> --- a/elf/tst-rtld-list-tunables.exp
> +++ b/elf/tst-rtld-list-tunables.exp
> @@ -8,6 +8,7 @@ glibc.malloc.perturb: 0 (min: 0, max: 255)
>   glibc.malloc.tcache_count: 0x0 (min: 0x0, max: 0x[f]+)
>   glibc.malloc.tcache_max: 0x0 (min: 0x0, max: 0x[f]+)
>   glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0x[f]+)
> +glibc.malloc.thp_madvise: 0 (min: 0, max: 1)
>   glibc.malloc.top_pad: 0x0 (min: 0x0, max: 0x[f]+)
>   glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0x[f]+)
>   glibc.rtld.nns: 0x4 (min: 0x1, max: 0x10)
> diff --git a/malloc/arena.c b/malloc/arena.c
> index 667484630e..81bff54303 100644
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -231,6 +231,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_count, size_t)
>   TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t)
>   #endif
>   TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t)
> +TUNABLE_CALLBACK_FNDECL (set_thp_madvise, int32_t)
>   #else
>   /* Initialization routine. */
>   #include <string.h>
> @@ -331,6 +332,7 @@ ptmalloc_init (void)
>   	       TUNABLE_CALLBACK (set_tcache_unsorted_limit));
>   # endif
>     TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast));
> +  TUNABLE_GET (thp_madvise, int32_t, TUNABLE_CALLBACK (set_thp_madvise));
>   #else
>     if (__glibc_likely (_environ != NULL))
>       {
> @@ -509,6 +511,9 @@ new_heap (size_t size, size_t top_pad)
>         __munmap (p2, HEAP_MAX_SIZE);
>         return 0;
>       }
> +
> +  sysmadvise_thp (p2, size);
> +
>     h = (heap_info *) p2;
>     h->size = size;
>     h->mprotect_size = size;
> diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h
> index 0c7b5a183c..7493e34d86 100644
> --- a/malloc/malloc-internal.h
> +++ b/malloc/malloc-internal.h
> @@ -22,6 +22,7 @@
>   #include <malloc-machine.h>
>   #include <malloc-sysdep.h>
>   #include <malloc-size.h>
> +#include <malloc-hugepages.h>
>   
>   /* Called in the parent process before a fork.  */
>   void __malloc_fork_lock_parent (void) attribute_hidden;
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index e065785af7..ad3eec41ac 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -1881,6 +1881,11 @@ struct malloc_par
>     INTERNAL_SIZE_T arena_test;
>     INTERNAL_SIZE_T arena_max;
>   
> +#if HAVE_TUNABLES
> +  /* Transparent Large Page support.  */
> +  INTERNAL_SIZE_T thp_pagesize;
> +#endif
> +
>     /* Memory map support */
>     int n_mmaps;
>     int n_mmaps_max;
> @@ -2009,6 +2014,20 @@ free_perturb (char *p, size_t n)
>   
>   #include <stap-probe.h>
>   
> +/* ----------- Routines dealing with transparent huge pages ----------- */
> +
> +static inline void
> +sysmadvise_thp (void *p, INTERNAL_SIZE_T size)
> +{
> +#if HAVE_TUNABLES && defined (MADV_HUGEPAGE)
> +  /* Do not consider areas smaller than a huge page or if the tunable is
> +     not active.  */

You also shouldn't bother setting it if 
/sys/kernel/mm/transparent_hugepage/enabled is set to "enabled" since 
it's redundant.

> +  if (mp_.thp_pagesize == 0 || size < mp_.thp_pagesize)
> +    return;
> +  __madvise (p, size, MADV_HUGEPAGE);
> +#endif
> +}
> +
>   /* ------------------- Support for multiple arenas -------------------- */
>   #include "arena.c"
>   
> @@ -2446,6 +2465,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>   
>             if (mm != MAP_FAILED)
>               {
> +	      sysmadvise_thp (mm, size);
> +
>                 /*
>                    The offset to the start of the mmapped region is stored
>                    in the prev_size field of the chunk. This allows us to adjust
> @@ -2607,6 +2628,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>         if (size > 0)
>           {
>             brk = (char *) (MORECORE (size));
> +	  if (brk != (char *) (MORECORE_FAILURE))
> +	    sysmadvise_thp (brk, size);
>             LIBC_PROBE (memory_sbrk_more, 2, brk, size);
>           }
>   
> @@ -2638,6 +2661,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>   
>                 if (mbrk != MAP_FAILED)
>                   {
> +		  sysmadvise_thp (mbrk, size);
> +
>                     /* We do not need, and cannot use, another sbrk call to find end */
>                     brk = mbrk;
>                     snd_brk = brk + size;
> @@ -2749,6 +2774,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>                         correction = 0;
>                         snd_brk = (char *) (MORECORE (0));
>                       }
> +		  else
> +		    sysmadvise_thp (snd_brk, correction);
>                   }
>   
>                 /* handle non-contiguous cases */
> @@ -2989,6 +3016,8 @@ mremap_chunk (mchunkptr p, size_t new_size)
>     if (cp == MAP_FAILED)
>       return 0;
>   
> +  sysmadvise_thp (cp, new_size);
> +
>     p = (mchunkptr) (cp + offset);
>   
>     assert (aligned_OK (chunk2mem (p)));
> @@ -5325,6 +5354,25 @@ do_set_mxfast (size_t value)
>     return 0;
>   }
>   
> +#if HAVE_TUNABLES
> +static __always_inline int
> +do_set_thp_madvise (int32_t value)
> +{
> +  if (value > 0)
> +    {
> +      enum malloc_thp_mode_t thp_mode = __malloc_thp_mode ();
> +      /*
> +	 Only enables THP usage is system does support it and has at least
> +	 always or madvise mode.  Otherwise the madvise() call is wasteful.
> +       */
> +      if (thp_mode != malloc_thp_mode_not_supported
> +	  && thp_mode != malloc_thp_mode_never)
> +	mp_.thp_pagesize = __malloc_default_thp_pagesize ();
> +    }
> +  return 0;
> +}
> +#endif
> +
>   int
>   __libc_mallopt (int param_number, int value)
>   {
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index 658547c613..93c46807f9 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -270,6 +270,15 @@ pointer, so add 4 on 32-bit systems or 8 on 64-bit systems to the size
>   passed to @code{malloc} for the largest bin size to enable.
>   @end deftp
>   
> +@deftp Tunable glibc.malloc.thp_madivse
> +This tunable enable the use of @code{madvise} with @code{MADV_HUGEPAGE} after
> +the system allocator allocated memory through @code{mmap} if the system supports
> +Transparent Huge Page (currently only Linux).
> +
> +The default value of this tunable is @code{0}, which disable its usage.
> +Setting to a positive value enable the @code{madvise} call.
> +@end deftp
> +
>   @node Dynamic Linking Tunables
>   @section Dynamic Linking Tunables
>   @cindex dynamic linking tunables
> diff --git a/sysdeps/generic/Makefile b/sysdeps/generic/Makefile
> index a209e85cc4..8eef83c94d 100644
> --- a/sysdeps/generic/Makefile
> +++ b/sysdeps/generic/Makefile
> @@ -27,3 +27,11 @@ sysdep_routines += framestate unwind-pe
>   shared-only-routines += framestate unwind-pe
>   endif
>   endif
> +
> +ifeq ($(subdir),malloc)
> +sysdep_malloc_debug_routines += malloc-hugepages
> +endif
> +
> +ifeq ($(subdir),misc)
> +sysdep_routines += malloc-hugepages
> +endif
> diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c
> new file mode 100644
> index 0000000000..262bcdbeb8
> --- /dev/null
> +++ b/sysdeps/generic/malloc-hugepages.c
> @@ -0,0 +1,31 @@
> +/* Huge Page support.  Generic implementation.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public License as
> +   published by the Free Software Foundation; either version 2.1 of the
> +   License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; see the file COPYING.LIB.  If
> +   not, see <https://www.gnu.org/licenses/>.  */
> +
> +#include <malloc-hugepages.h>
> +
> +size_t
> +__malloc_default_thp_pagesize (void)
> +{
> +  return 0;
> +}
> +
> +enum malloc_thp_mode_t
> +__malloc_thp_mode (void)
> +{
> +  return malloc_thp_mode_not_supported;
> +}
> diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h
> new file mode 100644
> index 0000000000..664cda9b67
> --- /dev/null
> +++ b/sysdeps/generic/malloc-hugepages.h
> @@ -0,0 +1,37 @@
> +/* Malloc huge page support.  Generic implementation.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public License as
> +   published by the Free Software Foundation; either version 2.1 of the
> +   License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; see the file COPYING.LIB.  If
> +   not, see <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _MALLOC_HUGEPAGES_H
> +#define _MALLOC_HUGEPAGES_H
> +
> +#include <stddef.h>
> +
> +/* Return the default transparent huge page size.  */
> +size_t __malloc_default_thp_pagesize (void) attribute_hidden;
> +
> +enum malloc_thp_mode_t
> +{
> +  malloc_thp_mode_always,
> +  malloc_thp_mode_madvise,
> +  malloc_thp_mode_never,
> +  malloc_thp_mode_not_supported
> +};
> +
> +enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden;
> +
> +#endif /* _MALLOC_HUGEPAGES_H */
> diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c
> new file mode 100644
> index 0000000000..66589127cd
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c
> @@ -0,0 +1,76 @@
> +/* Huge Page support.  Linux implementation.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public License as
> +   published by the Free Software Foundation; either version 2.1 of the
> +   License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; see the file COPYING.LIB.  If
> +   not, see <https://www.gnu.org/licenses/>.  */
> +
> +#include <intprops.h>
> +#include <malloc-hugepages.h>
> +#include <not-cancel.h>
> +
> +size_t
> +__malloc_default_thp_pagesize (void)
> +{

Likewise for page size; this could be cached too.

> +  int fd = __open64_nocancel (
> +    "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", O_RDONLY);
> +  if (fd == -1)
> +    return 0;
> +
> +
> +  char str[INT_BUFSIZE_BOUND (size_t)];
> +  ssize_t s = __read_nocancel (fd, str, sizeof (str));
> +  __close_nocancel (fd);
> +
> +  if (s < 0)
> +    return 0;
> +
> +  int r = 0;
> +  for (ssize_t i = 0; i < s; i++)
> +    {
> +      if (str[i] == '\n')
> +	break;
> +      r *= 10;
> +      r += str[i] - '0';
> +    }
> +  return r;
> +}
> +
> +enum malloc_thp_mode_t
> +__malloc_thp_mode (void)
> +{
> +  int fd = __open64_nocancel ("/sys/kernel/mm/transparent_hugepage/enabled",
> +			      O_RDONLY);
> +  if (fd == -1)
> +    return malloc_thp_mode_not_supported;
> +
> +  static const char mode_always[]  = "[always] madvise never\n";
> +  static const char mode_madvise[] = "always [madvise] never\n";
> +  static const char mode_never[]   = "always madvise [never]\n";
> +
> +  char str[sizeof(mode_always)];
> +  ssize_t s = __read_nocancel (fd, str, sizeof (str));
> +  __close_nocancel (fd);
> +
> +  if (s == sizeof (mode_always) - 1)
> +    {
> +      if (strcmp (str, mode_always) == 0)
> +	return malloc_thp_mode_always;
> +      else if (strcmp (str, mode_madvise) == 0)
> +	return malloc_thp_mode_madvise;
> +      else if (strcmp (str, mode_never) == 0)
> +	return malloc_thp_mode_never;
> +    }
> +  return malloc_thp_mode_not_supported;
> +}
> 


  reply	other threads:[~2021-08-18 18:42 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-18 14:19 [PATCH v2 0/4] malloc: Improve Huge Page support Adhemerval Zanella
2021-08-18 14:19 ` [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages Adhemerval Zanella
2021-08-18 18:42   ` Siddhesh Poyarekar [this message]
2021-08-19 12:00     ` Adhemerval Zanella
2021-08-19 12:22       ` Siddhesh Poyarekar
2021-08-18 14:19 ` [PATCH v2 2/4] malloc: Add THP/madvise support for sbrk Adhemerval Zanella
2021-08-18 14:19 ` [PATCH v2 3/4] malloc: Move mmap logic to its own function Adhemerval Zanella
2021-08-19  0:47   ` Siddhesh Poyarekar
2021-08-18 14:20 ` [PATCH v2 4/4] malloc: Add Huge Page support for sysmalloc Adhemerval Zanella
2021-08-19  1:03   ` Siddhesh Poyarekar
2021-08-19 12:08     ` Adhemerval Zanella
2021-08-19 17:58   ` Matheus Castanho
2021-08-19 18:50     ` Adhemerval Zanella
2021-08-20 12:34       ` Matheus Castanho
2021-08-18 18:11 ` [PATCH v2 0/4] malloc: Improve Huge Page support Siddhesh Poyarekar
2021-08-19 11:26   ` Adhemerval Zanella
2021-08-19 11:48     ` Siddhesh Poyarekar
2021-08-19 12:04       ` Adhemerval Zanella
2021-08-19 12:26         ` Siddhesh Poyarekar
2021-08-19 12:42           ` Adhemerval Zanella
2021-08-19 16:42 ` Guillaume Morin
2021-08-19 16:55   ` Adhemerval Zanella
2021-08-19 17:17     ` Guillaume Morin
2021-08-19 17:27       ` Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0f4f0950-a262-ac78-55f3-f566cd63e416@sourceware.org \
    --to=siddhesh@sourceware.org \
    --cc=adhemerval.zanella@linaro.org \
    --cc=guillaume@morinfr.org \
    --cc=libc-alpha@sourceware.org \
    --cc=nmanthey@conp-solutions.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).