[RFC] Stack allocation, hugepages and RSS implications

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [RFC] Stack allocation, hugepages and RSS implications
       [not found] <87pm9j4azf.fsf@oracle.com>
@ 2023-03-08 14:17 ` Cupertino Miranda
  2023-03-08 14:53   ` Cristian Rodríguez
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Cupertino Miranda @ 2023-03-08 14:17 UTC (permalink / raw)
  To: libc-alpha; +Cc: Jose E. Marchesi, Elena Zannoni, Cupertino Miranda

Hi everyone,

For performance purposes, one of ours in-house applications requires to enable
TRANSPARENT_HUGEPAGES_ALWAYS option in linux kernel, actually making the
kernel to force all of the big enough and alligned memory allocations to
reside in hugepages.  I believe the reason behind this decision is to
have more control on data location.

For stack allocation, it seems that hugepages make resident set size
(RSS) increase significantly, and without any apparent benefit, as the
huge page will be split in small pages even before leaving glibc stack
allocation code.

As an example, this is what happens in case of a pthread_create with 2MB
stack size:
 1. mmap request for the 2MB allocation with PROT_NONE;
      a huge page is "registered" by the kernel
 2. the thread descriptor is writen in the end of the stack.
      this will trigger a page exception in the kernel which will make the actual
      memory allocation of the 2MB.
 3. an mprotect changes protection on the guard (one of the small pages of the
    allocated space):
      at this point the kernel needs to break the 2MB page into many small pages
      in order to change the protection on that memory region.
      This will eliminate any benefit of having small pages for stack allocation,
      but also makes RSS to be increaded by 2MB even though nothing was
      written to most of the small pages.

As an exercise I added __madvise(..., MADV_NOHUGEPAGE) right after
the __mmap in nptl/allocatestack.c. As expected, RSS was significantly reduced for
the application.

At this point I am very much confident that there is a real benefit in our
particular use case to enforce stacks not ever to use hugepages.

This RFC is to understand if I have missed some option in glibc that would
allow to better control stack allocation.
If not, I am tempted to propose/submit a change, in the form of a tunable, to
enforce NOHUGEPAGES for stacks.

In any case, I wonder if there is an actual use case where an hugepage would
survive glibc stack allocation and will bring an actual benefit.

Looking forward for your comments.

Best regards,
Cupertino

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Stack allocation, hugepages and RSS implications
  2023-03-08 14:17 ` [RFC] Stack allocation, hugepages and RSS implications Cupertino Miranda
@ 2023-03-08 14:53   ` Cristian Rodríguez
  2023-03-08 15:12     ` Cupertino Miranda
  2023-03-08 17:19   ` Adhemerval Zanella Netto
  2023-03-09 10:54   ` Florian Weimer
  2 siblings, 1 reply; 12+ messages in thread
From: Cristian Rodríguez @ 2023-03-08 14:53 UTC (permalink / raw)
  To: Cupertino Miranda
  Cc: libc-alpha, Jose E. Marchesi, Elena Zannoni, Cupertino Miranda

On Wed, Mar 8, 2023 at 11:17 AM Cupertino Miranda via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
>
> Hi everyone,

> In any case, I wonder if there is an actual use case where an hugepage would
> survive glibc stack allocation and will bring an actual benefit.
>
> Looking forward for your comments.

I think you could just post a patch to disable hugepages for this allocation.
Are you sure this is the only case though ? There are probably several
cases where using hugepages is non-optimal.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Stack allocation, hugepages and RSS implications
  2023-03-08 14:53   ` Cristian Rodríguez
@ 2023-03-08 15:12     ` Cupertino Miranda
  0 siblings, 0 replies; 12+ messages in thread
From: Cupertino Miranda @ 2023-03-08 15:12 UTC (permalink / raw)
  To: Cristian Rodríguez
  Cc: libc-alpha, Jose E. Marchesi, Elena Zannoni, Cupertino Miranda


Cristian Rodríguez writes:

> On Wed, Mar 8, 2023 at 11:17 AM Cupertino Miranda via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>>
>> Hi everyone,
>
>> In any case, I wonder if there is an actual use case where an hugepage would
>> survive glibc stack allocation and will bring an actual benefit.
>>
>> Looking forward for your comments.
>
> I think you could just post a patch to disable hugepages for this allocation.
> Are you sure this is the only case though ? There are probably several
> cases where using hugepages is non-optimal.

No, pthread_create itself might not be the only case. I am considering
to make that change in allocate_stack code, which would also fix it for
the main thread stack alloction.

If you are asking about other types of memory allocations, I do not
know.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Stack allocation, hugepages and RSS implications
  2023-03-08 14:17 ` [RFC] Stack allocation, hugepages and RSS implications Cupertino Miranda
  2023-03-08 14:53   ` Cristian Rodríguez
@ 2023-03-08 17:19   ` Adhemerval Zanella Netto
  2023-03-09  9:38     ` Cupertino Miranda
  2023-03-09 10:54   ` Florian Weimer
  2 siblings, 1 reply; 12+ messages in thread
From: Adhemerval Zanella Netto @ 2023-03-08 17:19 UTC (permalink / raw)
  To: Cupertino Miranda, libc-alpha
  Cc: Jose E. Marchesi, Elena Zannoni, Cupertino Miranda



On 08/03/23 11:17, Cupertino Miranda via Libc-alpha wrote:
> 
> Hi everyone,
> 
> For performance purposes, one of ours in-house applications requires to enable
> TRANSPARENT_HUGEPAGES_ALWAYS option in linux kernel, actually making the
> kernel to force all of the big enough and alligned memory allocations to
> reside in hugepages.  I believe the reason behind this decision is to
> have more control on data location.

He have, since 2.35, the glibc.malloc.hugetlb tunables, where setting to 1 
enables MADV_HUGEPAGE madvise for mmap allocated pages if mode is set as 
'madvise' (/sys/kernel/mm/transparent_hugepage/enabled).  One option would 
to use it instead of 'always' and use glibc.malloc.hugetlb=1.

The main drawback of this strategy is this system wide setting, so it
might affect other user/programs as well.

> 
> For stack allocation, it seems that hugepages make resident set size
> (RSS) increase significantly, and without any apparent benefit, as the
> huge page will be split in small pages even before leaving glibc stack
> allocation code.
> 
> As an example, this is what happens in case of a pthread_create with 2MB
> stack size:
>  1. mmap request for the 2MB allocation with PROT_NONE;
>       a huge page is "registered" by the kernel
>  2. the thread descriptor is writen in the end of the stack.
>       this will trigger a page exception in the kernel which will make the actual
>       memory allocation of the 2MB.
>  3. an mprotect changes protection on the guard (one of the small pages of the
>     allocated space):
>       at this point the kernel needs to break the 2MB page into many small pages
>       in order to change the protection on that memory region.
>       This will eliminate any benefit of having small pages for stack allocation,
>       but also makes RSS to be increaded by 2MB even though nothing was
>       written to most of the small pages.
> 
> As an exercise I added __madvise(..., MADV_NOHUGEPAGE) right after
> the __mmap in nptl/allocatestack.c. As expected, RSS was significantly reduced for
> the application.
> 
> At this point I am very much confident that there is a real benefit in our
> particular use case to enforce stacks not ever to use hugepages.
> 
> This RFC is to understand if I have missed some option in glibc that would
> allow to better control stack allocation.
> If not, I am tempted to propose/submit a change, in the form of a tunable, to
> enforce NOHUGEPAGES for stacks.
> 
> In any case, I wonder if there is an actual use case where an hugepage would
> survive glibc stack allocation and will bring an actual benefit.
> 
> Looking forward for your comments.

Maybe also a similar strategy on pthread stack allocation, where if transparent 
hugepages is 'always' and glibc.malloc.hugetlb is 3 we set MADV_NOHUGEPAGE on 
internal mmaps.  So value of '3' means disable THP, which might be confusing
but currently we have '0' as 'use system default'.  It can be also another
tunable, like glibc.hugetlb to decouple from malloc code.

Ideally it will require to cache the __malloc_thp_mode, so we avoid the non 
required mprotected calls, similar to what we need on malloc do_set_hugetlb
(it also assumes that once the programs calls the initial malloc, any system
wide change to THP won't take effect).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Stack allocation, hugepages and RSS implications
  2023-03-08 17:19   ` Adhemerval Zanella Netto
@ 2023-03-09  9:38     ` Cupertino Miranda
  2023-03-09 17:11       ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 12+ messages in thread
From: Cupertino Miranda @ 2023-03-09  9:38 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: libc-alpha, Jose E. Marchesi, Elena Zannoni, Cupertino Miranda


Adhemerval Zanella Netto writes:

> On 08/03/23 11:17, Cupertino Miranda via Libc-alpha wrote:
>>
>> Hi everyone,
>>
>> For performance purposes, one of ours in-house applications requires to enable
>> TRANSPARENT_HUGEPAGES_ALWAYS option in linux kernel, actually making the
>> kernel to force all of the big enough and alligned memory allocations to
>> reside in hugepages.  I believe the reason behind this decision is to
>> have more control on data location.
>
> He have, since 2.35, the glibc.malloc.hugetlb tunables, where setting to 1
> enables MADV_HUGEPAGE madvise for mmap allocated pages if mode is set as
> 'madvise' (/sys/kernel/mm/transparent_hugepage/enabled).  One option would
> to use it instead of 'always' and use glibc.malloc.hugetlb=1.
>
> The main drawback of this strategy is this system wide setting, so it
> might affect other user/programs as well.
>
>>
>> For stack allocation, it seems that hugepages make resident set size
>> (RSS) increase significantly, and without any apparent benefit, as the
>> huge page will be split in small pages even before leaving glibc stack
>> allocation code.
>>
>> As an example, this is what happens in case of a pthread_create with 2MB
>> stack size:
>>  1. mmap request for the 2MB allocation with PROT_NONE;
>>       a huge page is "registered" by the kernel
>>  2. the thread descriptor is writen in the end of the stack.
>>       this will trigger a page exception in the kernel which will make the actual
>>       memory allocation of the 2MB.
>>  3. an mprotect changes protection on the guard (one of the small pages of the
>>     allocated space):
>>       at this point the kernel needs to break the 2MB page into many small pages
>>       in order to change the protection on that memory region.
>>       This will eliminate any benefit of having small pages for stack allocation,
>>       but also makes RSS to be increaded by 2MB even though nothing was
>>       written to most of the small pages.
>>
>> As an exercise I added __madvise(..., MADV_NOHUGEPAGE) right after
>> the __mmap in nptl/allocatestack.c. As expected, RSS was significantly reduced for
>> the application.
>>
>> At this point I am very much confident that there is a real benefit in our
>> particular use case to enforce stacks not ever to use hugepages.
>>
>> This RFC is to understand if I have missed some option in glibc that would
>> allow to better control stack allocation.
>> If not, I am tempted to propose/submit a change, in the form of a tunable, to
>> enforce NOHUGEPAGES for stacks.
>>
>> In any case, I wonder if there is an actual use case where an hugepage would
>> survive glibc stack allocation and will bring an actual benefit.
>>
>> Looking forward for your comments.
>
> Maybe also a similar strategy on pthread stack allocation, where if transparent
> hugepages is 'always' and glibc.malloc.hugetlb is 3 we set MADV_NOHUGEPAGE on
> internal mmaps.  So value of '3' means disable THP, which might be confusing
> but currently we have '0' as 'use system default'.  It can be also another
> tunable, like glibc.hugetlb to decouple from malloc code.
>
The intent would not be to disable hugepages on all internal mmaps, as I
think you said, but rather just do it for stack allocations.
Although more work, I would say if we add this to a tunable then maybe
we should move it from malloc namespace.
If moving it out of malloc is not Ok for backcompatibility reasons, then
I would say create a new tunable specific for the purpose, like
glibc.stack_nohugetlb ?

The more I think about this the less I feel we will ever be able to
practically use hugepages in stacks. We can declare them as such, but
soon enough the kernel would split them in small pages.

> Ideally it will require to cache the __malloc_thp_mode, so we avoid the non
> required mprotected calls, similar to what we need on malloc do_set_hugetlb
> (it also assumes that once the programs calls the initial malloc, any system
> wide change to THP won't take effect).
Very good point. Did not think about this before.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Stack allocation, hugepages and RSS implications
  2023-03-08 14:17 ` [RFC] Stack allocation, hugepages and RSS implications Cupertino Miranda
  2023-03-08 14:53   ` Cristian Rodríguez
  2023-03-08 17:19   ` Adhemerval Zanella Netto
@ 2023-03-09 10:54   ` Florian Weimer
  2023-03-09 14:29     ` Cupertino Miranda
  2 siblings, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2023-03-09 10:54 UTC (permalink / raw)
  To: Cupertino Miranda via Libc-alpha
  Cc: Cupertino Miranda, Jose E. Marchesi, Elena Zannoni, Cupertino Miranda

* Cupertino Miranda via Libc-alpha:

> Hi everyone,
>
> For performance purposes, one of ours in-house applications requires to enable
> TRANSPARENT_HUGEPAGES_ALWAYS option in linux kernel, actually making the
> kernel to force all of the big enough and alligned memory allocations to
> reside in hugepages.  I believe the reason behind this decision is to
> have more control on data location.
>
> For stack allocation, it seems that hugepages make resident set size
> (RSS) increase significantly, and without any apparent benefit, as the
> huge page will be split in small pages even before leaving glibc stack
> allocation code.
>
> As an example, this is what happens in case of a pthread_create with 2MB
> stack size:
>  1. mmap request for the 2MB allocation with PROT_NONE;
>       a huge page is "registered" by the kernel
>  2. the thread descriptor is writen in the end of the stack.
>       this will trigger a page exception in the kernel which will make the actual
>       memory allocation of the 2MB.
>  3. an mprotect changes protection on the guard (one of the small pages of the
>     allocated space):
>       at this point the kernel needs to break the 2MB page into many small pages
>       in order to change the protection on that memory region.
>       This will eliminate any benefit of having small pages for stack allocation,
>       but also makes RSS to be increaded by 2MB even though nothing was
>       written to most of the small pages.
>
> As an exercise I added __madvise(..., MADV_NOHUGEPAGE) right after the
> __mmap in nptl/allocatestack.c. As expected, RSS was significantly
> reduced for the application.

Interesting.  I did not expect to get hugepages right out of mmap.  I
would have expected subsequent coalescing by khugepaged, taking actual
stack usage into account.  But over-allocating memory might be
beneficial, see below.

(Something must be happening between step 1 & 2 to make the writes
possible.)

> In any case, I wonder if there is an actual use case where an hugepage would
> survive glibc stack allocation and will bring an actual benefit.

It can reduce TLB misses.  The first-level TLB might only have 64
entries for 4K pages, for example.  If the working set on the stack
(including the TCB) needs more than a couple of pages, it might
beneficial to use a 2M page and use just one TLB entry.

In your case, if your stacks are quite small, maybe you can just
allocate slightly less than 2 MiB?

The other question is whether the reported RSS is real, or if the kernel
will recover zero stack pages on memory pressure.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Stack allocation, hugepages and RSS implications
  2023-03-09 10:54   ` Florian Weimer
@ 2023-03-09 14:29     ` Cupertino Miranda
  0 siblings, 0 replies; 12+ messages in thread
From: Cupertino Miranda @ 2023-03-09 14:29 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Cupertino Miranda via Libc-alpha, Jose E. Marchesi,
	Elena Zannoni, Cupertino Miranda

[-- Attachment #1: Type: text/plain, Size: 4737 bytes --]


Hi Florian,

>> Hi everyone,
>>
>> For performance purposes, one of ours in-house applications requires to enable
>> TRANSPARENT_HUGEPAGES_ALWAYS option in linux kernel, actually making the
>> kernel to force all of the big enough and alligned memory allocations to
>> reside in hugepages.  I believe the reason behind this decision is to
>> have more control on data location.
>>
>> For stack allocation, it seems that hugepages make resident set size
>> (RSS) increase significantly, and without any apparent benefit, as the
>> huge page will be split in small pages even before leaving glibc stack
>> allocation code.
>>
>> As an example, this is what happens in case of a pthread_create with 2MB
>> stack size:
>>  1. mmap request for the 2MB allocation with PROT_NONE;
>>       a huge page is "registered" by the kernel
>>  2. the thread descriptor is writen in the end of the stack.
>>       this will trigger a page exception in the kernel which will make the actual
>>       memory allocation of the 2MB.
>>  3. an mprotect changes protection on the guard (one of the small pages of the
>>     allocated space):
>>       at this point the kernel needs to break the 2MB page into many small pages
>>       in order to change the protection on that memory region.
>>       This will eliminate any benefit of having small pages for stack allocation,
>>       but also makes RSS to be increaded by 2MB even though nothing was
>>       written to most of the small pages.
>>
>> As an exercise I added __madvise(..., MADV_NOHUGEPAGE) right after the
>> __mmap in nptl/allocatestack.c. As expected, RSS was significantly
>> reduced for the application.
>
> Interesting.  I did not expect to get hugepages right out of mmap.  I
> would have expected subsequent coalescing by khugepaged, taking actual
> stack usage into account.  But over-allocating memory might be
> beneficial, see below.
It is probably not getting the hugepages on mmap. Still the RSS is
growing as if it did.
>
> (Something must be happening between step 1 & 2 to make the writes
> possible.)
Totally right.
Could have explained it better. There is a call to setup_stack_prot that
I believe changes the protection for the stack-related values single
small page.

The write happens right after when you start writting to stack-related
values.
This is the critical point where it makes RSS grow by the hugepage size.

>
>> In any case, I wonder if there is an actual use case where an hugepage would
>> survive glibc stack allocation and will bring an actual benefit.
>
> It can reduce TLB misses.  The first-level TLB might only have 64
> entries for 4K pages, for example.  If the working set on the stack
> (including the TCB) needs more than a couple of pages, it might
> beneficial to use a 2M page and use just one TLB entry.
Indeed it might only not make sense if (guardsize > 0) as it is the case
of the example.
I think that in this case you can never get a hugepage since the guard
TLB entries will be write protected and would have different protection
from the remaining of the stack pages.
At least if you don't plan to allocate more than 2 hugepages.

I believe allocating 2M+4k was considered but it made it hard to control
data location.

> In your case, if your stacks are quite small, maybe you can just
> allocate slightly less than 2 MiB?
>
> The other question is whether the reported RSS is real, or if the kernel
> will recover zero stack pages on memory pressure.
Its a good point. I have no idea if the kernel is capable to recover the
zero stack pages in this particular case. Is there any way to trigger a recover?

In our example (in attach), there is a significant difference in
reported RSS, when we madvise the kernel.
Reported RSS is collected from /proc/self/statm.

# LD_LIBRARY_PATH=${HOME}/glibc_example/lib ./tststackalloc 1
Page size: 4 kB, 2 MB huge pages
Will attempt to align allocations to make stacks eligible for huge pages
pid: 2458323 (/proc/2458323/smaps)
Creating 128 threads...
RSS: 65888 pages (269877248 bytes = 257 MB)

After the madvise is added right before the writes to stack related
values (patch below):

# LD_LIBRARY_PATH=${HOME}/glibc_example/lib ./tststackalloc 1
Page size: 4 kB, 2 MB huge pages
Will attempt to align allocations to make stacks eligible for huge pages
pid: 2463199 (/proc/2463199/smaps)
Creating 128 threads...
RSS: 448 pages (1835008 bytes = 1 MB)

Thanks,
Cupertino

>
> Thanks,
> Florian

@@ -397,6 +397,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
                }
            }

+         __madvise(mem, size, MADV_NOHUGEPAGE);
          /* Remember the stack-related values.  */
          pd->stackblock = mem;
          pd->stackblock_size = size;


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: tststackalloc.c --]
[-- Type: text/x-csrc, Size: 4600 bytes --]

// Compile & run:
//    gcc -Wall -g -o tststackalloc tststackalloc.c $< -lpthread
//    ./tststackalloc 1     # Attempt to use huge pages for stacks -> RSS bloat
//    ./tststackalloc 0     # Do not attempt to use huge pages -> No RSS bloat

#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <unistd.h>
#include <inttypes.h>
#include <sys/mman.h>
#include <fcntl.h>

// Number of threads to create
#define NOOF_THREADS (128)

// Size of a small page (hard-coded)
#define SMALL_PAGE_SIZE (4*1024)

// Size of a huge page (hard-coded)
#define HUGE_PAGE_SIZE (2*1024*1024)

// Total size of the thread stack, including the guard page(s)
#define STACK_SIZE_TOTAL (HUGE_PAGE_SIZE)

// Size of the guard page(s)
#define GUARD_SIZE (SMALL_PAGE_SIZE)

//#define PRINT_STACK_RANGES
//#define PRINT_PROC_SMAPS

// When enabled (set to non-zero), tries to align thread stacks on
// huge page boundaries, making them eligible for huge pages
static int huge_page_align_stacks;

static volatile int exit_thread = 0;

#if defined(PRINT_STACK_RANGES)
static void print_stack_range(void) {
  pthread_attr_t attr;
  void* bottom;
  size_t size;
  int err;

  err = pthread_getattr_np(pthread_self(), &attr);
  if (err != 0) {
    fprintf(stderr, "Error looking up attr\n");
    exit(1);
  }

  err = pthread_attr_getstack(&attr, &bottom, &size);
  if (err != 0) {
    fprintf(stderr, "Cannot locate current stack attributes!\n");
    exit(1);
  }

  pthread_attr_destroy(&attr);

  fprintf(stderr, "Stack: %p-%p (0x%zx/%zd)\n", bottom, bottom + size, size, size);
}
#endif

static void* start(void* arg) {
#if defined(PRINT_STACK_RANGES)
  print_stack_range();
#endif

  while(!exit_thread) {
    sleep(1);
  }
  return NULL;
}

#if defined(PRINT_PROC_SMAPS)
static void print_proc_file(const char* file) {
  char path[128];
  snprintf(path, sizeof(path), "/proc/self/%s", file);
  int smap = open(path, O_RDONLY);
  char buf[4096];
  int x = 0;
  while ((x = read(smap, buf, sizeof(buf))) > 0) {
    write(1, buf, x);
  }
  close(smap);
}
#endif

static size_t get_rss(void) {
  FILE* stat = fopen("/proc/self/statm", "r");
  long rss;
  fscanf(stat, "%*d %ld", &rss);
  return rss;
}

uintptr_t align_down(uintptr_t value, uintptr_t alignment) {
  return value & ~(alignment - 1);
}

// Do a series of small, single page mmap calls to attempt to set
// everything up so that the next mmap call (glibc allocating the
// stack) returns a 2MB aligned range. The kernel "expands" vmas from
// higher to lower addresses (subsequent calls return ranges starting
// at lower addresses), so this function keeps calling mmap until it a
// huge page aligned address is returned. The next range (the stack)
// will then end on that same address.
static void align_next_on(uintptr_t alignment) {
  uintptr_t p;
  do {
    p = (uintptr_t)mmap(NULL, SMALL_PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_NORESERVE, -1, 0);
  } while (p != align_down(p, HUGE_PAGE_SIZE));
}

int main(int argc, char* argv[]) {
  pthread_t t[NOOF_THREADS];
  pthread_attr_t attr;
  int i;

  if (argc != 2) {
    printf("Usage: %s <huge page stacks>\n", argv[0]);
    printf("    huge page stacks = 1 - attempt to use huge pages for stacks\n");
    exit(1);
  }
  huge_page_align_stacks = atoi(argv[1]);

  void* dummy = malloc(1024);
  free(dummy);

  fprintf(stderr, "Page size: %d kB, %d MB huge pages\n", SMALL_PAGE_SIZE / 1024, HUGE_PAGE_SIZE / (1024 * 1024));
  if (huge_page_align_stacks) {
    fprintf(stderr, "Will attempt to align allocations to make stacks eligible for huge pages\n");
  }
  pid_t pid = getpid();
  fprintf(stderr, "pid: %d (/proc/%d/smaps)\n", pid, pid);

  size_t guard_size = GUARD_SIZE;
  size_t stack_size = STACK_SIZE_TOTAL;
  pthread_attr_init(&attr);
  pthread_attr_setstacksize(&attr, stack_size);
  pthread_attr_setguardsize(&attr, guard_size);

  fprintf(stderr, "Creating %d threads...\n", NOOF_THREADS);
  for (i = 0; i < NOOF_THREADS; i++) {
    if (huge_page_align_stacks) {
      // align (next) allocation on huge page boundary
      align_next_on(HUGE_PAGE_SIZE);
    }
    pthread_create(&t[i], &attr, start, NULL);
  }
  sleep(1);

#if defined(PRINT_PROC_SMAPS)
  print_proc_file("smaps");
#endif

  size_t rss = get_rss();
  fprintf(stderr, "RSS: %zd pages (%zd bytes = %zd MB)\n", rss, rss * SMALL_PAGE_SIZE, rss * SMALL_PAGE_SIZE / 1024 / 1024);

  fprintf(stderr, "Press enter to exit...\n");
  getchar();

  exit_thread = 1;
  for (i = 0; i < NOOF_THREADS; i++) {
    pthread_join(t[i], NULL);
  }
  return 0;
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Stack allocation, hugepages and RSS implications
  2023-03-09  9:38     ` Cupertino Miranda
@ 2023-03-09 17:11       ` Adhemerval Zanella Netto
  2023-03-09 18:11         ` Cupertino Miranda
  0 siblings, 1 reply; 12+ messages in thread
From: Adhemerval Zanella Netto @ 2023-03-09 17:11 UTC (permalink / raw)
  To: Cupertino Miranda
  Cc: libc-alpha, Jose E. Marchesi, Elena Zannoni, Cupertino Miranda



On 09/03/23 06:38, Cupertino Miranda wrote:
> 
> Adhemerval Zanella Netto writes:
> 
>> On 08/03/23 11:17, Cupertino Miranda via Libc-alpha wrote:
>>>
>>> Hi everyone,
>>>
>>> For performance purposes, one of ours in-house applications requires to enable
>>> TRANSPARENT_HUGEPAGES_ALWAYS option in linux kernel, actually making the
>>> kernel to force all of the big enough and alligned memory allocations to
>>> reside in hugepages.  I believe the reason behind this decision is to
>>> have more control on data location.
>>
>> He have, since 2.35, the glibc.malloc.hugetlb tunables, where setting to 1
>> enables MADV_HUGEPAGE madvise for mmap allocated pages if mode is set as
>> 'madvise' (/sys/kernel/mm/transparent_hugepage/enabled).  One option would
>> to use it instead of 'always' and use glibc.malloc.hugetlb=1.
>>
>> The main drawback of this strategy is this system wide setting, so it
>> might affect other user/programs as well.
>>
>>>
>>> For stack allocation, it seems that hugepages make resident set size
>>> (RSS) increase significantly, and without any apparent benefit, as the
>>> huge page will be split in small pages even before leaving glibc stack
>>> allocation code.
>>>
>>> As an example, this is what happens in case of a pthread_create with 2MB
>>> stack size:
>>>  1. mmap request for the 2MB allocation with PROT_NONE;
>>>       a huge page is "registered" by the kernel
>>>  2. the thread descriptor is writen in the end of the stack.
>>>       this will trigger a page exception in the kernel which will make the actual
>>>       memory allocation of the 2MB.
>>>  3. an mprotect changes protection on the guard (one of the small pages of the
>>>     allocated space):
>>>       at this point the kernel needs to break the 2MB page into many small pages
>>>       in order to change the protection on that memory region.
>>>       This will eliminate any benefit of having small pages for stack allocation,
>>>       but also makes RSS to be increaded by 2MB even though nothing was
>>>       written to most of the small pages.
>>>
>>> As an exercise I added __madvise(..., MADV_NOHUGEPAGE) right after
>>> the __mmap in nptl/allocatestack.c. As expected, RSS was significantly reduced for
>>> the application.
>>>
>>> At this point I am very much confident that there is a real benefit in our
>>> particular use case to enforce stacks not ever to use hugepages.
>>>
>>> This RFC is to understand if I have missed some option in glibc that would
>>> allow to better control stack allocation.
>>> If not, I am tempted to propose/submit a change, in the form of a tunable, to
>>> enforce NOHUGEPAGES for stacks.
>>>
>>> In any case, I wonder if there is an actual use case where an hugepage would
>>> survive glibc stack allocation and will bring an actual benefit.
>>>
>>> Looking forward for your comments.
>>
>> Maybe also a similar strategy on pthread stack allocation, where if transparent
>> hugepages is 'always' and glibc.malloc.hugetlb is 3 we set MADV_NOHUGEPAGE on
>> internal mmaps.  So value of '3' means disable THP, which might be confusing
>> but currently we have '0' as 'use system default'.  It can be also another
>> tunable, like glibc.hugetlb to decouple from malloc code.
>>
> The intent would not be to disable hugepages on all internal mmaps, as I
> think you said, but rather just do it for stack allocations.
> Although more work, I would say if we add this to a tunable then maybe
> we should move it from malloc namespace.

I was thinking on mmap allocation where internal usage might trigger this
behavior.  If I understood what is happening, since the initial stack is 
aligned to the hugepage size (assuming x86 2MB hugepage and 8MB default 
stack size) and 'always' is set a the policy, the stack will be always
backed up by hugepages.  And then, when the guard page is set at 
setup_stack_prot, it will force the kernel to split and move the stack
to default pages.

It seems to be a pthread specific problem, since I think alloc_new_heap
already mprotect if hugepage it is used.

And I agree with Florian that backing up thread stack with hugepage it might
indeed reduce TLB misses.  However, if you want to optimize to RSS maybe you
can force the total thread stack size to not be multiple of hugepages:

$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
$ grep -w STACK_SIZE_TOTAL tststackalloc.c
#define STACK_SIZE_TOTAL (3 * (HUGE_PAGE_SIZE)) / 4
  size_t stack_size = STACK_SIZE_TOTAL;
$ ./testrun.sh ./tststackalloc 1
Page size: 4 kB, 2 MB huge pages
Will attempt to align allocations to make stacks eligible for huge pages
pid: 342503 (/proc/342503/smaps)
Creating 128 threads...
RSS: 537 pages (2199552 bytes = 2 MB)
Press enter to exit...

$ ./testrun.sh ./tststackalloc 0
Page size: 4 kB, 2 MB huge pages
pid: 342641 (/proc/342641/smaps)
Creating 128 threads...
RSS: 536 pages (2195456 bytes = 2 MB)
Press enter to exit...

But I think a tunable to force it for all stack sizes might be useful indeed.

> If moving it out of malloc is not Ok for backcompatibility reasons, then
> I would say create a new tunable specific for the purpose, like
> glibc.stack_nohugetlb ?

We don't enforce tunable compatibility, but we have the glibc.pthread namespace
already.  Maybe we can use glibc.pthread.stack_hugetlb, with 0 to use the default
and 1 to avoid by call mprotect (we might change this semantic).

> 
> The more I think about this the less I feel we will ever be able to
> practically use hugepages in stacks. We can declare them as such, but
> soon enough the kernel would split them in small pages.
> 
>> Ideally it will require to cache the __malloc_thp_mode, so we avoid the non
>> required mprotected calls, similar to what we need on malloc do_set_hugetlb
>> (it also assumes that once the programs calls the initial malloc, any system
>> wide change to THP won't take effect).
> Very good point. Did not think about this before.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Stack allocation, hugepages and RSS implications
  2023-03-09 17:11       ` Adhemerval Zanella Netto
@ 2023-03-09 18:11         ` Cupertino Miranda
  2023-03-09 18:15           ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 12+ messages in thread
From: Cupertino Miranda @ 2023-03-09 18:11 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: libc-alpha, Jose E. Marchesi, Elena Zannoni, Cupertino Miranda


Adhemerval Zanella Netto writes:

> On 09/03/23 06:38, Cupertino Miranda wrote:
>>
>> Adhemerval Zanella Netto writes:
>>
>>> On 08/03/23 11:17, Cupertino Miranda via Libc-alpha wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> For performance purposes, one of ours in-house applications requires to enable
>>>> TRANSPARENT_HUGEPAGES_ALWAYS option in linux kernel, actually making the
>>>> kernel to force all of the big enough and alligned memory allocations to
>>>> reside in hugepages.  I believe the reason behind this decision is to
>>>> have more control on data location.
>>>
>>> He have, since 2.35, the glibc.malloc.hugetlb tunables, where setting to 1
>>> enables MADV_HUGEPAGE madvise for mmap allocated pages if mode is set as
>>> 'madvise' (/sys/kernel/mm/transparent_hugepage/enabled).  One option would
>>> to use it instead of 'always' and use glibc.malloc.hugetlb=1.
>>>
>>> The main drawback of this strategy is this system wide setting, so it
>>> might affect other user/programs as well.
>>>
>>>>
>>>> For stack allocation, it seems that hugepages make resident set size
>>>> (RSS) increase significantly, and without any apparent benefit, as the
>>>> huge page will be split in small pages even before leaving glibc stack
>>>> allocation code.
>>>>
>>>> As an example, this is what happens in case of a pthread_create with 2MB
>>>> stack size:
>>>>  1. mmap request for the 2MB allocation with PROT_NONE;
>>>>       a huge page is "registered" by the kernel
>>>>  2. the thread descriptor is writen in the end of the stack.
>>>>       this will trigger a page exception in the kernel which will make the actual
>>>>       memory allocation of the 2MB.
>>>>  3. an mprotect changes protection on the guard (one of the small pages of the
>>>>     allocated space):
>>>>       at this point the kernel needs to break the 2MB page into many small pages
>>>>       in order to change the protection on that memory region.
>>>>       This will eliminate any benefit of having small pages for stack allocation,
>>>>       but also makes RSS to be increaded by 2MB even though nothing was
>>>>       written to most of the small pages.
>>>>
>>>> As an exercise I added __madvise(..., MADV_NOHUGEPAGE) right after
>>>> the __mmap in nptl/allocatestack.c. As expected, RSS was significantly reduced for
>>>> the application.
>>>>
>>>> At this point I am very much confident that there is a real benefit in our
>>>> particular use case to enforce stacks not ever to use hugepages.
>>>>
>>>> This RFC is to understand if I have missed some option in glibc that would
>>>> allow to better control stack allocation.
>>>> If not, I am tempted to propose/submit a change, in the form of a tunable, to
>>>> enforce NOHUGEPAGES for stacks.
>>>>
>>>> In any case, I wonder if there is an actual use case where an hugepage would
>>>> survive glibc stack allocation and will bring an actual benefit.
>>>>
>>>> Looking forward for your comments.
>>>
>>> Maybe also a similar strategy on pthread stack allocation, where if transparent
>>> hugepages is 'always' and glibc.malloc.hugetlb is 3 we set MADV_NOHUGEPAGE on
>>> internal mmaps.  So value of '3' means disable THP, which might be confusing
>>> but currently we have '0' as 'use system default'.  It can be also another
>>> tunable, like glibc.hugetlb to decouple from malloc code.
>>>
>> The intent would not be to disable hugepages on all internal mmaps, as I
>> think you said, but rather just do it for stack allocations.
>> Although more work, I would say if we add this to a tunable then maybe
>> we should move it from malloc namespace.
>
> I was thinking on mmap allocation where internal usage might trigger this
> behavior.  If I understood what is happening, since the initial stack is
> aligned to the hugepage size (assuming x86 2MB hugepage and 8MB default
> stack size) and 'always' is set a the policy, the stack will be always
> backed up by hugepages.  And then, when the guard page is set at
> setup_stack_prot, it will force the kernel to split and move the stack
> to default pages.
Yes for the most part I think. Actually I think the kernel makes the
split at the the first write.
At the setup_stack_prot, it could sort of get to the conclusion that the
pages would need to be split, but it does not do it. Only when the write
and page exception occurs it realizes that it needs to split, and it
materializes all of the pages as if the hugepage was already dirty.
In my madvise experiments, only when I madvise after the write it gets
RSS to bloat.

> It seems to be a pthread specific problem, since I think alloc_new_heap
> already mprotect if hugepage it is used.
>
> And I agree with Florian that backing up thread stack with hugepage it might
> indeed reduce TLB misses.  However, if you want to optimize to RSS maybe you
> can force the total thread stack size to not be multiple of hugepages:
Considering the default 8MB stack size, there is nothing to think about,
it definetely is a requirement.
>
> $ cat /sys/kernel/mm/transparent_hugepage/enabled
> [always] madvise never
> $ grep -w STACK_SIZE_TOTAL tststackalloc.c
> #define STACK_SIZE_TOTAL (3 * (HUGE_PAGE_SIZE)) / 4
>   size_t stack_size = STACK_SIZE_TOTAL;
> $ ./testrun.sh ./tststackalloc 1
> Page size: 4 kB, 2 MB huge pages
> Will attempt to align allocations to make stacks eligible for huge pages
> pid: 342503 (/proc/342503/smaps)
> Creating 128 threads...
> RSS: 537 pages (2199552 bytes = 2 MB)
> Press enter to exit...
>
> $ ./testrun.sh ./tststackalloc 0
> Page size: 4 kB, 2 MB huge pages
> pid: 342641 (/proc/342641/smaps)
> Creating 128 threads...
> RSS: 536 pages (2195456 bytes = 2 MB)
> Press enter to exit...
>
> But I think a tunable to force it for all stack sizes might be useful indeed.
>
>> If moving it out of malloc is not Ok for backcompatibility reasons, then
>> I would say create a new tunable specific for the purpose, like
>> glibc.stack_nohugetlb ?
>
> We don't enforce tunable compatibility, but we have the glibc.pthread namespace
> already.  Maybe we can use glibc.pthread.stack_hugetlb, with 0 to use the default
> and 1 to avoid by call mprotect (we might change this semantic).
Will work on the patch right away. I would swap the 0 and the 1,
otherwise it looks in reverse logic. 0 to enable and 1 to disable.

>
>>
>> The more I think about this the less I feel we will ever be able to
>> practically use hugepages in stacks. We can declare them as such, but
>> soon enough the kernel would split them in small pages.
>>
>>> Ideally it will require to cache the __malloc_thp_mode, so we avoid the non
>>> required mprotected calls, similar to what we need on malloc do_set_hugetlb
>>> (it also assumes that once the programs calls the initial malloc, any system
>>> wide change to THP won't take effect).
>> Very good point. Did not think about this before.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Stack allocation, hugepages and RSS implications
  2023-03-09 18:11         ` Cupertino Miranda
@ 2023-03-09 18:15           ` Adhemerval Zanella Netto
  2023-03-09 19:01             ` Cupertino Miranda
  0 siblings, 1 reply; 12+ messages in thread
From: Adhemerval Zanella Netto @ 2023-03-09 18:15 UTC (permalink / raw)
  To: Cupertino Miranda
  Cc: libc-alpha, Jose E. Marchesi, Elena Zannoni, Cupertino Miranda



On 09/03/23 15:11, Cupertino Miranda wrote:
> 
> Adhemerval Zanella Netto writes:
> 
>> On 09/03/23 06:38, Cupertino Miranda wrote:
>>>
>>> Adhemerval Zanella Netto writes:
>>>
>>>> On 08/03/23 11:17, Cupertino Miranda via Libc-alpha wrote:
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>> For performance purposes, one of ours in-house applications requires to enable
>>>>> TRANSPARENT_HUGEPAGES_ALWAYS option in linux kernel, actually making the
>>>>> kernel to force all of the big enough and alligned memory allocations to
>>>>> reside in hugepages.  I believe the reason behind this decision is to
>>>>> have more control on data location.
>>>>
>>>> He have, since 2.35, the glibc.malloc.hugetlb tunables, where setting to 1
>>>> enables MADV_HUGEPAGE madvise for mmap allocated pages if mode is set as
>>>> 'madvise' (/sys/kernel/mm/transparent_hugepage/enabled).  One option would
>>>> to use it instead of 'always' and use glibc.malloc.hugetlb=1.
>>>>
>>>> The main drawback of this strategy is this system wide setting, so it
>>>> might affect other user/programs as well.
>>>>
>>>>>
>>>>> For stack allocation, it seems that hugepages make resident set size
>>>>> (RSS) increase significantly, and without any apparent benefit, as the
>>>>> huge page will be split in small pages even before leaving glibc stack
>>>>> allocation code.
>>>>>
>>>>> As an example, this is what happens in case of a pthread_create with 2MB
>>>>> stack size:
>>>>>  1. mmap request for the 2MB allocation with PROT_NONE;
>>>>>       a huge page is "registered" by the kernel
>>>>>  2. the thread descriptor is writen in the end of the stack.
>>>>>       this will trigger a page exception in the kernel which will make the actual
>>>>>       memory allocation of the 2MB.
>>>>>  3. an mprotect changes protection on the guard (one of the small pages of the
>>>>>     allocated space):
>>>>>       at this point the kernel needs to break the 2MB page into many small pages
>>>>>       in order to change the protection on that memory region.
>>>>>       This will eliminate any benefit of having small pages for stack allocation,
>>>>>       but also makes RSS to be increaded by 2MB even though nothing was
>>>>>       written to most of the small pages.
>>>>>
>>>>> As an exercise I added __madvise(..., MADV_NOHUGEPAGE) right after
>>>>> the __mmap in nptl/allocatestack.c. As expected, RSS was significantly reduced for
>>>>> the application.
>>>>>
>>>>> At this point I am very much confident that there is a real benefit in our
>>>>> particular use case to enforce stacks not ever to use hugepages.
>>>>>
>>>>> This RFC is to understand if I have missed some option in glibc that would
>>>>> allow to better control stack allocation.
>>>>> If not, I am tempted to propose/submit a change, in the form of a tunable, to
>>>>> enforce NOHUGEPAGES for stacks.
>>>>>
>>>>> In any case, I wonder if there is an actual use case where an hugepage would
>>>>> survive glibc stack allocation and will bring an actual benefit.
>>>>>
>>>>> Looking forward for your comments.
>>>>
>>>> Maybe also a similar strategy on pthread stack allocation, where if transparent
>>>> hugepages is 'always' and glibc.malloc.hugetlb is 3 we set MADV_NOHUGEPAGE on
>>>> internal mmaps.  So value of '3' means disable THP, which might be confusing
>>>> but currently we have '0' as 'use system default'.  It can be also another
>>>> tunable, like glibc.hugetlb to decouple from malloc code.
>>>>
>>> The intent would not be to disable hugepages on all internal mmaps, as I
>>> think you said, but rather just do it for stack allocations.
>>> Although more work, I would say if we add this to a tunable then maybe
>>> we should move it from malloc namespace.
>>
>> I was thinking on mmap allocation where internal usage might trigger this
>> behavior.  If I understood what is happening, since the initial stack is
>> aligned to the hugepage size (assuming x86 2MB hugepage and 8MB default
>> stack size) and 'always' is set a the policy, the stack will be always
>> backed up by hugepages.  And then, when the guard page is set at
>> setup_stack_prot, it will force the kernel to split and move the stack
>> to default pages.
> Yes for the most part I think. Actually I think the kernel makes the
> split at the the first write.
> At the setup_stack_prot, it could sort of get to the conclusion that the
> pages would need to be split, but it does not do it. Only when the write
> and page exception occurs it realizes that it needs to split, and it
> materializes all of the pages as if the hugepage was already dirty.
> In my madvise experiments, only when I madvise after the write it gets
> RSS to bloat.

Yes, I expect that COW semantic will actually trigger the page migration.

> 
>> It seems to be a pthread specific problem, since I think alloc_new_heap
>> already mprotect if hugepage it is used.
>>
>> And I agree with Florian that backing up thread stack with hugepage it might
>> indeed reduce TLB misses.  However, if you want to optimize to RSS maybe you
>> can force the total thread stack size to not be multiple of hugepages:
> Considering the default 8MB stack size, there is nothing to think about,
> it definetely is a requirement.

The 8MB come in fact from ulimit -s, but I agree that my suggestion was more
like a hack.


>>
>> $ cat /sys/kernel/mm/transparent_hugepage/enabled
>> [always] madvise never
>> $ grep -w STACK_SIZE_TOTAL tststackalloc.c
>> #define STACK_SIZE_TOTAL (3 * (HUGE_PAGE_SIZE)) / 4
>>   size_t stack_size = STACK_SIZE_TOTAL;
>> $ ./testrun.sh ./tststackalloc 1
>> Page size: 4 kB, 2 MB huge pages
>> Will attempt to align allocations to make stacks eligible for huge pages
>> pid: 342503 (/proc/342503/smaps)
>> Creating 128 threads...
>> RSS: 537 pages (2199552 bytes = 2 MB)
>> Press enter to exit...
>>
>> $ ./testrun.sh ./tststackalloc 0
>> Page size: 4 kB, 2 MB huge pages
>> pid: 342641 (/proc/342641/smaps)
>> Creating 128 threads...
>> RSS: 536 pages (2195456 bytes = 2 MB)
>> Press enter to exit...
>>
>> But I think a tunable to force it for all stack sizes might be useful indeed.
>>
>>> If moving it out of malloc is not Ok for backcompatibility reasons, then
>>> I would say create a new tunable specific for the purpose, like
>>> glibc.stack_nohugetlb ?
>>
>> We don't enforce tunable compatibility, but we have the glibc.pthread namespace
>> already.  Maybe we can use glibc.pthread.stack_hugetlb, with 0 to use the default
>> and 1 to avoid by call mprotect (we might change this semantic).
> Will work on the patch right away. I would swap the 0 and the 1,
> otherwise it looks in reverse logic. 0 to enable and 1 to disable.

It works as well.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Stack allocation, hugepages and RSS implications
  2023-03-09 18:15           ` Adhemerval Zanella Netto
@ 2023-03-09 19:01             ` Cupertino Miranda
  2023-03-09 19:11               ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 12+ messages in thread
From: Cupertino Miranda @ 2023-03-09 19:01 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: libc-alpha, Jose E. Marchesi, Elena Zannoni, Cupertino Miranda


>> Will work on the patch right away. I would swap the 0 and the 1,
>> otherwise it looks in reverse logic. 0 to enable and 1 to disable.
>
> It works as well.

Asked in IRC as well, ignore if you reply there.
I was wondering if there is any incovenient to make the tunable also
writable through an environment variable.
It would allow to limit the "optimization" to only particular executables.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Stack allocation, hugepages and RSS implications
  2023-03-09 19:01             ` Cupertino Miranda
@ 2023-03-09 19:11               ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 12+ messages in thread
From: Adhemerval Zanella Netto @ 2023-03-09 19:11 UTC (permalink / raw)
  To: Cupertino Miranda
  Cc: libc-alpha, Jose E. Marchesi, Elena Zannoni, Cupertino Miranda



On 09/03/23 16:01, Cupertino Miranda wrote:
> 
>>> Will work on the patch right away. I would swap the 0 and the 1,
>>> otherwise it looks in reverse logic. 0 to enable and 1 to disable.
>>
>> It works as well.
> 
> Asked in IRC as well, ignore if you reply there.
> I was wondering if there is any incovenient to make the tunable also
> writable through an environment variable.
> It would allow to limit the "optimization" to only particular executables.

Do you mean also add another environment variable to set it?  If so, I would
advise to avoid; the tunable is already composable (you can specify multiple
options).

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-03-09 19:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <87pm9j4azf.fsf@oracle.com>
2023-03-08 14:17 ` [RFC] Stack allocation, hugepages and RSS implications Cupertino Miranda
2023-03-08 14:53   ` Cristian Rodríguez
2023-03-08 15:12     ` Cupertino Miranda
2023-03-08 17:19   ` Adhemerval Zanella Netto
2023-03-09  9:38     ` Cupertino Miranda
2023-03-09 17:11       ` Adhemerval Zanella Netto
2023-03-09 18:11         ` Cupertino Miranda
2023-03-09 18:15           ` Adhemerval Zanella Netto
2023-03-09 19:01             ` Cupertino Miranda
2023-03-09 19:11               ` Adhemerval Zanella Netto
2023-03-09 10:54   ` Florian Weimer
2023-03-09 14:29     ` Cupertino Miranda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).