On 03/05/23 09:42, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
>> +static __always_inline int
>> +advise_thp (void *mem, size_t size, char *guard)
>> +{
>> +  enum malloc_thp_mode_t thpmode = __malloc_thp_mode ();
>> +  if (thpmode != malloc_thp_mode_always)
>> +    return 0;
>> +
>> +  unsigned long int thpsize = __malloc_default_thp_pagesize ();
>> +  if (PTR_ALIGN_DOWN (mem, thpsize) != PTR_ALIGN_DOWN (guard, thpsize))
>> +    return 0;
>> +
>> +  return __madvise (mem, size, MADV_NOHUGEPAGE);
>> +}
> 
> This still doesn't make sense since if _STACK_GROWS_DOWN, mem == guard, so
> this will always execute the madvise. 

Yes, if THP is set to always this is exactly the idea of this patch since
afaiu the kernel might still back up the stack with large pages if the 
request a size is smaller than the default THP.  It is only an issue if
the guard page address is not aligned to THP default size, which will
potentially trigger issues Cupertino has brought (since we do not prior
hand which is the mapping flags used on page used to fulfill the allocation). 

> As I mentioned, I couldn't find evidence that
> the claimed scenario of a huge page allocated, written to and then split due to the
> mprotect exists.

I adapted Cupertino original test to allow specify both the thread stack
and guard size by command line.  Just:

$ gcc -std=gnu11 -O2 -g -I. -D_LARGEFILE64_SOURCE=1 -D_GNU_SOURCE   -c -o tststackalloc.o tststackalloc.c
$ echo "always" | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
$ ./tststackalloc -m
[...]
[statm] RSS: 65964 pages (270188544 bytes = 257 MB)
[smaps] RSS: 270327808 bytes = 257 MB
[...]

With either the new tunable or this patch:

$ ld-linux-x86-64.so.2 --library-path . ./tststackalloc  -m
[...]
[statm] RSS: 531 pages (2174976 bytes = 2 MB)
[smaps] RSS: 3002368 bytes = 2 MB
[...]

> 
> So the real issue is that the current stack allocation code randomly (based on
> alignment from previous mmap calls) uses huge pages even for small stacks.

Keep in mind this heuristic is only enabled if THP is set to 'always', meaning
the kernel will try to back *all* the stack with large pages.  The issue is
when the *guard* page is within a large page.