Re: RFC PATCH: Don't use /proc/self/maps to calculate size of initial thread stack

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org
Subject: Re: RFC PATCH: Don't use /proc/self/maps to calculate size of initial thread stack
Date: Wed, 21 Sep 2022 17:58:46 -0300	[thread overview]
Message-ID: <54c6018f-3b1d-84e9-04e5-55c0eca66a4c@linaro.org> (raw)
In-Reply-To: <9d232b1b-f123-4189-bf09-dd29aab6486a@www.fastmail.com>



On 15/09/22 13:09, Zack Weinberg via Libc-alpha wrote:
> On Tue, Sep 13, 2022, at 5:52 AM, Florian Weimer wrote:
>> * Zack Weinberg via Libc-alpha:
>>> for many years, the NPTL implementation has said that
>>> the stack starts at __libc_stack_end, rounded in the opposite
>>> direction from stack growth to the nearest page boundary, and extends
>>> for getrlimit(RLIMIT_STACK).rlim_cur bytes, *minus the size of the
>>> information block*, which is beyond __libc_stack_end.  The rationale
>>> is that the resource limit is enforced against the entire memory area,
>>> so if we don't subtract the size of the information block, then the
>>> program will run out of stack a few pages before pthread_attr_getstack
>>> says it will.
>>
>> Do we actually have to subtract the size of the information block?
>> One could argue that this is just part of the arguments passed to main,
>> so sort-of-but-not-quite part of main's stack frame.
> 
> We could make that change, but we'd need to make other changes as well
> to keep everything consistent, and I'm not sure _how_ to make that
> change without having the information that pthread_getattr_np is probing for.
> 
> Suppose 'stackaddr' and 'stacksize' are the values reported by
> pthread_attr_getstack when applied to the initial thread. Then the
> invariants I think we need to preserve are:
> 
>   stacksize <= getrlimit(RLIMIT_STACK).rlim_cur
>   stackaddr % getpagesize() == 0
>   if the stack grows downward in memory, it must be OK to grow the
>      stack down to, but not necessarily beyond, stackaddr
>   conversely, if the stack grows upward, it must be OK to grow the
>      stack up to, but not necessarily beyond, stackaddr + stacksize
> 
> Now, the entire headache here is that __libc_stack_end is *not*
> necessarily page aligned and (on an architecture where the stack grows
> downward in memory)
> 
>   __libc_stack_end - getrlimit(RLIMIT_STACK).rlim_cur
> 
> will be a pointer to somewhere *beyond* the lowest address that the
> kernel will enlarge the stack to, even if you round __libc_stack_end
> up to the next page boundary before the subtraction.  The function of
> the code changed by my patch -- before and after -- is to determine
> the actual boundaries of the lazy-allocation region for the initial
> thread's stack.
> 
> If we changed __libc_stack_end to point to the "bottom" (opposite the
> direction of stack growth) of the entire stack region, then we could
> simply subtract the rlimit size from it and have stackaddr.  But
> that's exactly the challenge: how do we know where that "bottom" is?
> 
> I don't know where __libc_stack_end is set.  Early startup code should
> be able to do things that pthread_attr_t can't, like "find the
> end-most address among all the pointers in argv, envp, and auxv, then
> round end-wards to a page boundary" (where "end-most" and "end-wards"
> mean "in the direction opposite to stack growth") but that might not
> always give the right answer.  I also don't know if there's any
> existing code in libc that depends on __libc_stack_end _not_ pointing
> past the information block (of course we could always add a new
> __libc_info_block_end, or just fill in the initial thread's pthread_t
> more thoroughly).
> 
>> process_vm_readv seems quite likely to get blocked by seccomp filters.
> 
> I was worried about that too :-/
> 
>> Maybe we can get the kernel to pass the end of the stack in the
>> auxiliary vector?
> 
> Sure, but then what do we do on older kernels?  I'm reluctant to say
> "keep the old code" because we know this is breaking for people right
> now (although honestly "mount /proc earlier" isn't a terrible
> suggestion for a workaround).
> 
> zw

I wonder if we could use inplace mremap (which should be a nop) to inform
a more approximate value for the stack (the code only handles grown down
architecture):

  uintptr_t pagesize = GLRO(dl_pagesize);
  char *stack_end_page = (char*) ALIGN_UP ((uintptr_t) __libc_stack_end,
                                           pagesize);

  size_t stacksize = pagesize;
  while (mremap (stack_end_page - stacksize - pagesize, pagesize,
                 2 * pagesize, 0)
         == MAP_FAILED && errno == ENOMEM)
    stacksize += pagesize;

  iattr->stackaddr = (void *) stack_end_page;
  iattr->stacksize = stacksize;


On x86_64 it does show a value more similar to what [stack] segment reports,
for instance with:

  7ffffffdd000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]

It returns stackaddr as 0x7ffffffdd000 with stacksize as 0x21000.  It does 
not return the same value as current implementation, but I also see that
current implementation returns both the address and size way large than
what /proc/self/maps actually maps for the process.

next prev parent reply	other threads:[~2022-09-21 20:58 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-09 21:03 Zack Weinberg
2022-09-13  9:52 ` Florian Weimer
2022-09-13 22:03   ` Michael Hudson-Doyle
2022-09-15 16:09   ` Zack Weinberg
2022-09-20 12:16     ` Florian Weimer
2022-09-21 12:41       ` Zack Weinberg
2022-09-21 13:01         ` Florian Weimer
2022-09-21 20:58     ` Adhemerval Zanella Netto [this message]
2022-09-23 14:59       ` Zack Weinberg
2022-09-23 15:24         ` Adhemerval Zanella Netto
2022-09-23 18:57         ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54c6018f-3b1d-84e9-04e5-55c0eca66a4c@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).