public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "Zack Weinberg" <zack@owlfolio.org>
To: "Florian Weimer" <fweimer@redhat.com>,
	"GNU libc development" <libc-alpha@sourceware.org>
Subject: Re: RFC PATCH: Don't use /proc/self/maps to calculate size of initial thread stack
Date: Thu, 15 Sep 2022 12:09:36 -0400	[thread overview]
Message-ID: <9d232b1b-f123-4189-bf09-dd29aab6486a@www.fastmail.com> (raw)
In-Reply-To: <87fsgvvbwq.fsf@oldenburg.str.redhat.com>

On Tue, Sep 13, 2022, at 5:52 AM, Florian Weimer wrote:
> * Zack Weinberg via Libc-alpha:
>> for many years, the NPTL implementation has said that
>> the stack starts at __libc_stack_end, rounded in the opposite
>> direction from stack growth to the nearest page boundary, and extends
>> for getrlimit(RLIMIT_STACK).rlim_cur bytes, *minus the size of the
>> information block*, which is beyond __libc_stack_end.  The rationale
>> is that the resource limit is enforced against the entire memory area,
>> so if we don't subtract the size of the information block, then the
>> program will run out of stack a few pages before pthread_attr_getstack
>> says it will.
>
> Do we actually have to subtract the size of the information block?
> One could argue that this is just part of the arguments passed to main,
> so sort-of-but-not-quite part of main's stack frame.

We could make that change, but we'd need to make other changes as well
to keep everything consistent, and I'm not sure _how_ to make that
change without having the information that pthread_getattr_np is probing for.

Suppose 'stackaddr' and 'stacksize' are the values reported by
pthread_attr_getstack when applied to the initial thread. Then the
invariants I think we need to preserve are:

  stacksize <= getrlimit(RLIMIT_STACK).rlim_cur
  stackaddr % getpagesize() == 0
  if the stack grows downward in memory, it must be OK to grow the
     stack down to, but not necessarily beyond, stackaddr
  conversely, if the stack grows upward, it must be OK to grow the
     stack up to, but not necessarily beyond, stackaddr + stacksize

Now, the entire headache here is that __libc_stack_end is *not*
necessarily page aligned and (on an architecture where the stack grows
downward in memory)

  __libc_stack_end - getrlimit(RLIMIT_STACK).rlim_cur

will be a pointer to somewhere *beyond* the lowest address that the
kernel will enlarge the stack to, even if you round __libc_stack_end
up to the next page boundary before the subtraction.  The function of
the code changed by my patch -- before and after -- is to determine
the actual boundaries of the lazy-allocation region for the initial
thread's stack.

If we changed __libc_stack_end to point to the "bottom" (opposite the
direction of stack growth) of the entire stack region, then we could
simply subtract the rlimit size from it and have stackaddr.  But
that's exactly the challenge: how do we know where that "bottom" is?

I don't know where __libc_stack_end is set.  Early startup code should
be able to do things that pthread_attr_t can't, like "find the
end-most address among all the pointers in argv, envp, and auxv, then
round end-wards to a page boundary" (where "end-most" and "end-wards"
mean "in the direction opposite to stack growth") but that might not
always give the right answer.  I also don't know if there's any
existing code in libc that depends on __libc_stack_end _not_ pointing
past the information block (of course we could always add a new
__libc_info_block_end, or just fill in the initial thread's pthread_t
more thoroughly).

> process_vm_readv seems quite likely to get blocked by seccomp filters.

I was worried about that too :-/

> Maybe we can get the kernel to pass the end of the stack in the
> auxiliary vector?

Sure, but then what do we do on older kernels?  I'm reluctant to say
"keep the old code" because we know this is breaking for people right
now (although honestly "mount /proc earlier" isn't a terrible
suggestion for a workaround).

zw

  parent reply	other threads:[~2022-09-15 16:09 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-09 21:03 Zack Weinberg
2022-09-13  9:52 ` Florian Weimer
2022-09-13 22:03   ` Michael Hudson-Doyle
2022-09-15 16:09   ` Zack Weinberg [this message]
2022-09-20 12:16     ` Florian Weimer
2022-09-21 12:41       ` Zack Weinberg
2022-09-21 13:01         ` Florian Weimer
2022-09-21 20:58     ` Adhemerval Zanella Netto
2022-09-23 14:59       ` Zack Weinberg
2022-09-23 15:24         ` Adhemerval Zanella Netto
2022-09-23 18:57         ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9d232b1b-f123-4189-bf09-dd29aab6486a@www.fastmail.com \
    --to=zack@owlfolio.org \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).