public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* RFC PATCH: Don't use /proc/self/maps to calculate size of initial thread stack
@ 2022-09-09 21:03 Zack Weinberg
  2022-09-13  9:52 ` Florian Weimer
  0 siblings, 1 reply; 11+ messages in thread
From: Zack Weinberg @ 2022-09-09 21:03 UTC (permalink / raw)
  To: GNU libc development

When pthread_getattr_np is applied to the initial thread, it has to figure out how big the initial thread's stack is.  Since the initial thread's stack is lazily allocated and the kernel reuses that memory region for the "information block" (argv, environ, etc) there are several different ways one could define "the size of the initial thread's stack"; for many years, the NPTL implementation has said that the stack starts at __libc_stack_end, rounded in the opposite direction from stack growth to the nearest page boundary, and extends for getrlimit(RLIMIT_STACK).rlim_cur bytes, *minus the size of the information block*, which is beyond __libc_stack_end.  The rationale is that the resource limit is enforced against the entire memory area, so if we don't subtract the size of the information block, then the program will run out of stack a few pages before pthread_attr_getstack says it will.

The trouble with this definition is determining how big the information block is.  NPTL has, again for many years, done it by parsing /proc/self/maps, which is less than ideal because (a) it's a text parser deep in the guts of libpthread (and it uses _scanf_, eeew) and (b) it means programs that call pthread_getattr_np on the initial thread have a hidden dependency on /proc being mounted.  This came to my attention because the Rust runtime _always_ calls pthread_getattr_np on the initial thread during startup, in order to set up stack overflow monitoring, causing problems for use of Rust programs during early boot.

This patch changes things so that instead of parsing /proc/self/maps, we take advantage of the fact that process_vm_readv is guaranteed to return a short read when it encounters an unmapped page somewhere in the _middle_ of the "remote" iovec.  So we set up an iovec that reads one byte from each of N pages starting just before __libc_stack_end, feed it to process_vm_readv, and count how many pages were actually readable.

This is arguably just as dirty of a hack -- in particular, process_vm_readv(getpid(), ...) _ought_ to always be allowed, since you can't do anything with it that you couldn't do with direct memory access instructions and possibly a SIGSEGV handler, but it's not apparent to me if it _is_ always allowed; process_vm_readv is a spinoff of ptrace and it uses that set of security checks which are a bit of a mess.  Also it _may_ not be computing exactly the same values.  I wrote a test to compare the new algorithm to the old one, and it passes, but I'm not trying very hard to set up exotic conditions (mmap areas close to the initial stack, for instance).

Since this is a proof-of-concept, and also because the test suite is thoroughly broken for unrelated reasons right now, I haven't bothered setting up PLT bypass for process_vm_readv.

Thoughts?
zw

diff --git a/nptl/Makefile b/nptl/Makefile
index 3d2ce8af8a..fdb6011c34 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -295,6 +295,7 @@ tests = tst-attr2 tst-attr3 tst-default-attr \
 	tst-dlsym1 \
 	tst-context1 \
 	tst-sched1 \
+	tst-initial-thread-stacksize \
 	tst-initializers1 $(addprefix tst-initializers1-,\
 			    c89 gnu89 c99 gnu99 c11 gnu11) \
 	tst-thread_local1 \
diff --git a/nptl/pthread_getattr_np.c b/nptl/pthread_getattr_np.c
index 9c5b73b452..4764c885ab 100644
--- a/nptl/pthread_getattr_np.c
+++ b/nptl/pthread_getattr_np.c
@@ -18,18 +18,125 @@
 #include <assert.h>
 #include <errno.h>
 #include <inttypes.h>
-#include <stdio.h>
-#include <stdio_ext.h>
 #include <stdlib.h>
 #include <string.h>
+#include <unistd.h>
 #include <sys/resource.h>
+#include <sys/uio.h>
 #include "pthreadP.h"
 #include <lowlevellock.h>
 #include <ldsodefs.h>
+#include <shlib-compat.h>
 
+/* Number of pages to probe in each call to process_vm_readv.
+   Must be at least 2 for probe_initial_thread_stack to work
+   correctly.  Larger numbers trade off larger arrays allocated
+   in probe_initial_thread_stack's stack frame for fewer calls
+   to process_vm_readv.
+
+   The current choice of 32 allows us to probe past up to 124KiB
+   of argv, environ, etc. for the end of the stack in a single call
+   to process_vm_readv, while requiring only a little more than
+   1KiB of scratch space.  */
+#define PROBES_PER_CALL 32
+
+/* Subroutine of pthread_getattr_np:
+   Determine the size and base address of the initial thread's stack.
+
+   We assume that the initial thread's stack can grow to fill all the
+   space permitted by the RLIMIT_STACK resource limit, and we treat it
+   as having ended at __libc_stack_end, rounded to a page boundary.
+   Because __libc_stack_end is not the actual end of the memory region
+   containing the stack -- the program arguments, environment
+   variables, auxiliary vector, etc. are all "below" __libc_stack_end
+   -- but the resource limit applies to the entire memory region, we
+   must find the actual end in order to report the size accurately.
+
+   Currently, we find the actual end by taking advantage of the fact
+   that process_vm_readv is guaranteed to stop at the first "remote"
+   iovec entry that touches an unmapped page.  */
+static int
+probe_initial_thread_stack (struct pthread_attr *iattr)
+{
+  uintptr_t pagesize = GLRO(dl_pagesize);
+  uintptr_t stack_end_page =
+    _STACK_GROWS_DOWN
+    ? ((uintptr_t) __libc_stack_end + pagesize - 1) & -pagesize
+    : (uintptr_t) __libc_stack_end & -pagesize;
+
+  struct rlimit rl;
+  if (__getrlimit (RLIMIT_STACK, &rl) != 0)
+    return -1;
+
+  char local_buf[PROBES_PER_CALL];
+  struct iovec local_iov[PROBES_PER_CALL];
+  struct iovec stack_iov[PROBES_PER_CALL];
+
+  /* We're going to read one byte per page of potential stack into
+     local_buf.  The local_iov can be set up for this once in advance,
+     as can the stack_iov's iov_len fields. */
+  for (int i = 0; i < PROBES_PER_CALL; i++)
+    {
+      local_iov[i].iov_base = &local_buf[i];
+      local_iov[i].iov_len = 1;
+      stack_iov[i].iov_len = 1;
+    }
+
+  /* We're looking for the bottommost page of the stack, so the probes
+     should advance in the opposite direction from the direction the
+     stack grows.  */
+  intptr_t probe_stride = pagesize * (_STACK_GROWS_DOWN ? 1 : -1);
+  uintptr_t first_probe_point = stack_end_page;
+  pid_t me = __getpid ();
+
+  /* If necessary, keep probing until we wrap around the address space.  */
+  int ret = -1;
+  while (_STACK_GROWS_DOWN
+	 ? (first_probe_point >= stack_end_page)
+	 : (first_probe_point <= stack_end_page))
+    {
+      /* first_probe_point marks a boundary between addresses known to
+         be accessible, and addresses that might not be accessible.
+         In order to ensure that process_vm_readv gives us a partial
+         read rather than a complete failure when we hit the first
+         address that isn't accessible, start the probes half a page
+         backward (i.e. in the opposite direction from probe_stride)
+         from there.  */
+      for (int i = 0; i < PROBES_PER_CALL; i++)
+	stack_iov[i].iov_base =
+	  (void *) (first_probe_point + i * probe_stride - probe_stride / 2);
+
+      ssize_t n = process_vm_readv (me, local_iov, PROBES_PER_CALL,
+				    stack_iov, PROBES_PER_CALL, 0);
+      if (n < 0)
+	break;
+      if (n < PROBES_PER_CALL)
+	{
+	  /* The 0th probe is _before_ first_probe_point.  */
+	  first_probe_point += (n - 1) * probe_stride;
+	  ret = 0;
+	  break;
+	}
+
+      first_probe_point += PROBES_PER_CALL * probe_stride;
+    }
+
+  if (ret == -1)
+    return -1;
+
+  /* first_probe_point marks the actual boundary of the stack's VM
+     region, and is known to be page-aligned.  Calculate the apparent
+     stack size.  */
+  iattr->stackaddr = (void *) stack_end_page;
+  iattr->stacksize = rl.rlim_cur
+    - (_STACK_GROWS_DOWN
+       ? first_probe_point - stack_end_page
+       : stack_end_page - first_probe_point);
+  return 0;
+}
 
 int
-__pthread_getattr_np (pthread_t thread_id, pthread_attr_t *attr)
+__pthread_getattr_np (pthread_t thread_id, pthread_attr_t * attr)
 {
   struct pthread *thread = (struct pthread *) thread_id;
 
@@ -62,112 +169,22 @@ __pthread_getattr_np (pthread_t thread_id, pthread_attr_t *attr)
   if (__glibc_likely (thread->stackblock != NULL))
     {
       /* The stack size reported to the user should not include the
-	 guard size.  */
+         guard size.  */
       iattr->stacksize = thread->stackblock_size - thread->guardsize;
-#if _STACK_GROWS_DOWN
-      iattr->stackaddr = (char *) thread->stackblock
-			 + thread->stackblock_size;
-#else
       iattr->stackaddr = (char *) thread->stackblock;
+#if _STACK_GROWS_DOWN
+      iattr->stackaddr += thread->stackblock_size;
 #endif
     }
   else
     {
       /* No stack information available.  This must be for the initial
-	 thread.  Get the info in some magical way.  */
-
-      /* Stack size limit.  */
-      struct rlimit rl;
-
-      /* The safest way to get the top of the stack is to read
-	 /proc/self/maps and locate the line into which
-	 __libc_stack_end falls.  */
-      FILE *fp = fopen ("/proc/self/maps", "rce");
-      if (fp == NULL)
+         thread.  Get the info in some magical way.  */
+      if (probe_initial_thread_stack (iattr))
 	ret = errno;
-      /* We need the limit of the stack in any case.  */
-      else
-	{
-	  if (__getrlimit (RLIMIT_STACK, &rl) != 0)
-	    ret = errno;
-	  else
-	    {
-	      /* We consider the main process stack to have ended with
-	         the page containing __libc_stack_end.  There is stuff below
-		 it in the stack too, like the program arguments, environment
-		 variables and auxv info, but we ignore those pages when
-		 returning size so that the output is consistent when the
-		 stack is marked executable due to a loaded DSO requiring
-		 it.  */
-	      void *stack_end = (void *) ((uintptr_t) __libc_stack_end
-					  & -(uintptr_t) GLRO(dl_pagesize));
-#if _STACK_GROWS_DOWN
-	      stack_end += GLRO(dl_pagesize);
-#endif
-	      /* We need no locking.  */
-	      __fsetlocking (fp, FSETLOCKING_BYCALLER);
-
-	      /* Until we found an entry (which should always be the case)
-		 mark the result as a failure.  */
-	      ret = ENOENT;
-
-	      char *line = NULL;
-	      size_t linelen = 0;
-#if _STACK_GROWS_DOWN
-	      uintptr_t last_to = 0;
-#endif
-
-	      while (! feof_unlocked (fp))
-		{
-		  if (__getline (&line, &linelen, fp) <= 0)
-		    break;
-
-		  uintptr_t from;
-		  uintptr_t to;
-		  if (sscanf (line, "%" SCNxPTR "-%" SCNxPTR, &from, &to) != 2)
-		    continue;
-		  if (from <= (uintptr_t) __libc_stack_end
-		      && (uintptr_t) __libc_stack_end < to)
-		    {
-		      /* Found the entry.  Now we have the info we need.  */
-		      iattr->stackaddr = stack_end;
-		      iattr->stacksize =
-		        rl.rlim_cur - (size_t) (to - (uintptr_t) stack_end);
-
-		      /* Cut it down to align it to page size since otherwise we
-		         risk going beyond rlimit when the kernel rounds up the
-		         stack extension request.  */
-		      iattr->stacksize = (iattr->stacksize
-					  & -(intptr_t) GLRO(dl_pagesize));
-#if _STACK_GROWS_DOWN
-		      /* The limit might be too high.  */
-		      if ((size_t) iattr->stacksize
-			  > (size_t) iattr->stackaddr - last_to)
-			iattr->stacksize = (size_t) iattr->stackaddr - last_to;
-#else
-		      /* The limit might be too high.  */
-		      if ((size_t) iattr->stacksize
-			  > to - (size_t) iattr->stackaddr)
-			iattr->stacksize = to - (size_t) iattr->stackaddr;
-#endif
-		      /* We succeed and no need to look further.  */
-		      ret = 0;
-		      break;
-		    }
-#if _STACK_GROWS_DOWN
-		  last_to = to;
-#endif
-		}
-
-	      free (line);
-	    }
-
-	  fclose (fp);
-	}
+      iattr->flags |= ATTR_FLAG_STACKADDR;
     }
 
-  iattr->flags |= ATTR_FLAG_STACKADDR;
-
   if (ret == 0)
     {
       size_t size = 16;
@@ -205,6 +222,7 @@ __pthread_getattr_np (pthread_t thread_id, pthread_attr_t *attr)
 
   return ret;
 }
+
 versioned_symbol (libc, __pthread_getattr_np, pthread_getattr_np, GLIBC_2_32);
 
 #if SHLIB_COMPAT (libc, GLIBC_2_2_3, GLIBC_2_32)
diff --git a/nptl/tst-initial-thread-stacksize.c b/nptl/tst-initial-thread-stacksize.c
new file mode 100644
index 0000000000..3b47e32822
--- /dev/null
+++ b/nptl/tst-initial-thread-stacksize.c
@@ -0,0 +1,163 @@
+/* Copyright (C) 2002-2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <inttypes.h>
+#include <pthread.h>
+#include <stackinfo.h>
+#include <stdio.h>
+#include <stdio_ext.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/resource.h>
+
+#include <support/test-driver.h>
+
+extern void *__libc_stack_end;
+
+/* The old algorithm for determining the stack size of the initial
+   thread.  It relies on parsing /proc/self/maps.
+
+   Writes the base address and size of the stack to STACKADDR_P and
+   STACKSIZE_P respectively; returns 0 on success or -1 on failure.  */
+static int
+get_stack_size_proc (void **stackaddr_p, size_t *stacksize_p)
+{
+  uintptr_t pagesize = sysconf (_SC_PAGESIZE);
+  if (pagesize == (uintptr_t) - 1)
+    return -1;
+
+  /* Page-align the guess.  */
+  uintptr_t stack_end_page =
+    _STACK_GROWS_DOWN
+    ? ((uintptr_t) __libc_stack_end + pagesize - 1) & -pagesize
+    : (uintptr_t) __libc_stack_end & -pagesize;
+
+  struct rlimit rl;
+  if (getrlimit (RLIMIT_STACK, &rl))
+    return -1;
+
+  FILE *fp = fopen ("/proc/self/maps", "rce");
+  if (fp == NULL)
+    return -1;
+  /* We need no locking.  */
+  __fsetlocking (fp, FSETLOCKING_BYCALLER);
+
+  int ret = -1;
+  char *line = NULL;
+  size_t linelen = 0;
+  uintptr_t stackaddr = 0;
+  uintptr_t stacksize = 0;
+  uintptr_t from;
+  uintptr_t to;
+#if _STACK_GROWS_DOWN
+  uintptr_t last_to = 0;
+#endif
+
+  while (!feof_unlocked (fp))
+    {
+      if (getline (&line, &linelen, fp) <= 0)
+	break;
+      if (sscanf (line, "%" SCNxPTR "-%" SCNxPTR, &from, &to) != 2)
+	continue;
+      if (from <= stack_end_page && stack_end_page < to)
+	{
+	  /* Found the entry.  Now we have the info we need.  */
+	  stackaddr = stack_end_page;
+	  stacksize = rl.rlim_cur - (to - stack_end_page);
+	  /* We succeed and no need to look further.  */
+	  ret = 0;
+	  break;
+	}
+#if _STACK_GROWS_DOWN
+      last_to = to;
+#endif
+    }
+
+  free (line);
+  fclose (fp);
+  if (ret == 0)
+    {
+      /* Cut stacksize down to align it to page size since otherwise we
+         risk going beyond rlimit when the kernel rounds up the
+         stack extension request.  */
+      stacksize = stacksize & -pagesize;
+      /* The limit might be too high.  */
+#if _STACK_GROWS_DOWN
+      if (stacksize > stackaddr - last_to)
+	stacksize = stackaddr - last_to;
+#else
+      if (stacksize > to - stackaddr)
+	stacksize = to - stackaddr;
+#endif
+      *stackaddr_p =
+        (void *) (_STACK_GROWS_DOWN ? stackaddr - stacksize : stackaddr);
+      *stacksize_p = stacksize;
+    }
+  return ret;
+}
+
+static int
+do_test (void)
+{
+  void *stackaddr_old, *stackaddr_new;
+  size_t stacksize_old, stacksize_new;
+
+  int perr;
+  pthread_attr_t attr;
+  if ((perr = pthread_getattr_np (pthread_self (), &attr)) != 0)
+    {
+      fprintf (stderr, "pthread_getattr_np: %s\n", strerror (perr));
+      return 1;
+    }
+  if ((perr =
+       pthread_attr_getstack (&attr, &stackaddr_new, &stacksize_new)) != 0)
+    {
+      fprintf (stderr, "pthread_attr_getstack: %s\n", strerror (perr));
+      return 1;
+    }
+
+  if (get_stack_size_proc (&stackaddr_old, &stacksize_old))
+    {
+      perror ("get_stack_size_proc");
+      return 1;
+    }
+
+  int ret = 0;
+  if (stackaddr_old != stackaddr_new)
+    {
+      puts ("ERROR: old != new stackaddr");
+      ret = 1;
+    }
+  if (stacksize_old != stacksize_new)
+    {
+      puts ("ERROR: old != new stacksize");
+      ret = 1;
+    }
+
+  if (ret || test_verbose)
+    {
+      printf ("old: stackaddr=%p stacksize=%zu\n",
+              stackaddr_old, stacksize_old);
+      printf ("new: stackaddr=%p stacksize=%zu\n",
+              stackaddr_new, stacksize_new);
+    }
+
+  return ret;
+}
+
+#include <support/test-driver.c>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-09-23 18:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-09 21:03 RFC PATCH: Don't use /proc/self/maps to calculate size of initial thread stack Zack Weinberg
2022-09-13  9:52 ` Florian Weimer
2022-09-13 22:03   ` Michael Hudson-Doyle
2022-09-15 16:09   ` Zack Weinberg
2022-09-20 12:16     ` Florian Weimer
2022-09-21 12:41       ` Zack Weinberg
2022-09-21 13:01         ` Florian Weimer
2022-09-21 20:58     ` Adhemerval Zanella Netto
2022-09-23 14:59       ` Zack Weinberg
2022-09-23 15:24         ` Adhemerval Zanella Netto
2022-09-23 18:57         ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).