public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 00/13] Linux: Move most stack management out of libpthread
@ 2021-05-06 18:08 Florian Weimer
  2021-05-06 18:08 ` [PATCH 01/13] scripts/versions.awk: Add strings and hashes to <first-versions.h> Florian Weimer
                   ` (13 more replies)
  0 siblings, 14 replies; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:08 UTC (permalink / raw)
  To: libc-alpha

This incorporates the previous “nptl: Remove delayed rtld lock
initialization” series.

Tested on i686-linux-gnu and x86_64-linux-gnu.  Built with
build-many-glibcs.py.

Thanks,
Florian

Florian Weimer (13):
  scripts/versions.awk: Add strings and hashes to <first-versions.h>
  elf, nptl: Resolve recursive lock implementation early
  nptl: Export __libc_multiple_threads from libc as an internal symbol
  Linux: Explicitly disable cancellation checking in the dynamic loader
  Linux: Simplify and fix the definition of SINGLE_THREAD_P
  nptl: Eliminate __pthread_multiple_threads
  elf: Introduce __tls_pre_init_tp
  nptl: Move more stack management variables into _rtld_global
  nptl: Simplify the change_stack_perm calling convention
  nptl: Move changing of stack permissions into ld.so
  nptl: Simplify resetting the in-flight stack in __reclaim_stacks
  nptl: Move __default_pthread_attr, __default_pthread_attr_lock into
    libc
  Linux: Move __reclaim_stacks into the fork implementation in libc

 csu/libc-tls.c                          |   2 +
 elf/Makefile                            |   3 +-
 elf/dl-load.c                           |   4 +
 elf/dl-mutex.c                          |  19 ++
 elf/dl-support.c                        |  13 +-
 elf/dl-tls_init_tp.c                    |  29 +++
 elf/rtld.c                              |  34 +---
 nptl/Makefile                           |   2 +-
 nptl/Versions                           |   4 +-
 nptl/allocatestack.c                    | 227 ++----------------------
 nptl/libc_multiple_threads.c            |   3 +-
 nptl/libc_pthread_init.c                |  11 --
 nptl/nptl-init.c                        |  24 ---
 nptl/pthreadP.h                         |  33 ++--
 nptl/pthread_cancel.c                   |   2 +-
 nptl/vars.c                             |  15 +-
 scripts/versions.awk                    |  36 ++++
 sysdeps/generic/ldsodefs.h              |  51 +++++-
 sysdeps/nptl/dl-mutex.c                 |  53 ++++++
 sysdeps/nptl/dl-tls_init_tp.c           |  26 ++-
 sysdeps/nptl/fork.c                     | 110 ++++++++++++
 sysdeps/nptl/libc-lockP.h               |  17 +-
 sysdeps/unix/sysdep.h                   |  11 +-
 sysdeps/unix/sysv/linux/Versions        |   6 +
 sysdeps/unix/sysv/linux/dl-execstack.c  |  76 +++++++-
 sysdeps/unix/sysv/linux/single-thread.h |  42 ++---
 26 files changed, 481 insertions(+), 372 deletions(-)
 create mode 100644 elf/dl-mutex.c
 create mode 100644 sysdeps/nptl/dl-mutex.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/13] scripts/versions.awk: Add strings and hashes to <first-versions.h>
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
@ 2021-05-06 18:08 ` Florian Weimer
  2021-05-09 21:42   ` Carlos O'Donell
  2021-05-06 18:09 ` [PATCH v2 02/13] elf, nptl: Resolve recursive lock implementation early Florian Weimer
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:08 UTC (permalink / raw)
  To: libc-alpha

This generates new macros of this from:

#define FIRST_VERSION_libc___pthread_mutex_lock_STRING "GLIBC_2.2.5"
#define FIRST_VERSION_libc___pthread_mutex_lock_HASH 0x9691a75

They are useful for symbol lookups using _dl_lookup_direct.
---
 scripts/versions.awk | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/scripts/versions.awk b/scripts/versions.awk
index d56f4e712c..3291123666 100644
--- a/scripts/versions.awk
+++ b/scripts/versions.awk
@@ -32,6 +32,29 @@ BEGIN {
   sort = "sort -t. -k 1,1 -k 2n,2n -k 3 > " tmpfile;
 }
 
+# GNU awk does not implement the ord and chr functions.
+# <https://www.gnu.org/software/gawk/manual/html_node/Ordinal-Functions.html>
+# says that they are "written very nicely", using code similar to what
+# is included here.
+function chr(c) {
+    return sprintf("%c", c)
+}
+
+BEGIN {
+    for (c = 1; c < 127; c++) {
+	ord_table[chr(c)] = c;
+    }
+}
+
+function ord(c) {
+    if (ord_table[c]) {
+	return ord_table[c];
+    } else {
+	printf("Invalid character reference: '%c'\n", c) > "/dev/stderr";
+	++lossage;
+    }
+}
+
 # Remove comment lines.
 /^ *#/ {
   next;
@@ -90,6 +113,17 @@ function close_and_move(name, real_name) {
   system(move_if_change " " name " " real_name " >&2");
 }
 
+# ELF hash, for use with symbol versions.
+function elf_hash(s, i, acc) {
+  acc = 0;
+  for (i = 1; i <= length(s); ++i) {
+      acc = and(lshift(acc, 4) + ord(substr(s, i, 1)), 0xffffffff);
+      top = and(acc, 0xf0000000);
+      acc = and(xor(acc, rshift(top, 24)), compl(top));
+  }
+  return acc;
+}
+
 # Now print the accumulated information.
 END {
   close(sort);
@@ -145,6 +179,8 @@ END {
 	  && oldver ~ "^GLIBC_[0-9]" \
 	  && sym ~ "^[A-Za-z0-9_]*$") {
 	ver_val = oldver;
+	printf("#define %s_STRING \"%s\"\n", first_ver_macro, ver_val) > first_ver_header;
+	printf("#define %s_HASH 0x%x\n", first_ver_macro, elf_hash(ver_val)) > first_ver_header;
 	gsub("\\.", "_", ver_val);
 	printf("#define %s %s\n", first_ver_macro, ver_val) > first_ver_header;
 	first_ver_seen[first_ver_macro] = 1;
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 02/13] elf, nptl: Resolve recursive lock implementation early
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
  2021-05-06 18:08 ` [PATCH 01/13] scripts/versions.awk: Add strings and hashes to <first-versions.h> Florian Weimer
@ 2021-05-06 18:09 ` Florian Weimer
  2021-05-09 21:42   ` Carlos O'Donell
  2021-05-06 18:10 ` [PATCH 03/13] nptl: Export __libc_multiple_threads from libc as an internal symbol Florian Weimer
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:09 UTC (permalink / raw)
  To: libc-alpha

If libpthread is included in libc, it is not necessary to delay
initialization of the lock/unlock function pointers until libpthread
is loaded.  This eliminates two unprotected function pointers
from _rtld_global and removes some initialization code from
libpthread.
---
v2: Rename dl-lock.c into dl-mutex.c and use a sysdeps override instead
    of a preprocessor conditional.

 elf/Makefile               |  3 ++-
 elf/dl-mutex.c             | 19 ++++++++++++++
 elf/rtld.c                 | 18 +++++++++++++
 nptl/nptl-init.c           |  9 -------
 sysdeps/generic/ldsodefs.h | 25 +++++++++++++++++-
 sysdeps/nptl/dl-mutex.c    | 53 ++++++++++++++++++++++++++++++++++++++
 sysdeps/nptl/libc-lockP.h  | 17 +++---------
 7 files changed, 120 insertions(+), 24 deletions(-)
 create mode 100644 elf/dl-mutex.c
 create mode 100644 sysdeps/nptl/dl-mutex.c

diff --git a/elf/Makefile b/elf/Makefile
index 4f99af626f..d3e909637a 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -66,7 +66,8 @@ elide-routines.os = $(all-dl-routines) dl-support enbl-secure dl-origin \
 # interpreter and operating independent of libc.
 rtld-routines	= rtld $(all-dl-routines) dl-sysdep dl-environ dl-minimal \
   dl-error-minimal dl-conflict dl-hwcaps dl-hwcaps_split dl-hwcaps-subdirs \
-  dl-usage dl-diagnostics dl-diagnostics-kernel dl-diagnostics-cpu
+  dl-usage dl-diagnostics dl-diagnostics-kernel dl-diagnostics-cpu \
+  dl-mutex
 all-rtld-routines = $(rtld-routines) $(sysdep-rtld-routines)
 
 CFLAGS-dl-runtime.c += -fexceptions -fasynchronous-unwind-tables
diff --git a/elf/dl-mutex.c b/elf/dl-mutex.c
new file mode 100644
index 0000000000..2cd9d49c2e
--- /dev/null
+++ b/elf/dl-mutex.c
@@ -0,0 +1,19 @@
+/* Recursive locking implementation for the dynamic loader.  Generic version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* The generic version initialization happpens in dl_main.  */
diff --git a/elf/rtld.c b/elf/rtld.c
index ad325d4c10..a359167f8a 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -857,6 +857,14 @@ rtld_lock_default_unlock_recursive (void *lock)
   __rtld_lock_default_unlock_recursive (lock);
 }
 #endif
+#if PTHREAD_IN_LIBC
+/* Dummy implementation.  See __rtld_mutex_init.  */
+static int
+rtld_mutex_dummy (pthread_mutex_t *lock)
+{
+  return 0;
+}
+#endif
 
 
 static void
@@ -1148,6 +1156,10 @@ dl_main (const ElfW(Phdr) *phdr,
   GL(dl_rtld_lock_recursive) = rtld_lock_default_lock_recursive;
   GL(dl_rtld_unlock_recursive) = rtld_lock_default_unlock_recursive;
 #endif
+#if PTHREAD_IN_LIBC
+  ___rtld_mutex_lock = rtld_mutex_dummy;
+  ___rtld_mutex_unlock = rtld_mutex_dummy;
+#endif
 
   /* The explicit initialization here is cheaper than processing the reloc
      in the _rtld_local definition's initializer.  */
@@ -2363,6 +2375,9 @@ dl_main (const ElfW(Phdr) *phdr,
 	 loader.  */
       __rtld_malloc_init_real (main_map);
 
+      /* Likewise for the locking implementation.  */
+      __rtld_mutex_init ();
+
       /* Mark all the objects so we know they have been already relocated.  */
       for (struct link_map *l = main_map; l != NULL; l = l->l_next)
 	{
@@ -2468,6 +2483,9 @@ dl_main (const ElfW(Phdr) *phdr,
 	 at this point.  */
       __rtld_malloc_init_real (main_map);
 
+      /* Likewise for the locking implementation.  */
+      __rtld_mutex_init ();
+
       RTLD_TIMING_VAR (start);
       rtld_timer_start (&start);
 
diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c
index fcab5a0904..2724770533 100644
--- a/nptl/nptl-init.c
+++ b/nptl/nptl-init.c
@@ -179,15 +179,6 @@ __pthread_initialize_minimal_internal (void)
   lll_unlock (__default_pthread_attr_lock, LLL_PRIVATE);
 
 #ifdef SHARED
-  /* Make __rtld_lock_{,un}lock_recursive use pthread_mutex_{,un}lock,
-     keep the lock count from the ld.so implementation.  */
-  GL(dl_rtld_lock_recursive) = (void *) __pthread_mutex_lock;
-  GL(dl_rtld_unlock_recursive) = (void *) __pthread_mutex_unlock;
-  unsigned int rtld_lock_count = GL(dl_load_lock).mutex.__data.__count;
-  GL(dl_load_lock).mutex.__data.__count = 0;
-  while (rtld_lock_count-- > 0)
-    __pthread_mutex_lock (&GL(dl_load_lock).mutex);
-
   GL(dl_make_stack_executable_hook) = &__make_stacks_executable;
 #endif
 
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 1b064c5894..6d590d1335 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -403,7 +403,7 @@ struct rtld_global
   struct auditstate _dl_rtld_auditstate[DL_NNS];
 #endif
 
-#if defined SHARED && defined _LIBC_REENTRANT \
+#if !PTHREAD_IN_LIBC && defined SHARED \
     && defined __rtld_lock_default_lock_recursive
   EXTERN void (*_dl_rtld_lock_recursive) (void *);
   EXTERN void (*_dl_rtld_unlock_recursive) (void *);
@@ -1318,6 +1318,29 @@ link_map_audit_state (struct link_map *l, size_t index)
 }
 #endif /* SHARED */
 
+#if PTHREAD_IN_LIBC && defined SHARED
+/* Recursive locking implementation for use within the dynamic loader.
+   Used to define the __rtld_lock_lock_recursive and
+   __rtld_lock_unlock_recursive via <libc-lock.h>.  Initialized to a
+   no-op dummy implementation early.  Similar
+   to GL (dl_rtld_lock_recursive) and GL (dl_rtld_unlock_recursive)
+   in !PTHREAD_IN_LIBC builds.  */
+extern int (*___rtld_mutex_lock) (pthread_mutex_t *) attribute_hidden;
+extern int (*___rtld_mutex_unlock) (pthread_mutex_t *lock) attribute_hidden;
+
+/* Called after libc has been loaded, but before RELRO is activated.
+   Used to initialize the function pointers to the actual
+   implementations.  */
+void __rtld_mutex_init (void) attribute_hidden;
+#else /* !PTHREAD_IN_LIBC */
+static inline void
+__rtld_mutex_init (void)
+{
+  /* The initialization happens later (!PTHREAD_IN_LIBC) or is not
+     needed at all (!SHARED).  */
+}
+#endif /* !PTHREAD_IN_LIBC */
+
 #if THREAD_GSCOPE_IN_TCB
 void __thread_gscope_wait (void) attribute_hidden;
 # define THREAD_GSCOPE_WAIT() __thread_gscope_wait ()
diff --git a/sysdeps/nptl/dl-mutex.c b/sysdeps/nptl/dl-mutex.c
new file mode 100644
index 0000000000..08b71dc21b
--- /dev/null
+++ b/sysdeps/nptl/dl-mutex.c
@@ -0,0 +1,53 @@
+/* Recursive locking implementation for the dynamic loader.  NPTL version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Use the mutex implementation in libc (assuming PTHREAD_IN_LIBC).  */
+
+#include <assert.h>
+#include <first-versions.h>
+#include <ldsodefs.h>
+
+__typeof (pthread_mutex_lock) *___rtld_mutex_lock attribute_relro;
+__typeof (pthread_mutex_unlock) *___rtld_mutex_unlock attribute_relro;
+
+void
+__rtld_mutex_init (void)
+{
+  /* There is an implicit assumption here that the lock counters are
+     zero and this function is called while nothing is locked.  For
+     early initialization of the mutex functions this is true because
+     it happens directly in dl_main in elf/rtld.c, and not some ELF
+     constructor while holding loader locks.  */
+
+  struct link_map *libc_map = GL (dl_ns)[LM_ID_BASE].libc_map;
+
+  const ElfW(Sym) *sym
+    = _dl_lookup_direct (libc_map, "pthread_mutex_lock",
+                         0x4f152227, /* dl_new_hash output.  */
+                         FIRST_VERSION_libc_pthread_mutex_lock_STRING,
+                         FIRST_VERSION_libc_pthread_mutex_lock_HASH);
+  assert (sym != NULL);
+  ___rtld_mutex_lock = DL_SYMBOL_ADDRESS (libc_map, sym);
+
+  sym = _dl_lookup_direct (libc_map, "pthread_mutex_unlock",
+                           0x7dd7aaaa, /* dl_new_hash output.  */
+                           FIRST_VERSION_libc_pthread_mutex_unlock_STRING,
+                           FIRST_VERSION_libc_pthread_mutex_unlock_HASH);
+  assert (sym != NULL);
+  ___rtld_mutex_unlock = DL_SYMBOL_ADDRESS (libc_map, sym);
+}
diff --git a/sysdeps/nptl/libc-lockP.h b/sysdeps/nptl/libc-lockP.h
index ae9691d40e..ec7b02bbdd 100644
--- a/sysdeps/nptl/libc-lockP.h
+++ b/sysdeps/nptl/libc-lockP.h
@@ -151,9 +151,6 @@ _Static_assert (LLL_LOCK_INITIALIZER == 0, "LLL_LOCK_INITIALIZER != 0");
   __libc_maybe_call (__pthread_mutex_trylock, (&(NAME)), 0)
 #endif
 
-#define __rtld_lock_trylock_recursive(NAME) \
-  __libc_maybe_call (__pthread_mutex_trylock, (&(NAME).mutex), 0)
-
 /* Unlock the named lock variable.  */
 #if IS_IN (libc) || IS_IN (libpthread)
 # define __libc_lock_unlock(NAME) \
@@ -163,19 +160,13 @@ _Static_assert (LLL_LOCK_INITIALIZER == 0, "LLL_LOCK_INITIALIZER != 0");
 #endif
 #define __libc_rwlock_unlock(NAME) __pthread_rwlock_unlock (&(NAME))
 
-#ifdef SHARED
-# define __rtld_lock_default_lock_recursive(lock) \
-  ++((pthread_mutex_t *)(lock))->__data.__count;
-
-# define __rtld_lock_default_unlock_recursive(lock) \
-  --((pthread_mutex_t *)(lock))->__data.__count;
-
+#if IS_IN (rtld)
 # define __rtld_lock_lock_recursive(NAME) \
-  GL(dl_rtld_lock_recursive) (&(NAME).mutex)
+  ___rtld_mutex_lock (&(NAME).mutex)
 
 # define __rtld_lock_unlock_recursive(NAME) \
-  GL(dl_rtld_unlock_recursive) (&(NAME).mutex)
-#else
+  ___rtld_mutex_unlock (&(NAME).mutex)
+#else /* Not in the dynamic loader.  */
 # define __rtld_lock_lock_recursive(NAME) \
   __pthread_mutex_lock (&(NAME).mutex)
 
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 03/13] nptl: Export __libc_multiple_threads from libc as an internal symbol
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
  2021-05-06 18:08 ` [PATCH 01/13] scripts/versions.awk: Add strings and hashes to <first-versions.h> Florian Weimer
  2021-05-06 18:09 ` [PATCH v2 02/13] elf, nptl: Resolve recursive lock implementation early Florian Weimer
@ 2021-05-06 18:10 ` Florian Weimer
  2021-05-09 21:42   ` Carlos O'Donell
  2021-05-06 18:10 ` [PATCH 04/13] Linux: Explicitly disable cancellation checking in the dynamic loader Florian Weimer
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:10 UTC (permalink / raw)
  To: libc-alpha

This allows the elimination of the __libc_multiple_threads_ptr
variable in libpthread and its initialization procedure.
---
 nptl/Versions                           |  1 +
 nptl/allocatestack.c                    |  4 ++--
 nptl/libc_multiple_threads.c            |  3 ++-
 nptl/libc_pthread_init.c                | 11 -----------
 nptl/nptl-init.c                        | 10 +---------
 nptl/pthreadP.h                         |  6 +-----
 nptl/pthread_cancel.c                   |  2 +-
 sysdeps/unix/sysv/linux/single-thread.h |  6 +++++-
 8 files changed, 13 insertions(+), 30 deletions(-)

diff --git a/nptl/Versions b/nptl/Versions
index f950b77969..fb15a7e8eb 100644
--- a/nptl/Versions
+++ b/nptl/Versions
@@ -305,6 +305,7 @@ libc {
     __libc_cleanup_pop_restore;
     __libc_cleanup_push_defer;
     __libc_dl_error_tsd;
+    __libc_multiple_threads;
     __libc_pthread_init;
     __lll_clocklock_elision;
     __lll_lock_elision;
diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 8aaba088b1..059786192e 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -477,7 +477,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
       /* This is at least the second thread.  */
       pd->header.multiple_threads = 1;
 #ifndef TLS_MULTIPLE_THREADS_IN_TCB
-      __pthread_multiple_threads = *__libc_multiple_threads_ptr = 1;
+      __pthread_multiple_threads = __libc_multiple_threads = 1;
 #endif
 
 #ifdef NEED_DL_SYSINFO
@@ -598,7 +598,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
 	  /* This is at least the second thread.  */
 	  pd->header.multiple_threads = 1;
 #ifndef TLS_MULTIPLE_THREADS_IN_TCB
-	  __pthread_multiple_threads = *__libc_multiple_threads_ptr = 1;
+	  __pthread_multiple_threads = __libc_multiple_threads = 1;
 #endif
 
 #ifdef NEED_DL_SYSINFO
diff --git a/nptl/libc_multiple_threads.c b/nptl/libc_multiple_threads.c
index 60328023cd..a0e7932c26 100644
--- a/nptl/libc_multiple_threads.c
+++ b/nptl/libc_multiple_threads.c
@@ -23,6 +23,7 @@
 /* Variable set to a nonzero value either if more than one thread runs or ran,
    or if a single-threaded process is trying to cancel itself.  See
    nptl/descr.h for more context on the single-threaded process case.  */
-int __libc_multiple_threads attribute_hidden;
+int __libc_multiple_threads __attribute__ ((nocommon));
+libc_hidden_data_def (__libc_multiple_threads)
 # endif
 #endif
diff --git a/nptl/libc_pthread_init.c b/nptl/libc_pthread_init.c
index 397b83beb6..75f5d28ed6 100644
--- a/nptl/libc_pthread_init.c
+++ b/nptl/libc_pthread_init.c
@@ -27,20 +27,9 @@
 #include <sysdep.h>
 #include <ldsodefs.h>
 
-
-#ifdef TLS_MULTIPLE_THREADS_IN_TCB
 void
-#else
-extern int __libc_multiple_threads attribute_hidden;
-
-int *
-#endif
 __libc_pthread_init (void (*reclaim) (void))
 {
   /* Called by a child after fork.  */
   __register_atfork (NULL, NULL, reclaim, NULL);
-
-#ifndef TLS_MULTIPLE_THREADS_IN_TCB
-  return &__libc_multiple_threads;
-#endif
 }
diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c
index 2724770533..2fb1117f3e 100644
--- a/nptl/nptl-init.c
+++ b/nptl/nptl-init.c
@@ -37,11 +37,6 @@
 #include <libc-pointer-arith.h>
 #include <pthread_mutex_conf.h>
 
-#ifndef TLS_MULTIPLE_THREADS_IN_TCB
-/* Pointer to the corresponding variable in libc.  */
-int *__libc_multiple_threads_ptr attribute_hidden;
-#endif
-
 /* Size and alignment of static TLS block.  */
 size_t __static_tls_size;
 size_t __static_tls_align_m1;
@@ -183,10 +178,7 @@ __pthread_initialize_minimal_internal (void)
 #endif
 
   /* Register the fork generation counter with the libc.  */
-#ifndef TLS_MULTIPLE_THREADS_IN_TCB
-  __libc_multiple_threads_ptr =
-#endif
-    __libc_pthread_init (__reclaim_stacks);
+  __libc_pthread_init (__reclaim_stacks);
 }
 strong_alias (__pthread_initialize_minimal_internal,
 	      __pthread_initialize_minimal)
diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
index 6b52ca158e..dd6d6c6342 100644
--- a/nptl/pthreadP.h
+++ b/nptl/pthreadP.h
@@ -368,17 +368,13 @@ extern unsigned long int __fork_generation attribute_hidden;
 extern unsigned long int *__fork_generation_pointer attribute_hidden;
 
 /* Register the generation counter in the libpthread with the libc.  */
-#ifdef TLS_MULTIPLE_THREADS_IN_TCB
 extern void __libc_pthread_init (void (*reclaim) (void));
-#else
-extern int *__libc_pthread_init (void (*reclaim) (void));
 
+#ifndef TLS_MULTIPLE_THREADS_IN_TCB
 /* Variable set to a nonzero value either if more than one thread runs or ran,
    or if a single-threaded process is trying to cancel itself.  See
    nptl/descr.h for more context on the single-threaded process case.  */
 extern int __pthread_multiple_threads attribute_hidden;
-/* Pointer to the corresponding variable in libc.  */
-extern int *__libc_multiple_threads_ptr attribute_hidden;
 #endif
 
 extern size_t __pthread_get_minstack (const pthread_attr_t *attr);
diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c
index 060484cdc8..2cab8f0a34 100644
--- a/nptl/pthread_cancel.c
+++ b/nptl/pthread_cancel.c
@@ -90,7 +90,7 @@ __pthread_cancel (pthread_t th)
 	   points get executed.  */
 	THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1);
 #ifndef TLS_MULTIPLE_THREADS_IN_TCB
-	__pthread_multiple_threads = *__libc_multiple_threads_ptr = 1;
+	__pthread_multiple_threads = __libc_multiple_threads = 1;
 #endif
     }
   /* Mark the thread as canceled.  This has to be done
diff --git a/sysdeps/unix/sysv/linux/single-thread.h b/sysdeps/unix/sysv/linux/single-thread.h
index a28aaed04d..841f8c69d5 100644
--- a/sysdeps/unix/sysv/linux/single-thread.h
+++ b/sysdeps/unix/sysv/linux/single-thread.h
@@ -27,9 +27,13 @@
    The ABI might define SINGLE_THREAD_BY_GLOBAL to enable the single thread
    check to use global variables instead of the pthread_t field.  */
 
+#ifndef __ASSEMBLER__
+extern int __libc_multiple_threads;
+libc_hidden_proto (__libc_multiple_threads)
+#endif
+
 #ifdef SINGLE_THREAD_BY_GLOBAL
 # if IS_IN (libc)
-extern int __libc_multiple_threads;
 #  define SINGLE_THREAD_P \
   __glibc_likely (__libc_multiple_threads == 0)
 # elif IS_IN (libpthread)
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 04/13] Linux: Explicitly disable cancellation checking in the dynamic loader
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
                   ` (2 preceding siblings ...)
  2021-05-06 18:10 ` [PATCH 03/13] nptl: Export __libc_multiple_threads from libc as an internal symbol Florian Weimer
@ 2021-05-06 18:10 ` Florian Weimer
  2021-05-09 21:42   ` Carlos O'Donell
  2021-05-06 18:10 ` [PATCH 05/13] Linux: Simplify and fix the definition of SINGLE_THREAD_P Florian Weimer
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:10 UTC (permalink / raw)
  To: libc-alpha

Historically, SINGLE_THREAD_P is defined to 1 in the dynamic loader.
This has the side effect of disabling cancellation points.  In order
to enable future use of SINGLE_THREAD_P for single-thread
optimizations in the dynamic loader (which becomes important once
more code is moved from libpthread), introduce a new
NO_SYSCALL_CANCEL_CHECKING macro which is always 1 for IS_IN (rtld),
indepdently of the actual SINGLE_THREAD_P value.
---
 sysdeps/unix/sysdep.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/sysdeps/unix/sysdep.h b/sysdeps/unix/sysdep.h
index 2fa6bfa135..664d093c05 100644
--- a/sysdeps/unix/sysdep.h
+++ b/sysdeps/unix/sysdep.h
@@ -88,10 +88,17 @@
 #define INLINE_SYSCALL_CALL(...) \
   __INLINE_SYSCALL_DISP (__INLINE_SYSCALL, __VA_ARGS__)
 
+#if IS_IN (rtld)
+/* All cancellation points are compiled out in the dynamic loader.  */
+# define NO_SYSCALL_CANCEL_CHECKING 1
+#else
+# define NO_SYSCALL_CANCEL_CHECKING SINGLE_THREAD_P
+#endif
+
 #define SYSCALL_CANCEL(...) \
   ({									     \
     long int sc_ret;							     \
-    if (SINGLE_THREAD_P) 						     \
+    if (NO_SYSCALL_CANCEL_CHECKING)					     \
       sc_ret = INLINE_SYSCALL_CALL (__VA_ARGS__); 			     \
     else								     \
       {									     \
@@ -107,7 +114,7 @@
 #define INTERNAL_SYSCALL_CANCEL(...) \
   ({									     \
     long int sc_ret;							     \
-    if (SINGLE_THREAD_P) 						     \
+    if (NO_SYSCALL_CANCEL_CHECKING) 					     \
       sc_ret = INTERNAL_SYSCALL_CALL (__VA_ARGS__); 			     \
     else								     \
       {									     \
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 05/13] Linux: Simplify and fix the definition of SINGLE_THREAD_P
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
                   ` (3 preceding siblings ...)
  2021-05-06 18:10 ` [PATCH 04/13] Linux: Explicitly disable cancellation checking in the dynamic loader Florian Weimer
@ 2021-05-06 18:10 ` Florian Weimer
  2021-05-09 21:42   ` Carlos O'Donell
  2021-05-06 18:10 ` [PATCH 06/13] nptl: Eliminate __pthread_multiple_threads Florian Weimer
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:10 UTC (permalink / raw)
  To: libc-alpha

Always use __libc_multiple_threads if beneficial, and do not assume
the the dynamic loader is single-threaded.  This assumption could
become incorrect by accident once more code is moved from libpthread
into it.  The previous commit introducing the
NO_SYSCALL_CANCEL_CHECKING macro enables this change.

Do not hint to the compiler that multi-threaded programs are unlikely
(which is not quite true anymore).
---
 sysdeps/unix/sysv/linux/single-thread.h | 36 +++++--------------------
 1 file changed, 7 insertions(+), 29 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/single-thread.h b/sysdeps/unix/sysv/linux/single-thread.h
index 841f8c69d5..2fac8dcc11 100644
--- a/sysdeps/unix/sysv/linux/single-thread.h
+++ b/sysdeps/unix/sysv/linux/single-thread.h
@@ -32,35 +32,13 @@ extern int __libc_multiple_threads;
 libc_hidden_proto (__libc_multiple_threads)
 #endif
 
-#ifdef SINGLE_THREAD_BY_GLOBAL
-# if IS_IN (libc)
-#  define SINGLE_THREAD_P \
-  __glibc_likely (__libc_multiple_threads == 0)
-# elif IS_IN (libpthread)
-extern int __pthread_multiple_threads;
-#  define SINGLE_THREAD_P \
-  __glibc_likely (__pthread_multiple_threads == 0)
-# elif IS_IN (librt)
-#   define SINGLE_THREAD_P					\
-  __glibc_likely (THREAD_GETMEM (THREAD_SELF,			\
-				 header.multiple_threads) == 0)
-# else
-/* For rtld, et cetera.  */
-#  define SINGLE_THREAD_P (1)
-# endif
-#else /* SINGLE_THREAD_BY_GLOBAL  */
-# if IS_IN (libc) || IS_IN (libpthread) || IS_IN (librt)
-#   define SINGLE_THREAD_P					\
-  __glibc_likely (THREAD_GETMEM (THREAD_SELF,			\
-				 header.multiple_threads) == 0)
-# else
-/* For rtld, et cetera.  */
-#  define SINGLE_THREAD_P (1)
-# endif
-#endif /* SINGLE_THREAD_BY_GLOBAL  */
+#if !defined SINGLE_THREAD_BY_GLOBAL || IS_IN (rtld)
+# define SINGLE_THREAD_P \
+  (THREAD_GETMEM (THREAD_SELF, header.multiple_threads) == 0)
+#else
+# define SINGLE_THREAD_P (__libc_multiple_threads == 0)
+#endif
 
-#define RTLD_SINGLE_THREAD_P \
-  __glibc_likely (THREAD_GETMEM (THREAD_SELF, \
-				 header.multiple_threads) == 0)
+#define RTLD_SINGLE_THREAD_P SINGLE_THREAD_P
 
 #endif /* _SINGLE_THREAD_H  */
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 06/13] nptl: Eliminate __pthread_multiple_threads
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
                   ` (4 preceding siblings ...)
  2021-05-06 18:10 ` [PATCH 05/13] Linux: Simplify and fix the definition of SINGLE_THREAD_P Florian Weimer
@ 2021-05-06 18:10 ` Florian Weimer
  2021-05-09 21:42   ` Carlos O'Donell
  2021-05-06 18:10 ` [PATCH 07/13] elf: Introduce __tls_pre_init_tp Florian Weimer
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:10 UTC (permalink / raw)
  To: libc-alpha

It is no longer needed after the SINGLE_THREADED_P consolidation.
---
 nptl/allocatestack.c  | 4 ++--
 nptl/pthreadP.h       | 7 -------
 nptl/pthread_cancel.c | 2 +-
 nptl/vars.c           | 7 -------
 4 files changed, 3 insertions(+), 17 deletions(-)

diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 059786192e..88c49f8154 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -477,7 +477,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
       /* This is at least the second thread.  */
       pd->header.multiple_threads = 1;
 #ifndef TLS_MULTIPLE_THREADS_IN_TCB
-      __pthread_multiple_threads = __libc_multiple_threads = 1;
+      __libc_multiple_threads = 1;
 #endif
 
 #ifdef NEED_DL_SYSINFO
@@ -598,7 +598,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
 	  /* This is at least the second thread.  */
 	  pd->header.multiple_threads = 1;
 #ifndef TLS_MULTIPLE_THREADS_IN_TCB
-	  __pthread_multiple_threads = __libc_multiple_threads = 1;
+	  __libc_multiple_threads = 1;
 #endif
 
 #ifdef NEED_DL_SYSINFO
diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
index dd6d6c6342..8ab247f977 100644
--- a/nptl/pthreadP.h
+++ b/nptl/pthreadP.h
@@ -370,13 +370,6 @@ extern unsigned long int *__fork_generation_pointer attribute_hidden;
 /* Register the generation counter in the libpthread with the libc.  */
 extern void __libc_pthread_init (void (*reclaim) (void));
 
-#ifndef TLS_MULTIPLE_THREADS_IN_TCB
-/* Variable set to a nonzero value either if more than one thread runs or ran,
-   or if a single-threaded process is trying to cancel itself.  See
-   nptl/descr.h for more context on the single-threaded process case.  */
-extern int __pthread_multiple_threads attribute_hidden;
-#endif
-
 extern size_t __pthread_get_minstack (const pthread_attr_t *attr);
 
 /* Namespace save aliases.  */
diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c
index 2cab8f0a34..fd04bedf6c 100644
--- a/nptl/pthread_cancel.c
+++ b/nptl/pthread_cancel.c
@@ -90,7 +90,7 @@ __pthread_cancel (pthread_t th)
 	   points get executed.  */
 	THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1);
 #ifndef TLS_MULTIPLE_THREADS_IN_TCB
-	__pthread_multiple_threads = __libc_multiple_threads = 1;
+	__libc_multiple_threads = 1;
 #endif
     }
   /* Mark the thread as canceled.  This has to be done
diff --git a/nptl/vars.c b/nptl/vars.c
index 8de30856b8..03a6cc84be 100644
--- a/nptl/vars.c
+++ b/nptl/vars.c
@@ -26,10 +26,3 @@ union pthread_attr_transparent __default_pthread_attr attribute_hidden;
 
 /* Mutex protecting __default_pthread_attr.  */
 int __default_pthread_attr_lock = LLL_LOCK_INITIALIZER;
-
-#ifndef TLS_MULTIPLE_THREADS_IN_TCB
-/* Variable set to a nonzero value either if more than one thread runs or ran,
-   or if a single-threaded process is trying to cancel itself.  See
-   nptl/descr.h for more context on the single-threaded process case.  */
-int __pthread_multiple_threads attribute_hidden;
-#endif
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 07/13] elf: Introduce __tls_pre_init_tp
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
                   ` (5 preceding siblings ...)
  2021-05-06 18:10 ` [PATCH 06/13] nptl: Eliminate __pthread_multiple_threads Florian Weimer
@ 2021-05-06 18:10 ` Florian Weimer
  2021-05-09 21:42   ` Carlos O'Donell
  2021-05-06 18:10 ` [PATCH 08/13] nptl: Move more stack management variables into _rtld_global Florian Weimer
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:10 UTC (permalink / raw)
  To: libc-alpha

This is an early variant of __tls_init_tp, primarily for initializing
thread-related elements of _rtld_global/GL.

Some existing initialization code not needed for NPTL is moved into
the generic version of this function.
---
 csu/libc-tls.c                |  2 ++
 elf/dl-mutex.c                |  2 +-
 elf/dl-tls_init_tp.c          | 29 ++++++++++++++++++++++++++
 elf/rtld.c                    | 38 +----------------------------------
 sysdeps/generic/ldsodefs.h    |  4 ++++
 sysdeps/nptl/dl-tls_init_tp.c | 25 +++++++++++++++++++++--
 6 files changed, 60 insertions(+), 40 deletions(-)

diff --git a/csu/libc-tls.c b/csu/libc-tls.c
index 22f8e4838d..5515204863 100644
--- a/csu/libc-tls.c
+++ b/csu/libc-tls.c
@@ -114,6 +114,8 @@ __libc_setup_tls (void)
 
   struct link_map *main_map = GL(dl_ns)[LM_ID_BASE]._ns_loaded;
 
+  __tls_pre_init_tp ();
+
   /* Look through the TLS segment if there is any.  */
   if (_dl_phdr != NULL)
     for (phdr = _dl_phdr; phdr < &_dl_phdr[_dl_phnum]; ++phdr)
diff --git a/elf/dl-mutex.c b/elf/dl-mutex.c
index 2cd9d49c2e..ae1d8a84f0 100644
--- a/elf/dl-mutex.c
+++ b/elf/dl-mutex.c
@@ -16,4 +16,4 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-/* The generic version initialization happpens in dl_main.  */
+/* Initialization happens in __tls_pre_init_tp in dl-tls_init_tp.c.  */
diff --git a/elf/dl-tls_init_tp.c b/elf/dl-tls_init_tp.c
index 728cd84c00..d84adc992c 100644
--- a/elf/dl-tls_init_tp.c
+++ b/elf/dl-tls_init_tp.c
@@ -18,6 +18,35 @@
 
 #include <ldsodefs.h>
 
+#if defined SHARED && defined _LIBC_REENTRANT \
+    && defined __rtld_lock_default_lock_recursive
+static void
+rtld_lock_default_lock_recursive (void *lock)
+{
+  __rtld_lock_default_lock_recursive (lock);
+}
+
+static void
+rtld_lock_default_unlock_recursive (void *lock)
+{
+  __rtld_lock_default_unlock_recursive (lock);
+}
+#endif
+
+void
+__tls_pre_init_tp (void)
+{
+#if !THREAD_GSCOPE_IN_TCB
+  GL(dl_init_static_tls) = &_dl_nothread_init_static_tls;
+#endif
+
+#if defined SHARED && defined _LIBC_REENTRANT \
+    && defined __rtld_lock_default_lock_recursive
+  GL(dl_rtld_lock_recursive) = rtld_lock_default_lock_recursive;
+  GL(dl_rtld_unlock_recursive) = rtld_lock_default_unlock_recursive;
+#endif
+}
+
 void
 __tls_init_tp (void)
 {
diff --git a/elf/rtld.c b/elf/rtld.c
index a359167f8a..1255d5cc7d 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -843,30 +843,6 @@ ERROR: ld.so: object '%s' from %s cannot be preloaded (%s): ignored.\n",
   return 0;
 }
 
-#if defined SHARED && defined _LIBC_REENTRANT \
-    && defined __rtld_lock_default_lock_recursive
-static void
-rtld_lock_default_lock_recursive (void *lock)
-{
-  __rtld_lock_default_lock_recursive (lock);
-}
-
-static void
-rtld_lock_default_unlock_recursive (void *lock)
-{
-  __rtld_lock_default_unlock_recursive (lock);
-}
-#endif
-#if PTHREAD_IN_LIBC
-/* Dummy implementation.  See __rtld_mutex_init.  */
-static int
-rtld_mutex_dummy (pthread_mutex_t *lock)
-{
-  return 0;
-}
-#endif
-
-
 static void
 security_init (void)
 {
@@ -1147,19 +1123,7 @@ dl_main (const ElfW(Phdr) *phdr,
   struct dl_main_state state;
   dl_main_state_init (&state);
 
-#if !THREAD_GSCOPE_IN_TCB
-  GL(dl_init_static_tls) = &_dl_nothread_init_static_tls;
-#endif
-
-#if defined SHARED && defined _LIBC_REENTRANT \
-    && defined __rtld_lock_default_lock_recursive
-  GL(dl_rtld_lock_recursive) = rtld_lock_default_lock_recursive;
-  GL(dl_rtld_unlock_recursive) = rtld_lock_default_unlock_recursive;
-#endif
-#if PTHREAD_IN_LIBC
-  ___rtld_mutex_lock = rtld_mutex_dummy;
-  ___rtld_mutex_unlock = rtld_mutex_dummy;
-#endif
+  __tls_pre_init_tp ();
 
   /* The explicit initialization here is cheaper than processing the reloc
      in the _rtld_local definition's initializer.  */
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 6d590d1335..ee851ac789 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -1165,6 +1165,10 @@ extern void _dl_determine_tlsoffset (void) attribute_hidden;
    number of audit modules are loaded.  */
 void _dl_tls_static_surplus_init (size_t naudit) attribute_hidden;
 
+/* This function is called very early from dl_main to set up TLS and
+   other thread-related data structures.  */
+void __tls_pre_init_tp (void) attribute_hidden;
+
 /* This function is called after processor-specific initialization of
    the TCB and thread pointer via TLS_INIT_TP, to complete very early
    initialization of the thread library.  */
diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
index 05d2b6cfcc..cb29222727 100644
--- a/sysdeps/nptl/dl-tls_init_tp.c
+++ b/sysdeps/nptl/dl-tls_init_tp.c
@@ -27,12 +27,33 @@ bool __nptl_set_robust_list_avail __attribute__ ((nocommon));
 rtld_hidden_data_def (__nptl_set_robust_list_avail)
 #endif
 
+#ifdef SHARED
+/* Dummy implementation.  See __rtld_mutex_init.  */
+static int
+rtld_mutex_dummy (pthread_mutex_t *lock)
+{
+  return 0;
+}
+#endif
+
 void
-__tls_init_tp (void)
+__tls_pre_init_tp (void)
 {
-  /* Set up thread stack list management.  */
+  /* The list data structures are not consistent until
+     initialized.  */
   INIT_LIST_HEAD (&GL (dl_stack_used));
   INIT_LIST_HEAD (&GL (dl_stack_user));
+
+#ifdef SHARED
+  ___rtld_mutex_lock = rtld_mutex_dummy;
+  ___rtld_mutex_unlock = rtld_mutex_dummy;
+#endif
+}
+
+void
+__tls_init_tp (void)
+{
+  /* Set up thread stack list management.  */
   list_add (&THREAD_SELF->list, &GL (dl_stack_user));
 
    /* Early initialization of the TCB.   */
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 08/13] nptl: Move more stack management variables into _rtld_global
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
                   ` (6 preceding siblings ...)
  2021-05-06 18:10 ` [PATCH 07/13] elf: Introduce __tls_pre_init_tp Florian Weimer
@ 2021-05-06 18:10 ` Florian Weimer
  2021-05-09 21:42   ` Carlos O'Donell
  2021-05-06 18:11 ` [PATCH 09/13] nptl: Simplify the change_stack_perm calling convention Florian Weimer
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:10 UTC (permalink / raw)
  To: libc-alpha

Permissions of the cached stacks may have to be updated if an object
is loaded that requires executable stacks, so the dynamic loader
needs to know about these cached stacks.

The move of in_flight_stack and stack_cache_actsize is a requirement for
merging __reclaim_stacks into the fork implementation in libc.
---
 elf/dl-support.c              |  3 +++
 nptl/allocatestack.c          | 51 +++++++++++++++--------------------
 sysdeps/generic/ldsodefs.h    | 11 ++++++++
 sysdeps/nptl/dl-tls_init_tp.c |  1 +
 4 files changed, 36 insertions(+), 30 deletions(-)

diff --git a/elf/dl-support.c b/elf/dl-support.c
index f966a2e7cd..580b0202ad 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -192,6 +192,9 @@ int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable;
 #if THREAD_GSCOPE_IN_TCB
 list_t _dl_stack_used;
 list_t _dl_stack_user;
+list_t _dl_stack_cache;
+size_t _dl_stack_cache_actsize;
+uintptr_t _dl_in_flight_stack;
 int _dl_stack_cache_lock;
 #else
 int _dl_thread_gscope_count;
diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 88c49f8154..71cfa874d1 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -103,15 +103,6 @@
 
 /* Maximum size in kB of cache.  */
 static size_t stack_cache_maxsize = 40 * 1024 * 1024; /* 40MiBi by default.  */
-static size_t stack_cache_actsize;
-
-/* List of queued stack frames.  */
-static LIST_HEAD (stack_cache);
-
-/* We need to record what list operations we are going to do so that,
-   in case of an asynchronous interruption due to a fork() call, we
-   can correct for the work.  */
-static uintptr_t in_flight_stack;
 
 /* Check whether the stack is still used or not.  */
 #define FREE_P(descr) ((descr)->tid <= 0)
@@ -120,7 +111,7 @@ static uintptr_t in_flight_stack;
 static void
 stack_list_del (list_t *elem)
 {
-  in_flight_stack = (uintptr_t) elem;
+  GL (dl_in_flight_stack) = (uintptr_t) elem;
 
   atomic_write_barrier ();
 
@@ -128,14 +119,14 @@ stack_list_del (list_t *elem)
 
   atomic_write_barrier ();
 
-  in_flight_stack = 0;
+  GL (dl_in_flight_stack) = 0;
 }
 
 
 static void
 stack_list_add (list_t *elem, list_t *list)
 {
-  in_flight_stack = (uintptr_t) elem | 1;
+  GL (dl_in_flight_stack) = (uintptr_t) elem | 1;
 
   atomic_write_barrier ();
 
@@ -143,7 +134,7 @@ stack_list_add (list_t *elem, list_t *list)
 
   atomic_write_barrier ();
 
-  in_flight_stack = 0;
+  GL (dl_in_flight_stack) = 0;
 }
 
 
@@ -168,7 +159,7 @@ get_cached_stack (size_t *sizep, void **memp)
      same.  As the very least there are only a few different sizes.
      Therefore this loop will exit early most of the time with an
      exact match.  */
-  list_for_each (entry, &stack_cache)
+  list_for_each (entry, &GL (dl_stack_cache))
     {
       struct pthread *curr;
 
@@ -208,7 +199,7 @@ get_cached_stack (size_t *sizep, void **memp)
   stack_list_add (&result->list, &GL (dl_stack_used));
 
   /* And decrease the cache size.  */
-  stack_cache_actsize -= result->stackblock_size;
+  GL (dl_stack_cache_actsize) -= result->stackblock_size;
 
   /* Release the lock early.  */
   lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE);
@@ -249,7 +240,7 @@ free_stacks (size_t limit)
   list_t *prev;
 
   /* Search from the end of the list.  */
-  list_for_each_prev_safe (entry, prev, &stack_cache)
+  list_for_each_prev_safe (entry, prev, &GL (dl_stack_cache))
     {
       struct pthread *curr;
 
@@ -260,7 +251,7 @@ free_stacks (size_t limit)
 	  stack_list_del (entry);
 
 	  /* Account for the freed memory.  */
-	  stack_cache_actsize -= curr->stackblock_size;
+	  GL (dl_stack_cache_actsize) -= curr->stackblock_size;
 
 	  /* Free the memory associated with the ELF TLS.  */
 	  _dl_deallocate_tls (TLS_TPADJ (curr), false);
@@ -271,7 +262,7 @@ free_stacks (size_t limit)
 	    abort ();
 
 	  /* Maybe we have freed enough.  */
-	  if (stack_cache_actsize <= limit)
+	  if (GL (dl_stack_cache_actsize) <= limit)
 	    break;
 	}
     }
@@ -293,10 +284,10 @@ queue_stack (struct pthread *stack)
   /* We unconditionally add the stack to the list.  The memory may
      still be in use but it will not be reused until the kernel marks
      the stack as not used anymore.  */
-  stack_list_add (&stack->list, &stack_cache);
+  stack_list_add (&stack->list, &GL (dl_stack_cache));
 
-  stack_cache_actsize += stack->stackblock_size;
-  if (__glibc_unlikely (stack_cache_actsize > stack_cache_maxsize))
+  GL (dl_stack_cache_actsize) += stack->stackblock_size;
+  if (__glibc_unlikely (GL (dl_stack_cache_actsize) > stack_cache_maxsize))
     free_stacks (stack_cache_maxsize);
 }
 
@@ -827,7 +818,7 @@ __make_stacks_executable (void **stack_endp)
      might be wasted time but better spend it here than adding a check
      in the fast path.  */
   if (err == 0)
-    list_for_each (runp, &stack_cache)
+    list_for_each (runp, &GL (dl_stack_cache))
       {
 	err = change_stack_perm (list_entry (runp, struct pthread, list)
 #ifdef NEED_SEPARATE_REGISTER_STACK
@@ -857,10 +848,10 @@ __reclaim_stacks (void)
      we have to be aware that we might have interrupted a list
      operation.  */
 
-  if (in_flight_stack != 0)
+  if (GL (dl_in_flight_stack) != 0)
     {
-      bool add_p = in_flight_stack & 1;
-      list_t *elem = (list_t *) (in_flight_stack & ~(uintptr_t) 1);
+      bool add_p = GL (dl_in_flight_stack) & 1;
+      list_t *elem = (list_t *) (GL (dl_in_flight_stack) & ~(uintptr_t) 1);
 
       if (add_p)
 	{
@@ -871,8 +862,8 @@ __reclaim_stacks (void)
 
 	  if (GL (dl_stack_used).next->prev != &GL (dl_stack_used))
 	    l = &GL (dl_stack_used);
-	  else if (stack_cache.next->prev != &stack_cache)
-	    l = &stack_cache;
+	  else if (GL (dl_stack_cache).next->prev != &GL (dl_stack_cache))
+	    l = &GL (dl_stack_cache);
 
 	  if (l != NULL)
 	    {
@@ -901,7 +892,7 @@ __reclaim_stacks (void)
 	  curp->tid = 0;
 
 	  /* Account for the size of the stack.  */
-	  stack_cache_actsize += curp->stackblock_size;
+	  GL (dl_stack_cache_actsize) += curp->stackblock_size;
 
 	  if (curp->specific_used)
 	    {
@@ -926,7 +917,7 @@ __reclaim_stacks (void)
     }
 
   /* Add the stack of all running threads to the cache.  */
-  list_splice (&GL (dl_stack_used), &stack_cache);
+  list_splice (&GL (dl_stack_used), &GL (dl_stack_cache));
 
   /* Remove the entry for the current thread to from the cache list
      and add it to the list of running threads.  Which of the two
@@ -945,7 +936,7 @@ __reclaim_stacks (void)
   /* There is one thread running.  */
   __nptl_nthreads = 1;
 
-  in_flight_stack = 0;
+  GL (dl_in_flight_stack) = 0;
 
   /* Initialize locks.  */
   GL (dl_stack_cache_lock) = LLL_LOCK_INITIALIZER;
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index ee851ac789..81cce2e4d5 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -481,6 +481,17 @@ struct rtld_global
   /* List of thread stacks that were allocated by the application.  */
   EXTERN list_t _dl_stack_user;
 
+  /* List of queued thread stacks.  */
+  EXTERN list_t _dl_stack_cache;
+
+  /* Total size of all stacks in the cache (sum over stackblock_size).  */
+  EXTERN size_t _dl_stack_cache_actsize;
+
+  /* We need to record what list operations we are going to do so
+     that, in case of an asynchronous interruption due to a fork()
+     call, we can correct for the work.  */
+  EXTERN uintptr_t _dl_in_flight_stack;
+
   /* Mutex protecting the stack lists.  */
   EXTERN int _dl_stack_cache_lock;
 #else
diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
index cb29222727..f1aaa5aa9d 100644
--- a/sysdeps/nptl/dl-tls_init_tp.c
+++ b/sysdeps/nptl/dl-tls_init_tp.c
@@ -43,6 +43,7 @@ __tls_pre_init_tp (void)
      initialized.  */
   INIT_LIST_HEAD (&GL (dl_stack_used));
   INIT_LIST_HEAD (&GL (dl_stack_user));
+  INIT_LIST_HEAD (&GL (dl_stack_cache));
 
 #ifdef SHARED
   ___rtld_mutex_lock = rtld_mutex_dummy;
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 09/13] nptl: Simplify the change_stack_perm calling convention
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
                   ` (7 preceding siblings ...)
  2021-05-06 18:10 ` [PATCH 08/13] nptl: Move more stack management variables into _rtld_global Florian Weimer
@ 2021-05-06 18:11 ` Florian Weimer
  2021-05-09 21:42   ` Carlos O'Donell
  2021-05-06 18:11 ` [PATCH 10/13] nptl: Move changing of stack permissions into ld.so Florian Weimer
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:11 UTC (permalink / raw)
  To: libc-alpha

Only ia64 needs the page mask, and it is straightforward
to compute the value within the function itself.
---
 nptl/allocatestack.c | 29 +++++------------------------
 1 file changed, 5 insertions(+), 24 deletions(-)

diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 71cfa874d1..46089163f4 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -293,13 +293,10 @@ queue_stack (struct pthread *stack)
 
 
 static int
-change_stack_perm (struct pthread *pd
-#ifdef NEED_SEPARATE_REGISTER_STACK
-		   , size_t pagemask
-#endif
-		   )
+change_stack_perm (struct pthread *pd)
 {
 #ifdef NEED_SEPARATE_REGISTER_STACK
+  size_t pagemask = __getpagesize () - 1;
   void *stack = (pd->stackblock
 		 + (((((pd->stackblock_size - pd->guardsize) / 2)
 		      & pagemask) + pd->guardsize) & pagemask));
@@ -628,11 +625,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
 	  if (__builtin_expect ((GL(dl_stack_flags) & PF_X) != 0
 				&& (prot & PROT_EXEC) == 0, 0))
 	    {
-	      int err = change_stack_perm (pd
-#ifdef NEED_SEPARATE_REGISTER_STACK
-					   , ~pagesize_m1
-#endif
-					   );
+	      int err = change_stack_perm (pd);
 	      if (err != 0)
 		{
 		  /* Free the stack memory we just allocated.  */
@@ -796,20 +789,12 @@ __make_stacks_executable (void **stack_endp)
   if (err != 0)
     return err;
 
-#ifdef NEED_SEPARATE_REGISTER_STACK
-  const size_t pagemask = ~(__getpagesize () - 1);
-#endif
-
   lll_lock (GL (dl_stack_cache_lock), LLL_PRIVATE);
 
   list_t *runp;
   list_for_each (runp, &GL (dl_stack_used))
     {
-      err = change_stack_perm (list_entry (runp, struct pthread, list)
-#ifdef NEED_SEPARATE_REGISTER_STACK
-			       , pagemask
-#endif
-			       );
+      err = change_stack_perm (list_entry (runp, struct pthread, list));
       if (err != 0)
 	break;
     }
@@ -820,11 +805,7 @@ __make_stacks_executable (void **stack_endp)
   if (err == 0)
     list_for_each (runp, &GL (dl_stack_cache))
       {
-	err = change_stack_perm (list_entry (runp, struct pthread, list)
-#ifdef NEED_SEPARATE_REGISTER_STACK
-				 , pagemask
-#endif
-				 );
+	err = change_stack_perm (list_entry (runp, struct pthread, list));
 	if (err != 0)
 	  break;
       }
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 10/13] nptl: Move changing of stack permissions into ld.so
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
                   ` (8 preceding siblings ...)
  2021-05-06 18:11 ` [PATCH 09/13] nptl: Simplify the change_stack_perm calling convention Florian Weimer
@ 2021-05-06 18:11 ` Florian Weimer
  2021-05-09 21:42   ` Carlos O'Donell
  2021-05-06 18:11 ` [PATCH 11/13] nptl: Simplify resetting the in-flight stack in __reclaim_stacks Florian Weimer
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:11 UTC (permalink / raw)
  To: libc-alpha

All the stack lists are now in _rtld_global, so it is possible
to change stack permissions directly from there, instead of
calling into libpthread to do the change.
---
 elf/dl-load.c                          |  4 ++
 elf/dl-support.c                       | 10 ++--
 elf/rtld.c                             |  2 +
 nptl/allocatestack.c                   | 63 +--------------------
 nptl/nptl-init.c                       |  4 --
 nptl/pthreadP.h                        |  7 ++-
 sysdeps/generic/ldsodefs.h             | 11 +++-
 sysdeps/unix/sysv/linux/Versions       |  6 ++
 sysdeps/unix/sysv/linux/dl-execstack.c | 76 +++++++++++++++++++++++---
 9 files changed, 100 insertions(+), 83 deletions(-)

diff --git a/elf/dl-load.c b/elf/dl-load.c
index 2832ab3540..918ec7546c 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -1368,7 +1368,11 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
       check_consistency ();
 #endif
 
+#if PTHREAD_IN_LIBC
+      errval = _dl_make_stacks_executable (stack_endp);
+#else
       errval = (*GL(dl_make_stack_executable_hook)) (stack_endp);
+#endif
       if (errval)
 	{
 	  errstring = N_("\
diff --git a/elf/dl-support.c b/elf/dl-support.c
index 580b0202ad..dfc9ab760e 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -183,12 +183,6 @@ uint64_t _dl_hwcap_mask __attribute__ ((nocommon));
  * executable but this isn't true for all platforms.  */
 ElfW(Word) _dl_stack_flags = DEFAULT_STACK_PERMS;
 
-/* If loading a shared object requires that we make the stack executable
-   when it was not, we do it by calling this function.
-   It returns an errno code or zero on success.  */
-int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable;
-
-
 #if THREAD_GSCOPE_IN_TCB
 list_t _dl_stack_used;
 list_t _dl_stack_user;
@@ -197,6 +191,10 @@ size_t _dl_stack_cache_actsize;
 uintptr_t _dl_in_flight_stack;
 int _dl_stack_cache_lock;
 #else
+/* If loading a shared object requires that we make the stack executable
+   when it was not, we do it by calling this function.
+   It returns an errno code or zero on success.  */
+int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable;
 int _dl_thread_gscope_count;
 void (*_dl_init_static_tls) (struct link_map *) = &_dl_nothread_init_static_tls;
 #endif
diff --git a/elf/rtld.c b/elf/rtld.c
index 1255d5cc7d..fbbd60b446 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -1125,9 +1125,11 @@ dl_main (const ElfW(Phdr) *phdr,
 
   __tls_pre_init_tp ();
 
+#if !PTHREAD_IN_LIBC
   /* The explicit initialization here is cheaper than processing the reloc
      in the _rtld_local definition's initializer.  */
   GL(dl_make_stack_executable_hook) = &_dl_make_stack_executable;
+#endif
 
   /* Process the environment variable which control the behaviour.  */
   process_envvars (&state);
diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 46089163f4..12cd1058d4 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -291,31 +291,6 @@ queue_stack (struct pthread *stack)
     free_stacks (stack_cache_maxsize);
 }
 
-
-static int
-change_stack_perm (struct pthread *pd)
-{
-#ifdef NEED_SEPARATE_REGISTER_STACK
-  size_t pagemask = __getpagesize () - 1;
-  void *stack = (pd->stackblock
-		 + (((((pd->stackblock_size - pd->guardsize) / 2)
-		      & pagemask) + pd->guardsize) & pagemask));
-  size_t len = pd->stackblock + pd->stackblock_size - stack;
-#elif _STACK_GROWS_DOWN
-  void *stack = pd->stackblock + pd->guardsize;
-  size_t len = pd->stackblock_size - pd->guardsize;
-#elif _STACK_GROWS_UP
-  void *stack = pd->stackblock;
-  size_t len = (uintptr_t) pd - pd->guardsize - (uintptr_t) pd->stackblock;
-#else
-# error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP"
-#endif
-  if (__mprotect (stack, len, PROT_READ | PROT_WRITE | PROT_EXEC) != 0)
-    return errno;
-
-  return 0;
-}
-
 /* Return the guard page position on allocated stack.  */
 static inline char *
 __attribute ((always_inline))
@@ -625,7 +600,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
 	  if (__builtin_expect ((GL(dl_stack_flags) & PF_X) != 0
 				&& (prot & PROT_EXEC) == 0, 0))
 	    {
-	      int err = change_stack_perm (pd);
+	      int err = __nptl_change_stack_perm (pd);
 	      if (err != 0)
 		{
 		  /* Free the stack memory we just allocated.  */
@@ -780,42 +755,6 @@ __deallocate_stack (struct pthread *pd)
   lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE);
 }
 
-
-int
-__make_stacks_executable (void **stack_endp)
-{
-  /* First the main thread's stack.  */
-  int err = _dl_make_stack_executable (stack_endp);
-  if (err != 0)
-    return err;
-
-  lll_lock (GL (dl_stack_cache_lock), LLL_PRIVATE);
-
-  list_t *runp;
-  list_for_each (runp, &GL (dl_stack_used))
-    {
-      err = change_stack_perm (list_entry (runp, struct pthread, list));
-      if (err != 0)
-	break;
-    }
-
-  /* Also change the permission for the currently unused stacks.  This
-     might be wasted time but better spend it here than adding a check
-     in the fast path.  */
-  if (err == 0)
-    list_for_each (runp, &GL (dl_stack_cache))
-      {
-	err = change_stack_perm (list_entry (runp, struct pthread, list));
-	if (err != 0)
-	  break;
-      }
-
-  lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE);
-
-  return err;
-}
-
-
 /* In case of a fork() call the memory allocation in the child will be
    the same but only one thread is running.  All stacks except that of
    the one running thread are not used anymore.  We have to recycle
diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c
index 2fb1117f3e..4c89e7a792 100644
--- a/nptl/nptl-init.c
+++ b/nptl/nptl-init.c
@@ -173,10 +173,6 @@ __pthread_initialize_minimal_internal (void)
   __default_pthread_attr.internal.guardsize = GLRO (dl_pagesize);
   lll_unlock (__default_pthread_attr_lock, LLL_PRIVATE);
 
-#ifdef SHARED
-  GL(dl_make_stack_executable_hook) = &__make_stacks_executable;
-#endif
-
   /* Register the fork generation counter with the libc.  */
   __libc_pthread_init (__reclaim_stacks);
 }
diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
index 8ab247f977..3a6b436400 100644
--- a/nptl/pthreadP.h
+++ b/nptl/pthreadP.h
@@ -335,8 +335,11 @@ extern void __deallocate_stack (struct pthread *pd) attribute_hidden;
    function also re-initializes the lock for the stack cache.  */
 extern void __reclaim_stacks (void) attribute_hidden;
 
-/* Make all threads's stacks executable.  */
-extern int __make_stacks_executable (void **stack_endp) attribute_hidden;
+/* Change the permissions of a thread stack.  Called from
+   _dl_make_stacks_executable and pthread_create.  */
+int
+__nptl_change_stack_perm (struct pthread *pd);
+rtld_hidden_proto (__nptl_change_stack_perm)
 
 /* longjmp handling.  */
 extern void __pthread_cleanup_upto (__jmp_buf target, char *targetframe);
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 81cce2e4d5..8426b5cbd8 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -416,10 +416,12 @@ struct rtld_global
 #endif
 #include <dl-procruntime.c>
 
+#if !PTHREAD_IN_LIBC
   /* If loading a shared object requires that we make the stack executable
      when it was not, we do it by calling this function.
      It returns an errno code or zero on success.  */
   EXTERN int (*_dl_make_stack_executable_hook) (void **);
+#endif
 
   /* Prevailing state of the stack, PF_X indicating it's executable.  */
   EXTERN ElfW(Word) _dl_stack_flags;
@@ -717,10 +719,17 @@ extern const ElfW(Phdr) *_dl_phdr;
 extern size_t _dl_phnum;
 #endif
 
+#if PTHREAD_IN_LIBC
+/* This function changes the permissions of all stacks (not just those
+   of the main stack).  */
+int _dl_make_stacks_executable (void **stack_endp) attribute_hidden;
+#else
 /* This is the initial value of GL(dl_make_stack_executable_hook).
-   A threads library can change it.  */
+   A threads library can change it.  The ld.so implementation changes
+   the permissions of the main stack only.  */
 extern int _dl_make_stack_executable (void **stack_endp);
 rtld_hidden_proto (_dl_make_stack_executable)
+#endif
 
 /* Variable pointing to the end of the stack (or close to it).  This value
    must be constant over the runtime of the application.  Some programs
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index c35f783e2a..220bb2dffe 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -181,3 +181,9 @@ libc {
     __netlink_assert_response;
   }
 }
+
+ld {
+  GLIBC_PRIVATE {
+    __nptl_change_stack_perm;
+  }
+}
diff --git a/sysdeps/unix/sysv/linux/dl-execstack.c b/sysdeps/unix/sysv/linux/dl-execstack.c
index 3339138c42..e2449d1890 100644
--- a/sysdeps/unix/sysv/linux/dl-execstack.c
+++ b/sysdeps/unix/sysv/linux/dl-execstack.c
@@ -16,20 +16,21 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <ldsodefs.h>
-#include <sys/mman.h>
 #include <errno.h>
+#include <ldsodefs.h>
 #include <libintl.h>
-#include <stdbool.h>
+#include <list.h>
+#include <nptl/pthreadP.h>
 #include <stackinfo.h>
+#include <stdbool.h>
+#include <sys/mman.h>
 #include <sysdep.h>
-
+#include <unistd.h>
 
 extern int __stack_prot attribute_relro attribute_hidden;
 
-
-int
-_dl_make_stack_executable (void **stack_endp)
+static int
+make_main_stack_executable (void **stack_endp)
 {
   /* This gives us the highest/lowest page that needs to be changed.  */
   uintptr_t page = ((uintptr_t) *stack_endp
@@ -56,4 +57,63 @@ _dl_make_stack_executable (void **stack_endp)
 
   return result;
 }
-rtld_hidden_def (_dl_make_stack_executable)
+
+int
+_dl_make_stacks_executable (void **stack_endp)
+{
+  /* First the main thread's stack.  */
+  int err = make_main_stack_executable (stack_endp);
+  if (err != 0)
+    return err;
+
+  lll_lock (GL (dl_stack_cache_lock), LLL_PRIVATE);
+
+  list_t *runp;
+  list_for_each (runp, &GL (dl_stack_used))
+    {
+      err = __nptl_change_stack_perm (list_entry (runp, struct pthread, list));
+      if (err != 0)
+	break;
+    }
+
+  /* Also change the permission for the currently unused stacks.  This
+     might be wasted time but better spend it here than adding a check
+     in the fast path.  */
+  if (err == 0)
+    list_for_each (runp, &GL (dl_stack_cache))
+      {
+	err = __nptl_change_stack_perm (list_entry (runp, struct pthread,
+						    list));
+	if (err != 0)
+	  break;
+      }
+
+  lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE);
+
+  return err;
+}
+
+int
+__nptl_change_stack_perm (struct pthread *pd)
+{
+#ifdef NEED_SEPARATE_REGISTER_STACK
+  size_t pagemask = __getpagesize () - 1;
+  void *stack = (pd->stackblock
+		 + (((((pd->stackblock_size - pd->guardsize) / 2)
+		      & pagemask) + pd->guardsize) & pagemask));
+  size_t len = pd->stackblock + pd->stackblock_size - stack;
+#elif _STACK_GROWS_DOWN
+  void *stack = pd->stackblock + pd->guardsize;
+  size_t len = pd->stackblock_size - pd->guardsize;
+#elif _STACK_GROWS_UP
+  void *stack = pd->stackblock;
+  size_t len = (uintptr_t) pd - pd->guardsize - (uintptr_t) pd->stackblock;
+#else
+# error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP"
+#endif
+  if (__mprotect (stack, len, PROT_READ | PROT_WRITE | PROT_EXEC) != 0)
+    return errno;
+
+  return 0;
+}
+rtld_hidden_def (__nptl_change_stack_perm)
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 11/13] nptl: Simplify resetting the in-flight stack in __reclaim_stacks
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
                   ` (9 preceding siblings ...)
  2021-05-06 18:11 ` [PATCH 10/13] nptl: Move changing of stack permissions into ld.so Florian Weimer
@ 2021-05-06 18:11 ` Florian Weimer
  2021-05-09 21:42   ` Carlos O'Donell
  2021-05-06 18:11 ` [PATCH 12/13] nptl: Move __default_pthread_attr, __default_pthread_attr_lock into libc Florian Weimer
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:11 UTC (permalink / raw)
  To: libc-alpha

stack_list_del overwrites the in-flight stack variable.
---
 nptl/allocatestack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 12cd1058d4..076cffd35b 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -799,6 +799,8 @@ __reclaim_stacks (void)
 	  elem->next->prev = elem->prev;
 	  elem->prev->next = elem->next;
 	}
+
+      GL (dl_in_flight_stack) = 0;
     }
 
   /* Mark all stacks except the still running one as free.  */
@@ -842,7 +844,7 @@ __reclaim_stacks (void)
   /* Remove the entry for the current thread to from the cache list
      and add it to the list of running threads.  Which of the two
      lists is decided by the user_stack flag.  */
-  stack_list_del (&self->list);
+  list_del (&self->list);
 
   /* Re-initialize the lists for all the threads.  */
   INIT_LIST_HEAD (&GL (dl_stack_used));
@@ -856,8 +858,6 @@ __reclaim_stacks (void)
   /* There is one thread running.  */
   __nptl_nthreads = 1;
 
-  GL (dl_in_flight_stack) = 0;
-
   /* Initialize locks.  */
   GL (dl_stack_cache_lock) = LLL_LOCK_INITIALIZER;
   __default_pthread_attr_lock = LLL_LOCK_INITIALIZER;
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 12/13] nptl: Move __default_pthread_attr, __default_pthread_attr_lock into libc
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
                   ` (10 preceding siblings ...)
  2021-05-06 18:11 ` [PATCH 11/13] nptl: Simplify resetting the in-flight stack in __reclaim_stacks Florian Weimer
@ 2021-05-06 18:11 ` Florian Weimer
  2021-05-09 21:41   ` Carlos O'Donell
  2021-05-09 21:42   ` Carlos O'Donell
  2021-05-06 18:11 ` [PATCH 13/13] Linux: Move __reclaim_stacks into the fork implementation in libc Florian Weimer
  2021-05-09 21:42 ` [PATCH 00/13] Linux: Move most stack management out of libpthread Carlos O'Donell
  13 siblings, 2 replies; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:11 UTC (permalink / raw)
  To: libc-alpha

The GLIBC_PRIVATE exports for these symbols are expected to be
temporary.
---
 nptl/Makefile   | 2 +-
 nptl/Versions   | 2 ++
 nptl/pthreadP.h | 8 +++++---
 nptl/vars.c     | 8 ++++++--
 4 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/nptl/Makefile b/nptl/Makefile
index b5f26c6864..f7723cb808 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -170,6 +170,7 @@ routines = \
   sem_wait \
   tpp \
   unwind \
+  vars \
 
 shared-only-routines = forward
 static-only-routines = pthread_atfork
@@ -212,7 +213,6 @@ libpthread-routines = \
   pthread_sigqueue \
   pthread_timedjoin \
   pthread_tryjoin \
-  vars \
   version \
 
 libpthread-shared-only-routines = \
diff --git a/nptl/Versions b/nptl/Versions
index fb15a7e8eb..d439a023b7 100644
--- a/nptl/Versions
+++ b/nptl/Versions
@@ -298,6 +298,8 @@ libc {
     tss_set;
   }
   GLIBC_PRIVATE {
+    __default_pthread_attr;
+    __default_pthread_attr_lock;
     __futex_abstimed_wait64;
     __futex_abstimed_wait_cancelable64;
     __init_sched_fifo_prio;
diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
index 3a6b436400..6b912f053b 100644
--- a/nptl/pthreadP.h
+++ b/nptl/pthreadP.h
@@ -198,9 +198,11 @@ enum
 
 
 /* Default pthread attributes.  */
-extern union pthread_attr_transparent __default_pthread_attr attribute_hidden;
-extern int __default_pthread_attr_lock attribute_hidden;
-/* Called from __libpthread_freeres to deallocate the default attribute.  */
+extern union pthread_attr_transparent __default_pthread_attr;
+libc_hidden_proto (__default_pthread_attr)
+extern int __default_pthread_attr_lock;
+libc_hidden_proto (__default_pthread_attr_lock)
+/* Called from __libc_freeres to deallocate the default attribute.  */
 extern void __default_pthread_attr_freeres (void) attribute_hidden;
 
 /* Size and alignment of static TLS block.  */
diff --git a/nptl/vars.c b/nptl/vars.c
index 03a6cc84be..989d7930e0 100644
--- a/nptl/vars.c
+++ b/nptl/vars.c
@@ -22,7 +22,11 @@
 
 /* Default thread attributes for the case when the user does not
    provide any.  */
-union pthread_attr_transparent __default_pthread_attr attribute_hidden;
+union pthread_attr_transparent __default_pthread_attr
+  __attribute__ ((nocommon));
+libc_hidden_data_def (__default_pthread_attr)
 
 /* Mutex protecting __default_pthread_attr.  */
-int __default_pthread_attr_lock = LLL_LOCK_INITIALIZER;
+int __default_pthread_attr_lock __attribute__ ((nocommon))
+  = LLL_LOCK_INITIALIZER;
+libc_hidden_data_def (__default_pthread_attr_lock)
-- 
2.30.2



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 13/13] Linux: Move __reclaim_stacks into the fork implementation in libc
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
                   ` (11 preceding siblings ...)
  2021-05-06 18:11 ` [PATCH 12/13] nptl: Move __default_pthread_attr, __default_pthread_attr_lock into libc Florian Weimer
@ 2021-05-06 18:11 ` Florian Weimer
  2021-05-09 21:41   ` Carlos O'Donell
  2021-05-09 21:42 ` [PATCH 00/13] Linux: Move most stack management out of libpthread Carlos O'Donell
  13 siblings, 1 reply; 30+ messages in thread
From: Florian Weimer @ 2021-05-06 18:11 UTC (permalink / raw)
  To: libc-alpha

As a result, __libc_pthread_init is no longer needed.
---
 nptl/Versions        |   1 -
 nptl/allocatestack.c | 108 ------------------------------------------
 nptl/nptl-init.c     |   3 --
 nptl/pthreadP.h      |   7 ---
 sysdeps/nptl/fork.c  | 110 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 110 insertions(+), 119 deletions(-)

diff --git a/nptl/Versions b/nptl/Versions
index d439a023b7..4c1c4ee0a7 100644
--- a/nptl/Versions
+++ b/nptl/Versions
@@ -308,7 +308,6 @@ libc {
     __libc_cleanup_push_defer;
     __libc_dl_error_tsd;
     __libc_multiple_threads;
-    __libc_pthread_init;
     __lll_clocklock_elision;
     __lll_lock_elision;
     __lll_lock_wait;
diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 076cffd35b..8672e89e75 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -754,111 +754,3 @@ __deallocate_stack (struct pthread *pd)
 
   lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE);
 }
-
-/* In case of a fork() call the memory allocation in the child will be
-   the same but only one thread is running.  All stacks except that of
-   the one running thread are not used anymore.  We have to recycle
-   them.  */
-void
-__reclaim_stacks (void)
-{
-  struct pthread *self = (struct pthread *) THREAD_SELF;
-
-  /* No locking necessary.  The caller is the only stack in use.  But
-     we have to be aware that we might have interrupted a list
-     operation.  */
-
-  if (GL (dl_in_flight_stack) != 0)
-    {
-      bool add_p = GL (dl_in_flight_stack) & 1;
-      list_t *elem = (list_t *) (GL (dl_in_flight_stack) & ~(uintptr_t) 1);
-
-      if (add_p)
-	{
-	  /* We always add at the beginning of the list.  So in this case we
-	     only need to check the beginning of these lists to see if the
-	     pointers at the head of the list are inconsistent.  */
-	  list_t *l = NULL;
-
-	  if (GL (dl_stack_used).next->prev != &GL (dl_stack_used))
-	    l = &GL (dl_stack_used);
-	  else if (GL (dl_stack_cache).next->prev != &GL (dl_stack_cache))
-	    l = &GL (dl_stack_cache);
-
-	  if (l != NULL)
-	    {
-	      assert (l->next->prev == elem);
-	      elem->next = l->next;
-	      elem->prev = l;
-	      l->next = elem;
-	    }
-	}
-      else
-	{
-	  /* We can simply always replay the delete operation.  */
-	  elem->next->prev = elem->prev;
-	  elem->prev->next = elem->next;
-	}
-
-      GL (dl_in_flight_stack) = 0;
-    }
-
-  /* Mark all stacks except the still running one as free.  */
-  list_t *runp;
-  list_for_each (runp, &GL (dl_stack_used))
-    {
-      struct pthread *curp = list_entry (runp, struct pthread, list);
-      if (curp != self)
-	{
-	  /* This marks the stack as free.  */
-	  curp->tid = 0;
-
-	  /* Account for the size of the stack.  */
-	  GL (dl_stack_cache_actsize) += curp->stackblock_size;
-
-	  if (curp->specific_used)
-	    {
-	      /* Clear the thread-specific data.  */
-	      memset (curp->specific_1stblock, '\0',
-		      sizeof (curp->specific_1stblock));
-
-	      curp->specific_used = false;
-
-	      for (size_t cnt = 1; cnt < PTHREAD_KEY_1STLEVEL_SIZE; ++cnt)
-		if (curp->specific[cnt] != NULL)
-		  {
-		    memset (curp->specific[cnt], '\0',
-			    sizeof (curp->specific_1stblock));
-
-		    /* We have allocated the block which we do not
-		       free here so re-set the bit.  */
-		    curp->specific_used = true;
-		  }
-	    }
-	}
-    }
-
-  /* Add the stack of all running threads to the cache.  */
-  list_splice (&GL (dl_stack_used), &GL (dl_stack_cache));
-
-  /* Remove the entry for the current thread to from the cache list
-     and add it to the list of running threads.  Which of the two
-     lists is decided by the user_stack flag.  */
-  list_del (&self->list);
-
-  /* Re-initialize the lists for all the threads.  */
-  INIT_LIST_HEAD (&GL (dl_stack_used));
-  INIT_LIST_HEAD (&GL (dl_stack_user));
-
-  if (__glibc_unlikely (THREAD_GETMEM (self, user_stack)))
-    list_add (&self->list, &GL (dl_stack_user));
-  else
-    list_add (&self->list, &GL (dl_stack_used));
-
-  /* There is one thread running.  */
-  __nptl_nthreads = 1;
-
-  /* Initialize locks.  */
-  GL (dl_stack_cache_lock) = LLL_LOCK_INITIALIZER;
-  __default_pthread_attr_lock = LLL_LOCK_INITIALIZER;
-}
diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c
index 4c89e7a792..16fb66bdf5 100644
--- a/nptl/nptl-init.c
+++ b/nptl/nptl-init.c
@@ -172,9 +172,6 @@ __pthread_initialize_minimal_internal (void)
   __default_pthread_attr.internal.stacksize = limit.rlim_cur;
   __default_pthread_attr.internal.guardsize = GLRO (dl_pagesize);
   lll_unlock (__default_pthread_attr_lock, LLL_PRIVATE);
-
-  /* Register the fork generation counter with the libc.  */
-  __libc_pthread_init (__reclaim_stacks);
 }
 strong_alias (__pthread_initialize_minimal_internal,
 	      __pthread_initialize_minimal)
diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
index 6b912f053b..d9b97c814a 100644
--- a/nptl/pthreadP.h
+++ b/nptl/pthreadP.h
@@ -333,10 +333,6 @@ extern void __free_tcb (struct pthread *pd) attribute_hidden;
 /* Free allocated stack.  */
 extern void __deallocate_stack (struct pthread *pd) attribute_hidden;
 
-/* Mark all the stacks except for the current one as available.  This
-   function also re-initializes the lock for the stack cache.  */
-extern void __reclaim_stacks (void) attribute_hidden;
-
 /* Change the permissions of a thread stack.  Called from
    _dl_make_stacks_executable and pthread_create.  */
 int
@@ -372,9 +368,6 @@ extern unsigned long int __fork_generation attribute_hidden;
 /* Pointer to the fork generation counter in the thread library.  */
 extern unsigned long int *__fork_generation_pointer attribute_hidden;
 
-/* Register the generation counter in the libpthread with the libc.  */
-extern void __libc_pthread_init (void (*reclaim) (void));
-
 extern size_t __pthread_get_minstack (const pthread_attr_t *attr);
 
 /* Namespace save aliases.  */
diff --git a/sysdeps/nptl/fork.c b/sysdeps/nptl/fork.c
index f41c40fca0..062b01265a 100644
--- a/sysdeps/nptl/fork.c
+++ b/sysdeps/nptl/fork.c
@@ -35,6 +35,7 @@
 #include <nss/nss_database.h>
 #include <unwind-link.h>
 #include <sys/single_threaded.h>
+#include <list.h>
 
 static void
 fresetlockfiles (void)
@@ -46,6 +47,106 @@ fresetlockfiles (void)
       _IO_lock_init (*((_IO_lock_t *) _IO_iter_file(i)->_lock));
 }
 
+/* In case of a fork() call the memory allocation in the child will be
+   the same but only one thread is running.  All stacks except that of
+   the one running thread are not used anymore.  We have to recycle
+   them.  */
+static void
+reclaim_stacks (void)
+{
+  struct pthread *self = (struct pthread *) THREAD_SELF;
+
+  /* No locking necessary.  The caller is the only stack in use.  But
+     we have to be aware that we might have interrupted a list
+     operation.  */
+
+  if (GL (dl_in_flight_stack) != 0)
+    {
+      bool add_p = GL (dl_in_flight_stack) & 1;
+      list_t *elem = (list_t *) (GL (dl_in_flight_stack) & ~(uintptr_t) 1);
+
+      if (add_p)
+	{
+	  /* We always add at the beginning of the list.  So in this case we
+	     only need to check the beginning of these lists to see if the
+	     pointers at the head of the list are inconsistent.  */
+	  list_t *l = NULL;
+
+	  if (GL (dl_stack_used).next->prev != &GL (dl_stack_used))
+	    l = &GL (dl_stack_used);
+	  else if (GL (dl_stack_cache).next->prev != &GL (dl_stack_cache))
+	    l = &GL (dl_stack_cache);
+
+	  if (l != NULL)
+	    {
+	      assert (l->next->prev == elem);
+	      elem->next = l->next;
+	      elem->prev = l;
+	      l->next = elem;
+	    }
+	}
+      else
+	{
+	  /* We can simply always replay the delete operation.  */
+	  elem->next->prev = elem->prev;
+	  elem->prev->next = elem->next;
+	}
+
+      GL (dl_in_flight_stack) = 0;
+    }
+
+  /* Mark all stacks except the still running one as free.  */
+  list_t *runp;
+  list_for_each (runp, &GL (dl_stack_used))
+    {
+      struct pthread *curp = list_entry (runp, struct pthread, list);
+      if (curp != self)
+	{
+	  /* This marks the stack as free.  */
+	  curp->tid = 0;
+
+	  /* Account for the size of the stack.  */
+	  GL (dl_stack_cache_actsize) += curp->stackblock_size;
+
+	  if (curp->specific_used)
+	    {
+	      /* Clear the thread-specific data.  */
+	      memset (curp->specific_1stblock, '\0',
+		      sizeof (curp->specific_1stblock));
+
+	      curp->specific_used = false;
+
+	      for (size_t cnt = 1; cnt < PTHREAD_KEY_1STLEVEL_SIZE; ++cnt)
+		if (curp->specific[cnt] != NULL)
+		  {
+		    memset (curp->specific[cnt], '\0',
+			    sizeof (curp->specific_1stblock));
+
+		    /* We have allocated the block which we do not
+		       free here so re-set the bit.  */
+		    curp->specific_used = true;
+		  }
+	    }
+	}
+    }
+
+  /* Add the stack of all running threads to the cache.  */
+  list_splice (&GL (dl_stack_used), &GL (dl_stack_cache));
+
+  /* Remove the entry for the current thread to from the cache list
+     and add it to the list of running threads.  Which of the two
+     lists is decided by the user_stack flag.  */
+  list_del (&self->list);
+
+  /* Re-initialize the lists for all the threads.  */
+  INIT_LIST_HEAD (&GL (dl_stack_used));
+  INIT_LIST_HEAD (&GL (dl_stack_user));
+
+  if (__glibc_unlikely (THREAD_GETMEM (self, user_stack)))
+    list_add (&self->list, &GL (dl_stack_user));
+  else
+    list_add (&self->list, &GL (dl_stack_used));
+}
 
 pid_t
 __libc_fork (void)
@@ -112,6 +213,13 @@ __libc_fork (void)
 	{
 	  __libc_unwind_link_after_fork ();
 
+	  /* There is one thread running.  */
+	  __nptl_nthreads = 1;
+
+	  /* Initialize thread library locks.  */
+	  GL (dl_stack_cache_lock) = LLL_LOCK_INITIALIZER;
+	  __default_pthread_attr_lock = LLL_LOCK_INITIALIZER;
+
 	  /* Release malloc locks.  */
 	  call_function_static_weak (__malloc_fork_unlock_child);
 
@@ -128,6 +236,8 @@ __libc_fork (void)
       /* Reset the lock the dynamic loader uses to protect its data.  */
       __rtld_lock_initialize (GL(dl_load_lock));
 
+      reclaim_stacks ();
+
       /* Run the handlers registered for the child.  */
       __run_fork_handlers (atfork_run_child, multiple_threads);
     }
-- 
2.30.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 13/13] Linux: Move __reclaim_stacks into the fork implementation in libc
  2021-05-06 18:11 ` [PATCH 13/13] Linux: Move __reclaim_stacks into the fork implementation in libc Florian Weimer
@ 2021-05-09 21:41   ` Carlos O'Donell
  0 siblings, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:41 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:11 PM, Florian Weimer via Libc-alpha wrote:
> As a result, __libc_pthread_init is no longer needed.

Yay! :-)

LGTM.

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>


> ---
>  nptl/Versions        |   1 -
>  nptl/allocatestack.c | 108 ------------------------------------------
>  nptl/nptl-init.c     |   3 --
>  nptl/pthreadP.h      |   7 ---
>  sysdeps/nptl/fork.c  | 110 +++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 110 insertions(+), 119 deletions(-)
> 
> diff --git a/nptl/Versions b/nptl/Versions
> index d439a023b7..4c1c4ee0a7 100644
> --- a/nptl/Versions
> +++ b/nptl/Versions
> @@ -308,7 +308,6 @@ libc {
>      __libc_cleanup_push_defer;
>      __libc_dl_error_tsd;
>      __libc_multiple_threads;
> -    __libc_pthread_init;

OK.

>      __lll_clocklock_elision;
>      __lll_lock_elision;
>      __lll_lock_wait;
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 076cffd35b..8672e89e75 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -754,111 +754,3 @@ __deallocate_stack (struct pthread *pd)
>  
>    lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE);
>  }
> -
> -/* In case of a fork() call the memory allocation in the child will be
> -   the same but only one thread is running.  All stacks except that of
> -   the one running thread are not used anymore.  We have to recycle
> -   them.  */
> -void
> -__reclaim_stacks (void)
> -{
> -  struct pthread *self = (struct pthread *) THREAD_SELF;
> -
> -  /* No locking necessary.  The caller is the only stack in use.  But
> -     we have to be aware that we might have interrupted a list
> -     operation.  */
> -
> -  if (GL (dl_in_flight_stack) != 0)
> -    {
> -      bool add_p = GL (dl_in_flight_stack) & 1;
> -      list_t *elem = (list_t *) (GL (dl_in_flight_stack) & ~(uintptr_t) 1);
> -
> -      if (add_p)
> -	{
> -	  /* We always add at the beginning of the list.  So in this case we
> -	     only need to check the beginning of these lists to see if the
> -	     pointers at the head of the list are inconsistent.  */
> -	  list_t *l = NULL;
> -
> -	  if (GL (dl_stack_used).next->prev != &GL (dl_stack_used))
> -	    l = &GL (dl_stack_used);
> -	  else if (GL (dl_stack_cache).next->prev != &GL (dl_stack_cache))
> -	    l = &GL (dl_stack_cache);
> -
> -	  if (l != NULL)
> -	    {
> -	      assert (l->next->prev == elem);
> -	      elem->next = l->next;
> -	      elem->prev = l;
> -	      l->next = elem;
> -	    }
> -	}
> -      else
> -	{
> -	  /* We can simply always replay the delete operation.  */
> -	  elem->next->prev = elem->prev;
> -	  elem->prev->next = elem->next;
> -	}
> -
> -      GL (dl_in_flight_stack) = 0;
> -    }
> -
> -  /* Mark all stacks except the still running one as free.  */
> -  list_t *runp;
> -  list_for_each (runp, &GL (dl_stack_used))
> -    {
> -      struct pthread *curp = list_entry (runp, struct pthread, list);
> -      if (curp != self)
> -	{
> -	  /* This marks the stack as free.  */
> -	  curp->tid = 0;
> -
> -	  /* Account for the size of the stack.  */
> -	  GL (dl_stack_cache_actsize) += curp->stackblock_size;
> -
> -	  if (curp->specific_used)
> -	    {
> -	      /* Clear the thread-specific data.  */
> -	      memset (curp->specific_1stblock, '\0',
> -		      sizeof (curp->specific_1stblock));
> -
> -	      curp->specific_used = false;
> -
> -	      for (size_t cnt = 1; cnt < PTHREAD_KEY_1STLEVEL_SIZE; ++cnt)
> -		if (curp->specific[cnt] != NULL)
> -		  {
> -		    memset (curp->specific[cnt], '\0',
> -			    sizeof (curp->specific_1stblock));
> -
> -		    /* We have allocated the block which we do not
> -		       free here so re-set the bit.  */
> -		    curp->specific_used = true;
> -		  }
> -	    }
> -	}
> -    }
> -
> -  /* Add the stack of all running threads to the cache.  */
> -  list_splice (&GL (dl_stack_used), &GL (dl_stack_cache));
> -
> -  /* Remove the entry for the current thread to from the cache list
> -     and add it to the list of running threads.  Which of the two
> -     lists is decided by the user_stack flag.  */
> -  list_del (&self->list);
> -
> -  /* Re-initialize the lists for all the threads.  */
> -  INIT_LIST_HEAD (&GL (dl_stack_used));
> -  INIT_LIST_HEAD (&GL (dl_stack_user));
> -
> -  if (__glibc_unlikely (THREAD_GETMEM (self, user_stack)))
> -    list_add (&self->list, &GL (dl_stack_user));
> -  else
> -    list_add (&self->list, &GL (dl_stack_used));
> -
> -  /* There is one thread running.  */
> -  __nptl_nthreads = 1;
> -
> -  /* Initialize locks.  */
> -  GL (dl_stack_cache_lock) = LLL_LOCK_INITIALIZER;
> -  __default_pthread_attr_lock = LLL_LOCK_INITIALIZER;
> -}
> diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c
> index 4c89e7a792..16fb66bdf5 100644
> --- a/nptl/nptl-init.c
> +++ b/nptl/nptl-init.c
> @@ -172,9 +172,6 @@ __pthread_initialize_minimal_internal (void)
>    __default_pthread_attr.internal.stacksize = limit.rlim_cur;
>    __default_pthread_attr.internal.guardsize = GLRO (dl_pagesize);
>    lll_unlock (__default_pthread_attr_lock, LLL_PRIVATE);
> -
> -  /* Register the fork generation counter with the libc.  */
> -  __libc_pthread_init (__reclaim_stacks);
>  }
>  strong_alias (__pthread_initialize_minimal_internal,
>  	      __pthread_initialize_minimal)
> diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
> index 6b912f053b..d9b97c814a 100644
> --- a/nptl/pthreadP.h
> +++ b/nptl/pthreadP.h
> @@ -333,10 +333,6 @@ extern void __free_tcb (struct pthread *pd) attribute_hidden;
>  /* Free allocated stack.  */
>  extern void __deallocate_stack (struct pthread *pd) attribute_hidden;
>  
> -/* Mark all the stacks except for the current one as available.  This
> -   function also re-initializes the lock for the stack cache.  */
> -extern void __reclaim_stacks (void) attribute_hidden;
> -
>  /* Change the permissions of a thread stack.  Called from
>     _dl_make_stacks_executable and pthread_create.  */
>  int
> @@ -372,9 +368,6 @@ extern unsigned long int __fork_generation attribute_hidden;
>  /* Pointer to the fork generation counter in the thread library.  */
>  extern unsigned long int *__fork_generation_pointer attribute_hidden;
>  
> -/* Register the generation counter in the libpthread with the libc.  */
> -extern void __libc_pthread_init (void (*reclaim) (void));
> -
>  extern size_t __pthread_get_minstack (const pthread_attr_t *attr);
>  
>  /* Namespace save aliases.  */
> diff --git a/sysdeps/nptl/fork.c b/sysdeps/nptl/fork.c
> index f41c40fca0..062b01265a 100644
> --- a/sysdeps/nptl/fork.c
> +++ b/sysdeps/nptl/fork.c
> @@ -35,6 +35,7 @@
>  #include <nss/nss_database.h>
>  #include <unwind-link.h>
>  #include <sys/single_threaded.h>
> +#include <list.h>
>  
>  static void
>  fresetlockfiles (void)
> @@ -46,6 +47,106 @@ fresetlockfiles (void)
>        _IO_lock_init (*((_IO_lock_t *) _IO_iter_file(i)->_lock));
>  }
>  
> +/* In case of a fork() call the memory allocation in the child will be
> +   the same but only one thread is running.  All stacks except that of
> +   the one running thread are not used anymore.  We have to recycle
> +   them.  */
> +static void
> +reclaim_stacks (void)
> +{
> +  struct pthread *self = (struct pthread *) THREAD_SELF;
> +
> +  /* No locking necessary.  The caller is the only stack in use.  But
> +     we have to be aware that we might have interrupted a list
> +     operation.  */
> +
> +  if (GL (dl_in_flight_stack) != 0)
> +    {
> +      bool add_p = GL (dl_in_flight_stack) & 1;
> +      list_t *elem = (list_t *) (GL (dl_in_flight_stack) & ~(uintptr_t) 1);
> +
> +      if (add_p)
> +	{
> +	  /* We always add at the beginning of the list.  So in this case we
> +	     only need to check the beginning of these lists to see if the
> +	     pointers at the head of the list are inconsistent.  */
> +	  list_t *l = NULL;
> +
> +	  if (GL (dl_stack_used).next->prev != &GL (dl_stack_used))
> +	    l = &GL (dl_stack_used);
> +	  else if (GL (dl_stack_cache).next->prev != &GL (dl_stack_cache))
> +	    l = &GL (dl_stack_cache);
> +
> +	  if (l != NULL)
> +	    {
> +	      assert (l->next->prev == elem);
> +	      elem->next = l->next;
> +	      elem->prev = l;
> +	      l->next = elem;
> +	    }
> +	}
> +      else
> +	{
> +	  /* We can simply always replay the delete operation.  */
> +	  elem->next->prev = elem->prev;
> +	  elem->prev->next = elem->next;
> +	}
> +
> +      GL (dl_in_flight_stack) = 0;
> +    }
> +
> +  /* Mark all stacks except the still running one as free.  */
> +  list_t *runp;
> +  list_for_each (runp, &GL (dl_stack_used))
> +    {
> +      struct pthread *curp = list_entry (runp, struct pthread, list);
> +      if (curp != self)
> +	{
> +	  /* This marks the stack as free.  */
> +	  curp->tid = 0;
> +
> +	  /* Account for the size of the stack.  */
> +	  GL (dl_stack_cache_actsize) += curp->stackblock_size;
> +
> +	  if (curp->specific_used)
> +	    {
> +	      /* Clear the thread-specific data.  */
> +	      memset (curp->specific_1stblock, '\0',
> +		      sizeof (curp->specific_1stblock));
> +
> +	      curp->specific_used = false;
> +
> +	      for (size_t cnt = 1; cnt < PTHREAD_KEY_1STLEVEL_SIZE; ++cnt)
> +		if (curp->specific[cnt] != NULL)
> +		  {
> +		    memset (curp->specific[cnt], '\0',
> +			    sizeof (curp->specific_1stblock));
> +
> +		    /* We have allocated the block which we do not
> +		       free here so re-set the bit.  */
> +		    curp->specific_used = true;
> +		  }
> +	    }
> +	}
> +    }
> +
> +  /* Add the stack of all running threads to the cache.  */
> +  list_splice (&GL (dl_stack_used), &GL (dl_stack_cache));
> +
> +  /* Remove the entry for the current thread to from the cache list
> +     and add it to the list of running threads.  Which of the two
> +     lists is decided by the user_stack flag.  */
> +  list_del (&self->list);
> +
> +  /* Re-initialize the lists for all the threads.  */
> +  INIT_LIST_HEAD (&GL (dl_stack_used));
> +  INIT_LIST_HEAD (&GL (dl_stack_user));
> +
> +  if (__glibc_unlikely (THREAD_GETMEM (self, user_stack)))
> +    list_add (&self->list, &GL (dl_stack_user));
> +  else
> +    list_add (&self->list, &GL (dl_stack_used));
> +}
>  
>  pid_t
>  __libc_fork (void)
> @@ -112,6 +213,13 @@ __libc_fork (void)
>  	{
>  	  __libc_unwind_link_after_fork ();
>  
> +	  /* There is one thread running.  */
> +	  __nptl_nthreads = 1;
> +
> +	  /* Initialize thread library locks.  */
> +	  GL (dl_stack_cache_lock) = LLL_LOCK_INITIALIZER;
> +	  __default_pthread_attr_lock = LLL_LOCK_INITIALIZER;
> +
>  	  /* Release malloc locks.  */
>  	  call_function_static_weak (__malloc_fork_unlock_child);
>  
> @@ -128,6 +236,8 @@ __libc_fork (void)
>        /* Reset the lock the dynamic loader uses to protect its data.  */
>        __rtld_lock_initialize (GL(dl_load_lock));
>  
> +      reclaim_stacks ();
> +
>        /* Run the handlers registered for the child.  */
>        __run_fork_handlers (atfork_run_child, multiple_threads);
>      }
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/13] nptl: Move __default_pthread_attr, __default_pthread_attr_lock into libc
  2021-05-06 18:11 ` [PATCH 12/13] nptl: Move __default_pthread_attr, __default_pthread_attr_lock into libc Florian Weimer
@ 2021-05-09 21:41   ` Carlos O'Donell
  2021-05-09 21:42   ` Carlos O'Donell
  1 sibling, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:41 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:11 PM, Florian Weimer via Libc-alpha wrote:
> The GLIBC_PRIVATE exports for these symbols are expected to be
> temporary.

LGTM.

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>


> ---
>  nptl/Makefile   | 2 +-
>  nptl/Versions   | 2 ++
>  nptl/pthreadP.h | 8 +++++---
>  nptl/vars.c     | 8 ++++++--
>  4 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/nptl/Makefile b/nptl/Makefile
> index b5f26c6864..f7723cb808 100644
> --- a/nptl/Makefile
> +++ b/nptl/Makefile
> @@ -170,6 +170,7 @@ routines = \
>    sem_wait \
>    tpp \
>    unwind \
> +  vars \
>  
>  shared-only-routines = forward
>  static-only-routines = pthread_atfork
> @@ -212,7 +213,6 @@ libpthread-routines = \
>    pthread_sigqueue \
>    pthread_timedjoin \
>    pthread_tryjoin \
> -  vars \
>    version \
>  
>  libpthread-shared-only-routines = \
> diff --git a/nptl/Versions b/nptl/Versions
> index fb15a7e8eb..d439a023b7 100644
> --- a/nptl/Versions
> +++ b/nptl/Versions
> @@ -298,6 +298,8 @@ libc {
>      tss_set;
>    }
>    GLIBC_PRIVATE {
> +    __default_pthread_attr;
> +    __default_pthread_attr_lock;
>      __futex_abstimed_wait64;
>      __futex_abstimed_wait_cancelable64;
>      __init_sched_fifo_prio;
> diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
> index 3a6b436400..6b912f053b 100644
> --- a/nptl/pthreadP.h
> +++ b/nptl/pthreadP.h
> @@ -198,9 +198,11 @@ enum
>  
>  
>  /* Default pthread attributes.  */
> -extern union pthread_attr_transparent __default_pthread_attr attribute_hidden;
> -extern int __default_pthread_attr_lock attribute_hidden;
> -/* Called from __libpthread_freeres to deallocate the default attribute.  */
> +extern union pthread_attr_transparent __default_pthread_attr;
> +libc_hidden_proto (__default_pthread_attr)
> +extern int __default_pthread_attr_lock;
> +libc_hidden_proto (__default_pthread_attr_lock)
> +/* Called from __libc_freeres to deallocate the default attribute.  */
>  extern void __default_pthread_attr_freeres (void) attribute_hidden;
>  
>  /* Size and alignment of static TLS block.  */
> diff --git a/nptl/vars.c b/nptl/vars.c
> index 03a6cc84be..989d7930e0 100644
> --- a/nptl/vars.c
> +++ b/nptl/vars.c
> @@ -22,7 +22,11 @@
>  
>  /* Default thread attributes for the case when the user does not
>     provide any.  */
> -union pthread_attr_transparent __default_pthread_attr attribute_hidden;
> +union pthread_attr_transparent __default_pthread_attr
> +  __attribute__ ((nocommon));
> +libc_hidden_data_def (__default_pthread_attr)
>  
>  /* Mutex protecting __default_pthread_attr.  */
> -int __default_pthread_attr_lock = LLL_LOCK_INITIALIZER;
> +int __default_pthread_attr_lock __attribute__ ((nocommon))
> +  = LLL_LOCK_INITIALIZER;
> +libc_hidden_data_def (__default_pthread_attr_lock)
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 12/13] nptl: Move __default_pthread_attr, __default_pthread_attr_lock into libc
  2021-05-06 18:11 ` [PATCH 12/13] nptl: Move __default_pthread_attr, __default_pthread_attr_lock into libc Florian Weimer
  2021-05-09 21:41   ` Carlos O'Donell
@ 2021-05-09 21:42   ` Carlos O'Donell
  1 sibling, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:42 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:11 PM, Florian Weimer via Libc-alpha wrote:
> The GLIBC_PRIVATE exports for these symbols are expected to be
> temporary.
> ---
>  nptl/Makefile   | 2 +-
>  nptl/Versions   | 2 ++
>  nptl/pthreadP.h | 8 +++++---
>  nptl/vars.c     | 8 ++++++--
>  4 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/nptl/Makefile b/nptl/Makefile
> index b5f26c6864..f7723cb808 100644
> --- a/nptl/Makefile
> +++ b/nptl/Makefile
> @@ -170,6 +170,7 @@ routines = \
>    sem_wait \
>    tpp \
>    unwind \
> +  vars \
>  
>  shared-only-routines = forward
>  static-only-routines = pthread_atfork
> @@ -212,7 +213,6 @@ libpthread-routines = \
>    pthread_sigqueue \
>    pthread_timedjoin \
>    pthread_tryjoin \
> -  vars \
>    version \
>  
>  libpthread-shared-only-routines = \
> diff --git a/nptl/Versions b/nptl/Versions
> index fb15a7e8eb..d439a023b7 100644
> --- a/nptl/Versions
> +++ b/nptl/Versions
> @@ -298,6 +298,8 @@ libc {
>      tss_set;
>    }
>    GLIBC_PRIVATE {
> +    __default_pthread_attr;
> +    __default_pthread_attr_lock;
>      __futex_abstimed_wait64;
>      __futex_abstimed_wait_cancelable64;
>      __init_sched_fifo_prio;
> diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
> index 3a6b436400..6b912f053b 100644
> --- a/nptl/pthreadP.h
> +++ b/nptl/pthreadP.h
> @@ -198,9 +198,11 @@ enum
>  
>  
>  /* Default pthread attributes.  */
> -extern union pthread_attr_transparent __default_pthread_attr attribute_hidden;
> -extern int __default_pthread_attr_lock attribute_hidden;
> -/* Called from __libpthread_freeres to deallocate the default attribute.  */
> +extern union pthread_attr_transparent __default_pthread_attr;
> +libc_hidden_proto (__default_pthread_attr)
> +extern int __default_pthread_attr_lock;
> +libc_hidden_proto (__default_pthread_attr_lock)
> +/* Called from __libc_freeres to deallocate the default attribute.  */
>  extern void __default_pthread_attr_freeres (void) attribute_hidden;
>  
>  /* Size and alignment of static TLS block.  */
> diff --git a/nptl/vars.c b/nptl/vars.c
> index 03a6cc84be..989d7930e0 100644
> --- a/nptl/vars.c
> +++ b/nptl/vars.c
> @@ -22,7 +22,11 @@
>  
>  /* Default thread attributes for the case when the user does not
>     provide any.  */
> -union pthread_attr_transparent __default_pthread_attr attribute_hidden;
> +union pthread_attr_transparent __default_pthread_attr
> +  __attribute__ ((nocommon));
> +libc_hidden_data_def (__default_pthread_attr)
>  
>  /* Mutex protecting __default_pthread_attr.  */
> -int __default_pthread_attr_lock = LLL_LOCK_INITIALIZER;
> +int __default_pthread_attr_lock __attribute__ ((nocommon))
> +  = LLL_LOCK_INITIALIZER;
> +libc_hidden_data_def (__default_pthread_attr_lock)
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 11/13] nptl: Simplify resetting the in-flight stack in __reclaim_stacks
  2021-05-06 18:11 ` [PATCH 11/13] nptl: Simplify resetting the in-flight stack in __reclaim_stacks Florian Weimer
@ 2021-05-09 21:42   ` Carlos O'Donell
  0 siblings, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:42 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:11 PM, Florian Weimer via Libc-alpha wrote:
> stack_list_del overwrites the in-flight stack variable.

LGTM. Hard to tell if I want to call this is a bug fix or a cleanup
that just happens naturally as part of the refactoring :-)

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> ---
>  nptl/allocatestack.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 12cd1058d4..076cffd35b 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -799,6 +799,8 @@ __reclaim_stacks (void)
>  	  elem->next->prev = elem->prev;
>  	  elem->prev->next = elem->next;
>  	}
> +
> +      GL (dl_in_flight_stack) = 0;
>      }
>  
>    /* Mark all stacks except the still running one as free.  */
> @@ -842,7 +844,7 @@ __reclaim_stacks (void)
>    /* Remove the entry for the current thread to from the cache list
>       and add it to the list of running threads.  Which of the two
>       lists is decided by the user_stack flag.  */
> -  stack_list_del (&self->list);
> +  list_del (&self->list);
>  
>    /* Re-initialize the lists for all the threads.  */
>    INIT_LIST_HEAD (&GL (dl_stack_used));
> @@ -856,8 +858,6 @@ __reclaim_stacks (void)
>    /* There is one thread running.  */
>    __nptl_nthreads = 1;
>  
> -  GL (dl_in_flight_stack) = 0;
> -
>    /* Initialize locks.  */
>    GL (dl_stack_cache_lock) = LLL_LOCK_INITIALIZER;
>    __default_pthread_attr_lock = LLL_LOCK_INITIALIZER;
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/13] nptl: Move changing of stack permissions into ld.so
  2021-05-06 18:11 ` [PATCH 10/13] nptl: Move changing of stack permissions into ld.so Florian Weimer
@ 2021-05-09 21:42   ` Carlos O'Donell
  0 siblings, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:42 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:11 PM, Florian Weimer via Libc-alpha wrote:
> All the stack lists are now in _rtld_global, so it is possible
> to change stack permissions directly from there, instead of
> calling into libpthread to do the change.

LGTM.

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>


> ---
>  elf/dl-load.c                          |  4 ++
>  elf/dl-support.c                       | 10 ++--
>  elf/rtld.c                             |  2 +
>  nptl/allocatestack.c                   | 63 +--------------------
>  nptl/nptl-init.c                       |  4 --
>  nptl/pthreadP.h                        |  7 ++-
>  sysdeps/generic/ldsodefs.h             | 11 +++-
>  sysdeps/unix/sysv/linux/Versions       |  6 ++
>  sysdeps/unix/sysv/linux/dl-execstack.c | 76 +++++++++++++++++++++++---
>  9 files changed, 100 insertions(+), 83 deletions(-)
> 
> diff --git a/elf/dl-load.c b/elf/dl-load.c
> index 2832ab3540..918ec7546c 100644
> --- a/elf/dl-load.c
> +++ b/elf/dl-load.c
> @@ -1368,7 +1368,11 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
>        check_consistency ();
>  #endif
>  
> +#if PTHREAD_IN_LIBC
> +      errval = _dl_make_stacks_executable (stack_endp);
> +#else
>        errval = (*GL(dl_make_stack_executable_hook)) (stack_endp);
> +#endif
>        if (errval)
>  	{
>  	  errstring = N_("\
> diff --git a/elf/dl-support.c b/elf/dl-support.c
> index 580b0202ad..dfc9ab760e 100644
> --- a/elf/dl-support.c
> +++ b/elf/dl-support.c
> @@ -183,12 +183,6 @@ uint64_t _dl_hwcap_mask __attribute__ ((nocommon));
>   * executable but this isn't true for all platforms.  */
>  ElfW(Word) _dl_stack_flags = DEFAULT_STACK_PERMS;
>  
> -/* If loading a shared object requires that we make the stack executable
> -   when it was not, we do it by calling this function.
> -   It returns an errno code or zero on success.  */
> -int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable;
> -
> -
>  #if THREAD_GSCOPE_IN_TCB
>  list_t _dl_stack_used;
>  list_t _dl_stack_user;
> @@ -197,6 +191,10 @@ size_t _dl_stack_cache_actsize;
>  uintptr_t _dl_in_flight_stack;
>  int _dl_stack_cache_lock;
>  #else
> +/* If loading a shared object requires that we make the stack executable
> +   when it was not, we do it by calling this function.
> +   It returns an errno code or zero on success.  */
> +int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable;
>  int _dl_thread_gscope_count;
>  void (*_dl_init_static_tls) (struct link_map *) = &_dl_nothread_init_static_tls;
>  #endif
> diff --git a/elf/rtld.c b/elf/rtld.c
> index 1255d5cc7d..fbbd60b446 100644
> --- a/elf/rtld.c
> +++ b/elf/rtld.c
> @@ -1125,9 +1125,11 @@ dl_main (const ElfW(Phdr) *phdr,
>  
>    __tls_pre_init_tp ();
>  
> +#if !PTHREAD_IN_LIBC
>    /* The explicit initialization here is cheaper than processing the reloc
>       in the _rtld_local definition's initializer.  */
>    GL(dl_make_stack_executable_hook) = &_dl_make_stack_executable;
> +#endif
>  
>    /* Process the environment variable which control the behaviour.  */
>    process_envvars (&state);
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 46089163f4..12cd1058d4 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -291,31 +291,6 @@ queue_stack (struct pthread *stack)
>      free_stacks (stack_cache_maxsize);
>  }
>  
> -
> -static int
> -change_stack_perm (struct pthread *pd)
> -{
> -#ifdef NEED_SEPARATE_REGISTER_STACK
> -  size_t pagemask = __getpagesize () - 1;
> -  void *stack = (pd->stackblock
> -		 + (((((pd->stackblock_size - pd->guardsize) / 2)
> -		      & pagemask) + pd->guardsize) & pagemask));
> -  size_t len = pd->stackblock + pd->stackblock_size - stack;
> -#elif _STACK_GROWS_DOWN
> -  void *stack = pd->stackblock + pd->guardsize;
> -  size_t len = pd->stackblock_size - pd->guardsize;
> -#elif _STACK_GROWS_UP
> -  void *stack = pd->stackblock;
> -  size_t len = (uintptr_t) pd - pd->guardsize - (uintptr_t) pd->stackblock;
> -#else
> -# error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP"
> -#endif
> -  if (__mprotect (stack, len, PROT_READ | PROT_WRITE | PROT_EXEC) != 0)
> -    return errno;
> -
> -  return 0;
> -}
> -
>  /* Return the guard page position on allocated stack.  */
>  static inline char *
>  __attribute ((always_inline))
> @@ -625,7 +600,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
>  	  if (__builtin_expect ((GL(dl_stack_flags) & PF_X) != 0
>  				&& (prot & PROT_EXEC) == 0, 0))
>  	    {
> -	      int err = change_stack_perm (pd);
> +	      int err = __nptl_change_stack_perm (pd);
>  	      if (err != 0)
>  		{
>  		  /* Free the stack memory we just allocated.  */
> @@ -780,42 +755,6 @@ __deallocate_stack (struct pthread *pd)
>    lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE);
>  }
>  
> -
> -int
> -__make_stacks_executable (void **stack_endp)
> -{
> -  /* First the main thread's stack.  */
> -  int err = _dl_make_stack_executable (stack_endp);
> -  if (err != 0)
> -    return err;
> -
> -  lll_lock (GL (dl_stack_cache_lock), LLL_PRIVATE);
> -
> -  list_t *runp;
> -  list_for_each (runp, &GL (dl_stack_used))
> -    {
> -      err = change_stack_perm (list_entry (runp, struct pthread, list));
> -      if (err != 0)
> -	break;
> -    }
> -
> -  /* Also change the permission for the currently unused stacks.  This
> -     might be wasted time but better spend it here than adding a check
> -     in the fast path.  */
> -  if (err == 0)
> -    list_for_each (runp, &GL (dl_stack_cache))
> -      {
> -	err = change_stack_perm (list_entry (runp, struct pthread, list));
> -	if (err != 0)
> -	  break;
> -      }
> -
> -  lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE);
> -
> -  return err;
> -}
> -
> -
>  /* In case of a fork() call the memory allocation in the child will be
>     the same but only one thread is running.  All stacks except that of
>     the one running thread are not used anymore.  We have to recycle
> diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c
> index 2fb1117f3e..4c89e7a792 100644
> --- a/nptl/nptl-init.c
> +++ b/nptl/nptl-init.c
> @@ -173,10 +173,6 @@ __pthread_initialize_minimal_internal (void)
>    __default_pthread_attr.internal.guardsize = GLRO (dl_pagesize);
>    lll_unlock (__default_pthread_attr_lock, LLL_PRIVATE);
>  
> -#ifdef SHARED
> -  GL(dl_make_stack_executable_hook) = &__make_stacks_executable;
> -#endif
> -
>    /* Register the fork generation counter with the libc.  */
>    __libc_pthread_init (__reclaim_stacks);
>  }
> diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
> index 8ab247f977..3a6b436400 100644
> --- a/nptl/pthreadP.h
> +++ b/nptl/pthreadP.h
> @@ -335,8 +335,11 @@ extern void __deallocate_stack (struct pthread *pd) attribute_hidden;
>     function also re-initializes the lock for the stack cache.  */
>  extern void __reclaim_stacks (void) attribute_hidden;
>  
> -/* Make all threads's stacks executable.  */
> -extern int __make_stacks_executable (void **stack_endp) attribute_hidden;
> +/* Change the permissions of a thread stack.  Called from
> +   _dl_make_stacks_executable and pthread_create.  */
> +int
> +__nptl_change_stack_perm (struct pthread *pd);
> +rtld_hidden_proto (__nptl_change_stack_perm)
>  
>  /* longjmp handling.  */
>  extern void __pthread_cleanup_upto (__jmp_buf target, char *targetframe);
> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
> index 81cce2e4d5..8426b5cbd8 100644
> --- a/sysdeps/generic/ldsodefs.h
> +++ b/sysdeps/generic/ldsodefs.h
> @@ -416,10 +416,12 @@ struct rtld_global
>  #endif
>  #include <dl-procruntime.c>
>  
> +#if !PTHREAD_IN_LIBC
>    /* If loading a shared object requires that we make the stack executable
>       when it was not, we do it by calling this function.
>       It returns an errno code or zero on success.  */
>    EXTERN int (*_dl_make_stack_executable_hook) (void **);
> +#endif
>  
>    /* Prevailing state of the stack, PF_X indicating it's executable.  */
>    EXTERN ElfW(Word) _dl_stack_flags;
> @@ -717,10 +719,17 @@ extern const ElfW(Phdr) *_dl_phdr;
>  extern size_t _dl_phnum;
>  #endif
>  
> +#if PTHREAD_IN_LIBC
> +/* This function changes the permissions of all stacks (not just those
> +   of the main stack).  */
> +int _dl_make_stacks_executable (void **stack_endp) attribute_hidden;
> +#else
>  /* This is the initial value of GL(dl_make_stack_executable_hook).
> -   A threads library can change it.  */
> +   A threads library can change it.  The ld.so implementation changes
> +   the permissions of the main stack only.  */
>  extern int _dl_make_stack_executable (void **stack_endp);
>  rtld_hidden_proto (_dl_make_stack_executable)
> +#endif
>  
>  /* Variable pointing to the end of the stack (or close to it).  This value
>     must be constant over the runtime of the application.  Some programs
> diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
> index c35f783e2a..220bb2dffe 100644
> --- a/sysdeps/unix/sysv/linux/Versions
> +++ b/sysdeps/unix/sysv/linux/Versions
> @@ -181,3 +181,9 @@ libc {
>      __netlink_assert_response;
>    }
>  }
> +
> +ld {
> +  GLIBC_PRIVATE {
> +    __nptl_change_stack_perm;
> +  }
> +}
> diff --git a/sysdeps/unix/sysv/linux/dl-execstack.c b/sysdeps/unix/sysv/linux/dl-execstack.c
> index 3339138c42..e2449d1890 100644
> --- a/sysdeps/unix/sysv/linux/dl-execstack.c
> +++ b/sysdeps/unix/sysv/linux/dl-execstack.c
> @@ -16,20 +16,21 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> -#include <ldsodefs.h>
> -#include <sys/mman.h>
>  #include <errno.h>
> +#include <ldsodefs.h>
>  #include <libintl.h>
> -#include <stdbool.h>
> +#include <list.h>
> +#include <nptl/pthreadP.h>
>  #include <stackinfo.h>
> +#include <stdbool.h>
> +#include <sys/mman.h>
>  #include <sysdep.h>
> -
> +#include <unistd.h>
>  
>  extern int __stack_prot attribute_relro attribute_hidden;
>  
> -
> -int
> -_dl_make_stack_executable (void **stack_endp)
> +static int
> +make_main_stack_executable (void **stack_endp)
>  {
>    /* This gives us the highest/lowest page that needs to be changed.  */
>    uintptr_t page = ((uintptr_t) *stack_endp
> @@ -56,4 +57,63 @@ _dl_make_stack_executable (void **stack_endp)
>  
>    return result;
>  }
> -rtld_hidden_def (_dl_make_stack_executable)
> +
> +int
> +_dl_make_stacks_executable (void **stack_endp)
> +{
> +  /* First the main thread's stack.  */
> +  int err = make_main_stack_executable (stack_endp);
> +  if (err != 0)
> +    return err;
> +
> +  lll_lock (GL (dl_stack_cache_lock), LLL_PRIVATE);
> +
> +  list_t *runp;
> +  list_for_each (runp, &GL (dl_stack_used))
> +    {
> +      err = __nptl_change_stack_perm (list_entry (runp, struct pthread, list));
> +      if (err != 0)
> +	break;
> +    }
> +
> +  /* Also change the permission for the currently unused stacks.  This
> +     might be wasted time but better spend it here than adding a check
> +     in the fast path.  */
> +  if (err == 0)
> +    list_for_each (runp, &GL (dl_stack_cache))
> +      {
> +	err = __nptl_change_stack_perm (list_entry (runp, struct pthread,
> +						    list));
> +	if (err != 0)
> +	  break;
> +      }
> +
> +  lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE);
> +
> +  return err;
> +}
> +
> +int
> +__nptl_change_stack_perm (struct pthread *pd)
> +{
> +#ifdef NEED_SEPARATE_REGISTER_STACK
> +  size_t pagemask = __getpagesize () - 1;
> +  void *stack = (pd->stackblock
> +		 + (((((pd->stackblock_size - pd->guardsize) / 2)
> +		      & pagemask) + pd->guardsize) & pagemask));
> +  size_t len = pd->stackblock + pd->stackblock_size - stack;
> +#elif _STACK_GROWS_DOWN
> +  void *stack = pd->stackblock + pd->guardsize;
> +  size_t len = pd->stackblock_size - pd->guardsize;
> +#elif _STACK_GROWS_UP
> +  void *stack = pd->stackblock;
> +  size_t len = (uintptr_t) pd - pd->guardsize - (uintptr_t) pd->stackblock;
> +#else
> +# error "Define either _STACK_GROWS_DOWN or _STACK_GROWS_UP"
> +#endif
> +  if (__mprotect (stack, len, PROT_READ | PROT_WRITE | PROT_EXEC) != 0)
> +    return errno;
> +
> +  return 0;
> +}
> +rtld_hidden_def (__nptl_change_stack_perm)
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 09/13] nptl: Simplify the change_stack_perm calling convention
  2021-05-06 18:11 ` [PATCH 09/13] nptl: Simplify the change_stack_perm calling convention Florian Weimer
@ 2021-05-09 21:42   ` Carlos O'Donell
  0 siblings, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:42 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:11 PM, Florian Weimer via Libc-alpha wrote:
> Only ia64 needs the page mask, and it is straightforward
> to compute the value within the function itself.

LGTM.

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>


> ---
>  nptl/allocatestack.c | 29 +++++------------------------
>  1 file changed, 5 insertions(+), 24 deletions(-)
> 
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 71cfa874d1..46089163f4 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -293,13 +293,10 @@ queue_stack (struct pthread *stack)
>  
>  
>  static int
> -change_stack_perm (struct pthread *pd
> -#ifdef NEED_SEPARATE_REGISTER_STACK
> -		   , size_t pagemask
> -#endif
> -		   )
> +change_stack_perm (struct pthread *pd)
>  {
>  #ifdef NEED_SEPARATE_REGISTER_STACK
> +  size_t pagemask = __getpagesize () - 1;
>    void *stack = (pd->stackblock
>  		 + (((((pd->stackblock_size - pd->guardsize) / 2)
>  		      & pagemask) + pd->guardsize) & pagemask));
> @@ -628,11 +625,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
>  	  if (__builtin_expect ((GL(dl_stack_flags) & PF_X) != 0
>  				&& (prot & PROT_EXEC) == 0, 0))
>  	    {
> -	      int err = change_stack_perm (pd
> -#ifdef NEED_SEPARATE_REGISTER_STACK
> -					   , ~pagesize_m1
> -#endif
> -					   );
> +	      int err = change_stack_perm (pd);
>  	      if (err != 0)
>  		{
>  		  /* Free the stack memory we just allocated.  */
> @@ -796,20 +789,12 @@ __make_stacks_executable (void **stack_endp)
>    if (err != 0)
>      return err;
>  
> -#ifdef NEED_SEPARATE_REGISTER_STACK
> -  const size_t pagemask = ~(__getpagesize () - 1);
> -#endif
> -
>    lll_lock (GL (dl_stack_cache_lock), LLL_PRIVATE);
>  
>    list_t *runp;
>    list_for_each (runp, &GL (dl_stack_used))
>      {
> -      err = change_stack_perm (list_entry (runp, struct pthread, list)
> -#ifdef NEED_SEPARATE_REGISTER_STACK
> -			       , pagemask
> -#endif
> -			       );
> +      err = change_stack_perm (list_entry (runp, struct pthread, list));
>        if (err != 0)
>  	break;
>      }
> @@ -820,11 +805,7 @@ __make_stacks_executable (void **stack_endp)
>    if (err == 0)
>      list_for_each (runp, &GL (dl_stack_cache))
>        {
> -	err = change_stack_perm (list_entry (runp, struct pthread, list)
> -#ifdef NEED_SEPARATE_REGISTER_STACK
> -				 , pagemask
> -#endif
> -				 );
> +	err = change_stack_perm (list_entry (runp, struct pthread, list));
>  	if (err != 0)
>  	  break;
>        }
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/13] nptl: Move more stack management variables into _rtld_global
  2021-05-06 18:10 ` [PATCH 08/13] nptl: Move more stack management variables into _rtld_global Florian Weimer
@ 2021-05-09 21:42   ` Carlos O'Donell
  0 siblings, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:42 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:10 PM, Florian Weimer via Libc-alpha wrote:
> Permissions of the cached stacks may have to be updated if an object
> is loaded that requires executable stacks, so the dynamic loader
> needs to know about these cached stacks.
> 
> The move of in_flight_stack and stack_cache_actsize is a requirement for
> merging __reclaim_stacks into the fork implementation in libc.

LGTM and makes logical sense to move the data.

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>


> ---
>  elf/dl-support.c              |  3 +++
>  nptl/allocatestack.c          | 51 +++++++++++++++--------------------
>  sysdeps/generic/ldsodefs.h    | 11 ++++++++
>  sysdeps/nptl/dl-tls_init_tp.c |  1 +
>  4 files changed, 36 insertions(+), 30 deletions(-)
> 
> diff --git a/elf/dl-support.c b/elf/dl-support.c
> index f966a2e7cd..580b0202ad 100644
> --- a/elf/dl-support.c
> +++ b/elf/dl-support.c
> @@ -192,6 +192,9 @@ int (*_dl_make_stack_executable_hook) (void **) = _dl_make_stack_executable;
>  #if THREAD_GSCOPE_IN_TCB
>  list_t _dl_stack_used;
>  list_t _dl_stack_user;
> +list_t _dl_stack_cache;
> +size_t _dl_stack_cache_actsize;
> +uintptr_t _dl_in_flight_stack;
>  int _dl_stack_cache_lock;
>  #else
>  int _dl_thread_gscope_count;
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 88c49f8154..71cfa874d1 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -103,15 +103,6 @@
>  
>  /* Maximum size in kB of cache.  */
>  static size_t stack_cache_maxsize = 40 * 1024 * 1024; /* 40MiBi by default.  */
> -static size_t stack_cache_actsize;
> -
> -/* List of queued stack frames.  */
> -static LIST_HEAD (stack_cache);
> -
> -/* We need to record what list operations we are going to do so that,
> -   in case of an asynchronous interruption due to a fork() call, we
> -   can correct for the work.  */
> -static uintptr_t in_flight_stack;
>  
>  /* Check whether the stack is still used or not.  */
>  #define FREE_P(descr) ((descr)->tid <= 0)
> @@ -120,7 +111,7 @@ static uintptr_t in_flight_stack;
>  static void
>  stack_list_del (list_t *elem)
>  {
> -  in_flight_stack = (uintptr_t) elem;
> +  GL (dl_in_flight_stack) = (uintptr_t) elem;
>  
>    atomic_write_barrier ();
>  
> @@ -128,14 +119,14 @@ stack_list_del (list_t *elem)
>  
>    atomic_write_barrier ();
>  
> -  in_flight_stack = 0;
> +  GL (dl_in_flight_stack) = 0;
>  }
>  
>  
>  static void
>  stack_list_add (list_t *elem, list_t *list)
>  {
> -  in_flight_stack = (uintptr_t) elem | 1;
> +  GL (dl_in_flight_stack) = (uintptr_t) elem | 1;
>  
>    atomic_write_barrier ();
>  
> @@ -143,7 +134,7 @@ stack_list_add (list_t *elem, list_t *list)
>  
>    atomic_write_barrier ();
>  
> -  in_flight_stack = 0;
> +  GL (dl_in_flight_stack) = 0;
>  }
>  
>  
> @@ -168,7 +159,7 @@ get_cached_stack (size_t *sizep, void **memp)
>       same.  As the very least there are only a few different sizes.
>       Therefore this loop will exit early most of the time with an
>       exact match.  */
> -  list_for_each (entry, &stack_cache)
> +  list_for_each (entry, &GL (dl_stack_cache))
>      {
>        struct pthread *curr;
>  
> @@ -208,7 +199,7 @@ get_cached_stack (size_t *sizep, void **memp)
>    stack_list_add (&result->list, &GL (dl_stack_used));
>  
>    /* And decrease the cache size.  */
> -  stack_cache_actsize -= result->stackblock_size;
> +  GL (dl_stack_cache_actsize) -= result->stackblock_size;
>  
>    /* Release the lock early.  */
>    lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE);
> @@ -249,7 +240,7 @@ free_stacks (size_t limit)
>    list_t *prev;
>  
>    /* Search from the end of the list.  */
> -  list_for_each_prev_safe (entry, prev, &stack_cache)
> +  list_for_each_prev_safe (entry, prev, &GL (dl_stack_cache))
>      {
>        struct pthread *curr;
>  
> @@ -260,7 +251,7 @@ free_stacks (size_t limit)
>  	  stack_list_del (entry);
>  
>  	  /* Account for the freed memory.  */
> -	  stack_cache_actsize -= curr->stackblock_size;
> +	  GL (dl_stack_cache_actsize) -= curr->stackblock_size;
>  
>  	  /* Free the memory associated with the ELF TLS.  */
>  	  _dl_deallocate_tls (TLS_TPADJ (curr), false);
> @@ -271,7 +262,7 @@ free_stacks (size_t limit)
>  	    abort ();
>  
>  	  /* Maybe we have freed enough.  */
> -	  if (stack_cache_actsize <= limit)
> +	  if (GL (dl_stack_cache_actsize) <= limit)
>  	    break;
>  	}
>      }
> @@ -293,10 +284,10 @@ queue_stack (struct pthread *stack)
>    /* We unconditionally add the stack to the list.  The memory may
>       still be in use but it will not be reused until the kernel marks
>       the stack as not used anymore.  */
> -  stack_list_add (&stack->list, &stack_cache);
> +  stack_list_add (&stack->list, &GL (dl_stack_cache));
>  
> -  stack_cache_actsize += stack->stackblock_size;
> -  if (__glibc_unlikely (stack_cache_actsize > stack_cache_maxsize))
> +  GL (dl_stack_cache_actsize) += stack->stackblock_size;
> +  if (__glibc_unlikely (GL (dl_stack_cache_actsize) > stack_cache_maxsize))
>      free_stacks (stack_cache_maxsize);
>  }
>  
> @@ -827,7 +818,7 @@ __make_stacks_executable (void **stack_endp)
>       might be wasted time but better spend it here than adding a check
>       in the fast path.  */
>    if (err == 0)
> -    list_for_each (runp, &stack_cache)
> +    list_for_each (runp, &GL (dl_stack_cache))
>        {
>  	err = change_stack_perm (list_entry (runp, struct pthread, list)
>  #ifdef NEED_SEPARATE_REGISTER_STACK
> @@ -857,10 +848,10 @@ __reclaim_stacks (void)
>       we have to be aware that we might have interrupted a list
>       operation.  */
>  
> -  if (in_flight_stack != 0)
> +  if (GL (dl_in_flight_stack) != 0)
>      {
> -      bool add_p = in_flight_stack & 1;
> -      list_t *elem = (list_t *) (in_flight_stack & ~(uintptr_t) 1);
> +      bool add_p = GL (dl_in_flight_stack) & 1;
> +      list_t *elem = (list_t *) (GL (dl_in_flight_stack) & ~(uintptr_t) 1);
>  
>        if (add_p)
>  	{
> @@ -871,8 +862,8 @@ __reclaim_stacks (void)
>  
>  	  if (GL (dl_stack_used).next->prev != &GL (dl_stack_used))
>  	    l = &GL (dl_stack_used);
> -	  else if (stack_cache.next->prev != &stack_cache)
> -	    l = &stack_cache;
> +	  else if (GL (dl_stack_cache).next->prev != &GL (dl_stack_cache))
> +	    l = &GL (dl_stack_cache);
>  
>  	  if (l != NULL)
>  	    {
> @@ -901,7 +892,7 @@ __reclaim_stacks (void)
>  	  curp->tid = 0;
>  
>  	  /* Account for the size of the stack.  */
> -	  stack_cache_actsize += curp->stackblock_size;
> +	  GL (dl_stack_cache_actsize) += curp->stackblock_size;
>  
>  	  if (curp->specific_used)
>  	    {
> @@ -926,7 +917,7 @@ __reclaim_stacks (void)
>      }
>  
>    /* Add the stack of all running threads to the cache.  */
> -  list_splice (&GL (dl_stack_used), &stack_cache);
> +  list_splice (&GL (dl_stack_used), &GL (dl_stack_cache));
>  
>    /* Remove the entry for the current thread to from the cache list
>       and add it to the list of running threads.  Which of the two
> @@ -945,7 +936,7 @@ __reclaim_stacks (void)
>    /* There is one thread running.  */
>    __nptl_nthreads = 1;
>  
> -  in_flight_stack = 0;
> +  GL (dl_in_flight_stack) = 0;
>  
>    /* Initialize locks.  */
>    GL (dl_stack_cache_lock) = LLL_LOCK_INITIALIZER;
> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
> index ee851ac789..81cce2e4d5 100644
> --- a/sysdeps/generic/ldsodefs.h
> +++ b/sysdeps/generic/ldsodefs.h
> @@ -481,6 +481,17 @@ struct rtld_global
>    /* List of thread stacks that were allocated by the application.  */
>    EXTERN list_t _dl_stack_user;
>  
> +  /* List of queued thread stacks.  */
> +  EXTERN list_t _dl_stack_cache;
> +
> +  /* Total size of all stacks in the cache (sum over stackblock_size).  */
> +  EXTERN size_t _dl_stack_cache_actsize;
> +
> +  /* We need to record what list operations we are going to do so
> +     that, in case of an asynchronous interruption due to a fork()
> +     call, we can correct for the work.  */
> +  EXTERN uintptr_t _dl_in_flight_stack;
> +
>    /* Mutex protecting the stack lists.  */
>    EXTERN int _dl_stack_cache_lock;
>  #else
> diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
> index cb29222727..f1aaa5aa9d 100644
> --- a/sysdeps/nptl/dl-tls_init_tp.c
> +++ b/sysdeps/nptl/dl-tls_init_tp.c
> @@ -43,6 +43,7 @@ __tls_pre_init_tp (void)
>       initialized.  */
>    INIT_LIST_HEAD (&GL (dl_stack_used));
>    INIT_LIST_HEAD (&GL (dl_stack_user));
> +  INIT_LIST_HEAD (&GL (dl_stack_cache));
>  
>  #ifdef SHARED
>    ___rtld_mutex_lock = rtld_mutex_dummy;
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/13] Linux: Simplify and fix the definition of SINGLE_THREAD_P
  2021-05-06 18:10 ` [PATCH 05/13] Linux: Simplify and fix the definition of SINGLE_THREAD_P Florian Weimer
@ 2021-05-09 21:42   ` Carlos O'Donell
  0 siblings, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:42 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:10 PM, Florian Weimer via Libc-alpha wrote:
> Always use __libc_multiple_threads if beneficial, and do not assume
> the the dynamic loader is single-threaded.  This assumption could
> become incorrect by accident once more code is moved from libpthread
> into it.  The previous commit introducing the
> NO_SYSCALL_CANCEL_CHECKING macro enables this change.
> 
> Do not hint to the compiler that multi-threaded programs are unlikely
> (which is not quite true anymore).

LGTM.

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> ---
>  sysdeps/unix/sysv/linux/single-thread.h | 36 +++++--------------------
>  1 file changed, 7 insertions(+), 29 deletions(-)
> 
> diff --git a/sysdeps/unix/sysv/linux/single-thread.h b/sysdeps/unix/sysv/linux/single-thread.h
> index 841f8c69d5..2fac8dcc11 100644
> --- a/sysdeps/unix/sysv/linux/single-thread.h
> +++ b/sysdeps/unix/sysv/linux/single-thread.h
> @@ -32,35 +32,13 @@ extern int __libc_multiple_threads;
>  libc_hidden_proto (__libc_multiple_threads)
>  #endif
>  
> -#ifdef SINGLE_THREAD_BY_GLOBAL
> -# if IS_IN (libc)
> -#  define SINGLE_THREAD_P \
> -  __glibc_likely (__libc_multiple_threads == 0)
> -# elif IS_IN (libpthread)
> -extern int __pthread_multiple_threads;
> -#  define SINGLE_THREAD_P \
> -  __glibc_likely (__pthread_multiple_threads == 0)
> -# elif IS_IN (librt)
> -#   define SINGLE_THREAD_P					\
> -  __glibc_likely (THREAD_GETMEM (THREAD_SELF,			\
> -				 header.multiple_threads) == 0)
> -# else
> -/* For rtld, et cetera.  */
> -#  define SINGLE_THREAD_P (1)
> -# endif
> -#else /* SINGLE_THREAD_BY_GLOBAL  */
> -# if IS_IN (libc) || IS_IN (libpthread) || IS_IN (librt)
> -#   define SINGLE_THREAD_P					\
> -  __glibc_likely (THREAD_GETMEM (THREAD_SELF,			\
> -				 header.multiple_threads) == 0)
> -# else
> -/* For rtld, et cetera.  */
> -#  define SINGLE_THREAD_P (1)
> -# endif
> -#endif /* SINGLE_THREAD_BY_GLOBAL  */
> +#if !defined SINGLE_THREAD_BY_GLOBAL || IS_IN (rtld)
> +# define SINGLE_THREAD_P \
> +  (THREAD_GETMEM (THREAD_SELF, header.multiple_threads) == 0)
> +#else
> +# define SINGLE_THREAD_P (__libc_multiple_threads == 0)
> +#endif
>  
> -#define RTLD_SINGLE_THREAD_P \
> -  __glibc_likely (THREAD_GETMEM (THREAD_SELF, \
> -				 header.multiple_threads) == 0)
> +#define RTLD_SINGLE_THREAD_P SINGLE_THREAD_P
>  
>  #endif /* _SINGLE_THREAD_H  */
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/13] elf: Introduce __tls_pre_init_tp
  2021-05-06 18:10 ` [PATCH 07/13] elf: Introduce __tls_pre_init_tp Florian Weimer
@ 2021-05-09 21:42   ` Carlos O'Donell
  0 siblings, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:42 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:10 PM, Florian Weimer via Libc-alpha wrote:
> This is an early variant of __tls_init_tp, primarily for initializing
> thread-related elements of _rtld_global/GL.
> 
> Some existing initialization code not needed for NPTL is moved into
> the generic version of this function.

LGTM.

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> ---
>  csu/libc-tls.c                |  2 ++
>  elf/dl-mutex.c                |  2 +-
>  elf/dl-tls_init_tp.c          | 29 ++++++++++++++++++++++++++
>  elf/rtld.c                    | 38 +----------------------------------
>  sysdeps/generic/ldsodefs.h    |  4 ++++
>  sysdeps/nptl/dl-tls_init_tp.c | 25 +++++++++++++++++++++--
>  6 files changed, 60 insertions(+), 40 deletions(-)
> 
> diff --git a/csu/libc-tls.c b/csu/libc-tls.c
> index 22f8e4838d..5515204863 100644
> --- a/csu/libc-tls.c
> +++ b/csu/libc-tls.c
> @@ -114,6 +114,8 @@ __libc_setup_tls (void)
>  
>    struct link_map *main_map = GL(dl_ns)[LM_ID_BASE]._ns_loaded;
>  
> +  __tls_pre_init_tp ();
> +
>    /* Look through the TLS segment if there is any.  */
>    if (_dl_phdr != NULL)
>      for (phdr = _dl_phdr; phdr < &_dl_phdr[_dl_phnum]; ++phdr)
> diff --git a/elf/dl-mutex.c b/elf/dl-mutex.c
> index 2cd9d49c2e..ae1d8a84f0 100644
> --- a/elf/dl-mutex.c
> +++ b/elf/dl-mutex.c
> @@ -16,4 +16,4 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> -/* The generic version initialization happpens in dl_main.  */
> +/* Initialization happens in __tls_pre_init_tp in dl-tls_init_tp.c.  */
> diff --git a/elf/dl-tls_init_tp.c b/elf/dl-tls_init_tp.c
> index 728cd84c00..d84adc992c 100644
> --- a/elf/dl-tls_init_tp.c
> +++ b/elf/dl-tls_init_tp.c
> @@ -18,6 +18,35 @@
>  
>  #include <ldsodefs.h>
>  
> +#if defined SHARED && defined _LIBC_REENTRANT \
> +    && defined __rtld_lock_default_lock_recursive
> +static void
> +rtld_lock_default_lock_recursive (void *lock)
> +{
> +  __rtld_lock_default_lock_recursive (lock);
> +}
> +
> +static void
> +rtld_lock_default_unlock_recursive (void *lock)
> +{
> +  __rtld_lock_default_unlock_recursive (lock);
> +}
> +#endif
> +
> +void
> +__tls_pre_init_tp (void)
> +{
> +#if !THREAD_GSCOPE_IN_TCB
> +  GL(dl_init_static_tls) = &_dl_nothread_init_static_tls;

OK.

> +#endif
> +
> +#if defined SHARED && defined _LIBC_REENTRANT \
> +    && defined __rtld_lock_default_lock_recursive
> +  GL(dl_rtld_lock_recursive) = rtld_lock_default_lock_recursive;
> +  GL(dl_rtld_unlock_recursive) = rtld_lock_default_unlock_recursive;
> +#endif
> +}
> +
>  void
>  __tls_init_tp (void)
>  {
> diff --git a/elf/rtld.c b/elf/rtld.c
> index a359167f8a..1255d5cc7d 100644
> --- a/elf/rtld.c
> +++ b/elf/rtld.c
> @@ -843,30 +843,6 @@ ERROR: ld.so: object '%s' from %s cannot be preloaded (%s): ignored.\n",
>    return 0;
>  }
>  
> -#if defined SHARED && defined _LIBC_REENTRANT \
> -    && defined __rtld_lock_default_lock_recursive
> -static void
> -rtld_lock_default_lock_recursive (void *lock)
> -{
> -  __rtld_lock_default_lock_recursive (lock);
> -}
> -
> -static void
> -rtld_lock_default_unlock_recursive (void *lock)
> -{
> -  __rtld_lock_default_unlock_recursive (lock);
> -}
> -#endif
> -#if PTHREAD_IN_LIBC
> -/* Dummy implementation.  See __rtld_mutex_init.  */
> -static int
> -rtld_mutex_dummy (pthread_mutex_t *lock)
> -{
> -  return 0;
> -}
> -#endif
> -
> -
>  static void
>  security_init (void)
>  {
> @@ -1147,19 +1123,7 @@ dl_main (const ElfW(Phdr) *phdr,
>    struct dl_main_state state;
>    dl_main_state_init (&state);
>  
> -#if !THREAD_GSCOPE_IN_TCB
> -  GL(dl_init_static_tls) = &_dl_nothread_init_static_tls;
> -#endif
> -
> -#if defined SHARED && defined _LIBC_REENTRANT \
> -    && defined __rtld_lock_default_lock_recursive
> -  GL(dl_rtld_lock_recursive) = rtld_lock_default_lock_recursive;
> -  GL(dl_rtld_unlock_recursive) = rtld_lock_default_unlock_recursive;
> -#endif
> -#if PTHREAD_IN_LIBC
> -  ___rtld_mutex_lock = rtld_mutex_dummy;
> -  ___rtld_mutex_unlock = rtld_mutex_dummy;
> -#endif
> +  __tls_pre_init_tp ();
>  
>    /* The explicit initialization here is cheaper than processing the reloc
>       in the _rtld_local definition's initializer.  */
> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
> index 6d590d1335..ee851ac789 100644
> --- a/sysdeps/generic/ldsodefs.h
> +++ b/sysdeps/generic/ldsodefs.h
> @@ -1165,6 +1165,10 @@ extern void _dl_determine_tlsoffset (void) attribute_hidden;
>     number of audit modules are loaded.  */
>  void _dl_tls_static_surplus_init (size_t naudit) attribute_hidden;
>  
> +/* This function is called very early from dl_main to set up TLS and
> +   other thread-related data structures.  */
> +void __tls_pre_init_tp (void) attribute_hidden;
> +
>  /* This function is called after processor-specific initialization of
>     the TCB and thread pointer via TLS_INIT_TP, to complete very early
>     initialization of the thread library.  */
> diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
> index 05d2b6cfcc..cb29222727 100644
> --- a/sysdeps/nptl/dl-tls_init_tp.c
> +++ b/sysdeps/nptl/dl-tls_init_tp.c
> @@ -27,12 +27,33 @@ bool __nptl_set_robust_list_avail __attribute__ ((nocommon));
>  rtld_hidden_data_def (__nptl_set_robust_list_avail)
>  #endif
>  
> +#ifdef SHARED
> +/* Dummy implementation.  See __rtld_mutex_init.  */
> +static int
> +rtld_mutex_dummy (pthread_mutex_t *lock)
> +{
> +  return 0;
> +}
> +#endif
> +
>  void
> -__tls_init_tp (void)
> +__tls_pre_init_tp (void)
>  {
> -  /* Set up thread stack list management.  */
> +  /* The list data structures are not consistent until
> +     initialized.  */
>    INIT_LIST_HEAD (&GL (dl_stack_used));
>    INIT_LIST_HEAD (&GL (dl_stack_user));
> +
> +#ifdef SHARED
> +  ___rtld_mutex_lock = rtld_mutex_dummy;
> +  ___rtld_mutex_unlock = rtld_mutex_dummy;
> +#endif
> +}
> +
> +void
> +__tls_init_tp (void)
> +{
> +  /* Set up thread stack list management.  */
>    list_add (&THREAD_SELF->list, &GL (dl_stack_user));
>  
>     /* Early initialization of the TCB.   */
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/13] Linux: Explicitly disable cancellation checking in the dynamic loader
  2021-05-06 18:10 ` [PATCH 04/13] Linux: Explicitly disable cancellation checking in the dynamic loader Florian Weimer
@ 2021-05-09 21:42   ` Carlos O'Donell
  0 siblings, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:42 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:10 PM, Florian Weimer via Libc-alpha wrote:
> Historically, SINGLE_THREAD_P is defined to 1 in the dynamic loader.
> This has the side effect of disabling cancellation points.  In order
> to enable future use of SINGLE_THREAD_P for single-thread
> optimizations in the dynamic loader (which becomes important once
> more code is moved from libpthread), introduce a new
> NO_SYSCALL_CANCEL_CHECKING macro which is always 1 for IS_IN (rtld),
> indepdently of the actual SINGLE_THREAD_P value.

LGTM.

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> ---
>  sysdeps/unix/sysdep.h | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/sysdeps/unix/sysdep.h b/sysdeps/unix/sysdep.h
> index 2fa6bfa135..664d093c05 100644
> --- a/sysdeps/unix/sysdep.h
> +++ b/sysdeps/unix/sysdep.h
> @@ -88,10 +88,17 @@
>  #define INLINE_SYSCALL_CALL(...) \
>    __INLINE_SYSCALL_DISP (__INLINE_SYSCALL, __VA_ARGS__)
>  
> +#if IS_IN (rtld)
> +/* All cancellation points are compiled out in the dynamic loader.  */
> +# define NO_SYSCALL_CANCEL_CHECKING 1
> +#else
> +# define NO_SYSCALL_CANCEL_CHECKING SINGLE_THREAD_P
> +#endif
> +
>  #define SYSCALL_CANCEL(...) \
>    ({									     \
>      long int sc_ret;							     \
> -    if (SINGLE_THREAD_P) 						     \
> +    if (NO_SYSCALL_CANCEL_CHECKING)					     \
>        sc_ret = INLINE_SYSCALL_CALL (__VA_ARGS__); 			     \
>      else								     \
>        {									     \
> @@ -107,7 +114,7 @@
>  #define INTERNAL_SYSCALL_CANCEL(...) \
>    ({									     \
>      long int sc_ret;							     \
> -    if (SINGLE_THREAD_P) 						     \
> +    if (NO_SYSCALL_CANCEL_CHECKING) 					     \
>        sc_ret = INTERNAL_SYSCALL_CALL (__VA_ARGS__); 			     \
>      else								     \
>        {									     \
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/13] nptl: Export __libc_multiple_threads from libc as an internal symbol
  2021-05-06 18:10 ` [PATCH 03/13] nptl: Export __libc_multiple_threads from libc as an internal symbol Florian Weimer
@ 2021-05-09 21:42   ` Carlos O'Donell
  0 siblings, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:42 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:10 PM, Florian Weimer via Libc-alpha wrote:
> This allows the elimination of the __libc_multiple_threads_ptr
> variable in libpthread and its initialization procedure.

LGTM.

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> ---
>  nptl/Versions                           |  1 +
>  nptl/allocatestack.c                    |  4 ++--
>  nptl/libc_multiple_threads.c            |  3 ++-
>  nptl/libc_pthread_init.c                | 11 -----------
>  nptl/nptl-init.c                        | 10 +---------
>  nptl/pthreadP.h                         |  6 +-----
>  nptl/pthread_cancel.c                   |  2 +-
>  sysdeps/unix/sysv/linux/single-thread.h |  6 +++++-
>  8 files changed, 13 insertions(+), 30 deletions(-)
> 
> diff --git a/nptl/Versions b/nptl/Versions
> index f950b77969..fb15a7e8eb 100644
> --- a/nptl/Versions
> +++ b/nptl/Versions
> @@ -305,6 +305,7 @@ libc {
>      __libc_cleanup_pop_restore;
>      __libc_cleanup_push_defer;
>      __libc_dl_error_tsd;
> +    __libc_multiple_threads;
>      __libc_pthread_init;
>      __lll_clocklock_elision;
>      __lll_lock_elision;
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 8aaba088b1..059786192e 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -477,7 +477,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
>        /* This is at least the second thread.  */
>        pd->header.multiple_threads = 1;
>  #ifndef TLS_MULTIPLE_THREADS_IN_TCB
> -      __pthread_multiple_threads = *__libc_multiple_threads_ptr = 1;
> +      __pthread_multiple_threads = __libc_multiple_threads = 1;
>  #endif
>  
>  #ifdef NEED_DL_SYSINFO
> @@ -598,7 +598,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
>  	  /* This is at least the second thread.  */
>  	  pd->header.multiple_threads = 1;
>  #ifndef TLS_MULTIPLE_THREADS_IN_TCB
> -	  __pthread_multiple_threads = *__libc_multiple_threads_ptr = 1;
> +	  __pthread_multiple_threads = __libc_multiple_threads = 1;
>  #endif
>  
>  #ifdef NEED_DL_SYSINFO
> diff --git a/nptl/libc_multiple_threads.c b/nptl/libc_multiple_threads.c
> index 60328023cd..a0e7932c26 100644
> --- a/nptl/libc_multiple_threads.c
> +++ b/nptl/libc_multiple_threads.c
> @@ -23,6 +23,7 @@
>  /* Variable set to a nonzero value either if more than one thread runs or ran,
>     or if a single-threaded process is trying to cancel itself.  See
>     nptl/descr.h for more context on the single-threaded process case.  */
> -int __libc_multiple_threads attribute_hidden;
> +int __libc_multiple_threads __attribute__ ((nocommon));
> +libc_hidden_data_def (__libc_multiple_threads)
>  # endif
>  #endif
> diff --git a/nptl/libc_pthread_init.c b/nptl/libc_pthread_init.c
> index 397b83beb6..75f5d28ed6 100644
> --- a/nptl/libc_pthread_init.c
> +++ b/nptl/libc_pthread_init.c
> @@ -27,20 +27,9 @@
>  #include <sysdep.h>
>  #include <ldsodefs.h>
>  
> -
> -#ifdef TLS_MULTIPLE_THREADS_IN_TCB
>  void
> -#else
> -extern int __libc_multiple_threads attribute_hidden;
> -
> -int *
> -#endif
>  __libc_pthread_init (void (*reclaim) (void))
>  {
>    /* Called by a child after fork.  */
>    __register_atfork (NULL, NULL, reclaim, NULL);
> -
> -#ifndef TLS_MULTIPLE_THREADS_IN_TCB
> -  return &__libc_multiple_threads;
> -#endif
>  }
> diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c
> index 2724770533..2fb1117f3e 100644
> --- a/nptl/nptl-init.c
> +++ b/nptl/nptl-init.c
> @@ -37,11 +37,6 @@
>  #include <libc-pointer-arith.h>
>  #include <pthread_mutex_conf.h>
>  
> -#ifndef TLS_MULTIPLE_THREADS_IN_TCB
> -/* Pointer to the corresponding variable in libc.  */
> -int *__libc_multiple_threads_ptr attribute_hidden;
> -#endif
> -
>  /* Size and alignment of static TLS block.  */
>  size_t __static_tls_size;
>  size_t __static_tls_align_m1;
> @@ -183,10 +178,7 @@ __pthread_initialize_minimal_internal (void)
>  #endif
>  
>    /* Register the fork generation counter with the libc.  */
> -#ifndef TLS_MULTIPLE_THREADS_IN_TCB
> -  __libc_multiple_threads_ptr =
> -#endif
> -    __libc_pthread_init (__reclaim_stacks);
> +  __libc_pthread_init (__reclaim_stacks);
>  }
>  strong_alias (__pthread_initialize_minimal_internal,
>  	      __pthread_initialize_minimal)
> diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
> index 6b52ca158e..dd6d6c6342 100644
> --- a/nptl/pthreadP.h
> +++ b/nptl/pthreadP.h
> @@ -368,17 +368,13 @@ extern unsigned long int __fork_generation attribute_hidden;
>  extern unsigned long int *__fork_generation_pointer attribute_hidden;
>  
>  /* Register the generation counter in the libpthread with the libc.  */
> -#ifdef TLS_MULTIPLE_THREADS_IN_TCB
>  extern void __libc_pthread_init (void (*reclaim) (void));
> -#else
> -extern int *__libc_pthread_init (void (*reclaim) (void));
>  
> +#ifndef TLS_MULTIPLE_THREADS_IN_TCB
>  /* Variable set to a nonzero value either if more than one thread runs or ran,
>     or if a single-threaded process is trying to cancel itself.  See
>     nptl/descr.h for more context on the single-threaded process case.  */
>  extern int __pthread_multiple_threads attribute_hidden;
> -/* Pointer to the corresponding variable in libc.  */
> -extern int *__libc_multiple_threads_ptr attribute_hidden;
>  #endif
>  
>  extern size_t __pthread_get_minstack (const pthread_attr_t *attr);
> diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c
> index 060484cdc8..2cab8f0a34 100644
> --- a/nptl/pthread_cancel.c
> +++ b/nptl/pthread_cancel.c
> @@ -90,7 +90,7 @@ __pthread_cancel (pthread_t th)
>  	   points get executed.  */
>  	THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1);
>  #ifndef TLS_MULTIPLE_THREADS_IN_TCB
> -	__pthread_multiple_threads = *__libc_multiple_threads_ptr = 1;
> +	__pthread_multiple_threads = __libc_multiple_threads = 1;
>  #endif
>      }
>    /* Mark the thread as canceled.  This has to be done
> diff --git a/sysdeps/unix/sysv/linux/single-thread.h b/sysdeps/unix/sysv/linux/single-thread.h
> index a28aaed04d..841f8c69d5 100644
> --- a/sysdeps/unix/sysv/linux/single-thread.h
> +++ b/sysdeps/unix/sysv/linux/single-thread.h
> @@ -27,9 +27,13 @@
>     The ABI might define SINGLE_THREAD_BY_GLOBAL to enable the single thread
>     check to use global variables instead of the pthread_t field.  */
>  
> +#ifndef __ASSEMBLER__
> +extern int __libc_multiple_threads;
> +libc_hidden_proto (__libc_multiple_threads)
> +#endif
> +
>  #ifdef SINGLE_THREAD_BY_GLOBAL
>  # if IS_IN (libc)
> -extern int __libc_multiple_threads;
>  #  define SINGLE_THREAD_P \
>    __glibc_likely (__libc_multiple_threads == 0)
>  # elif IS_IN (libpthread)
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 02/13] elf, nptl: Resolve recursive lock implementation early
  2021-05-06 18:09 ` [PATCH v2 02/13] elf, nptl: Resolve recursive lock implementation early Florian Weimer
@ 2021-05-09 21:42   ` Carlos O'Donell
  2021-05-10  5:54     ` Florian Weimer
  0 siblings, 1 reply; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:42 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:09 PM, Florian Weimer via Libc-alpha wrote:
> If libpthread is included in libc, it is not necessary to delay
> initialization of the lock/unlock function pointers until libpthread
> is loaded.  This eliminates two unprotected function pointers
> from _rtld_global and removes some initialization code from
> libpthread.

This version looks good to me, and the early initialization makes it
logically easier to follow when reading the code. Despite the removal
of the unprotected function pointesr in _rtld_global, we still need
some function pointer in order to lookup the function symbols from libc.so
and remember their values, but data placement is harder to discover than
the fixed offset from a public symbol.

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>


> ---
> v2: Rename dl-lock.c into dl-mutex.c and use a sysdeps override instead
>     of a preprocessor conditional.
> 
>  elf/Makefile               |  3 ++-
>  elf/dl-mutex.c             | 19 ++++++++++++++
>  elf/rtld.c                 | 18 +++++++++++++
>  nptl/nptl-init.c           |  9 -------
>  sysdeps/generic/ldsodefs.h | 25 +++++++++++++++++-
>  sysdeps/nptl/dl-mutex.c    | 53 ++++++++++++++++++++++++++++++++++++++
>  sysdeps/nptl/libc-lockP.h  | 17 +++---------
>  7 files changed, 120 insertions(+), 24 deletions(-)
>  create mode 100644 elf/dl-mutex.c
>  create mode 100644 sysdeps/nptl/dl-mutex.c
> 
> diff --git a/elf/Makefile b/elf/Makefile
> index 4f99af626f..d3e909637a 100644
> --- a/elf/Makefile
> +++ b/elf/Makefile
> @@ -66,7 +66,8 @@ elide-routines.os = $(all-dl-routines) dl-support enbl-secure dl-origin \
>  # interpreter and operating independent of libc.
>  rtld-routines	= rtld $(all-dl-routines) dl-sysdep dl-environ dl-minimal \
>    dl-error-minimal dl-conflict dl-hwcaps dl-hwcaps_split dl-hwcaps-subdirs \
> -  dl-usage dl-diagnostics dl-diagnostics-kernel dl-diagnostics-cpu
> +  dl-usage dl-diagnostics dl-diagnostics-kernel dl-diagnostics-cpu \
> +  dl-mutex
>  all-rtld-routines = $(rtld-routines) $(sysdep-rtld-routines)
>  
>  CFLAGS-dl-runtime.c += -fexceptions -fasynchronous-unwind-tables
> diff --git a/elf/dl-mutex.c b/elf/dl-mutex.c
> new file mode 100644
> index 0000000000..2cd9d49c2e
> --- /dev/null
> +++ b/elf/dl-mutex.c
> @@ -0,0 +1,19 @@
> +/* Recursive locking implementation for the dynamic loader.  Generic version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* The generic version initialization happpens in dl_main.  */
> diff --git a/elf/rtld.c b/elf/rtld.c
> index ad325d4c10..a359167f8a 100644
> --- a/elf/rtld.c
> +++ b/elf/rtld.c
> @@ -857,6 +857,14 @@ rtld_lock_default_unlock_recursive (void *lock)
>    __rtld_lock_default_unlock_recursive (lock);
>  }
>  #endif
> +#if PTHREAD_IN_LIBC
> +/* Dummy implementation.  See __rtld_mutex_init.  */
> +static int
> +rtld_mutex_dummy (pthread_mutex_t *lock)
> +{
> +  return 0;
> +}
> +#endif
>  
>  
>  static void
> @@ -1148,6 +1156,10 @@ dl_main (const ElfW(Phdr) *phdr,
>    GL(dl_rtld_lock_recursive) = rtld_lock_default_lock_recursive;
>    GL(dl_rtld_unlock_recursive) = rtld_lock_default_unlock_recursive;
>  #endif
> +#if PTHREAD_IN_LIBC
> +  ___rtld_mutex_lock = rtld_mutex_dummy;
> +  ___rtld_mutex_unlock = rtld_mutex_dummy;
> +#endif
>  
>    /* The explicit initialization here is cheaper than processing the reloc
>       in the _rtld_local definition's initializer.  */
> @@ -2363,6 +2375,9 @@ dl_main (const ElfW(Phdr) *phdr,
>  	 loader.  */
>        __rtld_malloc_init_real (main_map);
>  
> +      /* Likewise for the locking implementation.  */
> +      __rtld_mutex_init ();
> +
>        /* Mark all the objects so we know they have been already relocated.  */
>        for (struct link_map *l = main_map; l != NULL; l = l->l_next)
>  	{
> @@ -2468,6 +2483,9 @@ dl_main (const ElfW(Phdr) *phdr,
>  	 at this point.  */
>        __rtld_malloc_init_real (main_map);
>  
> +      /* Likewise for the locking implementation.  */
> +      __rtld_mutex_init ();
> +
>        RTLD_TIMING_VAR (start);
>        rtld_timer_start (&start);
>  
> diff --git a/nptl/nptl-init.c b/nptl/nptl-init.c
> index fcab5a0904..2724770533 100644
> --- a/nptl/nptl-init.c
> +++ b/nptl/nptl-init.c
> @@ -179,15 +179,6 @@ __pthread_initialize_minimal_internal (void)
>    lll_unlock (__default_pthread_attr_lock, LLL_PRIVATE);
>  
>  #ifdef SHARED
> -  /* Make __rtld_lock_{,un}lock_recursive use pthread_mutex_{,un}lock,
> -     keep the lock count from the ld.so implementation.  */
> -  GL(dl_rtld_lock_recursive) = (void *) __pthread_mutex_lock;
> -  GL(dl_rtld_unlock_recursive) = (void *) __pthread_mutex_unlock;
> -  unsigned int rtld_lock_count = GL(dl_load_lock).mutex.__data.__count;
> -  GL(dl_load_lock).mutex.__data.__count = 0;
> -  while (rtld_lock_count-- > 0)
> -    __pthread_mutex_lock (&GL(dl_load_lock).mutex);
> -
>    GL(dl_make_stack_executable_hook) = &__make_stacks_executable;
>  #endif
>  
> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
> index 1b064c5894..6d590d1335 100644
> --- a/sysdeps/generic/ldsodefs.h
> +++ b/sysdeps/generic/ldsodefs.h
> @@ -403,7 +403,7 @@ struct rtld_global
>    struct auditstate _dl_rtld_auditstate[DL_NNS];
>  #endif
>  
> -#if defined SHARED && defined _LIBC_REENTRANT \
> +#if !PTHREAD_IN_LIBC && defined SHARED \
>      && defined __rtld_lock_default_lock_recursive
>    EXTERN void (*_dl_rtld_lock_recursive) (void *);
>    EXTERN void (*_dl_rtld_unlock_recursive) (void *);
> @@ -1318,6 +1318,29 @@ link_map_audit_state (struct link_map *l, size_t index)
>  }
>  #endif /* SHARED */
>  
> +#if PTHREAD_IN_LIBC && defined SHARED
> +/* Recursive locking implementation for use within the dynamic loader.
> +   Used to define the __rtld_lock_lock_recursive and
> +   __rtld_lock_unlock_recursive via <libc-lock.h>.  Initialized to a
> +   no-op dummy implementation early.  Similar
> +   to GL (dl_rtld_lock_recursive) and GL (dl_rtld_unlock_recursive)
> +   in !PTHREAD_IN_LIBC builds.  */
> +extern int (*___rtld_mutex_lock) (pthread_mutex_t *) attribute_hidden;
> +extern int (*___rtld_mutex_unlock) (pthread_mutex_t *lock) attribute_hidden;
> +
> +/* Called after libc has been loaded, but before RELRO is activated.
> +   Used to initialize the function pointers to the actual
> +   implementations.  */
> +void __rtld_mutex_init (void) attribute_hidden;
> +#else /* !PTHREAD_IN_LIBC */
> +static inline void
> +__rtld_mutex_init (void)
> +{
> +  /* The initialization happens later (!PTHREAD_IN_LIBC) or is not
> +     needed at all (!SHARED).  */
> +}
> +#endif /* !PTHREAD_IN_LIBC */
> +
>  #if THREAD_GSCOPE_IN_TCB
>  void __thread_gscope_wait (void) attribute_hidden;
>  # define THREAD_GSCOPE_WAIT() __thread_gscope_wait ()
> diff --git a/sysdeps/nptl/dl-mutex.c b/sysdeps/nptl/dl-mutex.c
> new file mode 100644
> index 0000000000..08b71dc21b
> --- /dev/null
> +++ b/sysdeps/nptl/dl-mutex.c
> @@ -0,0 +1,53 @@
> +/* Recursive locking implementation for the dynamic loader.  NPTL version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* Use the mutex implementation in libc (assuming PTHREAD_IN_LIBC).  */
> +
> +#include <assert.h>
> +#include <first-versions.h>
> +#include <ldsodefs.h>
> +
> +__typeof (pthread_mutex_lock) *___rtld_mutex_lock attribute_relro;
> +__typeof (pthread_mutex_unlock) *___rtld_mutex_unlock attribute_relro;
> +
> +void
> +__rtld_mutex_init (void)
> +{
> +  /* There is an implicit assumption here that the lock counters are
> +     zero and this function is called while nothing is locked.  For
> +     early initialization of the mutex functions this is true because
> +     it happens directly in dl_main in elf/rtld.c, and not some ELF
> +     constructor while holding loader locks.  */
> +
> +  struct link_map *libc_map = GL (dl_ns)[LM_ID_BASE].libc_map;
> +
> +  const ElfW(Sym) *sym
> +    = _dl_lookup_direct (libc_map, "pthread_mutex_lock",
> +                         0x4f152227, /* dl_new_hash output.  */
> +                         FIRST_VERSION_libc_pthread_mutex_lock_STRING,
> +                         FIRST_VERSION_libc_pthread_mutex_lock_HASH);
> +  assert (sym != NULL);
> +  ___rtld_mutex_lock = DL_SYMBOL_ADDRESS (libc_map, sym);
> +
> +  sym = _dl_lookup_direct (libc_map, "pthread_mutex_unlock",
> +                           0x7dd7aaaa, /* dl_new_hash output.  */
> +                           FIRST_VERSION_libc_pthread_mutex_unlock_STRING,
> +                           FIRST_VERSION_libc_pthread_mutex_unlock_HASH);
> +  assert (sym != NULL);
> +  ___rtld_mutex_unlock = DL_SYMBOL_ADDRESS (libc_map, sym);
> +}
> diff --git a/sysdeps/nptl/libc-lockP.h b/sysdeps/nptl/libc-lockP.h
> index ae9691d40e..ec7b02bbdd 100644
> --- a/sysdeps/nptl/libc-lockP.h
> +++ b/sysdeps/nptl/libc-lockP.h
> @@ -151,9 +151,6 @@ _Static_assert (LLL_LOCK_INITIALIZER == 0, "LLL_LOCK_INITIALIZER != 0");
>    __libc_maybe_call (__pthread_mutex_trylock, (&(NAME)), 0)
>  #endif
>  
> -#define __rtld_lock_trylock_recursive(NAME) \
> -  __libc_maybe_call (__pthread_mutex_trylock, (&(NAME).mutex), 0)
> -
>  /* Unlock the named lock variable.  */
>  #if IS_IN (libc) || IS_IN (libpthread)
>  # define __libc_lock_unlock(NAME) \
> @@ -163,19 +160,13 @@ _Static_assert (LLL_LOCK_INITIALIZER == 0, "LLL_LOCK_INITIALIZER != 0");
>  #endif
>  #define __libc_rwlock_unlock(NAME) __pthread_rwlock_unlock (&(NAME))
>  
> -#ifdef SHARED
> -# define __rtld_lock_default_lock_recursive(lock) \
> -  ++((pthread_mutex_t *)(lock))->__data.__count;
> -
> -# define __rtld_lock_default_unlock_recursive(lock) \
> -  --((pthread_mutex_t *)(lock))->__data.__count;
> -
> +#if IS_IN (rtld)
>  # define __rtld_lock_lock_recursive(NAME) \
> -  GL(dl_rtld_lock_recursive) (&(NAME).mutex)
> +  ___rtld_mutex_lock (&(NAME).mutex)
>  
>  # define __rtld_lock_unlock_recursive(NAME) \
> -  GL(dl_rtld_unlock_recursive) (&(NAME).mutex)
> -#else
> +  ___rtld_mutex_unlock (&(NAME).mutex)
> +#else /* Not in the dynamic loader.  */
>  # define __rtld_lock_lock_recursive(NAME) \
>    __pthread_mutex_lock (&(NAME).mutex)
>  
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/13] scripts/versions.awk: Add strings and hashes to <first-versions.h>
  2021-05-06 18:08 ` [PATCH 01/13] scripts/versions.awk: Add strings and hashes to <first-versions.h> Florian Weimer
@ 2021-05-09 21:42   ` Carlos O'Donell
  0 siblings, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:42 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:08 PM, Florian Weimer via Libc-alpha wrote:
> This generates new macros of this from:
> 
> #define FIRST_VERSION_libc___pthread_mutex_lock_STRING "GLIBC_2.2.5"
> #define FIRST_VERSION_libc___pthread_mutex_lock_HASH 0x9691a75
> 
> They are useful for symbol lookups using _dl_lookup_direct.

Straight forward, makes sense, used in the next patch.

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> ---
>  scripts/versions.awk | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/scripts/versions.awk b/scripts/versions.awk
> index d56f4e712c..3291123666 100644
> --- a/scripts/versions.awk
> +++ b/scripts/versions.awk
> @@ -32,6 +32,29 @@ BEGIN {
>    sort = "sort -t. -k 1,1 -k 2n,2n -k 3 > " tmpfile;
>  }
>  
> +# GNU awk does not implement the ord and chr functions.
> +# <https://www.gnu.org/software/gawk/manual/html_node/Ordinal-Functions.html>
> +# says that they are "written very nicely", using code similar to what
> +# is included here.
> +function chr(c) {
> +    return sprintf("%c", c)
> +}
> +
> +BEGIN {
> +    for (c = 1; c < 127; c++) {
> +	ord_table[chr(c)] = c;
> +    }
> +}
> +
> +function ord(c) {
> +    if (ord_table[c]) {
> +	return ord_table[c];
> +    } else {
> +	printf("Invalid character reference: '%c'\n", c) > "/dev/stderr";
> +	++lossage;
> +    }
> +}
> +
>  # Remove comment lines.
>  /^ *#/ {
>    next;
> @@ -90,6 +113,17 @@ function close_and_move(name, real_name) {
>    system(move_if_change " " name " " real_name " >&2");
>  }
>  
> +# ELF hash, for use with symbol versions.
> +function elf_hash(s, i, acc) {
> +  acc = 0;
> +  for (i = 1; i <= length(s); ++i) {
> +      acc = and(lshift(acc, 4) + ord(substr(s, i, 1)), 0xffffffff);
> +      top = and(acc, 0xf0000000);
> +      acc = and(xor(acc, rshift(top, 24)), compl(top));
> +  }
> +  return acc;
> +}
> +
>  # Now print the accumulated information.
>  END {
>    close(sort);
> @@ -145,6 +179,8 @@ END {
>  	  && oldver ~ "^GLIBC_[0-9]" \
>  	  && sym ~ "^[A-Za-z0-9_]*$") {
>  	ver_val = oldver;
> +	printf("#define %s_STRING \"%s\"\n", first_ver_macro, ver_val) > first_ver_header;
> +	printf("#define %s_HASH 0x%x\n", first_ver_macro, elf_hash(ver_val)) > first_ver_header;
>  	gsub("\\.", "_", ver_val);
>  	printf("#define %s %s\n", first_ver_macro, ver_val) > first_ver_header;
>  	first_ver_seen[first_ver_macro] = 1;
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/13] Linux: Move most stack management out of libpthread
  2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
                   ` (12 preceding siblings ...)
  2021-05-06 18:11 ` [PATCH 13/13] Linux: Move __reclaim_stacks into the fork implementation in libc Florian Weimer
@ 2021-05-09 21:42 ` Carlos O'Donell
  13 siblings, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:42 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:08 PM, Florian Weimer via Libc-alpha wrote:
> This incorporates the previous “nptl: Remove delayed rtld lock
> initialization” series.
> 
> Tested on i686-linux-gnu and x86_64-linux-gnu.  Built with
> build-many-glibcs.py.

This is certainly more complicated than your other series of patches
and involves some reorganizing that is needed for the libpthread merge.

Overall this looks good to me. Thanks for continuing the cleanup.

I tested on x86_64 and i686 with the recent changes compiler warning
changes reverted to allow the build to succeed.

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> Florian Weimer (13):
>   scripts/versions.awk: Add strings and hashes to <first-versions.h>
>   elf, nptl: Resolve recursive lock implementation early
>   nptl: Export __libc_multiple_threads from libc as an internal symbol
>   Linux: Explicitly disable cancellation checking in the dynamic loader
>   Linux: Simplify and fix the definition of SINGLE_THREAD_P
>   nptl: Eliminate __pthread_multiple_threads
>   elf: Introduce __tls_pre_init_tp
>   nptl: Move more stack management variables into _rtld_global
>   nptl: Simplify the change_stack_perm calling convention
>   nptl: Move changing of stack permissions into ld.so
>   nptl: Simplify resetting the in-flight stack in __reclaim_stacks
>   nptl: Move __default_pthread_attr, __default_pthread_attr_lock into
>     libc
>   Linux: Move __reclaim_stacks into the fork implementation in libc
> 
>  csu/libc-tls.c                          |   2 +
>  elf/Makefile                            |   3 +-
>  elf/dl-load.c                           |   4 +
>  elf/dl-mutex.c                          |  19 ++
>  elf/dl-support.c                        |  13 +-
>  elf/dl-tls_init_tp.c                    |  29 +++
>  elf/rtld.c                              |  34 +---
>  nptl/Makefile                           |   2 +-
>  nptl/Versions                           |   4 +-
>  nptl/allocatestack.c                    | 227 ++----------------------
>  nptl/libc_multiple_threads.c            |   3 +-
>  nptl/libc_pthread_init.c                |  11 --
>  nptl/nptl-init.c                        |  24 ---
>  nptl/pthreadP.h                         |  33 ++--
>  nptl/pthread_cancel.c                   |   2 +-
>  nptl/vars.c                             |  15 +-
>  scripts/versions.awk                    |  36 ++++
>  sysdeps/generic/ldsodefs.h              |  51 +++++-
>  sysdeps/nptl/dl-mutex.c                 |  53 ++++++
>  sysdeps/nptl/dl-tls_init_tp.c           |  26 ++-
>  sysdeps/nptl/fork.c                     | 110 ++++++++++++
>  sysdeps/nptl/libc-lockP.h               |  17 +-
>  sysdeps/unix/sysdep.h                   |  11 +-
>  sysdeps/unix/sysv/linux/Versions        |   6 +
>  sysdeps/unix/sysv/linux/dl-execstack.c  |  76 +++++++-
>  sysdeps/unix/sysv/linux/single-thread.h |  42 ++---
>  26 files changed, 481 insertions(+), 372 deletions(-)
>  create mode 100644 elf/dl-mutex.c
>  create mode 100644 sysdeps/nptl/dl-mutex.c
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/13] nptl: Eliminate __pthread_multiple_threads
  2021-05-06 18:10 ` [PATCH 06/13] nptl: Eliminate __pthread_multiple_threads Florian Weimer
@ 2021-05-09 21:42   ` Carlos O'Donell
  0 siblings, 0 replies; 30+ messages in thread
From: Carlos O'Donell @ 2021-05-09 21:42 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 5/6/21 2:10 PM, Florian Weimer via Libc-alpha wrote:
> It is no longer needed after the SINGLE_THREADED_P consolidation.

Yup. LGTM.

Tested on x86_64 and i686 without regression.

Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

> ---
>  nptl/allocatestack.c  | 4 ++--
>  nptl/pthreadP.h       | 7 -------
>  nptl/pthread_cancel.c | 2 +-
>  nptl/vars.c           | 7 -------
>  4 files changed, 3 insertions(+), 17 deletions(-)
> 
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 059786192e..88c49f8154 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -477,7 +477,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
>        /* This is at least the second thread.  */
>        pd->header.multiple_threads = 1;
>  #ifndef TLS_MULTIPLE_THREADS_IN_TCB
> -      __pthread_multiple_threads = __libc_multiple_threads = 1;
> +      __libc_multiple_threads = 1;
>  #endif
>  
>  #ifdef NEED_DL_SYSINFO
> @@ -598,7 +598,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
>  	  /* This is at least the second thread.  */
>  	  pd->header.multiple_threads = 1;
>  #ifndef TLS_MULTIPLE_THREADS_IN_TCB
> -	  __pthread_multiple_threads = __libc_multiple_threads = 1;
> +	  __libc_multiple_threads = 1;
>  #endif
>  
>  #ifdef NEED_DL_SYSINFO
> diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
> index dd6d6c6342..8ab247f977 100644
> --- a/nptl/pthreadP.h
> +++ b/nptl/pthreadP.h
> @@ -370,13 +370,6 @@ extern unsigned long int *__fork_generation_pointer attribute_hidden;
>  /* Register the generation counter in the libpthread with the libc.  */
>  extern void __libc_pthread_init (void (*reclaim) (void));
>  
> -#ifndef TLS_MULTIPLE_THREADS_IN_TCB
> -/* Variable set to a nonzero value either if more than one thread runs or ran,
> -   or if a single-threaded process is trying to cancel itself.  See
> -   nptl/descr.h for more context on the single-threaded process case.  */
> -extern int __pthread_multiple_threads attribute_hidden;
> -#endif
> -
>  extern size_t __pthread_get_minstack (const pthread_attr_t *attr);
>  
>  /* Namespace save aliases.  */
> diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c
> index 2cab8f0a34..fd04bedf6c 100644
> --- a/nptl/pthread_cancel.c
> +++ b/nptl/pthread_cancel.c
> @@ -90,7 +90,7 @@ __pthread_cancel (pthread_t th)
>  	   points get executed.  */
>  	THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1);
>  #ifndef TLS_MULTIPLE_THREADS_IN_TCB
> -	__pthread_multiple_threads = __libc_multiple_threads = 1;
> +	__libc_multiple_threads = 1;
>  #endif
>      }
>    /* Mark the thread as canceled.  This has to be done
> diff --git a/nptl/vars.c b/nptl/vars.c
> index 8de30856b8..03a6cc84be 100644
> --- a/nptl/vars.c
> +++ b/nptl/vars.c
> @@ -26,10 +26,3 @@ union pthread_attr_transparent __default_pthread_attr attribute_hidden;
>  
>  /* Mutex protecting __default_pthread_attr.  */
>  int __default_pthread_attr_lock = LLL_LOCK_INITIALIZER;
> -
> -#ifndef TLS_MULTIPLE_THREADS_IN_TCB
> -/* Variable set to a nonzero value either if more than one thread runs or ran,
> -   or if a single-threaded process is trying to cancel itself.  See
> -   nptl/descr.h for more context on the single-threaded process case.  */
> -int __pthread_multiple_threads attribute_hidden;
> -#endif
> 


-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 02/13] elf, nptl: Resolve recursive lock implementation early
  2021-05-09 21:42   ` Carlos O'Donell
@ 2021-05-10  5:54     ` Florian Weimer
  0 siblings, 0 replies; 30+ messages in thread
From: Florian Weimer @ 2021-05-10  5:54 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha

* Carlos O'Donell:

> On 5/6/21 2:09 PM, Florian Weimer via Libc-alpha wrote:
>> If libpthread is included in libc, it is not necessary to delay
>> initialization of the lock/unlock function pointers until libpthread
>> is loaded.  This eliminates two unprotected function pointers
>> from _rtld_global and removes some initialization code from
>> libpthread.
>
> This version looks good to me, and the early initialization makes it
> logically easier to follow when reading the code. Despite the removal
> of the unprotected function pointesr in _rtld_global, we still need
> some function pointer in order to lookup the function symbols from libc.so
> and remember their values, but data placement is harder to discover than
> the fixed offset from a public symbol.

Eh, the lookup happens before any user code runs, so it really ought to
be safe. 8-)

Thanks,
Florian


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-05-10  5:53 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-06 18:08 [PATCH 00/13] Linux: Move most stack management out of libpthread Florian Weimer
2021-05-06 18:08 ` [PATCH 01/13] scripts/versions.awk: Add strings and hashes to <first-versions.h> Florian Weimer
2021-05-09 21:42   ` Carlos O'Donell
2021-05-06 18:09 ` [PATCH v2 02/13] elf, nptl: Resolve recursive lock implementation early Florian Weimer
2021-05-09 21:42   ` Carlos O'Donell
2021-05-10  5:54     ` Florian Weimer
2021-05-06 18:10 ` [PATCH 03/13] nptl: Export __libc_multiple_threads from libc as an internal symbol Florian Weimer
2021-05-09 21:42   ` Carlos O'Donell
2021-05-06 18:10 ` [PATCH 04/13] Linux: Explicitly disable cancellation checking in the dynamic loader Florian Weimer
2021-05-09 21:42   ` Carlos O'Donell
2021-05-06 18:10 ` [PATCH 05/13] Linux: Simplify and fix the definition of SINGLE_THREAD_P Florian Weimer
2021-05-09 21:42   ` Carlos O'Donell
2021-05-06 18:10 ` [PATCH 06/13] nptl: Eliminate __pthread_multiple_threads Florian Weimer
2021-05-09 21:42   ` Carlos O'Donell
2021-05-06 18:10 ` [PATCH 07/13] elf: Introduce __tls_pre_init_tp Florian Weimer
2021-05-09 21:42   ` Carlos O'Donell
2021-05-06 18:10 ` [PATCH 08/13] nptl: Move more stack management variables into _rtld_global Florian Weimer
2021-05-09 21:42   ` Carlos O'Donell
2021-05-06 18:11 ` [PATCH 09/13] nptl: Simplify the change_stack_perm calling convention Florian Weimer
2021-05-09 21:42   ` Carlos O'Donell
2021-05-06 18:11 ` [PATCH 10/13] nptl: Move changing of stack permissions into ld.so Florian Weimer
2021-05-09 21:42   ` Carlos O'Donell
2021-05-06 18:11 ` [PATCH 11/13] nptl: Simplify resetting the in-flight stack in __reclaim_stacks Florian Weimer
2021-05-09 21:42   ` Carlos O'Donell
2021-05-06 18:11 ` [PATCH 12/13] nptl: Move __default_pthread_attr, __default_pthread_attr_lock into libc Florian Weimer
2021-05-09 21:41   ` Carlos O'Donell
2021-05-09 21:42   ` Carlos O'Donell
2021-05-06 18:11 ` [PATCH 13/13] Linux: Move __reclaim_stacks into the fork implementation in libc Florian Weimer
2021-05-09 21:41   ` Carlos O'Donell
2021-05-09 21:42 ` [PATCH 00/13] Linux: Move most stack management out of libpthread Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).