public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2 00/14] Dynamic TLS related data race fixes
@ 2021-04-13  8:17 Szabolcs Nagy
  2021-04-13  8:18 ` [PATCH v2 01/14] elf: Fix a DTV setup issue [BZ #27136] Szabolcs Nagy
                   ` (13 more replies)
  0 siblings, 14 replies; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:17 UTC (permalink / raw)
  To: libc-alpha

This series is trying to address some long standing dynamic
linker races:

19329 - race between tls access, pthread_create and dlopen
27111 - race between tls access, pthread_create and dlclose
27137 - x86: lazy tlsdesc relocation is racy

and there are a number of related issues fixed:

27135 - dtv gaps are not reused
27136 - dtv off-by-1 error
19924 - slow tls access after dlopen
27721 - x86: ld audit does not work with bind now tlsdesc

there are related issues that are not fixed, but tried to
not make the situation worse:

16133 - tls access is not as-safe
16134 - tls allocation aborts on oom
27112 - generation count may overflow on 32 bit targets

some of the patches can be handled independently.

The set is also available in the nsz/bug19329-v2 branch.

A common theme between the races is that the GL(dl_load_lock)
lock protects various dynamic linker shared global state that
are written under that lock, but in various situations (lazy
binding, tls access, thread creation) they are read without
the lock.

There seems to be consensus to not require relaxed atomic loads
for non-racy reads of objects that may be accessed atomically
elsewhere. The wiki with the concurrency rules have not been
updated yet. In this patch relaxed atomics is used if an access
is racy but synchronization is only missing when that is not
relied on (e.g. tls access must be synchronized with dlopen by
the user so unsynchronized dtv updates are not relied on).

I ran the glibc tests on x86_64, i386 and aarch64 linux (but
this may not cover the x86 tlsdesc changes completely).

v2:
- committed the fix for bug 27403 (arch64 memleak on dlclose)
- committed the fix for bug 21349 (a lazy symbol binding race)
- added an RFC fix for bug 19924 (slow tls after dlopen)
- reordered test patches after bug fixes.
- addressed previous review comments (see individual patches).

Szabolcs Nagy (14):
  elf: Fix a DTV setup issue [BZ #27136]
  elf: Add a DTV setup test [BZ #27136]
  elf: Fix comments and logic in _dl_add_to_slotinfo
  elf: Refactor _dl_update_slotinfo to avoid use after free
  elf: Fix data races in pthread_create and TLS access [BZ #19329]
  elf: Use relaxed atomics for racy accesses [BZ #19329]
  elf: Add test case for [BZ #19329]
  elf: Fix DTV gap reuse logic [BZ #27135]
  x86_64: Avoid lazy relocation of tlsdesc [BZ #27137]
  i386: Avoid lazy relocation of tlsdesc [BZ #27137]
  x86_64: Remove lazy tlsdesc relocation related code
  i386: Remove lazy tlsdesc relocation related code
  elf: Remove lazy tlsdesc relocation related code
  RFC elf: Fix slow tls access after dlopen [BZ #19924]

 elf/Makefile                |  15 ++-
 elf/dl-close.c              |  26 ++--
 elf/dl-open.c               |  21 ++--
 elf/dl-reloc.c              |   5 +-
 elf/dl-tls.c                | 161 ++++++++++++++-----------
 elf/tlsdeschtab.h           |  53 +--------
 elf/tst-tls20.c             |  98 +++++++++++++++
 elf/tst-tls20mod-bad.c      |   2 +
 elf/tst-tls21.c             |  68 +++++++++++
 elf/tst-tls21mod.c          |   1 +
 sysdeps/aarch64/tlsdesc.c   |   1 -
 sysdeps/arm/tlsdesc.c       |   1 -
 sysdeps/generic/ldsodefs.h  |   3 +-
 sysdeps/i386/dl-machine.h   |  76 ++++++------
 sysdeps/i386/dl-tlsdesc.S   | 156 ------------------------
 sysdeps/i386/dl-tlsdesc.h   |   6 +-
 sysdeps/i386/tlsdesc.c      | 230 ------------------------------------
 sysdeps/x86_64/dl-machine.h |  23 ++--
 sysdeps/x86_64/dl-tls.c     |   5 +-
 sysdeps/x86_64/dl-tlsdesc.S | 104 ----------------
 sysdeps/x86_64/dl-tlsdesc.h |   4 +-
 sysdeps/x86_64/tlsdesc.c    | 108 -----------------
 22 files changed, 361 insertions(+), 806 deletions(-)
 create mode 100644 elf/tst-tls20.c
 create mode 100644 elf/tst-tls20mod-bad.c
 create mode 100644 elf/tst-tls21.c
 create mode 100644 elf/tst-tls21mod.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 01/14] elf: Fix a DTV setup issue [BZ #27136]
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
@ 2021-04-13  8:18 ` Szabolcs Nagy
  2021-04-13  8:36   ` Andreas Schwab
  2021-04-13  8:18 ` [PATCH v2 02/14] elf: Add a DTV setup test " Szabolcs Nagy
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:18 UTC (permalink / raw)
  To: libc-alpha

The max modid is a valid index in the dtv, it should not be skipped.

The bug is observable if the last module has modid == 64 and its
generation is same or less than the max generation of the previous
modules.  Then dtv[0].counter implies dtv[64] is initialized but
it isn't. Fixes bug 27136.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
---
 elf/dl-tls.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index dd76829e74..79b93ad91b 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -590,7 +590,7 @@ _dl_allocate_tls_init (void *result)
 	}
 
       total += cnt;
-      if (total >= GL(dl_tls_max_dtv_idx))
+      if (total > GL(dl_tls_max_dtv_idx))
 	break;
 
       listp = listp->next;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 02/14] elf: Add a DTV setup test [BZ #27136]
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
  2021-04-13  8:18 ` [PATCH v2 01/14] elf: Fix a DTV setup issue [BZ #27136] Szabolcs Nagy
@ 2021-04-13  8:18 ` Szabolcs Nagy
  2021-04-14 18:06   ` Adhemerval Zanella
  2021-04-13  8:18 ` [PATCH v2 03/14] elf: Fix comments and logic in _dl_add_to_slotinfo Szabolcs Nagy
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:18 UTC (permalink / raw)
  To: libc-alpha

The test dlopens a large number of modules with TLS, they are reused
from an existing test.

The test relies on the reuse of slotinfo entries after dlclose, without
bug 27135 fixed this needs a failing dlopen. With a slotinfo list that
has non-monotone increasing generation counters, bug 27136 can trigger.

--
v2:
- moved from nptl/ to elf/
- dlopen a different set of existing modules.
- code style fixes (!= NULL etc).
- moved after bug fix patch
---
 elf/Makefile           | 10 ++++-
 elf/tst-tls20.c        | 98 ++++++++++++++++++++++++++++++++++++++++++
 elf/tst-tls20mod-bad.c |  2 +
 3 files changed, 109 insertions(+), 1 deletion(-)
 create mode 100644 elf/tst-tls20.c
 create mode 100644 elf/tst-tls20mod-bad.c

diff --git a/elf/Makefile b/elf/Makefile
index 0bef49e53d..b1344f1133 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -222,7 +222,8 @@ tests += restest1 preloadtest loadfail multiload origtest resolvfail \
 	 tst-audit14 tst-audit15 tst-audit16 \
 	 tst-single_threaded tst-single_threaded-pthread \
 	 tst-tls-ie tst-tls-ie-dlmopen argv0test \
-	 tst-glibc-hwcaps tst-glibc-hwcaps-prepend tst-glibc-hwcaps-mask
+	 tst-glibc-hwcaps tst-glibc-hwcaps-prepend tst-glibc-hwcaps-mask \
+	 tst-tls20
 #	 reldep9
 tests-internal += loadtest unload unload2 circleload1 \
 	 neededtest neededtest2 neededtest3 neededtest4 \
@@ -344,6 +345,7 @@ modules-names = testobj1 testobj2 testobj3 testobj4 testobj5 testobj6 \
 		libmarkermod2-1 libmarkermod2-2 \
 		libmarkermod3-1 libmarkermod3-2 libmarkermod3-3 \
 		libmarkermod4-1 libmarkermod4-2 libmarkermod4-3 libmarkermod4-4 \
+		tst-tls20mod-bad
 
 # Most modules build with _ISOMAC defined, but those filtered out
 # depend on internal headers.
@@ -1924,3 +1926,9 @@ $(objpfx)tst-rtld-help.out: $(objpfx)ld.so
 	fi; \
 	(exit $$status); \
 	$(evaluate-test)
+
+# Reuses tst-tls-many-dynamic-modules
+tst-tls20mod-bad.so-no-z-defs = yes
+$(objpfx)tst-tls20: $(libdl) $(shared-thread-library)
+$(objpfx)tst-tls20.out: $(objpfx)tst-tls20mod-bad.so \
+			$(tst-tls-many-dynamic-modules:%=$(objpfx)%.so)
diff --git a/elf/tst-tls20.c b/elf/tst-tls20.c
new file mode 100644
index 0000000000..a378c1fa08
--- /dev/null
+++ b/elf/tst-tls20.c
@@ -0,0 +1,98 @@
+/* Test dtv setup if entries don't have monotone increasing generation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <support/check.h>
+#include <support/xdlfcn.h>
+#include <support/xthread.h>
+
+#define NMOD 100
+static void *mod[NMOD];
+
+static void
+load_fail (void)
+{
+  /* Expected to fail because of a missing symbol.  */
+  void *m = dlopen ("tst-tls20mod-bad.so", RTLD_NOW);
+  if (m != NULL)
+    FAIL_EXIT1 ("dlopen of tst-tls20mod-bad.so succeeded\n");
+}
+
+static void
+load_mod (int i)
+{
+  char buf[100];
+  snprintf (buf, sizeof buf, "tst-tls-manydynamic%02dmod.so", i);
+  mod[i] = xdlopen (buf, RTLD_LAZY);
+}
+
+static void
+unload_mod (int i)
+{
+  if (mod[i] != NULL)
+    xdlclose (mod[i]);
+  mod[i] = NULL;
+}
+
+static void
+access (int i)
+{
+  char buf[100];
+  snprintf (buf, sizeof buf, "tls_global_%02d", i);
+  dlerror ();
+  int *p = dlsym (mod[i], buf);
+  printf ("mod[%d]: &tls = %p\n", i, p);
+  if (p == NULL)
+    FAIL_EXIT1 ("dlsym failed: %s\n", dlerror ());
+  ++*p;
+}
+
+static void *
+start (void *a)
+{
+  for (int i = 0; i < NMOD; i++)
+    if (mod[i] != NULL)
+      access (i);
+  return 0;
+}
+
+static int
+do_test (void)
+{
+  int i;
+
+  for (i = 0; i < NMOD; i++)
+    {
+      load_mod (i);
+      /* Bump the generation of mod[0] without using new dtv slot.  */
+      unload_mod (0);
+      load_fail (); /* Ensure GL(dl_tls_dtv_gaps) is true: see bug 27135.  */
+      load_mod (0);
+      /* Access TLS in all loaded modules.  */
+      pthread_t t = xpthread_create (0, start, 0);
+      xpthread_join (t);
+    }
+  for (i = 0; i < NMOD; i++)
+    unload_mod (i);
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/elf/tst-tls20mod-bad.c b/elf/tst-tls20mod-bad.c
new file mode 100644
index 0000000000..c1aed8ea7d
--- /dev/null
+++ b/elf/tst-tls20mod-bad.c
@@ -0,0 +1,2 @@
+void missing_symbol (void);
+void f (void) {missing_symbol ();}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 03/14] elf: Fix comments and logic in _dl_add_to_slotinfo
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
  2021-04-13  8:18 ` [PATCH v2 01/14] elf: Fix a DTV setup issue [BZ #27136] Szabolcs Nagy
  2021-04-13  8:18 ` [PATCH v2 02/14] elf: Add a DTV setup test " Szabolcs Nagy
@ 2021-04-13  8:18 ` Szabolcs Nagy
  2021-04-14 18:12   ` Adhemerval Zanella
  2021-04-13  8:18 ` [PATCH v2 04/14] elf: Refactor _dl_update_slotinfo to avoid use after free Szabolcs Nagy
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:18 UTC (permalink / raw)
  To: libc-alpha

Since

  commit a509eb117fac1d764b15eba64993f4bdb63d7f3c
  Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112]

the generation counter update is not needed in the failure path.
That commit ensures allocation in _dl_add_to_slotinfo happens before
the demarcation point in dlopen (it is called twice, first time is for
allocation only where dlopen can still be reverted on failure, then
second time actual dtv updates are done which then cannot fail).

--
v2:
- expanded the commit message a bit.
---
 elf/dl-tls.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 79b93ad91b..24d00c14ef 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -998,16 +998,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add)
 		+ TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
       if (listp == NULL)
 	{
-	  /* We ran out of memory.  We will simply fail this
-	     call but don't undo anything we did so far.  The
-	     application will crash or be terminated anyway very
-	     soon.  */
-
-	  /* We have to do this since some entries in the dtv
-	     slotinfo array might already point to this
-	     generation.  */
-	  ++GL(dl_tls_generation);
-
+	  /* We ran out of memory while resizing the dtv slotinfo list.  */
 	  _dl_signal_error (ENOMEM, "dlopen", NULL, N_("\
 cannot create TLS data structures"));
 	}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 04/14] elf: Refactor _dl_update_slotinfo to avoid use after free
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
                   ` (2 preceding siblings ...)
  2021-04-13  8:18 ` [PATCH v2 03/14] elf: Fix comments and logic in _dl_add_to_slotinfo Szabolcs Nagy
@ 2021-04-13  8:18 ` Szabolcs Nagy
  2021-04-14 18:20   ` Adhemerval Zanella
  2021-04-13  8:19 ` [PATCH v2 05/14] elf: Fix data races in pthread_create and TLS access [BZ #19329] Szabolcs Nagy
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:18 UTC (permalink / raw)
  To: libc-alpha

map is not valid to access here because it can be freed by a concurrent
dlclose: during tls access (via __tls_get_addr) _dl_update_slotinfo is
called without holding dlopen locks. So don't check the modid of map.

The map == 0 and map != 0 code paths can be shared (avoiding the dtv
resize in case of map == 0 is just an optimization: larger dtv than
necessary would be fine too).

--
v2:
- commit message update
---
 elf/dl-tls.c | 21 +++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 24d00c14ef..f8b32b3ecb 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -743,6 +743,8 @@ _dl_update_slotinfo (unsigned long int req_modid)
 	{
 	  for (size_t cnt = total == 0 ? 1 : 0; cnt < listp->len; ++cnt)
 	    {
+	      size_t modid = total + cnt;
+
 	      size_t gen = listp->slotinfo[cnt].gen;
 
 	      if (gen > new_gen)
@@ -758,25 +760,12 @@ _dl_update_slotinfo (unsigned long int req_modid)
 
 	      /* If there is no map this means the entry is empty.  */
 	      struct link_map *map = listp->slotinfo[cnt].map;
-	      if (map == NULL)
-		{
-		  if (dtv[-1].counter >= total + cnt)
-		    {
-		      /* If this modid was used at some point the memory
-			 might still be allocated.  */
-		      free (dtv[total + cnt].pointer.to_free);
-		      dtv[total + cnt].pointer.val = TLS_DTV_UNALLOCATED;
-		      dtv[total + cnt].pointer.to_free = NULL;
-		    }
-
-		  continue;
-		}
-
 	      /* Check whether the current dtv array is large enough.  */
-	      size_t modid = map->l_tls_modid;
-	      assert (total + cnt == modid);
 	      if (dtv[-1].counter < modid)
 		{
+		  if (map == NULL)
+		    continue;
+
 		  /* Resize the dtv.  */
 		  dtv = _dl_resize_dtv (dtv);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 05/14] elf: Fix data races in pthread_create and TLS access [BZ #19329]
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
                   ` (3 preceding siblings ...)
  2021-04-13  8:18 ` [PATCH v2 04/14] elf: Refactor _dl_update_slotinfo to avoid use after free Szabolcs Nagy
@ 2021-04-13  8:19 ` Szabolcs Nagy
  2021-04-15 17:44   ` Adhemerval Zanella
  2021-04-13  8:19 ` [PATCH v2 06/14] elf: Use relaxed atomics for racy accesses " Szabolcs Nagy
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:19 UTC (permalink / raw)
  To: libc-alpha

DTV setup at thread creation (_dl_allocate_tls_init) is changed
to take the dlopen lock, GL(dl_load_lock).  Avoiding data races
here without locks would require design changes: the map that is
accessed for static TLS initialization here may be concurrently
freed by dlclose.  That use after free may be solved by only
locking around static TLS setup or by ensuring dlclose does not
free modules with static TLS, however currently every link map
with TLS has to be accessed at least to see if it needs static
TLS.  And even if that's solved, still a lot of atomics would be
needed to synchronize DTV related globals without a lock. So fix
both bug 19329 and bug 27111 with a lock that prevents DTV setup
running concurrently with dlopen or dlclose.

_dl_update_slotinfo at TLS access still does not use any locks
so CONCURRENCY NOTES are added to explain the synchronization.
The early exit from the slotinfo walk when max_modid is reached
is not strictly necessary, but does not hurt either.

An incorrect acquire load was removed from _dl_resize_dtv: it
did not synchronize with any release store or fence and
synchronization is now handled separately at thread creation
and TLS access time.

There are still a number of racy read accesses to globals that
will be changed to relaxed MO atomics in a followup patch. This
should not introduce regressions compared to existing behaviour
and avoid cluttering the main part of the fix.

Not all TLS access related data races got fixed here: there are
additional races at lazy tlsdesc relocations see bug 27137.
---
 elf/dl-tls.c | 63 +++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 47 insertions(+), 16 deletions(-)

diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index f8b32b3ecb..33c06782b1 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -471,14 +471,11 @@ extern dtv_t _dl_static_dtv[];
 #endif
 
 static dtv_t *
-_dl_resize_dtv (dtv_t *dtv)
+_dl_resize_dtv (dtv_t *dtv, size_t max_modid)
 {
   /* Resize the dtv.  */
   dtv_t *newp;
-  /* Load GL(dl_tls_max_dtv_idx) atomically since it may be written to by
-     other threads concurrently.  */
-  size_t newsize
-    = atomic_load_acquire (&GL(dl_tls_max_dtv_idx)) + DTV_SURPLUS;
+  size_t newsize = max_modid + DTV_SURPLUS;
   size_t oldsize = dtv[-1].counter;
 
   if (dtv == GL(dl_initial_dtv))
@@ -524,11 +521,14 @@ _dl_allocate_tls_init (void *result)
   size_t total = 0;
   size_t maxgen = 0;
 
+  /* Protects global dynamic TLS related state.  */
+  __rtld_lock_lock_recursive (GL(dl_load_lock));
+
   /* Check if the current dtv is big enough.   */
   if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
     {
       /* Resize the dtv.  */
-      dtv = _dl_resize_dtv (dtv);
+      dtv = _dl_resize_dtv (dtv, GL(dl_tls_max_dtv_idx));
 
       /* Install this new dtv in the thread data structures.  */
       INSTALL_DTV (result, &dtv[-1]);
@@ -596,6 +596,7 @@ _dl_allocate_tls_init (void *result)
       listp = listp->next;
       assert (listp != NULL);
     }
+  __rtld_lock_unlock_recursive (GL(dl_load_lock));
 
   /* The DTV version is up-to-date now.  */
   dtv[0].counter = maxgen;
@@ -730,12 +731,29 @@ _dl_update_slotinfo (unsigned long int req_modid)
 
   if (dtv[0].counter < listp->slotinfo[idx].gen)
     {
-      /* The generation counter for the slot is higher than what the
-	 current dtv implements.  We have to update the whole dtv but
-	 only those entries with a generation counter <= the one for
-	 the entry we need.  */
+      /* CONCURRENCY NOTES:
+
+	 Here the dtv needs to be updated to new_gen generation count.
+
+	 This code may be called during TLS access when GL(dl_load_lock)
+	 is not held.  In that case the user code has to synchrnize with
+	 dlopen and dlclose calls of relevant modules.  A module m is
+	 relevant if the generation of m <= new_gen and dlclose of m is
+	 synchronized: a memory access here happens after the dlopen and
+	 before the dlclose of relevant modules.  The dtv entries for
+	 relevant modules need to be updated, other entries can be
+	 arbitrary.
+
+	 This e.g. means that the first part of the slotinfo list can be
+	 accessed race free, but the tail may be concurrently extended.
+	 Similarly relevant slotinfo entries can be read race free, but
+	 other entries are racy.  However updating a non-relevant dtv
+	 entry does not affect correctness.  For a relevant module m,
+	 max_modid >= modid of m.  */
       size_t new_gen = listp->slotinfo[idx].gen;
       size_t total = 0;
+      size_t max_modid  = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx));
+      assert (max_modid >= req_modid);
 
       /* We have to look through the entire dtv slotinfo list.  */
       listp =  GL(dl_tls_dtv_slotinfo_list);
@@ -745,12 +763,14 @@ _dl_update_slotinfo (unsigned long int req_modid)
 	    {
 	      size_t modid = total + cnt;
 
+	      /* Later entries are not relevant.  */
+	      if (modid > max_modid)
+		break;
+
 	      size_t gen = listp->slotinfo[cnt].gen;
 
 	      if (gen > new_gen)
-		/* This is a slot for a generation younger than the
-		   one we are handling now.  It might be incompletely
-		   set up so ignore it.  */
+		/* Not relevant.  */
 		continue;
 
 	      /* If the entry is older than the current dtv layout we
@@ -767,7 +787,7 @@ _dl_update_slotinfo (unsigned long int req_modid)
 		    continue;
 
 		  /* Resize the dtv.  */
-		  dtv = _dl_resize_dtv (dtv);
+		  dtv = _dl_resize_dtv (dtv, max_modid);
 
 		  assert (modid <= dtv[-1].counter);
 
@@ -789,8 +809,17 @@ _dl_update_slotinfo (unsigned long int req_modid)
 	    }
 
 	  total += listp->len;
+	  if (total > max_modid)
+	    break;
+
+	  /* Synchronize with _dl_add_to_slotinfo. Ideally this would
+	     be consume MO since we only need to order the accesses to
+	     the next node after the read of the address and on most
+	     hardware (other than alpha) a normal load would do that
+	     because of the address dependency.  */
+	  listp = atomic_load_acquire (&listp->next);
 	}
-      while ((listp = listp->next) != NULL);
+      while (listp != NULL);
 
       /* This will be the new maximum generation counter.  */
       dtv[0].counter = new_gen;
@@ -982,7 +1011,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add)
 	 the first slot.  */
       assert (idx == 0);
 
-      listp = prevp->next = (struct dtv_slotinfo_list *)
+      listp = (struct dtv_slotinfo_list *)
 	malloc (sizeof (struct dtv_slotinfo_list)
 		+ TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
       if (listp == NULL)
@@ -996,6 +1025,8 @@ cannot create TLS data structures"));
       listp->next = NULL;
       memset (listp->slotinfo, '\0',
 	      TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
+      /* Synchronize with _dl_update_slotinfo.  */
+      atomic_store_release (&prevp->next, listp);
     }
 
   /* Add the information into the slotinfo data structure.  */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 06/14] elf: Use relaxed atomics for racy accesses [BZ #19329]
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
                   ` (4 preceding siblings ...)
  2021-04-13  8:19 ` [PATCH v2 05/14] elf: Fix data races in pthread_create and TLS access [BZ #19329] Szabolcs Nagy
@ 2021-04-13  8:19 ` Szabolcs Nagy
  2021-04-15 18:21   ` Adhemerval Zanella
  2021-04-13  8:19 ` [PATCH v2 07/14] elf: Add test case for " Szabolcs Nagy
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:19 UTC (permalink / raw)
  To: libc-alpha

This is a follow up patch to the fix for bug 19329.  This adds
relaxed MO atomics to accesses that are racy, but relaxed MO is
enough.

--
v2:
- handle x86_64 dl-tls.c too
---
 elf/dl-close.c          | 20 +++++++++++++-------
 elf/dl-open.c           |  5 ++++-
 elf/dl-tls.c            | 31 +++++++++++++++++++++++--------
 sysdeps/x86_64/dl-tls.c |  3 ++-
 4 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/elf/dl-close.c b/elf/dl-close.c
index c51becd06b..3720e47dd1 100644
--- a/elf/dl-close.c
+++ b/elf/dl-close.c
@@ -79,9 +79,10 @@ remove_slotinfo (size_t idx, struct dtv_slotinfo_list *listp, size_t disp,
 	{
 	  assert (old_map->l_tls_modid == idx);
 
-	  /* Mark the entry as unused. */
-	  listp->slotinfo[idx - disp].gen = GL(dl_tls_generation) + 1;
-	  listp->slotinfo[idx - disp].map = NULL;
+	  /* Mark the entry as unused.  These can be read concurrently.  */
+	  atomic_store_relaxed (&listp->slotinfo[idx - disp].gen,
+				GL(dl_tls_generation) + 1);
+	  atomic_store_relaxed (&listp->slotinfo[idx - disp].map, NULL);
 	}
 
       /* If this is not the last currently used entry no need to look
@@ -96,8 +97,8 @@ remove_slotinfo (size_t idx, struct dtv_slotinfo_list *listp, size_t disp,
 
       if (listp->slotinfo[idx - disp].map != NULL)
 	{
-	  /* Found a new last used index.  */
-	  GL(dl_tls_max_dtv_idx) = idx;
+	  /* Found a new last used index.  This can be read concurrently.  */
+	  atomic_store_relaxed (&GL(dl_tls_max_dtv_idx), idx);
 	  return true;
 	}
     }
@@ -571,7 +572,9 @@ _dl_close_worker (struct link_map *map, bool force)
 					GL(dl_tls_dtv_slotinfo_list), 0,
 					imap->l_init_called))
 		/* All dynamically loaded modules with TLS are unloaded.  */
-		GL(dl_tls_max_dtv_idx) = GL(dl_tls_static_nelem);
+		/* Can be read concurrently.  */
+		atomic_store_relaxed (&GL(dl_tls_max_dtv_idx),
+				      GL(dl_tls_static_nelem));
 
 	      if (imap->l_tls_offset != NO_TLS_OFFSET
 		  && imap->l_tls_offset != FORCED_DYNAMIC_TLS_OFFSET)
@@ -769,8 +772,11 @@ _dl_close_worker (struct link_map *map, bool force)
   /* If we removed any object which uses TLS bump the generation counter.  */
   if (any_tls)
     {
-      if (__glibc_unlikely (++GL(dl_tls_generation) == 0))
+      size_t newgen = GL(dl_tls_generation) + 1;
+      if (__glibc_unlikely (newgen == 0))
 	_dl_fatal_printf ("TLS generation counter wrapped!  Please report as described in "REPORT_BUGS_TO".\n");
+      /* Can be read concurrently.  */
+      atomic_store_relaxed (&GL(dl_tls_generation), newgen);
 
       if (tls_free_end == GL(dl_tls_static_used))
 	GL(dl_tls_static_used) = tls_free_start;
diff --git a/elf/dl-open.c b/elf/dl-open.c
index ab7aaa345e..83b8e96a5c 100644
--- a/elf/dl-open.c
+++ b/elf/dl-open.c
@@ -395,9 +395,12 @@ update_tls_slotinfo (struct link_map *new)
 	}
     }
 
-  if (__builtin_expect (++GL(dl_tls_generation) == 0, 0))
+  size_t newgen = GL(dl_tls_generation) + 1;
+  if (__builtin_expect (newgen == 0, 0))
     _dl_fatal_printf (N_("\
 TLS generation counter wrapped!  Please report this."));
+  /* Can be read concurrently.  */
+  atomic_store_relaxed (&GL(dl_tls_generation), newgen);
 
   /* We need a second pass for static tls data, because
      _dl_update_slotinfo must not be run while calls to
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 33c06782b1..c4466bd9fc 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -175,7 +175,9 @@ _dl_next_tls_modid (void)
       /* No gaps, allocate a new entry.  */
     nogaps:
 
-      result = ++GL(dl_tls_max_dtv_idx);
+      result = GL(dl_tls_max_dtv_idx) + 1;
+      /* Can be read concurrently.  */
+      atomic_store_relaxed (&GL(dl_tls_max_dtv_idx), result);
     }
 
   return result;
@@ -359,10 +361,12 @@ allocate_dtv (void *result)
   dtv_t *dtv;
   size_t dtv_length;
 
+  /* Relaxed MO, because the dtv size is later rechecked, not relied on.  */
+  size_t max_modid = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx));
   /* We allocate a few more elements in the dtv than are needed for the
      initial set of modules.  This should avoid in most cases expansions
      of the dtv.  */
-  dtv_length = GL(dl_tls_max_dtv_idx) + DTV_SURPLUS;
+  dtv_length = max_modid + DTV_SURPLUS;
   dtv = calloc (dtv_length + 2, sizeof (dtv_t));
   if (dtv != NULL)
     {
@@ -767,7 +771,7 @@ _dl_update_slotinfo (unsigned long int req_modid)
 	      if (modid > max_modid)
 		break;
 
-	      size_t gen = listp->slotinfo[cnt].gen;
+	      size_t gen = atomic_load_relaxed (&listp->slotinfo[cnt].gen);
 
 	      if (gen > new_gen)
 		/* Not relevant.  */
@@ -779,7 +783,8 @@ _dl_update_slotinfo (unsigned long int req_modid)
 		continue;
 
 	      /* If there is no map this means the entry is empty.  */
-	      struct link_map *map = listp->slotinfo[cnt].map;
+	      struct link_map *map
+		= atomic_load_relaxed (&listp->slotinfo[cnt].map);
 	      /* Check whether the current dtv array is large enough.  */
 	      if (dtv[-1].counter < modid)
 		{
@@ -923,7 +928,12 @@ __tls_get_addr (GET_ADDR_ARGS)
 {
   dtv_t *dtv = THREAD_DTV ();
 
-  if (__glibc_unlikely (dtv[0].counter != GL(dl_tls_generation)))
+  /* Update is needed if dtv[0].counter < the generation of the accessed
+     module.  The global generation counter is used here as it is easier
+     to check.  Synchronization for the relaxed MO access is guaranteed
+     by user code, see CONCURRENCY NOTES in _dl_update_slotinfo.  */
+  size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
+  if (__glibc_unlikely (dtv[0].counter != gen))
     return update_get_addr (GET_ADDR_PARAM);
 
   void *p = dtv[GET_ADDR_MODULE].pointer.val;
@@ -946,7 +956,10 @@ _dl_tls_get_addr_soft (struct link_map *l)
     return NULL;
 
   dtv_t *dtv = THREAD_DTV ();
-  if (__glibc_unlikely (dtv[0].counter != GL(dl_tls_generation)))
+  /* This may be called without holding the GL(dl_load_lock).  Reading
+     arbitrary gen value is fine since this is best effort code.  */
+  size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
+  if (__glibc_unlikely (dtv[0].counter != gen))
     {
       /* This thread's DTV is not completely current,
 	 but it might already cover this module.  */
@@ -1032,7 +1045,9 @@ cannot create TLS data structures"));
   /* Add the information into the slotinfo data structure.  */
   if (do_add)
     {
-      listp->slotinfo[idx].map = l;
-      listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
+      /* Can be read concurrently.  See _dl_update_slotinfo.  */
+      atomic_store_relaxed (&listp->slotinfo[idx].map, l);
+      atomic_store_relaxed (&listp->slotinfo[idx].gen,
+			    GL(dl_tls_generation) + 1);
     }
 }
diff --git a/sysdeps/x86_64/dl-tls.c b/sysdeps/x86_64/dl-tls.c
index 6595f6615b..24ef560b71 100644
--- a/sysdeps/x86_64/dl-tls.c
+++ b/sysdeps/x86_64/dl-tls.c
@@ -40,7 +40,8 @@ __tls_get_addr_slow (GET_ADDR_ARGS)
 {
   dtv_t *dtv = THREAD_DTV ();
 
-  if (__glibc_unlikely (dtv[0].counter != GL(dl_tls_generation)))
+  size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
+  if (__glibc_unlikely (dtv[0].counter != gen))
     return update_get_addr (GET_ADDR_PARAM);
 
   return tls_get_addr_tail (GET_ADDR_PARAM, dtv, NULL);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 07/14] elf: Add test case for [BZ #19329]
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
                   ` (5 preceding siblings ...)
  2021-04-13  8:19 ` [PATCH v2 06/14] elf: Use relaxed atomics for racy accesses " Szabolcs Nagy
@ 2021-04-13  8:19 ` Szabolcs Nagy
  2021-04-15 19:21   ` Adhemerval Zanella
  2021-04-13  8:20 ` [PATCH v2 08/14] elf: Fix DTV gap reuse logic [BZ #27135] Szabolcs Nagy
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:19 UTC (permalink / raw)
  To: libc-alpha

Test concurrent dlopen and pthread_create when the loaded modules have
TLS.  This triggers dl-tls assertion failures more reliably than the
nptl/tst-stack4 test.

The dlopened module has 100 DT_NEEDED dependencies with TLS, they were
reused from an existing TLS test. The number of created threads during
dlopen depends on filesystem speed and hardware, but at most 3 threads
are alive at a time to limit resource usage.

---
v5:
- moved from nptl/ to elf/
- use many tls modules from another test (affects the timing a bit as
  these need more relocation processing, but still detects the race)
- use xdlopen
- use stdatomic.h
- moved from tests-internal back to tests
- use acq/rel atomics instead of relaxed, this works too and cleaner.
- moved after bug fix patch.

v4:
- rebased, updated copyright year.
- moved to tests-internal because of <atomic.h>

v3:
- use the new test support code.
- better Makefile usage so modules are cleaned properly.

v2:
- undef NDEBUG.
- join nop threads so at most 3 threads exist at a time.
- remove stack size setting (resource usage is no longer a concern).
- stop creating threads after dlopen observably finished.
- print the number of threads created for debugging.
---
 elf/Makefile       |  9 ++++--
 elf/tst-tls21.c    | 68 ++++++++++++++++++++++++++++++++++++++++++++++
 elf/tst-tls21mod.c |  1 +
 3 files changed, 76 insertions(+), 2 deletions(-)
 create mode 100644 elf/tst-tls21.c
 create mode 100644 elf/tst-tls21mod.c

diff --git a/elf/Makefile b/elf/Makefile
index b1344f1133..e9377608eb 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -223,7 +223,7 @@ tests += restest1 preloadtest loadfail multiload origtest resolvfail \
 	 tst-single_threaded tst-single_threaded-pthread \
 	 tst-tls-ie tst-tls-ie-dlmopen argv0test \
 	 tst-glibc-hwcaps tst-glibc-hwcaps-prepend tst-glibc-hwcaps-mask \
-	 tst-tls20
+	 tst-tls20 tst-tls21
 #	 reldep9
 tests-internal += loadtest unload unload2 circleload1 \
 	 neededtest neededtest2 neededtest3 neededtest4 \
@@ -345,7 +345,7 @@ modules-names = testobj1 testobj2 testobj3 testobj4 testobj5 testobj6 \
 		libmarkermod2-1 libmarkermod2-2 \
 		libmarkermod3-1 libmarkermod3-2 libmarkermod3-3 \
 		libmarkermod4-1 libmarkermod4-2 libmarkermod4-3 libmarkermod4-4 \
-		tst-tls20mod-bad
+		tst-tls20mod-bad tst-tls21mod
 
 # Most modules build with _ISOMAC defined, but those filtered out
 # depend on internal headers.
@@ -1932,3 +1932,8 @@ tst-tls20mod-bad.so-no-z-defs = yes
 $(objpfx)tst-tls20: $(libdl) $(shared-thread-library)
 $(objpfx)tst-tls20.out: $(objpfx)tst-tls20mod-bad.so \
 			$(tst-tls-many-dynamic-modules:%=$(objpfx)%.so)
+
+# Reuses tst-tls-many-dynamic-modules
+$(objpfx)tst-tls21: $(libdl) $(shared-thread-library)
+$(objpfx)tst-tls21.out: $(objpfx)tst-tls21mod.so
+$(objpfx)tst-tls21mod.so: $(tst-tls-many-dynamic-modules:%=$(objpfx)%.so)
diff --git a/elf/tst-tls21.c b/elf/tst-tls21.c
new file mode 100644
index 0000000000..560bf5813a
--- /dev/null
+++ b/elf/tst-tls21.c
@@ -0,0 +1,68 @@
+/* Test concurrent dlopen and pthread_create: BZ 19329.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <stdatomic.h>
+#include <support/xdlfcn.h>
+#include <support/xthread.h>
+
+#define THREADS 10000
+
+static atomic_int done;
+
+static void *
+start (void *a)
+{
+  /* Load a module with many dependencies that each have TLS.  */
+  xdlopen ("tst-tls21mod.so", RTLD_LAZY);
+  atomic_store_explicit (&done, 1, memory_order_release);
+  return 0;
+}
+
+static void *
+nop (void *a)
+{
+  return 0;
+}
+
+static int
+do_test (void)
+{
+  pthread_t t1, t2;
+  int i;
+
+  /* Load a module with lots of dependencies and TLS.  */
+  t1 = xpthread_create (0, start, 0);
+
+  /* Concurrently create lots of threads until dlopen is observably done.  */
+  for (i = 0; i < THREADS; i++)
+    {
+      if (atomic_load_explicit (&done, memory_order_acquire) != 0)
+	break;
+      t2 = xpthread_create (0, nop, 0);
+      xpthread_join (t2);
+    }
+
+  xpthread_join (t1);
+  printf ("threads created during dlopen: %d\n", i);
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/elf/tst-tls21mod.c b/elf/tst-tls21mod.c
new file mode 100644
index 0000000000..206ece4fb3
--- /dev/null
+++ b/elf/tst-tls21mod.c
@@ -0,0 +1 @@
+int __thread x;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 08/14] elf: Fix DTV gap reuse logic [BZ #27135]
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
                   ` (6 preceding siblings ...)
  2021-04-13  8:19 ` [PATCH v2 07/14] elf: Add test case for " Szabolcs Nagy
@ 2021-04-13  8:20 ` Szabolcs Nagy
  2021-04-15 19:45   ` Adhemerval Zanella
  2021-06-24  9:48   ` Florian Weimer
  2021-04-13  8:20 ` [PATCH v2 09/14] x86_64: Avoid lazy relocation of tlsdesc [BZ #27137] Szabolcs Nagy
                   ` (5 subsequent siblings)
  13 siblings, 2 replies; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:20 UTC (permalink / raw)
  To: libc-alpha

For some reason only dlopen failure caused dtv gaps to be reused.

It is possible that the intent was to never reuse modids for a
different module, but after dlopen failure all gaps are reused
not just the ones caused by the unfinished dlopened.

So the code has to handle reused modids already which seems to
work, however the data races at thread creation and tls access
(see bug 19329 and bug 27111) may be more severe if slots are
reused so this is scheduled after those fixes. I think fixing
the races are not simpler if reuse is disallowed and reuse has
other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are
removed from the middle of the slotinfo list. The value does
not have to be correct: incorrect true value causes the next
modid query to do a slotinfo walk, incorrect false will leave
gaps and new entries are added at the end.

Fixes bug 27135.
---
 elf/dl-close.c |  6 +++++-
 elf/dl-open.c  | 10 ----------
 elf/dl-tls.c   |  5 +----
 3 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/elf/dl-close.c b/elf/dl-close.c
index 3720e47dd1..9f31532f41 100644
--- a/elf/dl-close.c
+++ b/elf/dl-close.c
@@ -88,7 +88,11 @@ remove_slotinfo (size_t idx, struct dtv_slotinfo_list *listp, size_t disp,
       /* If this is not the last currently used entry no need to look
 	 further.  */
       if (idx != GL(dl_tls_max_dtv_idx))
-	return true;
+	{
+	  /* There is an unused dtv entry in the middle.  */
+	  GL(dl_tls_dtv_gaps) = true;
+	  return true;
+	}
     }
 
   while (idx - disp > (disp == 0 ? 1 + GL(dl_tls_static_nelem) : 0))
diff --git a/elf/dl-open.c b/elf/dl-open.c
index 83b8e96a5c..661f26977e 100644
--- a/elf/dl-open.c
+++ b/elf/dl-open.c
@@ -890,16 +890,6 @@ no more namespaces available for dlmopen()"));
 	 state if relocation failed, for example.  */
       if (args.map)
 	{
-	  /* Maybe some of the modules which were loaded use TLS.
-	     Since it will be removed in the following _dl_close call
-	     we have to mark the dtv array as having gaps to fill the
-	     holes.  This is a pessimistic assumption which won't hurt
-	     if not true.  There is no need to do this when we are
-	     loading the auditing DSOs since TLS has not yet been set
-	     up.  */
-	  if ((mode & __RTLD_AUDIT) == 0)
-	    GL(dl_tls_dtv_gaps) = true;
-
 	  _dl_close_worker (args.map, true);
 
 	  /* All l_nodelete_pending objects should have been deleted
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index c4466bd9fc..b0257185e9 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -187,10 +187,7 @@ _dl_next_tls_modid (void)
 size_t
 _dl_count_modids (void)
 {
-  /* It is rare that we have gaps; see elf/dl-open.c (_dl_open) where
-     we fail to load a module and unload it leaving a gap.  If we don't
-     have gaps then the number of modids is the current maximum so
-     return that.  */
+  /* The count is the max unless dlclose or failed dlopen created gaps.  */
   if (__glibc_likely (!GL(dl_tls_dtv_gaps)))
     return GL(dl_tls_max_dtv_idx);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 09/14] x86_64: Avoid lazy relocation of tlsdesc [BZ #27137]
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
                   ` (7 preceding siblings ...)
  2021-04-13  8:20 ` [PATCH v2 08/14] elf: Fix DTV gap reuse logic [BZ #27135] Szabolcs Nagy
@ 2021-04-13  8:20 ` Szabolcs Nagy
  2021-04-13 14:02   ` H.J. Lu
  2021-04-13  8:20 ` [PATCH v2 10/14] i386: " Szabolcs Nagy
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:20 UTC (permalink / raw)
  To: libc-alpha

Lazy tlsdesc relocation is racy because the static tls optimization and
tlsdesc management operations are done without holding the dlopen lock.

This similar to the commit b7cf203b5c17dd6d9878537d41e0c7cc3d270a67
for aarch64, but it fixes a different race: bug 27137.

Another issue is that ld auditing ignores DT_BIND_NOW and thus tries to
relocate tlsdesc lazily, but that does not work in a BIND_NOW module
due to missing DT_TLSDESC_PLT. Unconditionally relocating tlsdesc at
load time fixes this bug 27721 too.

--
v2:
- mention the ldaudit issue with bindnow and tlsdesc.
---
 sysdeps/x86_64/dl-machine.h | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index 103eee6c3f..9a876a371e 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -570,12 +570,21 @@ elf_machine_lazy_rel (struct link_map *map,
     }
   else if (__glibc_likely (r_type == R_X86_64_TLSDESC))
     {
-      struct tlsdesc volatile * __attribute__((__unused__)) td =
-	(struct tlsdesc volatile *)reloc_addr;
+      const Elf_Symndx symndx = ELFW (R_SYM) (reloc->r_info);
+      const ElfW (Sym) *symtab = (const void *)D_PTR (map, l_info[DT_SYMTAB]);
+      const ElfW (Sym) *sym = &symtab[symndx];
+      const struct r_found_version *version = NULL;
 
-      td->arg = (void*)reloc;
-      td->entry = (void*)(D_PTR (map, l_info[ADDRIDX (DT_TLSDESC_PLT)])
-			  + map->l_addr);
+      if (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
+	{
+	  const ElfW (Half) *vernum =
+	    (const void *)D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
+	  version = &map->l_versions[vernum[symndx] & 0x7fff];
+	}
+
+      /* Always initialize TLS descriptors completely at load time, in
+	 case static TLS is allocated for it that requires locking.  */
+      elf_machine_rela (map, reloc, sym, version, reloc_addr, skip_ifunc);
     }
   else if (__glibc_unlikely (r_type == R_X86_64_IRELATIVE))
     {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 10/14] i386: Avoid lazy relocation of tlsdesc [BZ #27137]
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
                   ` (8 preceding siblings ...)
  2021-04-13  8:20 ` [PATCH v2 09/14] x86_64: Avoid lazy relocation of tlsdesc [BZ #27137] Szabolcs Nagy
@ 2021-04-13  8:20 ` Szabolcs Nagy
  2021-04-13 14:02   ` H.J. Lu
  2021-04-13  8:21 ` [PATCH v2 11/14] x86_64: Remove lazy tlsdesc relocation related code Szabolcs Nagy
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:20 UTC (permalink / raw)
  To: libc-alpha

Lazy tlsdesc relocation is racy because the static tls optimization and
tlsdesc management operations are done without holding the dlopen lock.

This similar to the commit b7cf203b5c17dd6d9878537d41e0c7cc3d270a67
for aarch64, but it fixes a different race: bug 27137.

On i386 the code is a bit more complicated than on x86_64 because both
rel and rela relocs are supported.
---
 sysdeps/i386/dl-machine.h | 76 ++++++++++++++++++---------------------
 1 file changed, 34 insertions(+), 42 deletions(-)

diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
index 23e9cc3bfb..590b41d8d7 100644
--- a/sysdeps/i386/dl-machine.h
+++ b/sysdeps/i386/dl-machine.h
@@ -688,50 +688,32 @@ elf_machine_lazy_rel (struct link_map *map,
     }
   else if (__glibc_likely (r_type == R_386_TLS_DESC))
     {
-      struct tlsdesc volatile * __attribute__((__unused__)) td =
-	(struct tlsdesc volatile *)reloc_addr;
-
-      /* Handle relocations that reference the local *ABS* in a simple
-	 way, so as to preserve a potential addend.  */
-      if (ELF32_R_SYM (reloc->r_info) == 0)
-	td->entry = _dl_tlsdesc_resolve_abs_plus_addend;
-      /* Given a known-zero addend, we can store a pointer to the
-	 reloc in the arg position.  */
-      else if (td->arg == 0)
-	{
-	  td->arg = (void*)reloc;
-	  td->entry = _dl_tlsdesc_resolve_rel;
-	}
-      else
-	{
-	  /* We could handle non-*ABS* relocations with non-zero addends
-	     by allocating dynamically an arg to hold a pointer to the
-	     reloc, but that sounds pointless.  */
-	  const Elf32_Rel *const r = reloc;
-	  /* The code below was borrowed from elf_dynamic_do_rel().  */
-	  const ElfW(Sym) *const symtab =
-	    (const void *) D_PTR (map, l_info[DT_SYMTAB]);
+      const Elf32_Rel *const r = reloc;
+      /* The code below was borrowed from elf_dynamic_do_rel().  */
+      const ElfW(Sym) *const symtab =
+	(const void *) D_PTR (map, l_info[DT_SYMTAB]);
 
+      /* Always initialize TLS descriptors completely at load time, in
+	 case static TLS is allocated for it that requires locking.  */
 # ifdef RTLD_BOOTSTRAP
-	  /* The dynamic linker always uses versioning.  */
-	  assert (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL);
+      /* The dynamic linker always uses versioning.  */
+      assert (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL);
 # else
-	  if (map->l_info[VERSYMIDX (DT_VERSYM)])
+      if (map->l_info[VERSYMIDX (DT_VERSYM)])
 # endif
-	    {
-	      const ElfW(Half) *const version =
-		(const void *) D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
-	      ElfW(Half) ndx = version[ELFW(R_SYM) (r->r_info)] & 0x7fff;
-	      elf_machine_rel (map, r, &symtab[ELFW(R_SYM) (r->r_info)],
-			       &map->l_versions[ndx],
-			       (void *) (l_addr + r->r_offset), skip_ifunc);
-	    }
+	{
+	  const ElfW(Half) *const version =
+	    (const void *) D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
+	  ElfW(Half) ndx = version[ELFW(R_SYM) (r->r_info)] & 0x7fff;
+	  elf_machine_rel (map, r, &symtab[ELFW(R_SYM) (r->r_info)],
+			   &map->l_versions[ndx],
+			   (void *) (l_addr + r->r_offset), skip_ifunc);
+	}
 # ifndef RTLD_BOOTSTRAP
-	  else
-	    elf_machine_rel (map, r, &symtab[ELFW(R_SYM) (r->r_info)], NULL,
-			     (void *) (l_addr + r->r_offset), skip_ifunc);
+      else
+	elf_machine_rel (map, r, &symtab[ELFW(R_SYM) (r->r_info)], NULL,
+			 (void *) (l_addr + r->r_offset), skip_ifunc);
 # endif
-	}
     }
   else if (__glibc_unlikely (r_type == R_386_IRELATIVE))
     {
@@ -758,11 +740,21 @@ elf_machine_lazy_rela (struct link_map *map,
     ;
   else if (__glibc_likely (r_type == R_386_TLS_DESC))
     {
-      struct tlsdesc volatile * __attribute__((__unused__)) td =
-	(struct tlsdesc volatile *)reloc_addr;
+      const Elf_Symndx symndx = ELFW (R_SYM) (reloc->r_info);
+      const ElfW (Sym) *symtab = (const void *)D_PTR (map, l_info[DT_SYMTAB]);
+      const ElfW (Sym) *sym = &symtab[symndx];
+      const struct r_found_version *version = NULL;
+
+      if (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
+	{
+	  const ElfW (Half) *vernum =
+	    (const void *)D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
+	  version = &map->l_versions[vernum[symndx] & 0x7fff];
+	}
 
-      td->arg = (void*)reloc;
-      td->entry = _dl_tlsdesc_resolve_rela;
+      /* Always initialize TLS descriptors completely at load time, in
+	 case static TLS is allocated for it that requires locking.  */
+      elf_machine_rela (map, reloc, sym, version, reloc_addr, skip_ifunc);
     }
   else if (__glibc_unlikely (r_type == R_386_IRELATIVE))
     {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 11/14] x86_64: Remove lazy tlsdesc relocation related code
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
                   ` (9 preceding siblings ...)
  2021-04-13  8:20 ` [PATCH v2 10/14] i386: " Szabolcs Nagy
@ 2021-04-13  8:21 ` Szabolcs Nagy
  2021-04-13 14:03   ` H.J. Lu
  2021-04-13  8:21 ` [PATCH v2 12/14] i386: " Szabolcs Nagy
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:21 UTC (permalink / raw)
  To: libc-alpha

_dl_tlsdesc_resolve_rela and _dl_tlsdesc_resolve_hold are only used for
lazy tlsdesc relocation processing which is no longer supported.

--
v2:
- fix elf_machine_runtime_setup: remove tlsdesc got setup.
---
 sysdeps/x86_64/dl-machine.h |   4 --
 sysdeps/x86_64/dl-tlsdesc.S | 104 ----------------------------------
 sysdeps/x86_64/dl-tlsdesc.h |   4 +-
 sysdeps/x86_64/tlsdesc.c    | 109 +-----------------------------------
 4 files changed, 2 insertions(+), 219 deletions(-)

diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index 9a876a371e..a8596aa3fa 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -127,10 +127,6 @@ elf_machine_runtime_setup (struct link_map *l, int lazy, int profile)
 	}
     }
 
-  if (l->l_info[ADDRIDX (DT_TLSDESC_GOT)] && lazy)
-    *(ElfW(Addr)*)(D_PTR (l, l_info[ADDRIDX (DT_TLSDESC_GOT)]) + l->l_addr)
-      = (ElfW(Addr)) &_dl_tlsdesc_resolve_rela;
-
   return lazy;
 }
 
diff --git a/sysdeps/x86_64/dl-tlsdesc.S b/sysdeps/x86_64/dl-tlsdesc.S
index 1d055adadd..ca9236bed8 100644
--- a/sysdeps/x86_64/dl-tlsdesc.S
+++ b/sysdeps/x86_64/dl-tlsdesc.S
@@ -144,107 +144,3 @@ _dl_tlsdesc_dynamic:
 	cfi_endproc
 	.size	_dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
 #endif /* SHARED */
-
-     /* This function is a wrapper for a lazy resolver for TLS_DESC
-	RELA relocations.  The incoming 0(%rsp) points to the caller's
-	link map, pushed by the dynamic object's internal lazy TLS
-	resolver front-end before tail-calling us.  We need to pop it
-	ourselves.  %rax points to a TLS descriptor, such that 0(%rax)
-	holds the address of the internal resolver front-end (unless
-	some other thread beat us to resolving it) and 8(%rax) holds a
-	pointer to the relocation.
-
-	When the actual resolver returns, it will have adjusted the
-	TLS descriptor such that we can tail-call it for it to return
-	the TP offset of the symbol.  */
-
-	.hidden _dl_tlsdesc_resolve_rela
-	.global	_dl_tlsdesc_resolve_rela
-	.type	_dl_tlsdesc_resolve_rela,@function
-	cfi_startproc
-	.align 16
-	/* The PLT entry will have pushed the link_map pointer.  */
-_dl_tlsdesc_resolve_rela:
-	_CET_ENDBR
-	cfi_adjust_cfa_offset (8)
-	/* Save all call-clobbered registers.  Add 8 bytes for push in
-	   the PLT entry to align the stack.  */
-	subq	$80, %rsp
-	cfi_adjust_cfa_offset (80)
-	movq	%rax, (%rsp)
-	movq	%rdi, 8(%rsp)
-	movq	%rax, %rdi	/* Pass tlsdesc* in %rdi.  */
-	movq	%rsi, 16(%rsp)
-	movq	80(%rsp), %rsi	/* Pass link_map* in %rsi.  */
-	movq	%r8, 24(%rsp)
-	movq	%r9, 32(%rsp)
-	movq	%r10, 40(%rsp)
-	movq	%r11, 48(%rsp)
-	movq	%rdx, 56(%rsp)
-	movq	%rcx, 64(%rsp)
-	call	_dl_tlsdesc_resolve_rela_fixup
-	movq	(%rsp), %rax
-	movq	8(%rsp), %rdi
-	movq	16(%rsp), %rsi
-	movq	24(%rsp), %r8
-	movq	32(%rsp), %r9
-	movq	40(%rsp), %r10
-	movq	48(%rsp), %r11
-	movq	56(%rsp), %rdx
-	movq	64(%rsp), %rcx
-	addq	$88, %rsp
-	cfi_adjust_cfa_offset (-88)
-	jmp	*(%rax)
-	cfi_endproc
-	.size	_dl_tlsdesc_resolve_rela, .-_dl_tlsdesc_resolve_rela
-
-     /* This function is a placeholder for lazy resolving of TLS
-	relocations.  Once some thread starts resolving a TLS
-	relocation, it sets up the TLS descriptor to use this
-	resolver, such that other threads that would attempt to
-	resolve it concurrently may skip the call to the original lazy
-	resolver and go straight to a condition wait.
-
-	When the actual resolver returns, it will have adjusted the
-	TLS descriptor such that we can tail-call it for it to return
-	the TP offset of the symbol.  */
-
-	.hidden _dl_tlsdesc_resolve_hold
-	.global	_dl_tlsdesc_resolve_hold
-	.type	_dl_tlsdesc_resolve_hold,@function
-	cfi_startproc
-	.align 16
-_dl_tlsdesc_resolve_hold:
-0:
-	_CET_ENDBR
-	/* Save all call-clobbered registers.  */
-	subq	$72, %rsp
-	cfi_adjust_cfa_offset (72)
-	movq	%rax, (%rsp)
-	movq	%rdi, 8(%rsp)
-	movq	%rax, %rdi	/* Pass tlsdesc* in %rdi.  */
-	movq	%rsi, 16(%rsp)
-	/* Pass _dl_tlsdesc_resolve_hold's address in %rsi.  */
-	leaq	. - _dl_tlsdesc_resolve_hold(%rip), %rsi
-	movq	%r8, 24(%rsp)
-	movq	%r9, 32(%rsp)
-	movq	%r10, 40(%rsp)
-	movq	%r11, 48(%rsp)
-	movq	%rdx, 56(%rsp)
-	movq	%rcx, 64(%rsp)
-	call	_dl_tlsdesc_resolve_hold_fixup
-1:
-	movq	(%rsp), %rax
-	movq	8(%rsp), %rdi
-	movq	16(%rsp), %rsi
-	movq	24(%rsp), %r8
-	movq	32(%rsp), %r9
-	movq	40(%rsp), %r10
-	movq	48(%rsp), %r11
-	movq	56(%rsp), %rdx
-	movq	64(%rsp), %rcx
-	addq	$72, %rsp
-	cfi_adjust_cfa_offset (-72)
-	jmp	*(%rax)
-	cfi_endproc
-	.size	_dl_tlsdesc_resolve_hold, .-_dl_tlsdesc_resolve_hold
diff --git a/sysdeps/x86_64/dl-tlsdesc.h b/sysdeps/x86_64/dl-tlsdesc.h
index d134b3f4db..03d5ac7a54 100644
--- a/sysdeps/x86_64/dl-tlsdesc.h
+++ b/sysdeps/x86_64/dl-tlsdesc.h
@@ -55,9 +55,7 @@ struct tlsdesc_dynamic_arg
 
 extern ptrdiff_t attribute_hidden
   _dl_tlsdesc_return(struct tlsdesc *on_rax),
-  _dl_tlsdesc_undefweak(struct tlsdesc *on_rax),
-  _dl_tlsdesc_resolve_rela(struct tlsdesc *on_rax),
-  _dl_tlsdesc_resolve_hold(struct tlsdesc *on_rax);
+  _dl_tlsdesc_undefweak(struct tlsdesc *on_rax);
 
 # ifdef SHARED
 extern void *_dl_make_tlsdesc_dynamic (struct link_map *map,
diff --git a/sysdeps/x86_64/tlsdesc.c b/sysdeps/x86_64/tlsdesc.c
index 4083849f22..ecf864d6ee 100644
--- a/sysdeps/x86_64/tlsdesc.c
+++ b/sysdeps/x86_64/tlsdesc.c
@@ -16,120 +16,13 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <link.h>
 #include <ldsodefs.h>
-#include <elf/dynamic-link.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
 #include <dl-unmap-segments.h>
+#define _dl_tlsdesc_resolve_hold 0
 #include <tlsdeschtab.h>
 
-/* The following 2 functions take a caller argument, that contains the
-   address expected to be in the TLS descriptor.  If it's changed, we
-   want to return immediately.  */
-
-/* This function is used to lazily resolve TLS_DESC RELA relocations.
-   The argument location is used to hold a pointer to the relocation.  */
-
-void
-attribute_hidden
-_dl_tlsdesc_resolve_rela_fixup (struct tlsdesc volatile *td,
-				struct link_map *l)
-{
-  const ElfW(Rela) *reloc = td->arg;
-
-  if (_dl_tlsdesc_resolve_early_return_p
-      (td, (void*)(D_PTR (l, l_info[ADDRIDX (DT_TLSDESC_PLT)]) + l->l_addr)))
-    return;
-
-  /* The code below was borrowed from _dl_fixup().  */
-  const ElfW(Sym) *const symtab
-    = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
-  const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
-  const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
-  lookup_t result;
-
-   /* Look up the target symbol.  If the normal lookup rules are not
-      used don't look in the global scope.  */
-  if (ELFW(ST_BIND) (sym->st_info) != STB_LOCAL
-      && __builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
-    {
-      const struct r_found_version *version = NULL;
-
-      if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
-	{
-	  const ElfW(Half) *vernum =
-	    (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
-	  ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
-	  version = &l->l_versions[ndx];
-	  if (version->hash == 0)
-	    version = NULL;
-	}
-
-      result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym,
-				    l->l_scope, version, ELF_RTYPE_CLASS_PLT,
-				    DL_LOOKUP_ADD_DEPENDENCY, NULL);
-    }
-  else
-    {
-      /* We already found the symbol.  The module (and therefore its load
-	 address) is also known.  */
-      result = l;
-    }
-
-  if (! sym)
-    {
-      td->arg = (void*)reloc->r_addend;
-      td->entry = _dl_tlsdesc_undefweak;
-    }
-  else
-    {
-#  ifndef SHARED
-      CHECK_STATIC_TLS (l, result);
-#  else
-      if (!TRY_STATIC_TLS (l, result))
-	{
-	  td->arg = _dl_make_tlsdesc_dynamic (result, sym->st_value
-					      + reloc->r_addend);
-	  td->entry = _dl_tlsdesc_dynamic;
-	}
-      else
-#  endif
-	{
-	  td->arg = (void*)(sym->st_value - result->l_tls_offset
-			    + reloc->r_addend);
-	  td->entry = _dl_tlsdesc_return;
-	}
-    }
-
-  _dl_tlsdesc_wake_up_held_fixups ();
-}
-
-/* This function is used to avoid busy waiting for other threads to
-   complete the lazy relocation.  Once another thread wins the race to
-   relocate a TLS descriptor, it sets the descriptor up such that this
-   function is called to wait until the resolver releases the
-   lock.  */
-
-void
-attribute_hidden
-_dl_tlsdesc_resolve_hold_fixup (struct tlsdesc volatile *td,
-				void *caller)
-{
-  /* Maybe we're lucky and can return early.  */
-  if (caller != td->entry)
-    return;
-
-  /* Locking here will stop execution until the running resolver runs
-     _dl_tlsdesc_wake_up_held_fixups(), releasing the lock.
-
-     FIXME: We'd be better off waiting on a condition variable, such
-     that we didn't have to hold the lock throughout the relocation
-     processing.  */
-  __rtld_lock_lock_recursive (GL(dl_load_lock));
-  __rtld_lock_unlock_recursive (GL(dl_load_lock));
-}
-
 /* Unmap the dynamic object, but also release its TLS descriptor table
    if there is one.  */
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 12/14] i386: Remove lazy tlsdesc relocation related code
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
                   ` (10 preceding siblings ...)
  2021-04-13  8:21 ` [PATCH v2 11/14] x86_64: Remove lazy tlsdesc relocation related code Szabolcs Nagy
@ 2021-04-13  8:21 ` Szabolcs Nagy
  2021-04-13 14:04   ` H.J. Lu
  2021-04-13  8:21 ` [PATCH v2 13/14] elf: " Szabolcs Nagy
  2021-04-13  8:21 ` [PATCH v2 14/14] RFC elf: Fix slow tls access after dlopen [BZ #19924] Szabolcs Nagy
  13 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:21 UTC (permalink / raw)
  To: libc-alpha

Like in commit e75711ebfa976d5468ec292282566a18b07e4d67 for x86_64,
remove unused lazy tlsdesc relocation processing code:

  _dl_tlsdesc_resolve_abs_plus_addend
  _dl_tlsdesc_resolve_rel
  _dl_tlsdesc_resolve_rela
  _dl_tlsdesc_resolve_hold
---
 sysdeps/i386/dl-tlsdesc.S | 156 -------------------------
 sysdeps/i386/dl-tlsdesc.h |   6 +-
 sysdeps/i386/tlsdesc.c    | 231 +-------------------------------------
 3 files changed, 2 insertions(+), 391 deletions(-)

diff --git a/sysdeps/i386/dl-tlsdesc.S b/sysdeps/i386/dl-tlsdesc.S
index e781d973b7..255fe88651 100644
--- a/sysdeps/i386/dl-tlsdesc.S
+++ b/sysdeps/i386/dl-tlsdesc.S
@@ -134,159 +134,3 @@ _dl_tlsdesc_dynamic:
 	cfi_endproc
 	.size	_dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
 #endif /* SHARED */
-
-     /* This function is a wrapper for a lazy resolver for TLS_DESC
-	REL relocations that reference the *ABS* segment in their own
-	link maps.  %ebx points to the caller's GOT.  %eax points to a
-	TLS descriptor, such that 0(%eax) holds the address of the
-	resolver wrapper itself (unless some other thread beat us to
-	it) and 4(%eax) holds the addend in the relocation.
-
-	When the actual resolver returns, it will have adjusted the
-	TLS descriptor such that we can tail-call it for it to return
-	the TP offset of the symbol.  */
-
-	.hidden _dl_tlsdesc_resolve_abs_plus_addend
-	.global	_dl_tlsdesc_resolve_abs_plus_addend
-	.type	_dl_tlsdesc_resolve_abs_plus_addend,@function
-	cfi_startproc
-	.align 16
-_dl_tlsdesc_resolve_abs_plus_addend:
-0:
-	_CET_ENDBR
-	pushl	%eax
-	cfi_adjust_cfa_offset (4)
-	pushl	%ecx
-	cfi_adjust_cfa_offset (4)
-	pushl	%edx
-	cfi_adjust_cfa_offset (4)
-	movl	$1f - 0b, %ecx
-	movl	4(%ebx), %edx
-	call	_dl_tlsdesc_resolve_abs_plus_addend_fixup
-1:
-	popl	%edx
-	cfi_adjust_cfa_offset (-4)
-	popl	%ecx
-	cfi_adjust_cfa_offset (-4)
-	popl	%eax
-	cfi_adjust_cfa_offset (-4)
-	jmp	*(%eax)
-	cfi_endproc
-	.size	_dl_tlsdesc_resolve_abs_plus_addend, .-_dl_tlsdesc_resolve_abs_plus_addend
-
-     /* This function is a wrapper for a lazy resolver for TLS_DESC
-	REL relocations that had zero addends.  %ebx points to the
-	caller's GOT.  %eax points to a TLS descriptor, such that
-	0(%eax) holds the address of the resolver wrapper itself
-	(unless some other thread beat us to it) and 4(%eax) holds a
-	pointer to the relocation.
-
-	When the actual resolver returns, it will have adjusted the
-	TLS descriptor such that we can tail-call it for it to return
-	the TP offset of the symbol.  */
-
-	.hidden _dl_tlsdesc_resolve_rel
-	.global	_dl_tlsdesc_resolve_rel
-	.type	_dl_tlsdesc_resolve_rel,@function
-	cfi_startproc
-	.align 16
-_dl_tlsdesc_resolve_rel:
-0:
-	_CET_ENDBR
-	pushl	%eax
-	cfi_adjust_cfa_offset (4)
-	pushl	%ecx
-	cfi_adjust_cfa_offset (4)
-	pushl	%edx
-	cfi_adjust_cfa_offset (4)
-	movl	$1f - 0b, %ecx
-	movl	4(%ebx), %edx
-	call	_dl_tlsdesc_resolve_rel_fixup
-1:
-	popl	%edx
-	cfi_adjust_cfa_offset (-4)
-	popl	%ecx
-	cfi_adjust_cfa_offset (-4)
-	popl	%eax
-	cfi_adjust_cfa_offset (-4)
-	jmp	*(%eax)
-	cfi_endproc
-	.size	_dl_tlsdesc_resolve_rel, .-_dl_tlsdesc_resolve_rel
-
-     /* This function is a wrapper for a lazy resolver for TLS_DESC
-	RELA relocations.  %ebx points to the caller's GOT.  %eax
-	points to a TLS descriptor, such that 0(%eax) holds the
-	address of the resolver wrapper itself (unless some other
-	thread beat us to it) and 4(%eax) holds a pointer to the
-	relocation.
-
-	When the actual resolver returns, it will have adjusted the
-	TLS descriptor such that we can tail-call it for it to return
-	the TP offset of the symbol.  */
-
-	.hidden _dl_tlsdesc_resolve_rela
-	.global	_dl_tlsdesc_resolve_rela
-	.type	_dl_tlsdesc_resolve_rela,@function
-	cfi_startproc
-	.align 16
-_dl_tlsdesc_resolve_rela:
-0:
-	_CET_ENDBR
-	pushl	%eax
-	cfi_adjust_cfa_offset (4)
-	pushl	%ecx
-	cfi_adjust_cfa_offset (4)
-	pushl	%edx
-	cfi_adjust_cfa_offset (4)
-	movl	$1f - 0b, %ecx
-	movl	4(%ebx), %edx
-	call	_dl_tlsdesc_resolve_rela_fixup
-1:
-	popl	%edx
-	cfi_adjust_cfa_offset (-4)
-	popl	%ecx
-	cfi_adjust_cfa_offset (-4)
-	popl	%eax
-	cfi_adjust_cfa_offset (-4)
-	jmp	*(%eax)
-	cfi_endproc
-	.size	_dl_tlsdesc_resolve_rela, .-_dl_tlsdesc_resolve_rela
-
-     /* This function is a placeholder for lazy resolving of TLS
-	relocations.  Once some thread starts resolving a TLS
-	relocation, it sets up the TLS descriptor to use this
-	resolver, such that other threads that would attempt to
-	resolve it concurrently may skip the call to the original lazy
-	resolver and go straight to a condition wait.
-
-	When the actual resolver returns, it will have adjusted the
-	TLS descriptor such that we can tail-call it for it to return
-	the TP offset of the symbol.  */
-
-	.hidden _dl_tlsdesc_resolve_hold
-	.global	_dl_tlsdesc_resolve_hold
-	.type	_dl_tlsdesc_resolve_hold,@function
-	cfi_startproc
-	.align 16
-_dl_tlsdesc_resolve_hold:
-0:
-	_CET_ENDBR
-	pushl	%eax
-	cfi_adjust_cfa_offset (4)
-	pushl	%ecx
-	cfi_adjust_cfa_offset (4)
-	pushl	%edx
-	cfi_adjust_cfa_offset (4)
-	movl	$1f - 0b, %ecx
-	movl	4(%ebx), %edx
-	call	_dl_tlsdesc_resolve_hold_fixup
-1:
-	popl	%edx
-	cfi_adjust_cfa_offset (-4)
-	popl	%ecx
-	cfi_adjust_cfa_offset (-4)
-	popl	%eax
-	cfi_adjust_cfa_offset (-4)
-	jmp	*(%eax)
-	cfi_endproc
-	.size	_dl_tlsdesc_resolve_hold, .-_dl_tlsdesc_resolve_hold
diff --git a/sysdeps/i386/dl-tlsdesc.h b/sysdeps/i386/dl-tlsdesc.h
index 753c03e79c..12e90da3a8 100644
--- a/sysdeps/i386/dl-tlsdesc.h
+++ b/sysdeps/i386/dl-tlsdesc.h
@@ -43,11 +43,7 @@ struct tlsdesc_dynamic_arg
 
 extern ptrdiff_t attribute_hidden __attribute__ ((regparm (1)))
   _dl_tlsdesc_return (struct tlsdesc *),
-  _dl_tlsdesc_undefweak (struct tlsdesc *),
-  _dl_tlsdesc_resolve_abs_plus_addend (struct tlsdesc *),
-  _dl_tlsdesc_resolve_rel (struct tlsdesc *),
-  _dl_tlsdesc_resolve_rela (struct tlsdesc *),
-  _dl_tlsdesc_resolve_hold (struct tlsdesc *);
+  _dl_tlsdesc_undefweak (struct tlsdesc *);
 
 # ifdef SHARED
 extern void *_dl_make_tlsdesc_dynamic (struct link_map *map,
diff --git a/sysdeps/i386/tlsdesc.c b/sysdeps/i386/tlsdesc.c
index 0bc646541f..436a21f66b 100644
--- a/sysdeps/i386/tlsdesc.c
+++ b/sysdeps/i386/tlsdesc.c
@@ -16,242 +16,13 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <link.h>
 #include <ldsodefs.h>
-#include <elf/dynamic-link.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
 #include <dl-unmap-segments.h>
+#define _dl_tlsdesc_resolve_hold 0
 #include <tlsdeschtab.h>
 
-/* The following 4 functions take an entry_check_offset argument.
-   It's computed by the caller as an offset between its entry point
-   and the call site, such that by adding the built-in return address
-   that is implicitly passed to the function with this offset, we can
-   easily obtain the caller's entry point to compare with the entry
-   point given in the TLS descriptor.  If it's changed, we want to
-   return immediately.  */
-
-/* This function is used to lazily resolve TLS_DESC REL relocations
-   that reference the *ABS* segment in their own link maps.  The
-   argument is the addend originally stored there.  */
-
-void
-__attribute__ ((regparm (3))) attribute_hidden
-_dl_tlsdesc_resolve_abs_plus_addend_fixup (struct tlsdesc volatile *td,
-					   struct link_map *l,
-					   ptrdiff_t entry_check_offset)
-{
-  ptrdiff_t addend = (ptrdiff_t) td->arg;
-
-  if (_dl_tlsdesc_resolve_early_return_p (td, __builtin_return_address (0)
-					  - entry_check_offset))
-    return;
-
-#ifndef SHARED
-  CHECK_STATIC_TLS (l, l);
-#else
-  if (!TRY_STATIC_TLS (l, l))
-    {
-      td->arg = _dl_make_tlsdesc_dynamic (l, addend);
-      td->entry = _dl_tlsdesc_dynamic;
-    }
-  else
-#endif
-    {
-      td->arg = (void*) (addend - l->l_tls_offset);
-      td->entry = _dl_tlsdesc_return;
-    }
-
-  _dl_tlsdesc_wake_up_held_fixups ();
-}
-
-/* This function is used to lazily resolve TLS_DESC REL relocations
-   that originally had zero addends.  The argument location, that
-   originally held the addend, is used to hold a pointer to the
-   relocation, but it has to be restored before we call the function
-   that applies relocations.  */
-
-void
-__attribute__ ((regparm (3))) attribute_hidden
-_dl_tlsdesc_resolve_rel_fixup (struct tlsdesc volatile *td,
-			       struct link_map *l,
-			       ptrdiff_t entry_check_offset)
-{
-  const ElfW(Rel) *reloc = td->arg;
-
-  if (_dl_tlsdesc_resolve_early_return_p (td, __builtin_return_address (0)
-					  - entry_check_offset))
-    return;
-
-  /* The code below was borrowed from _dl_fixup(),
-     except for checking for STB_LOCAL.  */
-  const ElfW(Sym) *const symtab
-    = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
-  const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
-  const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
-  lookup_t result;
-
-   /* Look up the target symbol.  If the normal lookup rules are not
-      used don't look in the global scope.  */
-  if (ELFW(ST_BIND) (sym->st_info) != STB_LOCAL
-      && __builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
-    {
-      const struct r_found_version *version = NULL;
-
-      if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
-	{
-	  const ElfW(Half) *vernum =
-	    (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
-	  ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
-	  version = &l->l_versions[ndx];
-	  if (version->hash == 0)
-	    version = NULL;
-	}
-
-      result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym,
-				    l->l_scope, version, ELF_RTYPE_CLASS_PLT,
-				    DL_LOOKUP_ADD_DEPENDENCY, NULL);
-    }
-  else
-    {
-      /* We already found the symbol.  The module (and therefore its load
-	 address) is also known.  */
-      result = l;
-    }
-
-  if (!sym)
-    {
-      td->arg = 0;
-      td->entry = _dl_tlsdesc_undefweak;
-    }
-  else
-    {
-#  ifndef SHARED
-      CHECK_STATIC_TLS (l, result);
-#  else
-      if (!TRY_STATIC_TLS (l, result))
-	{
-	  td->arg = _dl_make_tlsdesc_dynamic (result, sym->st_value);
-	  td->entry = _dl_tlsdesc_dynamic;
-	}
-      else
-#  endif
-	{
-	  td->arg = (void*)(sym->st_value - result->l_tls_offset);
-	  td->entry = _dl_tlsdesc_return;
-	}
-    }
-
-  _dl_tlsdesc_wake_up_held_fixups ();
-}
-
-/* This function is used to lazily resolve TLS_DESC RELA relocations.
-   The argument location is used to hold a pointer to the relocation.  */
-
-void
-__attribute__ ((regparm (3))) attribute_hidden
-_dl_tlsdesc_resolve_rela_fixup (struct tlsdesc volatile *td,
-				struct link_map *l,
-				ptrdiff_t entry_check_offset)
-{
-  const ElfW(Rela) *reloc = td->arg;
-
-  if (_dl_tlsdesc_resolve_early_return_p (td, __builtin_return_address (0)
-					  - entry_check_offset))
-    return;
-
-  /* The code below was borrowed from _dl_fixup(),
-     except for checking for STB_LOCAL.  */
-  const ElfW(Sym) *const symtab
-    = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
-  const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
-  const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
-  lookup_t result;
-
-   /* Look up the target symbol.  If the normal lookup rules are not
-      used don't look in the global scope.  */
-  if (ELFW(ST_BIND) (sym->st_info) != STB_LOCAL
-      && __builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
-    {
-      const struct r_found_version *version = NULL;
-
-      if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
-	{
-	  const ElfW(Half) *vernum =
-	    (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
-	  ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
-	  version = &l->l_versions[ndx];
-	  if (version->hash == 0)
-	    version = NULL;
-	}
-
-      result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym,
-				    l->l_scope, version, ELF_RTYPE_CLASS_PLT,
-				    DL_LOOKUP_ADD_DEPENDENCY, NULL);
-    }
-  else
-    {
-      /* We already found the symbol.  The module (and therefore its load
-	 address) is also known.  */
-      result = l;
-    }
-
-  if (!sym)
-    {
-      td->arg = (void*) reloc->r_addend;
-      td->entry = _dl_tlsdesc_undefweak;
-    }
-  else
-    {
-#  ifndef SHARED
-      CHECK_STATIC_TLS (l, result);
-#  else
-      if (!TRY_STATIC_TLS (l, result))
-	{
-	  td->arg = _dl_make_tlsdesc_dynamic (result, sym->st_value
-					      + reloc->r_addend);
-	  td->entry = _dl_tlsdesc_dynamic;
-	}
-      else
-#  endif
-	{
-	  td->arg = (void*) (sym->st_value - result->l_tls_offset
-			     + reloc->r_addend);
-	  td->entry = _dl_tlsdesc_return;
-	}
-    }
-
-  _dl_tlsdesc_wake_up_held_fixups ();
-}
-
-/* This function is used to avoid busy waiting for other threads to
-   complete the lazy relocation.  Once another thread wins the race to
-   relocate a TLS descriptor, it sets the descriptor up such that this
-   function is called to wait until the resolver releases the
-   lock.  */
-
-void
-__attribute__ ((regparm (3))) attribute_hidden
-_dl_tlsdesc_resolve_hold_fixup (struct tlsdesc volatile *td,
-				struct link_map *l __attribute__((__unused__)),
-				ptrdiff_t entry_check_offset)
-{
-  /* Maybe we're lucky and can return early.  */
-  if (__builtin_return_address (0) - entry_check_offset != td->entry)
-    return;
-
-  /* Locking here will stop execution until the running resolver runs
-     _dl_tlsdesc_wake_up_held_fixups(), releasing the lock.
-
-     FIXME: We'd be better off waiting on a condition variable, such
-     that we didn't have to hold the lock throughout the relocation
-     processing.  */
-  __rtld_lock_lock_recursive (GL(dl_load_lock));
-  __rtld_lock_unlock_recursive (GL(dl_load_lock));
-}
-
-
 /* Unmap the dynamic object, but also release its TLS descriptor table
    if there is one.  */
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 13/14] elf: Remove lazy tlsdesc relocation related code
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
                   ` (11 preceding siblings ...)
  2021-04-13  8:21 ` [PATCH v2 12/14] i386: " Szabolcs Nagy
@ 2021-04-13  8:21 ` Szabolcs Nagy
  2021-04-15 19:52   ` Adhemerval Zanella
  2021-04-13  8:21 ` [PATCH v2 14/14] RFC elf: Fix slow tls access after dlopen [BZ #19924] Szabolcs Nagy
  13 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:21 UTC (permalink / raw)
  To: libc-alpha

Remove generic tlsdesc code related to lazy tlsdesc processing since
lazy tlsdesc relocation is no longer supported.  This includes removing
GL(dl_load_lock) from _dl_make_tlsdesc_dynamic which is only called at
load time when that lock is already held.

Added a documentation comment too.
---
 elf/tlsdeschtab.h         | 53 +++++----------------------------------
 sysdeps/aarch64/tlsdesc.c |  1 -
 sysdeps/arm/tlsdesc.c     |  1 -
 sysdeps/i386/tlsdesc.c    |  1 -
 sysdeps/x86_64/tlsdesc.c  |  1 -
 5 files changed, 6 insertions(+), 51 deletions(-)

diff --git a/elf/tlsdeschtab.h b/elf/tlsdeschtab.h
index 21695fd1e9..85bd0415e1 100644
--- a/elf/tlsdeschtab.h
+++ b/elf/tlsdeschtab.h
@@ -78,6 +78,10 @@ map_generation (struct link_map *map)
   return GL(dl_tls_generation) + 1;
 }
 
+/* Returns the data pointer for a given map and tls offset that is used
+   to fill in one of the GOT entries referenced by a TLSDESC relocation
+   when using dynamic TLS.  This requires allocation, returns NULL on
+   allocation failure.  */
 void *
 _dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset)
 {
@@ -85,18 +89,12 @@ _dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset)
   void **entry;
   struct tlsdesc_dynamic_arg *td, test;
 
-  /* FIXME: We could use a per-map lock here, but is it worth it?  */
-  __rtld_lock_lock_recursive (GL(dl_load_lock));
-
   ht = map->l_mach.tlsdesc_table;
   if (! ht)
     {
       ht = htab_create ();
       if (! ht)
-	{
-	  __rtld_lock_unlock_recursive (GL(dl_load_lock));
-	  return 0;
-	}
+	return 0;
       map->l_mach.tlsdesc_table = ht;
     }
 
@@ -104,15 +102,11 @@ _dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset)
   test.tlsinfo.ti_offset = ti_offset;
   entry = htab_find_slot (ht, &test, 1, hash_tlsdesc, eq_tlsdesc);
   if (! entry)
-    {
-      __rtld_lock_unlock_recursive (GL(dl_load_lock));
-      return 0;
-    }
+    return 0;
 
   if (*entry)
     {
       td = *entry;
-      __rtld_lock_unlock_recursive (GL(dl_load_lock));
       return td;
     }
 
@@ -122,44 +116,9 @@ _dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset)
      thread.  */
   td->gen_count = map_generation (map);
   td->tlsinfo = test.tlsinfo;
-
-  __rtld_lock_unlock_recursive (GL(dl_load_lock));
   return td;
 }
 
 # endif /* SHARED */
 
-/* The idea of the following two functions is to stop multiple threads
-   from attempting to resolve the same TLS descriptor without busy
-   waiting.  Ideally, we should be able to release the lock right
-   after changing td->entry, and then using say a condition variable
-   or a futex wake to wake up any waiting threads, but let's try to
-   avoid introducing such dependencies.  */
-
-static int
-__attribute__ ((unused))
-_dl_tlsdesc_resolve_early_return_p (struct tlsdesc volatile *td, void *caller)
-{
-  if (caller != atomic_load_relaxed (&td->entry))
-    return 1;
-
-  __rtld_lock_lock_recursive (GL(dl_load_lock));
-  if (caller != atomic_load_relaxed (&td->entry))
-    {
-      __rtld_lock_unlock_recursive (GL(dl_load_lock));
-      return 1;
-    }
-
-  atomic_store_relaxed (&td->entry, _dl_tlsdesc_resolve_hold);
-
-  return 0;
-}
-
-static void
-__attribute__ ((unused))
-_dl_tlsdesc_wake_up_held_fixups (void)
-{
-  __rtld_lock_unlock_recursive (GL(dl_load_lock));
-}
-
 #endif
diff --git a/sysdeps/aarch64/tlsdesc.c b/sysdeps/aarch64/tlsdesc.c
index 9b6a8b0ec4..3c1bcd8fe6 100644
--- a/sysdeps/aarch64/tlsdesc.c
+++ b/sysdeps/aarch64/tlsdesc.c
@@ -22,7 +22,6 @@
 #include <tls.h>
 #include <dl-tlsdesc.h>
 #include <dl-unmap-segments.h>
-#define _dl_tlsdesc_resolve_hold 0
 #include <tlsdeschtab.h>
 
 /* Unmap the dynamic object, but also release its TLS descriptor table
diff --git a/sysdeps/arm/tlsdesc.c b/sysdeps/arm/tlsdesc.c
index 2f70adef5c..b3d1735af4 100644
--- a/sysdeps/arm/tlsdesc.c
+++ b/sysdeps/arm/tlsdesc.c
@@ -20,7 +20,6 @@
 #include <tls.h>
 #include <dl-tlsdesc.h>
 #include <dl-unmap-segments.h>
-#define _dl_tlsdesc_resolve_hold 0
 #include <tlsdeschtab.h>
 
 /* Unmap the dynamic object, but also release its TLS descriptor table
diff --git a/sysdeps/i386/tlsdesc.c b/sysdeps/i386/tlsdesc.c
index 436a21f66b..364c7769ce 100644
--- a/sysdeps/i386/tlsdesc.c
+++ b/sysdeps/i386/tlsdesc.c
@@ -20,7 +20,6 @@
 #include <tls.h>
 #include <dl-tlsdesc.h>
 #include <dl-unmap-segments.h>
-#define _dl_tlsdesc_resolve_hold 0
 #include <tlsdeschtab.h>
 
 /* Unmap the dynamic object, but also release its TLS descriptor table
diff --git a/sysdeps/x86_64/tlsdesc.c b/sysdeps/x86_64/tlsdesc.c
index ecf864d6ee..3bb80517a3 100644
--- a/sysdeps/x86_64/tlsdesc.c
+++ b/sysdeps/x86_64/tlsdesc.c
@@ -20,7 +20,6 @@
 #include <tls.h>
 #include <dl-tlsdesc.h>
 #include <dl-unmap-segments.h>
-#define _dl_tlsdesc_resolve_hold 0
 #include <tlsdeschtab.h>
 
 /* Unmap the dynamic object, but also release its TLS descriptor table
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 14/14] RFC elf: Fix slow tls access after dlopen [BZ #19924]
  2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
                   ` (12 preceding siblings ...)
  2021-04-13  8:21 ` [PATCH v2 13/14] elf: " Szabolcs Nagy
@ 2021-04-13  8:21 ` Szabolcs Nagy
  2022-09-16  9:54   ` Carlos O'Donell
  13 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  8:21 UTC (permalink / raw)
  To: libc-alpha

In short: __tls_get_addr checks the global generation counter,
_dl_update_slotinfo updates up to the generation of the accessed
module. If the global generation is newer than geneneration of the
module then __tls_get_addr keeps hitting the slow path that updates
the dtv.

Possible approaches i can see:

1. update to global generation instead of module,
2. check the module generation in the fast path.

This patch is 1.: it needs additional sync (load acquire) so the
slotinfo list is up to date with the observed global generation.

Approach 2. would require walking the slotinfo list at all times.
I don't know how to make that fast with many modules.

Note: in the x86_64 version of dl-tls.c the generation is only loaded
once, since relaxed mo is not faster than acquire mo load.

I have not benchmarked this yet.
---
 elf/dl-close.c             |  2 +-
 elf/dl-open.c              |  8 ++++----
 elf/dl-reloc.c             |  5 ++---
 elf/dl-tls.c               | 28 ++++++++++++----------------
 sysdeps/generic/ldsodefs.h |  3 ++-
 sysdeps/x86_64/dl-tls.c    |  4 ++--
 6 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/elf/dl-close.c b/elf/dl-close.c
index 9f31532f41..45f8a7fe31 100644
--- a/elf/dl-close.c
+++ b/elf/dl-close.c
@@ -780,7 +780,7 @@ _dl_close_worker (struct link_map *map, bool force)
       if (__glibc_unlikely (newgen == 0))
 	_dl_fatal_printf ("TLS generation counter wrapped!  Please report as described in "REPORT_BUGS_TO".\n");
       /* Can be read concurrently.  */
-      atomic_store_relaxed (&GL(dl_tls_generation), newgen);
+      atomic_store_release (&GL(dl_tls_generation), newgen);
 
       if (tls_free_end == GL(dl_tls_static_used))
 	GL(dl_tls_static_used) = tls_free_start;
diff --git a/elf/dl-open.c b/elf/dl-open.c
index 661f26977e..5b9816e4e8 100644
--- a/elf/dl-open.c
+++ b/elf/dl-open.c
@@ -400,7 +400,7 @@ update_tls_slotinfo (struct link_map *new)
     _dl_fatal_printf (N_("\
 TLS generation counter wrapped!  Please report this."));
   /* Can be read concurrently.  */
-  atomic_store_relaxed (&GL(dl_tls_generation), newgen);
+  atomic_store_release (&GL(dl_tls_generation), newgen);
 
   /* We need a second pass for static tls data, because
      _dl_update_slotinfo must not be run while calls to
@@ -417,8 +417,8 @@ TLS generation counter wrapped!  Please report this."));
 	     now, but we can delay updating the DTV.  */
 	  imap->l_need_tls_init = 0;
 #ifdef SHARED
-	  /* Update the slot information data for at least the
-	     generation of the DSO we are allocating data for.  */
+	  /* Update the slot information data for the current
+	     generation.  */
 
 	  /* FIXME: This can terminate the process on memory
 	     allocation failure.  It is not possible to raise
@@ -426,7 +426,7 @@ TLS generation counter wrapped!  Please report this."));
 	     _dl_update_slotinfo would have to be split into two
 	     operations, similar to resize_scopes and update_scopes
 	     above.  This is related to bug 16134.  */
-	  _dl_update_slotinfo (imap->l_tls_modid);
+	  _dl_update_slotinfo (imap->l_tls_modid, newgen);
 #endif
 
 	  GL(dl_init_static_tls) (imap);
diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
index c2df26deea..427669d769 100644
--- a/elf/dl-reloc.c
+++ b/elf/dl-reloc.c
@@ -111,11 +111,10 @@ _dl_try_allocate_static_tls (struct link_map *map, bool optional)
   if (map->l_real->l_relocated)
     {
 #ifdef SHARED
+// TODO: it is not clear why we need to update the DTV here, add comment
       if (__builtin_expect (THREAD_DTV()[0].counter != GL(dl_tls_generation),
 			    0))
-	/* Update the slot information data for at least the generation of
-	   the DSO we are allocating data for.  */
-	(void) _dl_update_slotinfo (map->l_tls_modid);
+	(void) _dl_update_slotinfo (map->l_tls_modid, GL(dl_tls_generation));
 #endif
 
       GL(dl_init_static_tls) (map);
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index b0257185e9..b51a4f3a19 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -701,7 +701,7 @@ allocate_and_init (struct link_map *map)
 
 
 struct link_map *
-_dl_update_slotinfo (unsigned long int req_modid)
+_dl_update_slotinfo (unsigned long int req_modid, size_t new_gen)
 {
   struct link_map *the_map = NULL;
   dtv_t *dtv = THREAD_DTV ();
@@ -718,19 +718,12 @@ _dl_update_slotinfo (unsigned long int req_modid)
      code and therefore add to the slotinfo list.  This is a problem
      since we must not pick up any information about incomplete work.
      The solution to this is to ignore all dtv slots which were
-     created after the one we are currently interested.  We know that
-     dynamic loading for this module is completed and this is the last
-     load operation we know finished.  */
-  unsigned long int idx = req_modid;
+     created after the generation we are interested in.  We know that
+     dynamic loading for this generation is completed and this is the
+     last load operation we know finished.  */
   struct dtv_slotinfo_list *listp = GL(dl_tls_dtv_slotinfo_list);
 
-  while (idx >= listp->len)
-    {
-      idx -= listp->len;
-      listp = listp->next;
-    }
-
-  if (dtv[0].counter < listp->slotinfo[idx].gen)
+  if (dtv[0].counter < new_gen)
     {
       /* CONCURRENCY NOTES:
 
@@ -751,7 +744,6 @@ _dl_update_slotinfo (unsigned long int req_modid)
 	 other entries are racy.  However updating a non-relevant dtv
 	 entry does not affect correctness.  For a relevant module m,
 	 max_modid >= modid of m.  */
-      size_t new_gen = listp->slotinfo[idx].gen;
       size_t total = 0;
       size_t max_modid  = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx));
       assert (max_modid >= req_modid);
@@ -894,9 +886,9 @@ tls_get_addr_tail (GET_ADDR_ARGS, dtv_t *dtv, struct link_map *the_map)
 
 static struct link_map *
 __attribute_noinline__
-update_get_addr (GET_ADDR_ARGS)
+update_get_addr (GET_ADDR_ARGS, size_t gen)
 {
-  struct link_map *the_map = _dl_update_slotinfo (GET_ADDR_MODULE);
+  struct link_map *the_map = _dl_update_slotinfo (GET_ADDR_MODULE, gen);
   dtv_t *dtv = THREAD_DTV ();
 
   void *p = dtv[GET_ADDR_MODULE].pointer.val;
@@ -931,7 +923,11 @@ __tls_get_addr (GET_ADDR_ARGS)
      by user code, see CONCURRENCY NOTES in _dl_update_slotinfo.  */
   size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
   if (__glibc_unlikely (dtv[0].counter != gen))
-    return update_get_addr (GET_ADDR_PARAM);
+    {
+// TODO: needs comment update if we rely on consistent generation with slotinfo
+      gen = atomic_load_acquire (&GL(dl_tls_generation));
+      return update_get_addr (GET_ADDR_PARAM, gen);
+    }
 
   void *p = dtv[GET_ADDR_MODULE].pointer.val;
 
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index ea3f7a69d0..614463f016 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -1224,7 +1224,8 @@ extern void _dl_add_to_slotinfo (struct link_map *l, bool do_add)
 
 /* Update slot information data for at least the generation of the
    module with the given index.  */
-extern struct link_map *_dl_update_slotinfo (unsigned long int req_modid)
+extern struct link_map *_dl_update_slotinfo (unsigned long int req_modid,
+					     size_t gen)
      attribute_hidden;
 
 /* Look up the module's TLS block as for __tls_get_addr,
diff --git a/sysdeps/x86_64/dl-tls.c b/sysdeps/x86_64/dl-tls.c
index 24ef560b71..4ded8dd6b9 100644
--- a/sysdeps/x86_64/dl-tls.c
+++ b/sysdeps/x86_64/dl-tls.c
@@ -40,9 +40,9 @@ __tls_get_addr_slow (GET_ADDR_ARGS)
 {
   dtv_t *dtv = THREAD_DTV ();
 
-  size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
+  size_t gen = atomic_load_acquire (&GL(dl_tls_generation));
   if (__glibc_unlikely (dtv[0].counter != gen))
-    return update_get_addr (GET_ADDR_PARAM);
+    return update_get_addr (GET_ADDR_PARAM, gen);
 
   return tls_get_addr_tail (GET_ADDR_PARAM, dtv, NULL);
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 01/14] elf: Fix a DTV setup issue [BZ #27136]
  2021-04-13  8:18 ` [PATCH v2 01/14] elf: Fix a DTV setup issue [BZ #27136] Szabolcs Nagy
@ 2021-04-13  8:36   ` Andreas Schwab
  2021-04-13  9:35     ` Szabolcs Nagy
  0 siblings, 1 reply; 44+ messages in thread
From: Andreas Schwab @ 2021-04-13  8:36 UTC (permalink / raw)
  To: Szabolcs Nagy via Libc-alpha; +Cc: Szabolcs Nagy

On Apr 13 2021, Szabolcs Nagy via Libc-alpha wrote:

> The max modid is a valid index in the dtv, it should not be skipped.

Does this check in _dl_allocate_tls_init need to be adjusted as well?

  /* Check if the current dtv is big enough.   */
  if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 01/14] elf: Fix a DTV setup issue [BZ #27136]
  2021-04-13  8:36   ` Andreas Schwab
@ 2021-04-13  9:35     ` Szabolcs Nagy
  2021-04-13 10:22       ` Andreas Schwab
  0 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13  9:35 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Szabolcs Nagy via Libc-alpha

The 04/13/2021 10:36, Andreas Schwab wrote:
> On Apr 13 2021, Szabolcs Nagy via Libc-alpha wrote:
> 
> > The max modid is a valid index in the dtv, it should not be skipped.
> 
> Does this check in _dl_allocate_tls_init need to be adjusted as well?
> 
>   /* Check if the current dtv is big enough.   */
>   if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))

no, that seems fine because counter is not the dtv length but
the maximum valid index (dtv[dtv[-1].counter] is valid, the
dtv array size is counter+2 to accomodate for dtv[-1] and [0])

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 01/14] elf: Fix a DTV setup issue [BZ #27136]
  2021-04-13  9:35     ` Szabolcs Nagy
@ 2021-04-13 10:22       ` Andreas Schwab
  2021-04-13 10:34         ` Szabolcs Nagy
  0 siblings, 1 reply; 44+ messages in thread
From: Andreas Schwab @ 2021-04-13 10:22 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Szabolcs Nagy via Libc-alpha

On Apr 13 2021, Szabolcs Nagy wrote:

> The 04/13/2021 10:36, Andreas Schwab wrote:
>> On Apr 13 2021, Szabolcs Nagy via Libc-alpha wrote:
>> 
>> > The max modid is a valid index in the dtv, it should not be skipped.
>> 
>> Does this check in _dl_allocate_tls_init need to be adjusted as well?
>> 
>>   /* Check if the current dtv is big enough.   */
>>   if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
>
> no, that seems fine because counter is not the dtv length but
> the maximum valid index (dtv[dtv[-1].counter] is valid, the
> dtv array size is counter+2 to accomodate for dtv[-1] and [0])

Since both dtv[-1].counter and GL(dl_tls_max_dtv_idx) are indexes, not
lengths, I would expect them to be compared with <=.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 01/14] elf: Fix a DTV setup issue [BZ #27136]
  2021-04-13 10:22       ` Andreas Schwab
@ 2021-04-13 10:34         ` Szabolcs Nagy
  2021-04-13 10:51           ` Andreas Schwab
  0 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-13 10:34 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Szabolcs Nagy via Libc-alpha

The 04/13/2021 12:22, Andreas Schwab wrote:
> On Apr 13 2021, Szabolcs Nagy wrote:
> 
> > The 04/13/2021 10:36, Andreas Schwab wrote:
> >> On Apr 13 2021, Szabolcs Nagy via Libc-alpha wrote:
> >> 
> >> > The max modid is a valid index in the dtv, it should not be skipped.
> >> 
> >> Does this check in _dl_allocate_tls_init need to be adjusted as well?
> >> 
> >>   /* Check if the current dtv is big enough.   */
> >>   if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
> >
> > no, that seems fine because counter is not the dtv length but
> > the maximum valid index (dtv[dtv[-1].counter] is valid, the
> > dtv array size is counter+2 to accomodate for dtv[-1] and [0])
> 
> Since both dtv[-1].counter and GL(dl_tls_max_dtv_idx) are indexes, not
> lengths, I would expect them to be compared with <=.

but the code is

  /* Check if the current dtv is big enough.   */
  if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
    {
      /* Resize the dtv.  */
      dtv = _dl_resize_dtv (dtv, GL(dl_tls_max_dtv_idx));

in case of == there is no need to resize: the dtv array is
large enough for the observed GL(dl_tls_max_dtv_idx).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 01/14] elf: Fix a DTV setup issue [BZ #27136]
  2021-04-13 10:34         ` Szabolcs Nagy
@ 2021-04-13 10:51           ` Andreas Schwab
  0 siblings, 0 replies; 44+ messages in thread
From: Andreas Schwab @ 2021-04-13 10:51 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Szabolcs Nagy via Libc-alpha

On Apr 13 2021, Szabolcs Nagy wrote:

> The 04/13/2021 12:22, Andreas Schwab wrote:
>> On Apr 13 2021, Szabolcs Nagy wrote:
>> 
>> > The 04/13/2021 10:36, Andreas Schwab wrote:
>> >> On Apr 13 2021, Szabolcs Nagy via Libc-alpha wrote:
>> >> 
>> >> > The max modid is a valid index in the dtv, it should not be skipped.
>> >> 
>> >> Does this check in _dl_allocate_tls_init need to be adjusted as well?
>> >> 
>> >>   /* Check if the current dtv is big enough.   */
>> >>   if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
>> >
>> > no, that seems fine because counter is not the dtv length but
>> > the maximum valid index (dtv[dtv[-1].counter] is valid, the
>> > dtv array size is counter+2 to accomodate for dtv[-1] and [0])
>> 
>> Since both dtv[-1].counter and GL(dl_tls_max_dtv_idx) are indexes, not
>> lengths, I would expect them to be compared with <=.
>
> but the code is
>
>   /* Check if the current dtv is big enough.   */
>   if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
>     {
>       /* Resize the dtv.  */
>       dtv = _dl_resize_dtv (dtv, GL(dl_tls_max_dtv_idx));
>
> in case of == there is no need to resize: the dtv array is
> large enough for the observed GL(dl_tls_max_dtv_idx).

I think it would be easier to understand if the condition would be
written as (GL(dl_tls_max_dtv_idx) > dtv[-1].counter).

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 09/14] x86_64: Avoid lazy relocation of tlsdesc [BZ #27137]
  2021-04-13  8:20 ` [PATCH v2 09/14] x86_64: Avoid lazy relocation of tlsdesc [BZ #27137] Szabolcs Nagy
@ 2021-04-13 14:02   ` H.J. Lu
  0 siblings, 0 replies; 44+ messages in thread
From: H.J. Lu @ 2021-04-13 14:02 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: GNU C Library

On Tue, Apr 13, 2021 at 2:31 AM Szabolcs Nagy via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Lazy tlsdesc relocation is racy because the static tls optimization and
> tlsdesc management operations are done without holding the dlopen lock.
>
> This similar to the commit b7cf203b5c17dd6d9878537d41e0c7cc3d270a67
> for aarch64, but it fixes a different race: bug 27137.
>
> Another issue is that ld auditing ignores DT_BIND_NOW and thus tries to
> relocate tlsdesc lazily, but that does not work in a BIND_NOW module
> due to missing DT_TLSDESC_PLT. Unconditionally relocating tlsdesc at
> load time fixes this bug 27721 too.
>
> --
> v2:
> - mention the ldaudit issue with bindnow and tlsdesc.
> ---
>  sysdeps/x86_64/dl-machine.h | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> index 103eee6c3f..9a876a371e 100644
> --- a/sysdeps/x86_64/dl-machine.h
> +++ b/sysdeps/x86_64/dl-machine.h
> @@ -570,12 +570,21 @@ elf_machine_lazy_rel (struct link_map *map,
>      }
>    else if (__glibc_likely (r_type == R_X86_64_TLSDESC))
>      {
> -      struct tlsdesc volatile * __attribute__((__unused__)) td =
> -       (struct tlsdesc volatile *)reloc_addr;
> +      const Elf_Symndx symndx = ELFW (R_SYM) (reloc->r_info);
> +      const ElfW (Sym) *symtab = (const void *)D_PTR (map, l_info[DT_SYMTAB]);
> +      const ElfW (Sym) *sym = &symtab[symndx];
> +      const struct r_found_version *version = NULL;
>
> -      td->arg = (void*)reloc;
> -      td->entry = (void*)(D_PTR (map, l_info[ADDRIDX (DT_TLSDESC_PLT)])
> -                         + map->l_addr);
> +      if (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
> +       {
> +         const ElfW (Half) *vernum =
> +           (const void *)D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
> +         version = &map->l_versions[vernum[symndx] & 0x7fff];
> +       }
> +
> +      /* Always initialize TLS descriptors completely at load time, in
> +        case static TLS is allocated for it that requires locking.  */
> +      elf_machine_rela (map, reloc, sym, version, reloc_addr, skip_ifunc);
>      }
>    else if (__glibc_unlikely (r_type == R_X86_64_IRELATIVE))
>      {
> --
> 2.17.1
>

LGTM.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 10/14] i386: Avoid lazy relocation of tlsdesc [BZ #27137]
  2021-04-13  8:20 ` [PATCH v2 10/14] i386: " Szabolcs Nagy
@ 2021-04-13 14:02   ` H.J. Lu
  0 siblings, 0 replies; 44+ messages in thread
From: H.J. Lu @ 2021-04-13 14:02 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: GNU C Library

On Tue, Apr 13, 2021 at 2:32 AM Szabolcs Nagy via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Lazy tlsdesc relocation is racy because the static tls optimization and
> tlsdesc management operations are done without holding the dlopen lock.
>
> This similar to the commit b7cf203b5c17dd6d9878537d41e0c7cc3d270a67
> for aarch64, but it fixes a different race: bug 27137.
>
> On i386 the code is a bit more complicated than on x86_64 because both
> rel and rela relocs are supported.
> ---
>  sysdeps/i386/dl-machine.h | 76 ++++++++++++++++++---------------------
>  1 file changed, 34 insertions(+), 42 deletions(-)
>
> diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
> index 23e9cc3bfb..590b41d8d7 100644
> --- a/sysdeps/i386/dl-machine.h
> +++ b/sysdeps/i386/dl-machine.h
> @@ -688,50 +688,32 @@ elf_machine_lazy_rel (struct link_map *map,
>      }
>    else if (__glibc_likely (r_type == R_386_TLS_DESC))
>      {
> -      struct tlsdesc volatile * __attribute__((__unused__)) td =
> -       (struct tlsdesc volatile *)reloc_addr;
> -
> -      /* Handle relocations that reference the local *ABS* in a simple
> -        way, so as to preserve a potential addend.  */
> -      if (ELF32_R_SYM (reloc->r_info) == 0)
> -       td->entry = _dl_tlsdesc_resolve_abs_plus_addend;
> -      /* Given a known-zero addend, we can store a pointer to the
> -        reloc in the arg position.  */
> -      else if (td->arg == 0)
> -       {
> -         td->arg = (void*)reloc;
> -         td->entry = _dl_tlsdesc_resolve_rel;
> -       }
> -      else
> -       {
> -         /* We could handle non-*ABS* relocations with non-zero addends
> -            by allocating dynamically an arg to hold a pointer to the
> -            reloc, but that sounds pointless.  */
> -         const Elf32_Rel *const r = reloc;
> -         /* The code below was borrowed from elf_dynamic_do_rel().  */
> -         const ElfW(Sym) *const symtab =
> -           (const void *) D_PTR (map, l_info[DT_SYMTAB]);
> +      const Elf32_Rel *const r = reloc;
> +      /* The code below was borrowed from elf_dynamic_do_rel().  */
> +      const ElfW(Sym) *const symtab =
> +       (const void *) D_PTR (map, l_info[DT_SYMTAB]);
>
> +      /* Always initialize TLS descriptors completely at load time, in
> +        case static TLS is allocated for it that requires locking.  */
>  # ifdef RTLD_BOOTSTRAP
> -         /* The dynamic linker always uses versioning.  */
> -         assert (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL);
> +      /* The dynamic linker always uses versioning.  */
> +      assert (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL);
>  # else
> -         if (map->l_info[VERSYMIDX (DT_VERSYM)])
> +      if (map->l_info[VERSYMIDX (DT_VERSYM)])
>  # endif
> -           {
> -             const ElfW(Half) *const version =
> -               (const void *) D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
> -             ElfW(Half) ndx = version[ELFW(R_SYM) (r->r_info)] & 0x7fff;
> -             elf_machine_rel (map, r, &symtab[ELFW(R_SYM) (r->r_info)],
> -                              &map->l_versions[ndx],
> -                              (void *) (l_addr + r->r_offset), skip_ifunc);
> -           }
> +       {
> +         const ElfW(Half) *const version =
> +           (const void *) D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
> +         ElfW(Half) ndx = version[ELFW(R_SYM) (r->r_info)] & 0x7fff;
> +         elf_machine_rel (map, r, &symtab[ELFW(R_SYM) (r->r_info)],
> +                          &map->l_versions[ndx],
> +                          (void *) (l_addr + r->r_offset), skip_ifunc);
> +       }
>  # ifndef RTLD_BOOTSTRAP
> -         else
> -           elf_machine_rel (map, r, &symtab[ELFW(R_SYM) (r->r_info)], NULL,
> -                            (void *) (l_addr + r->r_offset), skip_ifunc);
> +      else
> +       elf_machine_rel (map, r, &symtab[ELFW(R_SYM) (r->r_info)], NULL,
> +                        (void *) (l_addr + r->r_offset), skip_ifunc);
>  # endif
> -       }
>      }
>    else if (__glibc_unlikely (r_type == R_386_IRELATIVE))
>      {
> @@ -758,11 +740,21 @@ elf_machine_lazy_rela (struct link_map *map,
>      ;
>    else if (__glibc_likely (r_type == R_386_TLS_DESC))
>      {
> -      struct tlsdesc volatile * __attribute__((__unused__)) td =
> -       (struct tlsdesc volatile *)reloc_addr;
> +      const Elf_Symndx symndx = ELFW (R_SYM) (reloc->r_info);
> +      const ElfW (Sym) *symtab = (const void *)D_PTR (map, l_info[DT_SYMTAB]);
> +      const ElfW (Sym) *sym = &symtab[symndx];
> +      const struct r_found_version *version = NULL;
> +
> +      if (map->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
> +       {
> +         const ElfW (Half) *vernum =
> +           (const void *)D_PTR (map, l_info[VERSYMIDX (DT_VERSYM)]);
> +         version = &map->l_versions[vernum[symndx] & 0x7fff];
> +       }
>
> -      td->arg = (void*)reloc;
> -      td->entry = _dl_tlsdesc_resolve_rela;
> +      /* Always initialize TLS descriptors completely at load time, in
> +        case static TLS is allocated for it that requires locking.  */
> +      elf_machine_rela (map, reloc, sym, version, reloc_addr, skip_ifunc);
>      }
>    else if (__glibc_unlikely (r_type == R_386_IRELATIVE))
>      {
> --
> 2.17.1
>

LGTM.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 11/14] x86_64: Remove lazy tlsdesc relocation related code
  2021-04-13  8:21 ` [PATCH v2 11/14] x86_64: Remove lazy tlsdesc relocation related code Szabolcs Nagy
@ 2021-04-13 14:03   ` H.J. Lu
  0 siblings, 0 replies; 44+ messages in thread
From: H.J. Lu @ 2021-04-13 14:03 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: GNU C Library

On Tue, Apr 13, 2021 at 2:33 AM Szabolcs Nagy via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> _dl_tlsdesc_resolve_rela and _dl_tlsdesc_resolve_hold are only used for
> lazy tlsdesc relocation processing which is no longer supported.
>
> --
> v2:
> - fix elf_machine_runtime_setup: remove tlsdesc got setup.
> ---
>  sysdeps/x86_64/dl-machine.h |   4 --
>  sysdeps/x86_64/dl-tlsdesc.S | 104 ----------------------------------
>  sysdeps/x86_64/dl-tlsdesc.h |   4 +-
>  sysdeps/x86_64/tlsdesc.c    | 109 +-----------------------------------
>  4 files changed, 2 insertions(+), 219 deletions(-)
>
> diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> index 9a876a371e..a8596aa3fa 100644
> --- a/sysdeps/x86_64/dl-machine.h
> +++ b/sysdeps/x86_64/dl-machine.h
> @@ -127,10 +127,6 @@ elf_machine_runtime_setup (struct link_map *l, int lazy, int profile)
>         }
>      }
>
> -  if (l->l_info[ADDRIDX (DT_TLSDESC_GOT)] && lazy)
> -    *(ElfW(Addr)*)(D_PTR (l, l_info[ADDRIDX (DT_TLSDESC_GOT)]) + l->l_addr)
> -      = (ElfW(Addr)) &_dl_tlsdesc_resolve_rela;
> -
>    return lazy;
>  }
>
> diff --git a/sysdeps/x86_64/dl-tlsdesc.S b/sysdeps/x86_64/dl-tlsdesc.S
> index 1d055adadd..ca9236bed8 100644
> --- a/sysdeps/x86_64/dl-tlsdesc.S
> +++ b/sysdeps/x86_64/dl-tlsdesc.S
> @@ -144,107 +144,3 @@ _dl_tlsdesc_dynamic:
>         cfi_endproc
>         .size   _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
>  #endif /* SHARED */
> -
> -     /* This function is a wrapper for a lazy resolver for TLS_DESC
> -       RELA relocations.  The incoming 0(%rsp) points to the caller's
> -       link map, pushed by the dynamic object's internal lazy TLS
> -       resolver front-end before tail-calling us.  We need to pop it
> -       ourselves.  %rax points to a TLS descriptor, such that 0(%rax)
> -       holds the address of the internal resolver front-end (unless
> -       some other thread beat us to resolving it) and 8(%rax) holds a
> -       pointer to the relocation.
> -
> -       When the actual resolver returns, it will have adjusted the
> -       TLS descriptor such that we can tail-call it for it to return
> -       the TP offset of the symbol.  */
> -
> -       .hidden _dl_tlsdesc_resolve_rela
> -       .global _dl_tlsdesc_resolve_rela
> -       .type   _dl_tlsdesc_resolve_rela,@function
> -       cfi_startproc
> -       .align 16
> -       /* The PLT entry will have pushed the link_map pointer.  */
> -_dl_tlsdesc_resolve_rela:
> -       _CET_ENDBR
> -       cfi_adjust_cfa_offset (8)
> -       /* Save all call-clobbered registers.  Add 8 bytes for push in
> -          the PLT entry to align the stack.  */
> -       subq    $80, %rsp
> -       cfi_adjust_cfa_offset (80)
> -       movq    %rax, (%rsp)
> -       movq    %rdi, 8(%rsp)
> -       movq    %rax, %rdi      /* Pass tlsdesc* in %rdi.  */
> -       movq    %rsi, 16(%rsp)
> -       movq    80(%rsp), %rsi  /* Pass link_map* in %rsi.  */
> -       movq    %r8, 24(%rsp)
> -       movq    %r9, 32(%rsp)
> -       movq    %r10, 40(%rsp)
> -       movq    %r11, 48(%rsp)
> -       movq    %rdx, 56(%rsp)
> -       movq    %rcx, 64(%rsp)
> -       call    _dl_tlsdesc_resolve_rela_fixup
> -       movq    (%rsp), %rax
> -       movq    8(%rsp), %rdi
> -       movq    16(%rsp), %rsi
> -       movq    24(%rsp), %r8
> -       movq    32(%rsp), %r9
> -       movq    40(%rsp), %r10
> -       movq    48(%rsp), %r11
> -       movq    56(%rsp), %rdx
> -       movq    64(%rsp), %rcx
> -       addq    $88, %rsp
> -       cfi_adjust_cfa_offset (-88)
> -       jmp     *(%rax)
> -       cfi_endproc
> -       .size   _dl_tlsdesc_resolve_rela, .-_dl_tlsdesc_resolve_rela
> -
> -     /* This function is a placeholder for lazy resolving of TLS
> -       relocations.  Once some thread starts resolving a TLS
> -       relocation, it sets up the TLS descriptor to use this
> -       resolver, such that other threads that would attempt to
> -       resolve it concurrently may skip the call to the original lazy
> -       resolver and go straight to a condition wait.
> -
> -       When the actual resolver returns, it will have adjusted the
> -       TLS descriptor such that we can tail-call it for it to return
> -       the TP offset of the symbol.  */
> -
> -       .hidden _dl_tlsdesc_resolve_hold
> -       .global _dl_tlsdesc_resolve_hold
> -       .type   _dl_tlsdesc_resolve_hold,@function
> -       cfi_startproc
> -       .align 16
> -_dl_tlsdesc_resolve_hold:
> -0:
> -       _CET_ENDBR
> -       /* Save all call-clobbered registers.  */
> -       subq    $72, %rsp
> -       cfi_adjust_cfa_offset (72)
> -       movq    %rax, (%rsp)
> -       movq    %rdi, 8(%rsp)
> -       movq    %rax, %rdi      /* Pass tlsdesc* in %rdi.  */
> -       movq    %rsi, 16(%rsp)
> -       /* Pass _dl_tlsdesc_resolve_hold's address in %rsi.  */
> -       leaq    . - _dl_tlsdesc_resolve_hold(%rip), %rsi
> -       movq    %r8, 24(%rsp)
> -       movq    %r9, 32(%rsp)
> -       movq    %r10, 40(%rsp)
> -       movq    %r11, 48(%rsp)
> -       movq    %rdx, 56(%rsp)
> -       movq    %rcx, 64(%rsp)
> -       call    _dl_tlsdesc_resolve_hold_fixup
> -1:
> -       movq    (%rsp), %rax
> -       movq    8(%rsp), %rdi
> -       movq    16(%rsp), %rsi
> -       movq    24(%rsp), %r8
> -       movq    32(%rsp), %r9
> -       movq    40(%rsp), %r10
> -       movq    48(%rsp), %r11
> -       movq    56(%rsp), %rdx
> -       movq    64(%rsp), %rcx
> -       addq    $72, %rsp
> -       cfi_adjust_cfa_offset (-72)
> -       jmp     *(%rax)
> -       cfi_endproc
> -       .size   _dl_tlsdesc_resolve_hold, .-_dl_tlsdesc_resolve_hold
> diff --git a/sysdeps/x86_64/dl-tlsdesc.h b/sysdeps/x86_64/dl-tlsdesc.h
> index d134b3f4db..03d5ac7a54 100644
> --- a/sysdeps/x86_64/dl-tlsdesc.h
> +++ b/sysdeps/x86_64/dl-tlsdesc.h
> @@ -55,9 +55,7 @@ struct tlsdesc_dynamic_arg
>
>  extern ptrdiff_t attribute_hidden
>    _dl_tlsdesc_return(struct tlsdesc *on_rax),
> -  _dl_tlsdesc_undefweak(struct tlsdesc *on_rax),
> -  _dl_tlsdesc_resolve_rela(struct tlsdesc *on_rax),
> -  _dl_tlsdesc_resolve_hold(struct tlsdesc *on_rax);
> +  _dl_tlsdesc_undefweak(struct tlsdesc *on_rax);
>
>  # ifdef SHARED
>  extern void *_dl_make_tlsdesc_dynamic (struct link_map *map,
> diff --git a/sysdeps/x86_64/tlsdesc.c b/sysdeps/x86_64/tlsdesc.c
> index 4083849f22..ecf864d6ee 100644
> --- a/sysdeps/x86_64/tlsdesc.c
> +++ b/sysdeps/x86_64/tlsdesc.c
> @@ -16,120 +16,13 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>
> -#include <link.h>
>  #include <ldsodefs.h>
> -#include <elf/dynamic-link.h>
>  #include <tls.h>
>  #include <dl-tlsdesc.h>
>  #include <dl-unmap-segments.h>
> +#define _dl_tlsdesc_resolve_hold 0
>  #include <tlsdeschtab.h>
>
> -/* The following 2 functions take a caller argument, that contains the
> -   address expected to be in the TLS descriptor.  If it's changed, we
> -   want to return immediately.  */
> -
> -/* This function is used to lazily resolve TLS_DESC RELA relocations.
> -   The argument location is used to hold a pointer to the relocation.  */
> -
> -void
> -attribute_hidden
> -_dl_tlsdesc_resolve_rela_fixup (struct tlsdesc volatile *td,
> -                               struct link_map *l)
> -{
> -  const ElfW(Rela) *reloc = td->arg;
> -
> -  if (_dl_tlsdesc_resolve_early_return_p
> -      (td, (void*)(D_PTR (l, l_info[ADDRIDX (DT_TLSDESC_PLT)]) + l->l_addr)))
> -    return;
> -
> -  /* The code below was borrowed from _dl_fixup().  */
> -  const ElfW(Sym) *const symtab
> -    = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
> -  const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
> -  const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
> -  lookup_t result;
> -
> -   /* Look up the target symbol.  If the normal lookup rules are not
> -      used don't look in the global scope.  */
> -  if (ELFW(ST_BIND) (sym->st_info) != STB_LOCAL
> -      && __builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
> -    {
> -      const struct r_found_version *version = NULL;
> -
> -      if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
> -       {
> -         const ElfW(Half) *vernum =
> -           (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
> -         ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
> -         version = &l->l_versions[ndx];
> -         if (version->hash == 0)
> -           version = NULL;
> -       }
> -
> -      result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym,
> -                                   l->l_scope, version, ELF_RTYPE_CLASS_PLT,
> -                                   DL_LOOKUP_ADD_DEPENDENCY, NULL);
> -    }
> -  else
> -    {
> -      /* We already found the symbol.  The module (and therefore its load
> -        address) is also known.  */
> -      result = l;
> -    }
> -
> -  if (! sym)
> -    {
> -      td->arg = (void*)reloc->r_addend;
> -      td->entry = _dl_tlsdesc_undefweak;
> -    }
> -  else
> -    {
> -#  ifndef SHARED
> -      CHECK_STATIC_TLS (l, result);
> -#  else
> -      if (!TRY_STATIC_TLS (l, result))
> -       {
> -         td->arg = _dl_make_tlsdesc_dynamic (result, sym->st_value
> -                                             + reloc->r_addend);
> -         td->entry = _dl_tlsdesc_dynamic;
> -       }
> -      else
> -#  endif
> -       {
> -         td->arg = (void*)(sym->st_value - result->l_tls_offset
> -                           + reloc->r_addend);
> -         td->entry = _dl_tlsdesc_return;
> -       }
> -    }
> -
> -  _dl_tlsdesc_wake_up_held_fixups ();
> -}
> -
> -/* This function is used to avoid busy waiting for other threads to
> -   complete the lazy relocation.  Once another thread wins the race to
> -   relocate a TLS descriptor, it sets the descriptor up such that this
> -   function is called to wait until the resolver releases the
> -   lock.  */
> -
> -void
> -attribute_hidden
> -_dl_tlsdesc_resolve_hold_fixup (struct tlsdesc volatile *td,
> -                               void *caller)
> -{
> -  /* Maybe we're lucky and can return early.  */
> -  if (caller != td->entry)
> -    return;
> -
> -  /* Locking here will stop execution until the running resolver runs
> -     _dl_tlsdesc_wake_up_held_fixups(), releasing the lock.
> -
> -     FIXME: We'd be better off waiting on a condition variable, such
> -     that we didn't have to hold the lock throughout the relocation
> -     processing.  */
> -  __rtld_lock_lock_recursive (GL(dl_load_lock));
> -  __rtld_lock_unlock_recursive (GL(dl_load_lock));
> -}
> -
>  /* Unmap the dynamic object, but also release its TLS descriptor table
>     if there is one.  */
>
> --
> 2.17.1
>

LGTM.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 12/14] i386: Remove lazy tlsdesc relocation related code
  2021-04-13  8:21 ` [PATCH v2 12/14] i386: " Szabolcs Nagy
@ 2021-04-13 14:04   ` H.J. Lu
  0 siblings, 0 replies; 44+ messages in thread
From: H.J. Lu @ 2021-04-13 14:04 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: GNU C Library

On Tue, Apr 13, 2021 at 2:34 AM Szabolcs Nagy via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Like in commit e75711ebfa976d5468ec292282566a18b07e4d67 for x86_64,
> remove unused lazy tlsdesc relocation processing code:
>
>   _dl_tlsdesc_resolve_abs_plus_addend
>   _dl_tlsdesc_resolve_rel
>   _dl_tlsdesc_resolve_rela
>   _dl_tlsdesc_resolve_hold
> ---
>  sysdeps/i386/dl-tlsdesc.S | 156 -------------------------
>  sysdeps/i386/dl-tlsdesc.h |   6 +-
>  sysdeps/i386/tlsdesc.c    | 231 +-------------------------------------
>  3 files changed, 2 insertions(+), 391 deletions(-)
>
> diff --git a/sysdeps/i386/dl-tlsdesc.S b/sysdeps/i386/dl-tlsdesc.S
> index e781d973b7..255fe88651 100644
> --- a/sysdeps/i386/dl-tlsdesc.S
> +++ b/sysdeps/i386/dl-tlsdesc.S
> @@ -134,159 +134,3 @@ _dl_tlsdesc_dynamic:
>         cfi_endproc
>         .size   _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
>  #endif /* SHARED */
> -
> -     /* This function is a wrapper for a lazy resolver for TLS_DESC
> -       REL relocations that reference the *ABS* segment in their own
> -       link maps.  %ebx points to the caller's GOT.  %eax points to a
> -       TLS descriptor, such that 0(%eax) holds the address of the
> -       resolver wrapper itself (unless some other thread beat us to
> -       it) and 4(%eax) holds the addend in the relocation.
> -
> -       When the actual resolver returns, it will have adjusted the
> -       TLS descriptor such that we can tail-call it for it to return
> -       the TP offset of the symbol.  */
> -
> -       .hidden _dl_tlsdesc_resolve_abs_plus_addend
> -       .global _dl_tlsdesc_resolve_abs_plus_addend
> -       .type   _dl_tlsdesc_resolve_abs_plus_addend,@function
> -       cfi_startproc
> -       .align 16
> -_dl_tlsdesc_resolve_abs_plus_addend:
> -0:
> -       _CET_ENDBR
> -       pushl   %eax
> -       cfi_adjust_cfa_offset (4)
> -       pushl   %ecx
> -       cfi_adjust_cfa_offset (4)
> -       pushl   %edx
> -       cfi_adjust_cfa_offset (4)
> -       movl    $1f - 0b, %ecx
> -       movl    4(%ebx), %edx
> -       call    _dl_tlsdesc_resolve_abs_plus_addend_fixup
> -1:
> -       popl    %edx
> -       cfi_adjust_cfa_offset (-4)
> -       popl    %ecx
> -       cfi_adjust_cfa_offset (-4)
> -       popl    %eax
> -       cfi_adjust_cfa_offset (-4)
> -       jmp     *(%eax)
> -       cfi_endproc
> -       .size   _dl_tlsdesc_resolve_abs_plus_addend, .-_dl_tlsdesc_resolve_abs_plus_addend
> -
> -     /* This function is a wrapper for a lazy resolver for TLS_DESC
> -       REL relocations that had zero addends.  %ebx points to the
> -       caller's GOT.  %eax points to a TLS descriptor, such that
> -       0(%eax) holds the address of the resolver wrapper itself
> -       (unless some other thread beat us to it) and 4(%eax) holds a
> -       pointer to the relocation.
> -
> -       When the actual resolver returns, it will have adjusted the
> -       TLS descriptor such that we can tail-call it for it to return
> -       the TP offset of the symbol.  */
> -
> -       .hidden _dl_tlsdesc_resolve_rel
> -       .global _dl_tlsdesc_resolve_rel
> -       .type   _dl_tlsdesc_resolve_rel,@function
> -       cfi_startproc
> -       .align 16
> -_dl_tlsdesc_resolve_rel:
> -0:
> -       _CET_ENDBR
> -       pushl   %eax
> -       cfi_adjust_cfa_offset (4)
> -       pushl   %ecx
> -       cfi_adjust_cfa_offset (4)
> -       pushl   %edx
> -       cfi_adjust_cfa_offset (4)
> -       movl    $1f - 0b, %ecx
> -       movl    4(%ebx), %edx
> -       call    _dl_tlsdesc_resolve_rel_fixup
> -1:
> -       popl    %edx
> -       cfi_adjust_cfa_offset (-4)
> -       popl    %ecx
> -       cfi_adjust_cfa_offset (-4)
> -       popl    %eax
> -       cfi_adjust_cfa_offset (-4)
> -       jmp     *(%eax)
> -       cfi_endproc
> -       .size   _dl_tlsdesc_resolve_rel, .-_dl_tlsdesc_resolve_rel
> -
> -     /* This function is a wrapper for a lazy resolver for TLS_DESC
> -       RELA relocations.  %ebx points to the caller's GOT.  %eax
> -       points to a TLS descriptor, such that 0(%eax) holds the
> -       address of the resolver wrapper itself (unless some other
> -       thread beat us to it) and 4(%eax) holds a pointer to the
> -       relocation.
> -
> -       When the actual resolver returns, it will have adjusted the
> -       TLS descriptor such that we can tail-call it for it to return
> -       the TP offset of the symbol.  */
> -
> -       .hidden _dl_tlsdesc_resolve_rela
> -       .global _dl_tlsdesc_resolve_rela
> -       .type   _dl_tlsdesc_resolve_rela,@function
> -       cfi_startproc
> -       .align 16
> -_dl_tlsdesc_resolve_rela:
> -0:
> -       _CET_ENDBR
> -       pushl   %eax
> -       cfi_adjust_cfa_offset (4)
> -       pushl   %ecx
> -       cfi_adjust_cfa_offset (4)
> -       pushl   %edx
> -       cfi_adjust_cfa_offset (4)
> -       movl    $1f - 0b, %ecx
> -       movl    4(%ebx), %edx
> -       call    _dl_tlsdesc_resolve_rela_fixup
> -1:
> -       popl    %edx
> -       cfi_adjust_cfa_offset (-4)
> -       popl    %ecx
> -       cfi_adjust_cfa_offset (-4)
> -       popl    %eax
> -       cfi_adjust_cfa_offset (-4)
> -       jmp     *(%eax)
> -       cfi_endproc
> -       .size   _dl_tlsdesc_resolve_rela, .-_dl_tlsdesc_resolve_rela
> -
> -     /* This function is a placeholder for lazy resolving of TLS
> -       relocations.  Once some thread starts resolving a TLS
> -       relocation, it sets up the TLS descriptor to use this
> -       resolver, such that other threads that would attempt to
> -       resolve it concurrently may skip the call to the original lazy
> -       resolver and go straight to a condition wait.
> -
> -       When the actual resolver returns, it will have adjusted the
> -       TLS descriptor such that we can tail-call it for it to return
> -       the TP offset of the symbol.  */
> -
> -       .hidden _dl_tlsdesc_resolve_hold
> -       .global _dl_tlsdesc_resolve_hold
> -       .type   _dl_tlsdesc_resolve_hold,@function
> -       cfi_startproc
> -       .align 16
> -_dl_tlsdesc_resolve_hold:
> -0:
> -       _CET_ENDBR
> -       pushl   %eax
> -       cfi_adjust_cfa_offset (4)
> -       pushl   %ecx
> -       cfi_adjust_cfa_offset (4)
> -       pushl   %edx
> -       cfi_adjust_cfa_offset (4)
> -       movl    $1f - 0b, %ecx
> -       movl    4(%ebx), %edx
> -       call    _dl_tlsdesc_resolve_hold_fixup
> -1:
> -       popl    %edx
> -       cfi_adjust_cfa_offset (-4)
> -       popl    %ecx
> -       cfi_adjust_cfa_offset (-4)
> -       popl    %eax
> -       cfi_adjust_cfa_offset (-4)
> -       jmp     *(%eax)
> -       cfi_endproc
> -       .size   _dl_tlsdesc_resolve_hold, .-_dl_tlsdesc_resolve_hold
> diff --git a/sysdeps/i386/dl-tlsdesc.h b/sysdeps/i386/dl-tlsdesc.h
> index 753c03e79c..12e90da3a8 100644
> --- a/sysdeps/i386/dl-tlsdesc.h
> +++ b/sysdeps/i386/dl-tlsdesc.h
> @@ -43,11 +43,7 @@ struct tlsdesc_dynamic_arg
>
>  extern ptrdiff_t attribute_hidden __attribute__ ((regparm (1)))
>    _dl_tlsdesc_return (struct tlsdesc *),
> -  _dl_tlsdesc_undefweak (struct tlsdesc *),
> -  _dl_tlsdesc_resolve_abs_plus_addend (struct tlsdesc *),
> -  _dl_tlsdesc_resolve_rel (struct tlsdesc *),
> -  _dl_tlsdesc_resolve_rela (struct tlsdesc *),
> -  _dl_tlsdesc_resolve_hold (struct tlsdesc *);
> +  _dl_tlsdesc_undefweak (struct tlsdesc *);
>
>  # ifdef SHARED
>  extern void *_dl_make_tlsdesc_dynamic (struct link_map *map,
> diff --git a/sysdeps/i386/tlsdesc.c b/sysdeps/i386/tlsdesc.c
> index 0bc646541f..436a21f66b 100644
> --- a/sysdeps/i386/tlsdesc.c
> +++ b/sysdeps/i386/tlsdesc.c
> @@ -16,242 +16,13 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>
> -#include <link.h>
>  #include <ldsodefs.h>
> -#include <elf/dynamic-link.h>
>  #include <tls.h>
>  #include <dl-tlsdesc.h>
>  #include <dl-unmap-segments.h>
> +#define _dl_tlsdesc_resolve_hold 0
>  #include <tlsdeschtab.h>
>
> -/* The following 4 functions take an entry_check_offset argument.
> -   It's computed by the caller as an offset between its entry point
> -   and the call site, such that by adding the built-in return address
> -   that is implicitly passed to the function with this offset, we can
> -   easily obtain the caller's entry point to compare with the entry
> -   point given in the TLS descriptor.  If it's changed, we want to
> -   return immediately.  */
> -
> -/* This function is used to lazily resolve TLS_DESC REL relocations
> -   that reference the *ABS* segment in their own link maps.  The
> -   argument is the addend originally stored there.  */
> -
> -void
> -__attribute__ ((regparm (3))) attribute_hidden
> -_dl_tlsdesc_resolve_abs_plus_addend_fixup (struct tlsdesc volatile *td,
> -                                          struct link_map *l,
> -                                          ptrdiff_t entry_check_offset)
> -{
> -  ptrdiff_t addend = (ptrdiff_t) td->arg;
> -
> -  if (_dl_tlsdesc_resolve_early_return_p (td, __builtin_return_address (0)
> -                                         - entry_check_offset))
> -    return;
> -
> -#ifndef SHARED
> -  CHECK_STATIC_TLS (l, l);
> -#else
> -  if (!TRY_STATIC_TLS (l, l))
> -    {
> -      td->arg = _dl_make_tlsdesc_dynamic (l, addend);
> -      td->entry = _dl_tlsdesc_dynamic;
> -    }
> -  else
> -#endif
> -    {
> -      td->arg = (void*) (addend - l->l_tls_offset);
> -      td->entry = _dl_tlsdesc_return;
> -    }
> -
> -  _dl_tlsdesc_wake_up_held_fixups ();
> -}
> -
> -/* This function is used to lazily resolve TLS_DESC REL relocations
> -   that originally had zero addends.  The argument location, that
> -   originally held the addend, is used to hold a pointer to the
> -   relocation, but it has to be restored before we call the function
> -   that applies relocations.  */
> -
> -void
> -__attribute__ ((regparm (3))) attribute_hidden
> -_dl_tlsdesc_resolve_rel_fixup (struct tlsdesc volatile *td,
> -                              struct link_map *l,
> -                              ptrdiff_t entry_check_offset)
> -{
> -  const ElfW(Rel) *reloc = td->arg;
> -
> -  if (_dl_tlsdesc_resolve_early_return_p (td, __builtin_return_address (0)
> -                                         - entry_check_offset))
> -    return;
> -
> -  /* The code below was borrowed from _dl_fixup(),
> -     except for checking for STB_LOCAL.  */
> -  const ElfW(Sym) *const symtab
> -    = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
> -  const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
> -  const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
> -  lookup_t result;
> -
> -   /* Look up the target symbol.  If the normal lookup rules are not
> -      used don't look in the global scope.  */
> -  if (ELFW(ST_BIND) (sym->st_info) != STB_LOCAL
> -      && __builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
> -    {
> -      const struct r_found_version *version = NULL;
> -
> -      if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
> -       {
> -         const ElfW(Half) *vernum =
> -           (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
> -         ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
> -         version = &l->l_versions[ndx];
> -         if (version->hash == 0)
> -           version = NULL;
> -       }
> -
> -      result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym,
> -                                   l->l_scope, version, ELF_RTYPE_CLASS_PLT,
> -                                   DL_LOOKUP_ADD_DEPENDENCY, NULL);
> -    }
> -  else
> -    {
> -      /* We already found the symbol.  The module (and therefore its load
> -        address) is also known.  */
> -      result = l;
> -    }
> -
> -  if (!sym)
> -    {
> -      td->arg = 0;
> -      td->entry = _dl_tlsdesc_undefweak;
> -    }
> -  else
> -    {
> -#  ifndef SHARED
> -      CHECK_STATIC_TLS (l, result);
> -#  else
> -      if (!TRY_STATIC_TLS (l, result))
> -       {
> -         td->arg = _dl_make_tlsdesc_dynamic (result, sym->st_value);
> -         td->entry = _dl_tlsdesc_dynamic;
> -       }
> -      else
> -#  endif
> -       {
> -         td->arg = (void*)(sym->st_value - result->l_tls_offset);
> -         td->entry = _dl_tlsdesc_return;
> -       }
> -    }
> -
> -  _dl_tlsdesc_wake_up_held_fixups ();
> -}
> -
> -/* This function is used to lazily resolve TLS_DESC RELA relocations.
> -   The argument location is used to hold a pointer to the relocation.  */
> -
> -void
> -__attribute__ ((regparm (3))) attribute_hidden
> -_dl_tlsdesc_resolve_rela_fixup (struct tlsdesc volatile *td,
> -                               struct link_map *l,
> -                               ptrdiff_t entry_check_offset)
> -{
> -  const ElfW(Rela) *reloc = td->arg;
> -
> -  if (_dl_tlsdesc_resolve_early_return_p (td, __builtin_return_address (0)
> -                                         - entry_check_offset))
> -    return;
> -
> -  /* The code below was borrowed from _dl_fixup(),
> -     except for checking for STB_LOCAL.  */
> -  const ElfW(Sym) *const symtab
> -    = (const void *) D_PTR (l, l_info[DT_SYMTAB]);
> -  const char *strtab = (const void *) D_PTR (l, l_info[DT_STRTAB]);
> -  const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
> -  lookup_t result;
> -
> -   /* Look up the target symbol.  If the normal lookup rules are not
> -      used don't look in the global scope.  */
> -  if (ELFW(ST_BIND) (sym->st_info) != STB_LOCAL
> -      && __builtin_expect (ELFW(ST_VISIBILITY) (sym->st_other), 0) == 0)
> -    {
> -      const struct r_found_version *version = NULL;
> -
> -      if (l->l_info[VERSYMIDX (DT_VERSYM)] != NULL)
> -       {
> -         const ElfW(Half) *vernum =
> -           (const void *) D_PTR (l, l_info[VERSYMIDX (DT_VERSYM)]);
> -         ElfW(Half) ndx = vernum[ELFW(R_SYM) (reloc->r_info)] & 0x7fff;
> -         version = &l->l_versions[ndx];
> -         if (version->hash == 0)
> -           version = NULL;
> -       }
> -
> -      result = _dl_lookup_symbol_x (strtab + sym->st_name, l, &sym,
> -                                   l->l_scope, version, ELF_RTYPE_CLASS_PLT,
> -                                   DL_LOOKUP_ADD_DEPENDENCY, NULL);
> -    }
> -  else
> -    {
> -      /* We already found the symbol.  The module (and therefore its load
> -        address) is also known.  */
> -      result = l;
> -    }
> -
> -  if (!sym)
> -    {
> -      td->arg = (void*) reloc->r_addend;
> -      td->entry = _dl_tlsdesc_undefweak;
> -    }
> -  else
> -    {
> -#  ifndef SHARED
> -      CHECK_STATIC_TLS (l, result);
> -#  else
> -      if (!TRY_STATIC_TLS (l, result))
> -       {
> -         td->arg = _dl_make_tlsdesc_dynamic (result, sym->st_value
> -                                             + reloc->r_addend);
> -         td->entry = _dl_tlsdesc_dynamic;
> -       }
> -      else
> -#  endif
> -       {
> -         td->arg = (void*) (sym->st_value - result->l_tls_offset
> -                            + reloc->r_addend);
> -         td->entry = _dl_tlsdesc_return;
> -       }
> -    }
> -
> -  _dl_tlsdesc_wake_up_held_fixups ();
> -}
> -
> -/* This function is used to avoid busy waiting for other threads to
> -   complete the lazy relocation.  Once another thread wins the race to
> -   relocate a TLS descriptor, it sets the descriptor up such that this
> -   function is called to wait until the resolver releases the
> -   lock.  */
> -
> -void
> -__attribute__ ((regparm (3))) attribute_hidden
> -_dl_tlsdesc_resolve_hold_fixup (struct tlsdesc volatile *td,
> -                               struct link_map *l __attribute__((__unused__)),
> -                               ptrdiff_t entry_check_offset)
> -{
> -  /* Maybe we're lucky and can return early.  */
> -  if (__builtin_return_address (0) - entry_check_offset != td->entry)
> -    return;
> -
> -  /* Locking here will stop execution until the running resolver runs
> -     _dl_tlsdesc_wake_up_held_fixups(), releasing the lock.
> -
> -     FIXME: We'd be better off waiting on a condition variable, such
> -     that we didn't have to hold the lock throughout the relocation
> -     processing.  */
> -  __rtld_lock_lock_recursive (GL(dl_load_lock));
> -  __rtld_lock_unlock_recursive (GL(dl_load_lock));
> -}
> -
> -
>  /* Unmap the dynamic object, but also release its TLS descriptor table
>     if there is one.  */
>
> --
> 2.17.1
>

LGTM.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/14] elf: Add a DTV setup test [BZ #27136]
  2021-04-13  8:18 ` [PATCH v2 02/14] elf: Add a DTV setup test " Szabolcs Nagy
@ 2021-04-14 18:06   ` Adhemerval Zanella
  2021-04-15  9:53     ` Szabolcs Nagy
  0 siblings, 1 reply; 44+ messages in thread
From: Adhemerval Zanella @ 2021-04-14 18:06 UTC (permalink / raw)
  To: libc-alpha, Szabolcs Nagy



On 13/04/2021 05:18, Szabolcs Nagy via Libc-alpha wrote:
> The test dlopens a large number of modules with TLS, they are reused
> from an existing test.
> 
> The test relies on the reuse of slotinfo entries after dlclose, without
> bug 27135 fixed this needs a failing dlopen. With a slotinfo list that
> has non-monotone increasing generation counters, bug 27136 can trigger.
> 
> --
> v2:
> - moved from nptl/ to elf/
> - dlopen a different set of existing modules.
> - code style fixes (!= NULL etc).
> - moved after bug fix patch

LGTM, with just a small suggestion below?

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  elf/Makefile           | 10 ++++-
>  elf/tst-tls20.c        | 98 ++++++++++++++++++++++++++++++++++++++++++
>  elf/tst-tls20mod-bad.c |  2 +
>  3 files changed, 109 insertions(+), 1 deletion(-)
>  create mode 100644 elf/tst-tls20.c
>  create mode 100644 elf/tst-tls20mod-bad.c
> 
> diff --git a/elf/Makefile b/elf/Makefile
> index 0bef49e53d..b1344f1133 100644
> --- a/elf/Makefile
> +++ b/elf/Makefile
> @@ -222,7 +222,8 @@ tests += restest1 preloadtest loadfail multiload origtest resolvfail \
>  	 tst-audit14 tst-audit15 tst-audit16 \
>  	 tst-single_threaded tst-single_threaded-pthread \
>  	 tst-tls-ie tst-tls-ie-dlmopen argv0test \
> -	 tst-glibc-hwcaps tst-glibc-hwcaps-prepend tst-glibc-hwcaps-mask
> +	 tst-glibc-hwcaps tst-glibc-hwcaps-prepend tst-glibc-hwcaps-mask \
> +	 tst-tls20
>  #	 reldep9
>  tests-internal += loadtest unload unload2 circleload1 \
>  	 neededtest neededtest2 neededtest3 neededtest4 \

Ok.

> @@ -344,6 +345,7 @@ modules-names = testobj1 testobj2 testobj3 testobj4 testobj5 testobj6 \
>  		libmarkermod2-1 libmarkermod2-2 \
>  		libmarkermod3-1 libmarkermod3-2 libmarkermod3-3 \
>  		libmarkermod4-1 libmarkermod4-2 libmarkermod4-3 libmarkermod4-4 \
> +		tst-tls20mod-bad
>  
>  # Most modules build with _ISOMAC defined, but those filtered out
>  # depend on internal headers.

Ok.

> @@ -1924,3 +1926,9 @@ $(objpfx)tst-rtld-help.out: $(objpfx)ld.so
>  	fi; \
>  	(exit $$status); \
>  	$(evaluate-test)
> +
> +# Reuses tst-tls-many-dynamic-modules
> +tst-tls20mod-bad.so-no-z-defs = yes
> +$(objpfx)tst-tls20: $(libdl) $(shared-thread-library)
> +$(objpfx)tst-tls20.out: $(objpfx)tst-tls20mod-bad.so \
> +			$(tst-tls-many-dynamic-modules:%=$(objpfx)%.so)
> diff --git a/elf/tst-tls20.c b/elf/tst-tls20.c

Ok.

> new file mode 100644
> index 0000000000..a378c1fa08
> --- /dev/null
> +++ b/elf/tst-tls20.c
> @@ -0,0 +1,98 @@
> +/* Test dtv setup if entries don't have monotone increasing generation.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <dlfcn.h>
> +#include <pthread.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <support/check.h>
> +#include <support/xdlfcn.h>
> +#include <support/xthread.h>
> +
> +#define NMOD 100
> +static void *mod[NMOD];
> +
> +static void
> +load_fail (void)
> +{
> +  /* Expected to fail because of a missing symbol.  */
> +  void *m = dlopen ("tst-tls20mod-bad.so", RTLD_NOW);
> +  if (m != NULL)
> +    FAIL_EXIT1 ("dlopen of tst-tls20mod-bad.so succeeded\n");
> +}
> +
> +static void
> +load_mod (int i)
> +{
> +  char buf[100];
> +  snprintf (buf, sizeof buf, "tst-tls-manydynamic%02dmod.so", i);
> +  mod[i] = xdlopen (buf, RTLD_LAZY);
> +}

Maybe xasprintf here?  

> +
> +static void
> +unload_mod (int i)
> +{
> +  if (mod[i] != NULL)
> +    xdlclose (mod[i]);
> +  mod[i] = NULL;
> +}
> +
> +static void
> +access (int i)
> +{
> +  char buf[100];
> +  snprintf (buf, sizeof buf, "tls_global_%02d", i);

Same as before.

> +  dlerror ();
> +  int *p = dlsym (mod[i], buf);
> +  printf ("mod[%d]: &tls = %p\n", i, p);
> +  if (p == NULL)
> +    FAIL_EXIT1 ("dlsym failed: %s\n", dlerror ());
> +  ++*p;
> +}
> +
> +static void *
> +start (void *a)
> +{
> +  for (int i = 0; i < NMOD; i++)
> +    if (mod[i] != NULL)
> +      access (i);
> +  return 0;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  int i;
> +
> +  for (i = 0; i < NMOD; i++)
> +    {
> +      load_mod (i);
> +      /* Bump the generation of mod[0] without using new dtv slot.  */
> +      unload_mod (0);
> +      load_fail (); /* Ensure GL(dl_tls_dtv_gaps) is true: see bug 27135.  */
> +      load_mod (0);
> +      /* Access TLS in all loaded modules.  */
> +      pthread_t t = xpthread_create (0, start, 0);
> +      xpthread_join (t);
> +    }
> +  for (i = 0; i < NMOD; i++)
> +    unload_mod (i);
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>
Ok.

> diff --git a/elf/tst-tls20mod-bad.c b/elf/tst-tls20mod-bad.c
> new file mode 100644
> index 0000000000..c1aed8ea7d
> --- /dev/null
> +++ b/elf/tst-tls20mod-bad.c
> @@ -0,0 +1,2 @@
> +void missing_symbol (void);
> +void f (void) {missing_symbol ();}
> 

Ok.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/14] elf: Fix comments and logic in _dl_add_to_slotinfo
  2021-04-13  8:18 ` [PATCH v2 03/14] elf: Fix comments and logic in _dl_add_to_slotinfo Szabolcs Nagy
@ 2021-04-14 18:12   ` Adhemerval Zanella
  0 siblings, 0 replies; 44+ messages in thread
From: Adhemerval Zanella @ 2021-04-14 18:12 UTC (permalink / raw)
  To: libc-alpha, Szabolcs Nagy



On 13/04/2021 05:18, Szabolcs Nagy via Libc-alpha wrote:
> Since
> 
>   commit a509eb117fac1d764b15eba64993f4bdb63d7f3c
>   Avoid late dlopen failure due to scope, TLS slotinfo updates [BZ #25112]
> 
> the generation counter update is not needed in the failure path.
> That commit ensures allocation in _dl_add_to_slotinfo happens before
> the demarcation point in dlopen (it is called twice, first time is for
> allocation only where dlopen can still be reverted on failure, then
> second time actual dtv updates are done which then cannot fail).


LGTM, the commit is more clear now.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> 
> --
> v2:
> - expanded the commit message a bit.
> ---
>  elf/dl-tls.c | 11 +----------
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 79b93ad91b..24d00c14ef 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -998,16 +998,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add)
>  		+ TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
>        if (listp == NULL)
>  	{
> -	  /* We ran out of memory.  We will simply fail this
> -	     call but don't undo anything we did so far.  The
> -	     application will crash or be terminated anyway very
> -	     soon.  */
> -
> -	  /* We have to do this since some entries in the dtv
> -	     slotinfo array might already point to this
> -	     generation.  */
> -	  ++GL(dl_tls_generation);
> -
> +	  /* We ran out of memory while resizing the dtv slotinfo list.  */
>  	  _dl_signal_error (ENOMEM, "dlopen", NULL, N_("\
>  cannot create TLS data structures"));
>  	}
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 04/14] elf: Refactor _dl_update_slotinfo to avoid use after free
  2021-04-13  8:18 ` [PATCH v2 04/14] elf: Refactor _dl_update_slotinfo to avoid use after free Szabolcs Nagy
@ 2021-04-14 18:20   ` Adhemerval Zanella
  0 siblings, 0 replies; 44+ messages in thread
From: Adhemerval Zanella @ 2021-04-14 18:20 UTC (permalink / raw)
  To: libc-alpha, Szabolcs Nagy



On 13/04/2021 05:18, Szabolcs Nagy via Libc-alpha wrote:
> map is not valid to access here because it can be freed by a concurrent
> dlclose: during tls access (via __tls_get_addr) _dl_update_slotinfo is
> called without holding dlopen locks. So don't check the modid of map.
> 
> The map == 0 and map != 0 code paths can be shared (avoiding the dtv
> resize in case of map == 0 is just an optimization: larger dtv than
> necessary would be fine too).
> 
> --
> v2:
> - commit message update

LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  elf/dl-tls.c | 21 +++++----------------
>  1 file changed, 5 insertions(+), 16 deletions(-)
> 
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 24d00c14ef..f8b32b3ecb 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -743,6 +743,8 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  	{
>  	  for (size_t cnt = total == 0 ? 1 : 0; cnt < listp->len; ++cnt)
>  	    {
> +	      size_t modid = total + cnt;
> +
>  	      size_t gen = listp->slotinfo[cnt].gen;
>  
>  	      if (gen > new_gen)
> @@ -758,25 +760,12 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  
>  	      /* If there is no map this means the entry is empty.  */
>  	      struct link_map *map = listp->slotinfo[cnt].map;
> -	      if (map == NULL)
> -		{
> -		  if (dtv[-1].counter >= total + cnt)
> -		    {
> -		      /* If this modid was used at some point the memory
> -			 might still be allocated.  */
> -		      free (dtv[total + cnt].pointer.to_free);
> -		      dtv[total + cnt].pointer.val = TLS_DTV_UNALLOCATED;
> -		      dtv[total + cnt].pointer.to_free = NULL;
> -		    }
> -
> -		  continue;
> -		}
> -
>  	      /* Check whether the current dtv array is large enough.  */
> -	      size_t modid = map->l_tls_modid;
> -	      assert (total + cnt == modid);
>  	      if (dtv[-1].counter < modid)
>  		{
> +		  if (map == NULL)
> +		    continue;
> +
>  		  /* Resize the dtv.  */
>  		  dtv = _dl_resize_dtv (dtv);
>  
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/14] elf: Add a DTV setup test [BZ #27136]
  2021-04-14 18:06   ` Adhemerval Zanella
@ 2021-04-15  9:53     ` Szabolcs Nagy
  0 siblings, 0 replies; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-15  9:53 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 995 bytes --]

The 04/14/2021 15:06, Adhemerval Zanella wrote:
> On 13/04/2021 05:18, Szabolcs Nagy via Libc-alpha wrote:
> > The test dlopens a large number of modules with TLS, they are reused
> > from an existing test.
> > 
> > The test relies on the reuse of slotinfo entries after dlclose, without
> > bug 27135 fixed this needs a failing dlopen. With a slotinfo list that
> > has non-monotone increasing generation counters, bug 27136 can trigger.
> > 
> > --
> > v2:
> > - moved from nptl/ to elf/
> > - dlopen a different set of existing modules.
> > - code style fixes (!= NULL etc).
> > - moved after bug fix patch
> 
> LGTM, with just a small suggestion below?
> 
> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

thanks

> > +{
> > +  char buf[100];
> > +  snprintf (buf, sizeof buf, "tst-tls-manydynamic%02dmod.so", i);
> > +  mod[i] = xdlopen (buf, RTLD_LAZY);
> > +}
> 
> Maybe xasprintf here?  

committed with this change, but made a mistake

committed patches are attached.

[-- Attachment #2: 0001-elf-Fix-missing-include-in-test-case-BZ-27136.patch --]
[-- Type: text/x-diff, Size: 719 bytes --]

From 52290d8c04569615fb011ee286d52dc5147afbd7 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date: Thu, 15 Apr 2021 09:57:10 +0100
Subject: [PATCH] elf: Fix missing include in test case [BZ #27136]

Broken test was introduced in

  commit 8f85075a2e9c26ff7486d4bbaf358999807d215c
  elf: Add a DTV setup test [BZ #27136]
---
 elf/tst-tls20.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/elf/tst-tls20.c b/elf/tst-tls20.c
index ac5f8c8d39..9977ec8032 100644
--- a/elf/tst-tls20.c
+++ b/elf/tst-tls20.c
@@ -21,6 +21,7 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <support/check.h>
+#include <support/support.h>
 #include <support/xdlfcn.h>
 #include <support/xthread.h>
 
-- 
2.17.1


[-- Attachment #3: 0001-elf-Add-a-DTV-setup-test-BZ-27136.patch --]
[-- Type: text/x-diff, Size: 5109 bytes --]

From 8f85075a2e9c26ff7486d4bbaf358999807d215c Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date: Thu, 31 Dec 2020 12:24:38 +0000
Subject: [PATCH] elf: Add a DTV setup test [BZ #27136]

The test dlopens a large number of modules with TLS, they are reused
from an existing test.

The test relies on the reuse of slotinfo entries after dlclose, without
bug 27135 fixed this needs a failing dlopen. With a slotinfo list that
has non-monotone increasing generation counters, bug 27136 can trigger.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
---
 elf/Makefile           | 10 ++++-
 elf/tst-tls20.c        | 98 ++++++++++++++++++++++++++++++++++++++++++
 elf/tst-tls20mod-bad.c |  2 +
 3 files changed, 109 insertions(+), 1 deletion(-)
 create mode 100644 elf/tst-tls20.c
 create mode 100644 elf/tst-tls20mod-bad.c

diff --git a/elf/Makefile b/elf/Makefile
index de659655dc..c531470ede 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -222,7 +222,8 @@ tests += restest1 preloadtest loadfail multiload origtest resolvfail \
 	 tst-audit14 tst-audit15 tst-audit16 \
 	 tst-single_threaded tst-single_threaded-pthread \
 	 tst-tls-ie tst-tls-ie-dlmopen argv0test \
-	 tst-glibc-hwcaps tst-glibc-hwcaps-prepend tst-glibc-hwcaps-mask
+	 tst-glibc-hwcaps tst-glibc-hwcaps-prepend tst-glibc-hwcaps-mask \
+	 tst-tls20
 #	 reldep9
 tests-internal += loadtest unload unload2 circleload1 \
 	 neededtest neededtest2 neededtest3 neededtest4 \
@@ -344,6 +345,7 @@ modules-names = testobj1 testobj2 testobj3 testobj4 testobj5 testobj6 \
 		libmarkermod2-1 libmarkermod2-2 \
 		libmarkermod3-1 libmarkermod3-2 libmarkermod3-3 \
 		libmarkermod4-1 libmarkermod4-2 libmarkermod4-3 libmarkermod4-4 \
+		tst-tls20mod-bad
 
 # Most modules build with _ISOMAC defined, but those filtered out
 # depend on internal headers.
@@ -1922,3 +1924,9 @@ $(objpfx)tst-rtld-help.out: $(objpfx)ld.so
 	fi; \
 	(exit $$status); \
 	$(evaluate-test)
+
+# Reuses tst-tls-many-dynamic-modules
+tst-tls20mod-bad.so-no-z-defs = yes
+$(objpfx)tst-tls20: $(libdl) $(shared-thread-library)
+$(objpfx)tst-tls20.out: $(objpfx)tst-tls20mod-bad.so \
+			$(tst-tls-many-dynamic-modules:%=$(objpfx)%.so)
diff --git a/elf/tst-tls20.c b/elf/tst-tls20.c
new file mode 100644
index 0000000000..ac5f8c8d39
--- /dev/null
+++ b/elf/tst-tls20.c
@@ -0,0 +1,98 @@
+/* Test dtv setup if entries don't have monotone increasing generation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <dlfcn.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <support/check.h>
+#include <support/xdlfcn.h>
+#include <support/xthread.h>
+
+#define NMOD 100
+static void *mod[NMOD];
+
+static void
+load_fail (void)
+{
+  /* Expected to fail because of a missing symbol.  */
+  void *m = dlopen ("tst-tls20mod-bad.so", RTLD_NOW);
+  if (m != NULL)
+    FAIL_EXIT1 ("dlopen of tst-tls20mod-bad.so succeeded\n");
+}
+
+static void
+load_mod (int i)
+{
+  char *buf = xasprintf ("tst-tls-manydynamic%02dmod.so", i);
+  mod[i] = xdlopen (buf, RTLD_LAZY);
+  free (buf);
+}
+
+static void
+unload_mod (int i)
+{
+  if (mod[i] != NULL)
+    xdlclose (mod[i]);
+  mod[i] = NULL;
+}
+
+static void
+access (int i)
+{
+  char *buf = xasprintf ("tls_global_%02d", i);
+  dlerror ();
+  int *p = dlsym (mod[i], buf);
+  printf ("mod[%d]: &tls = %p\n", i, p);
+  if (p == NULL)
+    FAIL_EXIT1 ("dlsym failed: %s\n", dlerror ());
+  ++*p;
+  free (buf);
+}
+
+static void *
+start (void *a)
+{
+  for (int i = 0; i < NMOD; i++)
+    if (mod[i] != NULL)
+      access (i);
+  return 0;
+}
+
+static int
+do_test (void)
+{
+  int i;
+
+  for (i = 0; i < NMOD; i++)
+    {
+      load_mod (i);
+      /* Bump the generation of mod[0] without using new dtv slot.  */
+      unload_mod (0);
+      load_fail (); /* Ensure GL(dl_tls_dtv_gaps) is true: see bug 27135.  */
+      load_mod (0);
+      /* Access TLS in all loaded modules.  */
+      pthread_t t = xpthread_create (0, start, 0);
+      xpthread_join (t);
+    }
+  for (i = 0; i < NMOD; i++)
+    unload_mod (i);
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/elf/tst-tls20mod-bad.c b/elf/tst-tls20mod-bad.c
new file mode 100644
index 0000000000..c1aed8ea7d
--- /dev/null
+++ b/elf/tst-tls20mod-bad.c
@@ -0,0 +1,2 @@
+void missing_symbol (void);
+void f (void) {missing_symbol ();}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 05/14] elf: Fix data races in pthread_create and TLS access [BZ #19329]
  2021-04-13  8:19 ` [PATCH v2 05/14] elf: Fix data races in pthread_create and TLS access [BZ #19329] Szabolcs Nagy
@ 2021-04-15 17:44   ` Adhemerval Zanella
  0 siblings, 0 replies; 44+ messages in thread
From: Adhemerval Zanella @ 2021-04-15 17:44 UTC (permalink / raw)
  To: libc-alpha, Szabolcs Nagy



On 13/04/2021 05:19, Szabolcs Nagy via Libc-alpha wrote:
> DTV setup at thread creation (_dl_allocate_tls_init) is changed
> to take the dlopen lock, GL(dl_load_lock).  Avoiding data races
> here without locks would require design changes: the map that is
> accessed for static TLS initialization here may be concurrently
> freed by dlclose.  That use after free may be solved by only
> locking around static TLS setup or by ensuring dlclose does not
> free modules with static TLS, however currently every link map
> with TLS has to be accessed at least to see if it needs static
> TLS.  And even if that's solved, still a lot of atomics would be
> needed to synchronize DTV related globals without a lock. So fix
> both bug 19329 and bug 27111 with a lock that prevents DTV setup
> running concurrently with dlopen or dlclose.

It sounds reasonable and I think the extra locks should be ok
performance-wise (concurrent dlopen/dlclose should not be an
hotspot operation in majority of the cases).

> 
> _dl_update_slotinfo at TLS access still does not use any locks
> so CONCURRENCY NOTES are added to explain the synchronization.
> The early exit from the slotinfo walk when max_modid is reached
> is not strictly necessary, but does not hurt either.
> 
> An incorrect acquire load was removed from _dl_resize_dtv: it
> did not synchronize with any release store or fence and
> synchronization is now handled separately at thread creation
> and TLS access time.

Reading the fix that add the acquire load (d8dd00805b8) it seems
the idea was to use a sequentially-consistent ordering, since 
previously GL(dl_tls_max_dtv_idx) was not accessed using atomic
operations.

> 
> There are still a number of racy read accesses to globals that
> will be changed to relaxed MO atomics in a followup patch. This
> should not introduce regressions compared to existing behaviour
> and avoid cluttering the main part of the fix.
> 
> Not all TLS access related data races got fixed here: there are
> additional races at lazy tlsdesc relocations see bug 27137.

LGTM, thanks.  There is only a misspelled word and in minor style issue
in the comments below.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>


> ---
>  elf/dl-tls.c | 63 +++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 47 insertions(+), 16 deletions(-)
> 
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index f8b32b3ecb..33c06782b1 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -471,14 +471,11 @@ extern dtv_t _dl_static_dtv[];
>  #endif
>  
>  static dtv_t *
> -_dl_resize_dtv (dtv_t *dtv)
> +_dl_resize_dtv (dtv_t *dtv, size_t max_modid)
>  {
>    /* Resize the dtv.  */
>    dtv_t *newp;
> -  /* Load GL(dl_tls_max_dtv_idx) atomically since it may be written to by
> -     other threads concurrently.  */
> -  size_t newsize
> -    = atomic_load_acquire (&GL(dl_tls_max_dtv_idx)) + DTV_SURPLUS;
> +  size_t newsize = max_modid + DTV_SURPLUS;
>    size_t oldsize = dtv[-1].counter;
>  
>    if (dtv == GL(dl_initial_dtv))

Ok.

> @@ -524,11 +521,14 @@ _dl_allocate_tls_init (void *result)
>    size_t total = 0;
>    size_t maxgen = 0;
>  
> +  /* Protects global dynamic TLS related state.  */
> +  __rtld_lock_lock_recursive (GL(dl_load_lock));
> +
>    /* Check if the current dtv is big enough.   */
>    if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
>      {
>        /* Resize the dtv.  */
> -      dtv = _dl_resize_dtv (dtv);
> +      dtv = _dl_resize_dtv (dtv, GL(dl_tls_max_dtv_idx));
>  
>        /* Install this new dtv in the thread data structures.  */
>        INSTALL_DTV (result, &dtv[-1]);

Ok.

> @@ -596,6 +596,7 @@ _dl_allocate_tls_init (void *result)
>        listp = listp->next;
>        assert (listp != NULL);
>      }
> +  __rtld_lock_unlock_recursive (GL(dl_load_lock));
>  
>    /* The DTV version is up-to-date now.  */
>    dtv[0].counter = maxgen;

Ok.

> @@ -730,12 +731,29 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  
>    if (dtv[0].counter < listp->slotinfo[idx].gen)
>      {
> -      /* The generation counter for the slot is higher than what the
> -	 current dtv implements.  We have to update the whole dtv but
> -	 only those entries with a generation counter <= the one for
> -	 the entry we need.  */
> +      /* CONCURRENCY NOTES:
> +
> +	 Here the dtv needs to be updated to new_gen generation count.
> +
> +	 This code may be called during TLS access when GL(dl_load_lock)
> +	 is not held.  In that case the user code has to synchrnize with

s/synchrnize/synchronize

> +	 dlopen and dlclose calls of relevant modules.  A module m is
> +	 relevant if the generation of m <= new_gen and dlclose of m is
> +	 synchronized: a memory access here happens after the dlopen and
> +	 before the dlclose of relevant modules.  The dtv entries for
> +	 relevant modules need to be updated, other entries can be
> +	 arbitrary.
> +
> +	 This e.g. means that the first part of the slotinfo list can be
> +	 accessed race free, but the tail may be concurrently extended.
> +	 Similarly relevant slotinfo entries can be read race free, but
> +	 other entries are racy.  However updating a non-relevant dtv
> +	 entry does not affect correctness.  For a relevant module m,
> +	 max_modid >= modid of m.  */
>        size_t new_gen = listp->slotinfo[idx].gen;
>        size_t total = 0;
> +      size_t max_modid  = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx));
> +      assert (max_modid >= req_modid);
>  
>        /* We have to look through the entire dtv slotinfo list.  */
>        listp =  GL(dl_tls_dtv_slotinfo_list);

Ok.

> @@ -745,12 +763,14 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  	    {
>  	      size_t modid = total + cnt;
>  
> +	      /* Later entries are not relevant.  */
> +	      if (modid > max_modid)
> +		break;
> +
>  	      size_t gen = listp->slotinfo[cnt].gen;
>  
>  	      if (gen > new_gen)
> -		/* This is a slot for a generation younger than the
> -		   one we are handling now.  It might be incompletely
> -		   set up so ignore it.  */
> +		/* Not relevant.  */
>  		continue;
>  
>  	      /* If the entry is older than the current dtv layout we

Ok.

> @@ -767,7 +787,7 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  		    continue;
>  
>  		  /* Resize the dtv.  */
> -		  dtv = _dl_resize_dtv (dtv);
> +		  dtv = _dl_resize_dtv (dtv, max_modid);
>  
>  		  assert (modid <= dtv[-1].counter);
>  

Ok.

> @@ -789,8 +809,17 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  	    }
>  
>  	  total += listp->len;
> +	  if (total > max_modid)
> +	    break;
> +
> +	  /* Synchronize with _dl_add_to_slotinfo. Ideally this would

Double space after period.

> +	     be consume MO since we only need to order the accesses to
> +	     the next node after the read of the address and on most
> +	     hardware (other than alpha) a normal load would do that
> +	     because of the address dependency.  */
> +	  listp = atomic_load_acquire (&listp->next);
>  	}
> -      while ((listp = listp->next) != NULL);
> +      while (listp != NULL);
>  
>        /* This will be the new maximum generation counter.  */
>        dtv[0].counter = new_gen;

Ok.

> @@ -982,7 +1011,7 @@ _dl_add_to_slotinfo (struct link_map *l, bool do_add)
>  	 the first slot.  */
>        assert (idx == 0);
>  
> -      listp = prevp->next = (struct dtv_slotinfo_list *)
> +      listp = (struct dtv_slotinfo_list *)
>  	malloc (sizeof (struct dtv_slotinfo_list)
>  		+ TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
>        if (listp == NULL)
> @@ -996,6 +1025,8 @@ cannot create TLS data structures"));
>        listp->next = NULL;
>        memset (listp->slotinfo, '\0',
>  	      TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
> +      /* Synchronize with _dl_update_slotinfo.  */
> +      atomic_store_release (&prevp->next, listp);
>      }
>  
>    /* Add the information into the slotinfo data structure.  */
> 
 Ok.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 06/14] elf: Use relaxed atomics for racy accesses [BZ #19329]
  2021-04-13  8:19 ` [PATCH v2 06/14] elf: Use relaxed atomics for racy accesses " Szabolcs Nagy
@ 2021-04-15 18:21   ` Adhemerval Zanella
  2021-04-16  9:12     ` Szabolcs Nagy
  0 siblings, 1 reply; 44+ messages in thread
From: Adhemerval Zanella @ 2021-04-15 18:21 UTC (permalink / raw)
  To: libc-alpha, Szabolcs Nagy



On 13/04/2021 05:19, Szabolcs Nagy via Libc-alpha wrote:
> This is a follow up patch to the fix for bug 19329.  This adds
> relaxed MO atomics to accesses that are racy, but relaxed MO is
> enough.

Could you extend a bit why relaxed MO should be suffice?

Patch looks good, just a small request below.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> 
> --
> v2:
> - handle x86_64 dl-tls.c too
> ---
>  elf/dl-close.c          | 20 +++++++++++++-------
>  elf/dl-open.c           |  5 ++++-
>  elf/dl-tls.c            | 31 +++++++++++++++++++++++--------
>  sysdeps/x86_64/dl-tls.c |  3 ++-
>  4 files changed, 42 insertions(+), 17 deletions(-)
> 
> diff --git a/elf/dl-close.c b/elf/dl-close.c
> index c51becd06b..3720e47dd1 100644
> --- a/elf/dl-close.c
> +++ b/elf/dl-close.c
> @@ -79,9 +79,10 @@ remove_slotinfo (size_t idx, struct dtv_slotinfo_list *listp, size_t disp,
>  	{
>  	  assert (old_map->l_tls_modid == idx);
>  
> -	  /* Mark the entry as unused. */
> -	  listp->slotinfo[idx - disp].gen = GL(dl_tls_generation) + 1;
> -	  listp->slotinfo[idx - disp].map = NULL;
> +	  /* Mark the entry as unused.  These can be read concurrently.  */
> +	  atomic_store_relaxed (&listp->slotinfo[idx - disp].gen,
> +				GL(dl_tls_generation) + 1);
> +	  atomic_store_relaxed (&listp->slotinfo[idx - disp].map, NULL);
>  	}
>  
>        /* If this is not the last currently used entry no need to look

Ok.

> @@ -96,8 +97,8 @@ remove_slotinfo (size_t idx, struct dtv_slotinfo_list *listp, size_t disp,
>  
>        if (listp->slotinfo[idx - disp].map != NULL)
>  	{
> -	  /* Found a new last used index.  */
> -	  GL(dl_tls_max_dtv_idx) = idx;
> +	  /* Found a new last used index.  This can be read concurrently.  */
> +	  atomic_store_relaxed (&GL(dl_tls_max_dtv_idx), idx);
>  	  return true;
>  	}
>      }

Ok.

> @@ -571,7 +572,9 @@ _dl_close_worker (struct link_map *map, bool force)
>  					GL(dl_tls_dtv_slotinfo_list), 0,
>  					imap->l_init_called))
>  		/* All dynamically loaded modules with TLS are unloaded.  */
> -		GL(dl_tls_max_dtv_idx) = GL(dl_tls_static_nelem);
> +		/* Can be read concurrently.  */
> +		atomic_store_relaxed (&GL(dl_tls_max_dtv_idx),
> +				      GL(dl_tls_static_nelem));
>  
>  	      if (imap->l_tls_offset != NO_TLS_OFFSET
>  		  && imap->l_tls_offset != FORCED_DYNAMIC_TLS_OFFSET)

Ok.

> @@ -769,8 +772,11 @@ _dl_close_worker (struct link_map *map, bool force)
>    /* If we removed any object which uses TLS bump the generation counter.  */
>    if (any_tls)
>      {
> -      if (__glibc_unlikely (++GL(dl_tls_generation) == 0))
> +      size_t newgen = GL(dl_tls_generation) + 1;
> +      if (__glibc_unlikely (newgen == 0))
>  	_dl_fatal_printf ("TLS generation counter wrapped!  Please report as described in "REPORT_BUGS_TO".\n");
> +      /* Can be read concurrently.  */
> +      atomic_store_relaxed (&GL(dl_tls_generation), newgen);
>  
>        if (tls_free_end == GL(dl_tls_static_used))
>  	GL(dl_tls_static_used) = tls_free_start;

Ok.

> diff --git a/elf/dl-open.c b/elf/dl-open.c
> index ab7aaa345e..83b8e96a5c 100644
> --- a/elf/dl-open.c
> +++ b/elf/dl-open.c
> @@ -395,9 +395,12 @@ update_tls_slotinfo (struct link_map *new)
>  	}
>      }
>  
> -  if (__builtin_expect (++GL(dl_tls_generation) == 0, 0))
> +  size_t newgen = GL(dl_tls_generation) + 1;
> +  if (__builtin_expect (newgen == 0, 0))
>      _dl_fatal_printf (N_("\

Use __glibc_unlikely since you are modifying it.

>  TLS generation counter wrapped!  Please report this."));
> +  /* Can be read concurrently.  */
> +  atomic_store_relaxed (&GL(dl_tls_generation), newgen);
>  
>    /* We need a second pass for static tls data, because
>       _dl_update_slotinfo must not be run while calls to

Ok.

> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 33c06782b1..c4466bd9fc 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -175,7 +175,9 @@ _dl_next_tls_modid (void)
>        /* No gaps, allocate a new entry.  */
>      nogaps:
>  
> -      result = ++GL(dl_tls_max_dtv_idx);
> +      result = GL(dl_tls_max_dtv_idx) + 1;
> +      /* Can be read concurrently.  */
> +      atomic_store_relaxed (&GL(dl_tls_max_dtv_idx), result);
>      }
>  
>    return result;

Ok.

> @@ -359,10 +361,12 @@ allocate_dtv (void *result)
>    dtv_t *dtv;
>    size_t dtv_length;
>  
> +  /* Relaxed MO, because the dtv size is later rechecked, not relied on.  */
> +  size_t max_modid = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx));
>    /* We allocate a few more elements in the dtv than are needed for the
>       initial set of modules.  This should avoid in most cases expansions
>       of the dtv.  */
> -  dtv_length = GL(dl_tls_max_dtv_idx) + DTV_SURPLUS;
> +  dtv_length = max_modid + DTV_SURPLUS;
>    dtv = calloc (dtv_length + 2, sizeof (dtv_t));
>    if (dtv != NULL)
>      {

Ok.

> @@ -767,7 +771,7 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  	      if (modid > max_modid)
>  		break;
>  
> -	      size_t gen = listp->slotinfo[cnt].gen;
> +	      size_t gen = atomic_load_relaxed (&listp->slotinfo[cnt].gen);
>  
>  	      if (gen > new_gen)
>  		/* Not relevant.  */

Ok.

> @@ -779,7 +783,8 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  		continue;
>  
>  	      /* If there is no map this means the entry is empty.  */
> -	      struct link_map *map = listp->slotinfo[cnt].map;
> +	      struct link_map *map
> +		= atomic_load_relaxed (&listp->slotinfo[cnt].map);
>  	      /* Check whether the current dtv array is large enough.  */
>  	      if (dtv[-1].counter < modid)
>  		{

OK.

> @@ -923,7 +928,12 @@ __tls_get_addr (GET_ADDR_ARGS)
>  {
>    dtv_t *dtv = THREAD_DTV ();
>  
> -  if (__glibc_unlikely (dtv[0].counter != GL(dl_tls_generation)))
> +  /* Update is needed if dtv[0].counter < the generation of the accessed
> +     module.  The global generation counter is used here as it is easier
> +     to check.  Synchronization for the relaxed MO access is guaranteed
> +     by user code, see CONCURRENCY NOTES in _dl_update_slotinfo.  */
> +  size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
> +  if (__glibc_unlikely (dtv[0].counter != gen))
>      return update_get_addr (GET_ADDR_PARAM);
>  
>    void *p = dtv[GET_ADDR_MODULE].pointer.val;

Ok.

> @@ -946,7 +956,10 @@ _dl_tls_get_addr_soft (struct link_map *l)
>      return NULL;
>  
>    dtv_t *dtv = THREAD_DTV ();
> -  if (__glibc_unlikely (dtv[0].counter != GL(dl_tls_generation)))
> +  /* This may be called without holding the GL(dl_load_lock).  Reading
> +     arbitrary gen value is fine since this is best effort code.  */
> +  size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
> +  if (__glibc_unlikely (dtv[0].counter != gen))
>      {
>        /* This thread's DTV is not completely current,
>  	 but it might already cover this module.  */

Ok.

> @@ -1032,7 +1045,9 @@ cannot create TLS data structures"));
>    /* Add the information into the slotinfo data structure.  */
>    if (do_add)
>      {
> -      listp->slotinfo[idx].map = l;
> -      listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
> +      /* Can be read concurrently.  See _dl_update_slotinfo.  */
> +      atomic_store_relaxed (&listp->slotinfo[idx].map, l);
> +      atomic_store_relaxed (&listp->slotinfo[idx].gen,
> +			    GL(dl_tls_generation) + 1);
>      }
>  }

Ok.

> diff --git a/sysdeps/x86_64/dl-tls.c b/sysdeps/x86_64/dl-tls.c
> index 6595f6615b..24ef560b71 100644
> --- a/sysdeps/x86_64/dl-tls.c
> +++ b/sysdeps/x86_64/dl-tls.c
> @@ -40,7 +40,8 @@ __tls_get_addr_slow (GET_ADDR_ARGS)
>  {
>    dtv_t *dtv = THREAD_DTV ();
>  
> -  if (__glibc_unlikely (dtv[0].counter != GL(dl_tls_generation)))
> +  size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
> +  if (__glibc_unlikely (dtv[0].counter != gen))
>      return update_get_addr (GET_ADDR_PARAM);
>  
>    return tls_get_addr_tail (GET_ADDR_PARAM, dtv, NULL);
> 

Ok.

X86_64 also access dl_tls_generation on sysdeps/x86_64/tls_get_addr.S,
but I afaik the default memory ordering for x86_64 already guarantee
relaxed MO.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 07/14] elf: Add test case for [BZ #19329]
  2021-04-13  8:19 ` [PATCH v2 07/14] elf: Add test case for " Szabolcs Nagy
@ 2021-04-15 19:21   ` Adhemerval Zanella
  0 siblings, 0 replies; 44+ messages in thread
From: Adhemerval Zanella @ 2021-04-15 19:21 UTC (permalink / raw)
  To: libc-alpha, Szabolcs Nagy



On 13/04/2021 05:19, Szabolcs Nagy via Libc-alpha wrote:
> Test concurrent dlopen and pthread_create when the loaded modules have
> TLS.  This triggers dl-tls assertion failures more reliably than the
> nptl/tst-stack4 test.
> 
> The dlopened module has 100 DT_NEEDED dependencies with TLS, they were
> reused from an existing TLS test. The number of created threads during
> dlopen depends on filesystem speed and hardware, but at most 3 threads
> are alive at a time to limit resource usage.
> 
> ---
> v5:
> - moved from nptl/ to elf/
> - use many tls modules from another test (affects the timing a bit as
>   these need more relocation processing, but still detects the race)
> - use xdlopen
> - use stdatomic.h
> - moved from tests-internal back to tests
> - use acq/rel atomics instead of relaxed, this works too and cleaner.
> - moved after bug fix patch.
> 
> v4:
> - rebased, updated copyright year.
> - moved to tests-internal because of <atomic.h>
> 
> v3:
> - use the new test support code.
> - better Makefile usage so modules are cleaned properly.
> 
> v2:
> - undef NDEBUG.
> - join nop threads so at most 3 threads exist at a time.
> - remove stack size setting (resource usage is no longer a concern).
> - stop creating threads after dlopen observably finished.
> - print the number of threads created for debugging.

LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  elf/Makefile       |  9 ++++--
>  elf/tst-tls21.c    | 68 ++++++++++++++++++++++++++++++++++++++++++++++
>  elf/tst-tls21mod.c |  1 +
>  3 files changed, 76 insertions(+), 2 deletions(-)
>  create mode 100644 elf/tst-tls21.c
>  create mode 100644 elf/tst-tls21mod.c
> 
> diff --git a/elf/Makefile b/elf/Makefile
> index b1344f1133..e9377608eb 100644
> --- a/elf/Makefile
> +++ b/elf/Makefile
> @@ -223,7 +223,7 @@ tests += restest1 preloadtest loadfail multiload origtest resolvfail \
>  	 tst-single_threaded tst-single_threaded-pthread \
>  	 tst-tls-ie tst-tls-ie-dlmopen argv0test \
>  	 tst-glibc-hwcaps tst-glibc-hwcaps-prepend tst-glibc-hwcaps-mask \
> -	 tst-tls20
> +	 tst-tls20 tst-tls21
>  #	 reldep9
>  tests-internal += loadtest unload unload2 circleload1 \
>  	 neededtest neededtest2 neededtest3 neededtest4 \

Ok.

> @@ -345,7 +345,7 @@ modules-names = testobj1 testobj2 testobj3 testobj4 testobj5 testobj6 \
>  		libmarkermod2-1 libmarkermod2-2 \
>  		libmarkermod3-1 libmarkermod3-2 libmarkermod3-3 \
>  		libmarkermod4-1 libmarkermod4-2 libmarkermod4-3 libmarkermod4-4 \
> -		tst-tls20mod-bad
> +		tst-tls20mod-bad tst-tls21mod
>  
>  # Most modules build with _ISOMAC defined, but those filtered out
>  # depend on internal headers.

Ok.

> @@ -1932,3 +1932,8 @@ tst-tls20mod-bad.so-no-z-defs = yes
>  $(objpfx)tst-tls20: $(libdl) $(shared-thread-library)
>  $(objpfx)tst-tls20.out: $(objpfx)tst-tls20mod-bad.so \
>  			$(tst-tls-many-dynamic-modules:%=$(objpfx)%.so)
> +
> +# Reuses tst-tls-many-dynamic-modules
> +$(objpfx)tst-tls21: $(libdl) $(shared-thread-library)
> +$(objpfx)tst-tls21.out: $(objpfx)tst-tls21mod.so
> +$(objpfx)tst-tls21mod.so: $(tst-tls-many-dynamic-modules:%=$(objpfx)%.so)

Ok.

> diff --git a/elf/tst-tls21.c b/elf/tst-tls21.c
> new file mode 100644
> index 0000000000..560bf5813a
> --- /dev/null
> +++ b/elf/tst-tls21.c
> @@ -0,0 +1,68 @@
> +/* Test concurrent dlopen and pthread_create: BZ 19329.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <dlfcn.h>
> +#include <pthread.h>
> +#include <stdio.h>
> +#include <stdatomic.h>
> +#include <support/xdlfcn.h>
> +#include <support/xthread.h>
> +
> +#define THREADS 10000
> +
> +static atomic_int done;
> +
> +static void *
> +start (void *a)
> +{
> +  /* Load a module with many dependencies that each have TLS.  */
> +  xdlopen ("tst-tls21mod.so", RTLD_LAZY);
> +  atomic_store_explicit (&done, 1, memory_order_release);
> +  return 0;
> +}
> +
> +static void *
> +nop (void *a)
> +{
> +  return 0;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  pthread_t t1, t2;
> +  int i;
> +
> +  /* Load a module with lots of dependencies and TLS.  */
> +  t1 = xpthread_create (0, start, 0);
> +
> +  /* Concurrently create lots of threads until dlopen is observably done.  */
> +  for (i = 0; i < THREADS; i++)
> +    {
> +      if (atomic_load_explicit (&done, memory_order_acquire) != 0)
> +	break;
> +      t2 = xpthread_create (0, nop, 0);
> +      xpthread_join (t2);
> +    }
> +
> +  xpthread_join (t1);
> +  printf ("threads created during dlopen: %d\n", i);
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>

Ok, 

> diff --git a/elf/tst-tls21mod.c b/elf/tst-tls21mod.c
> new file mode 100644
> index 0000000000..206ece4fb3
> --- /dev/null
> +++ b/elf/tst-tls21mod.c
> @@ -0,0 +1 @@
> +int __thread x;
> 

Ok.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 08/14] elf: Fix DTV gap reuse logic [BZ #27135]
  2021-04-13  8:20 ` [PATCH v2 08/14] elf: Fix DTV gap reuse logic [BZ #27135] Szabolcs Nagy
@ 2021-04-15 19:45   ` Adhemerval Zanella
  2021-06-24  9:48   ` Florian Weimer
  1 sibling, 0 replies; 44+ messages in thread
From: Adhemerval Zanella @ 2021-04-15 19:45 UTC (permalink / raw)
  To: libc-alpha, Szabolcs Nagy



On 13/04/2021 05:20, Szabolcs Nagy via Libc-alpha wrote:
> For some reason only dlopen failure caused dtv gaps to be reused.
> 
> It is possible that the intent was to never reuse modids for a
> different module, but after dlopen failure all gaps are reused
> not just the ones caused by the unfinished dlopened.
> 
> So the code has to handle reused modids already which seems to
> work, however the data races at thread creation and tls access
> (see bug 19329 and bug 27111) may be more severe if slots are
> reused so this is scheduled after those fixes. I think fixing
> the races are not simpler if reuse is disallowed and reuse has
> other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are
> removed from the middle of the slotinfo list. The value does
> not have to be correct: incorrect true value causes the next
> modid query to do a slotinfo walk, incorrect false will leave
> gaps and new entries are added at the end.
> 
> Fixes bug 27135.

LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  elf/dl-close.c |  6 +++++-
>  elf/dl-open.c  | 10 ----------
>  elf/dl-tls.c   |  5 +----
>  3 files changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/elf/dl-close.c b/elf/dl-close.c
> index 3720e47dd1..9f31532f41 100644
> --- a/elf/dl-close.c
> +++ b/elf/dl-close.c
> @@ -88,7 +88,11 @@ remove_slotinfo (size_t idx, struct dtv_slotinfo_list *listp, size_t disp,
>        /* If this is not the last currently used entry no need to look
>  	 further.  */
>        if (idx != GL(dl_tls_max_dtv_idx))
> -	return true;
> +	{
> +	  /* There is an unused dtv entry in the middle.  */
> +	  GL(dl_tls_dtv_gaps) = true;
> +	  return true;
> +	}
>      }
>  
>    while (idx - disp > (disp == 0 ? 1 + GL(dl_tls_static_nelem) : 0))

Ok.

> diff --git a/elf/dl-open.c b/elf/dl-open.c
> index 83b8e96a5c..661f26977e 100644
> --- a/elf/dl-open.c
> +++ b/elf/dl-open.c
> @@ -890,16 +890,6 @@ no more namespaces available for dlmopen()"));
>  	 state if relocation failed, for example.  */
>        if (args.map)
>  	{
> -	  /* Maybe some of the modules which were loaded use TLS.
> -	     Since it will be removed in the following _dl_close call
> -	     we have to mark the dtv array as having gaps to fill the
> -	     holes.  This is a pessimistic assumption which won't hurt
> -	     if not true.  There is no need to do this when we are
> -	     loading the auditing DSOs since TLS has not yet been set
> -	     up.  */
> -	  if ((mode & __RTLD_AUDIT) == 0)
> -	    GL(dl_tls_dtv_gaps) = true;
> -
>  	  _dl_close_worker (args.map, true);
>  
>  	  /* All l_nodelete_pending objects should have been deleted

Ok.

> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index c4466bd9fc..b0257185e9 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -187,10 +187,7 @@ _dl_next_tls_modid (void)
>  size_t
>  _dl_count_modids (void)
>  {
> -  /* It is rare that we have gaps; see elf/dl-open.c (_dl_open) where
> -     we fail to load a module and unload it leaving a gap.  If we don't
> -     have gaps then the number of modids is the current maximum so
> -     return that.  */
> +  /* The count is the max unless dlclose or failed dlopen created gaps.  */
>    if (__glibc_likely (!GL(dl_tls_dtv_gaps)))
>      return GL(dl_tls_max_dtv_idx);
>  
> 

Ok.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/14] elf: Remove lazy tlsdesc relocation related code
  2021-04-13  8:21 ` [PATCH v2 13/14] elf: " Szabolcs Nagy
@ 2021-04-15 19:52   ` Adhemerval Zanella
  0 siblings, 0 replies; 44+ messages in thread
From: Adhemerval Zanella @ 2021-04-15 19:52 UTC (permalink / raw)
  To: libc-alpha, Szabolcs Nagy



On 13/04/2021 05:21, Szabolcs Nagy via Libc-alpha wrote:
> Remove generic tlsdesc code related to lazy tlsdesc processing since
> lazy tlsdesc relocation is no longer supported.  This includes removing
> GL(dl_load_lock) from _dl_make_tlsdesc_dynamic which is only called at
> load time when that lock is already held.
> 
> Added a documentation comment too.

LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  elf/tlsdeschtab.h         | 53 +++++----------------------------------
>  sysdeps/aarch64/tlsdesc.c |  1 -
>  sysdeps/arm/tlsdesc.c     |  1 -
>  sysdeps/i386/tlsdesc.c    |  1 -
>  sysdeps/x86_64/tlsdesc.c  |  1 -
>  5 files changed, 6 insertions(+), 51 deletions(-)
> 
> diff --git a/elf/tlsdeschtab.h b/elf/tlsdeschtab.h
> index 21695fd1e9..85bd0415e1 100644
> --- a/elf/tlsdeschtab.h
> +++ b/elf/tlsdeschtab.h
> @@ -78,6 +78,10 @@ map_generation (struct link_map *map)
>    return GL(dl_tls_generation) + 1;
>  }
>  
> +/* Returns the data pointer for a given map and tls offset that is used
> +   to fill in one of the GOT entries referenced by a TLSDESC relocation
> +   when using dynamic TLS.  This requires allocation, returns NULL on
> +   allocation failure.  */
>  void *
>  _dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset)
>  {
> @@ -85,18 +89,12 @@ _dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset)
>    void **entry;
>    struct tlsdesc_dynamic_arg *td, test;
>  
> -  /* FIXME: We could use a per-map lock here, but is it worth it?  */
> -  __rtld_lock_lock_recursive (GL(dl_load_lock));
> -
>    ht = map->l_mach.tlsdesc_table;
>    if (! ht)
>      {
>        ht = htab_create ();
>        if (! ht)
> -	{
> -	  __rtld_lock_unlock_recursive (GL(dl_load_lock));
> -	  return 0;
> -	}
> +	return 0;
>        map->l_mach.tlsdesc_table = ht;
>      }
>  

Ok.

> @@ -104,15 +102,11 @@ _dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset)
>    test.tlsinfo.ti_offset = ti_offset;
>    entry = htab_find_slot (ht, &test, 1, hash_tlsdesc, eq_tlsdesc);
>    if (! entry)
> -    {
> -      __rtld_lock_unlock_recursive (GL(dl_load_lock));
> -      return 0;
> -    }
> +    return 0;
>  
>    if (*entry)
>      {
>        td = *entry;
> -      __rtld_lock_unlock_recursive (GL(dl_load_lock));
>        return td;
>      }
>  
Ok.

> @@ -122,44 +116,9 @@ _dl_make_tlsdesc_dynamic (struct link_map *map, size_t ti_offset)
>       thread.  */
>    td->gen_count = map_generation (map);
>    td->tlsinfo = test.tlsinfo;
> -
> -  __rtld_lock_unlock_recursive (GL(dl_load_lock));
>    return td;
>  }
>  
>  # endif /* SHARED */
>  
> -/* The idea of the following two functions is to stop multiple threads
> -   from attempting to resolve the same TLS descriptor without busy
> -   waiting.  Ideally, we should be able to release the lock right
> -   after changing td->entry, and then using say a condition variable
> -   or a futex wake to wake up any waiting threads, but let's try to
> -   avoid introducing such dependencies.  */
> -
> -static int
> -__attribute__ ((unused))
> -_dl_tlsdesc_resolve_early_return_p (struct tlsdesc volatile *td, void *caller)
> -{
> -  if (caller != atomic_load_relaxed (&td->entry))
> -    return 1;
> -
> -  __rtld_lock_lock_recursive (GL(dl_load_lock));
> -  if (caller != atomic_load_relaxed (&td->entry))
> -    {
> -      __rtld_lock_unlock_recursive (GL(dl_load_lock));
> -      return 1;
> -    }
> -
> -  atomic_store_relaxed (&td->entry, _dl_tlsdesc_resolve_hold);
> -
> -  return 0;
> -}
> -
> -static void
> -__attribute__ ((unused))
> -_dl_tlsdesc_wake_up_held_fixups (void)
> -{
> -  __rtld_lock_unlock_recursive (GL(dl_load_lock));
> -}
> -
>  #endif

Ok.

> diff --git a/sysdeps/aarch64/tlsdesc.c b/sysdeps/aarch64/tlsdesc.c
> index 9b6a8b0ec4..3c1bcd8fe6 100644
> --- a/sysdeps/aarch64/tlsdesc.c
> +++ b/sysdeps/aarch64/tlsdesc.c
> @@ -22,7 +22,6 @@
>  #include <tls.h>
>  #include <dl-tlsdesc.h>
>  #include <dl-unmap-segments.h>
> -#define _dl_tlsdesc_resolve_hold 0
>  #include <tlsdeschtab.h>
>  
>  /* Unmap the dynamic object, but also release its TLS descriptor table

Ok.

> diff --git a/sysdeps/arm/tlsdesc.c b/sysdeps/arm/tlsdesc.c
> index 2f70adef5c..b3d1735af4 100644
> --- a/sysdeps/arm/tlsdesc.c
> +++ b/sysdeps/arm/tlsdesc.c
> @@ -20,7 +20,6 @@
>  #include <tls.h>
>  #include <dl-tlsdesc.h>
>  #include <dl-unmap-segments.h>
> -#define _dl_tlsdesc_resolve_hold 0
>  #include <tlsdeschtab.h>
>  
>  /* Unmap the dynamic object, but also release its TLS descriptor table

Ok.

> diff --git a/sysdeps/i386/tlsdesc.c b/sysdeps/i386/tlsdesc.c
> index 436a21f66b..364c7769ce 100644
> --- a/sysdeps/i386/tlsdesc.c
> +++ b/sysdeps/i386/tlsdesc.c
> @@ -20,7 +20,6 @@
>  #include <tls.h>
>  #include <dl-tlsdesc.h>
>  #include <dl-unmap-segments.h>
> -#define _dl_tlsdesc_resolve_hold 0
>  #include <tlsdeschtab.h>
>  
>  /* Unmap the dynamic object, but also release its TLS descriptor table

Ok.

> diff --git a/sysdeps/x86_64/tlsdesc.c b/sysdeps/x86_64/tlsdesc.c
> index ecf864d6ee..3bb80517a3 100644
> --- a/sysdeps/x86_64/tlsdesc.c
> +++ b/sysdeps/x86_64/tlsdesc.c
> @@ -20,7 +20,6 @@
>  #include <tls.h>
>  #include <dl-tlsdesc.h>
>  #include <dl-unmap-segments.h>
> -#define _dl_tlsdesc_resolve_hold 0
>  #include <tlsdeschtab.h>
>  
>  /* Unmap the dynamic object, but also release its TLS descriptor table
> 

Ok.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 06/14] elf: Use relaxed atomics for racy accesses [BZ #19329]
  2021-04-15 18:21   ` Adhemerval Zanella
@ 2021-04-16  9:12     ` Szabolcs Nagy
  2021-05-11  2:56       ` Carlos O'Donell
  0 siblings, 1 reply; 44+ messages in thread
From: Szabolcs Nagy @ 2021-04-16  9:12 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

The 04/15/2021 15:21, Adhemerval Zanella wrote:
> On 13/04/2021 05:19, Szabolcs Nagy via Libc-alpha wrote:
> > This is a follow up patch to the fix for bug 19329.  This adds
> > relaxed MO atomics to accesses that are racy, but relaxed MO is
> > enough.
> 
> Could you extend a bit why relaxed MO should be suffice?

is it ok to change the commit message to:

This is a follow up patch to the fix for bug 19329.  This adds relaxed
MO atomics to accesses that are racy, but relaxed MO is enough.

The racy accesses all follow the pattern that the write is behind the
dlopen lock, but a read can happen concurrently (e.g. during tls access)
without holding the lock.  For slotinfo entries the read value only
matters if it reads from a synchronized write in dlopen or dlclose,
otherwise the related dtv entry is not valid to access so it is fine to
leave it in inconsistent state. Same for GL(dl_tls_max_dtv_idx) and
GL(dl_tls_generation), but there we rely on the read value being larger
than the last written value that was synchronized.

> 
> Patch looks good, just a small request below.
> 
> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 06/14] elf: Use relaxed atomics for racy accesses [BZ #19329]
  2021-04-16  9:12     ` Szabolcs Nagy
@ 2021-05-11  2:56       ` Carlos O'Donell
  2021-05-11  9:31         ` Szabolcs Nagy
  0 siblings, 1 reply; 44+ messages in thread
From: Carlos O'Donell @ 2021-05-11  2:56 UTC (permalink / raw)
  To: Szabolcs Nagy, Adhemerval Zanella; +Cc: libc-alpha

On 4/16/21 5:12 AM, Szabolcs Nagy via Libc-alpha wrote:
> The 04/15/2021 15:21, Adhemerval Zanella wrote:
>> On 13/04/2021 05:19, Szabolcs Nagy via Libc-alpha wrote:
>>> This is a follow up patch to the fix for bug 19329.  This adds
>>> relaxed MO atomics to accesses that are racy, but relaxed MO is
>>> enough.
>>
>> Could you extend a bit why relaxed MO should be suffice?
> 
> is it ok to change the commit message to:
> 
> This is a follow up patch to the fix for bug 19329.  This adds relaxed
> MO atomics to accesses that are racy, but relaxed MO is enough.

Suggest:

This is a follow up patch to the fix for bug 19329.  This adds relaxed
MO atomics to accesses that were previously data races but are now
race conditions, and where relaxed MO is sufficient.
 
> The racy accesses all follow the pattern that the write is behind the

s/racy accesses/race conditions/g

> dlopen lock, but a read can happen concurrently (e.g. during tls access)
> without holding the lock.  For slotinfo entries the read value only
> matters if it reads from a synchronized write in dlopen or dlclose,
> otherwise the related dtv entry is not valid to access so it is fine to
> leave it in inconsistent state. Same for GL(dl_tls_max_dtv_idx) and

s/it in/it in an/g

s/Same/The same applies for/g

> GL(dl_tls_generation), but there we rely on the read value being larger
> than the last written value that was synchronized.

Do you mean to imply that the synchronized writes all increase the generation
counter, and so any out of order reads rely on the value to be increasing?

Suggested:
The same applies for GL(dl_tls_max_dtv_idx) and GL(dl_tls_generation), but
there the algorithm relies on the fact that the read of the last synchronized
write is an increasing value.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 06/14] elf: Use relaxed atomics for racy accesses [BZ #19329]
  2021-05-11  2:56       ` Carlos O'Donell
@ 2021-05-11  9:31         ` Szabolcs Nagy
  2021-05-11 16:19           ` Szabolcs Nagy
  2021-05-12 20:33           ` Carlos O'Donell
  0 siblings, 2 replies; 44+ messages in thread
From: Szabolcs Nagy @ 2021-05-11  9:31 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Adhemerval Zanella, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2520 bytes --]

The 05/10/2021 22:56, Carlos O'Donell wrote:
> On 4/16/21 5:12 AM, Szabolcs Nagy via Libc-alpha wrote:
> > The 04/15/2021 15:21, Adhemerval Zanella wrote:
> >> On 13/04/2021 05:19, Szabolcs Nagy via Libc-alpha wrote:
> >>> This is a follow up patch to the fix for bug 19329.  This adds
> >>> relaxed MO atomics to accesses that are racy, but relaxed MO is
> >>> enough.
> >>
> >> Could you extend a bit why relaxed MO should be suffice?
> > 
> > is it ok to change the commit message to:
> > 
> > This is a follow up patch to the fix for bug 19329.  This adds relaxed
> > MO atomics to accesses that are racy, but relaxed MO is enough.
> 
> Suggest:
> 
> This is a follow up patch to the fix for bug 19329.  This adds relaxed
> MO atomics to accesses that were previously data races but are now
> race conditions, and where relaxed MO is sufficient.
>  
> > The racy accesses all follow the pattern that the write is behind the
> 
> s/racy accesses/race conditions/g
> 
> > dlopen lock, but a read can happen concurrently (e.g. during tls access)
> > without holding the lock.  For slotinfo entries the read value only
> > matters if it reads from a synchronized write in dlopen or dlclose,
> > otherwise the related dtv entry is not valid to access so it is fine to
> > leave it in inconsistent state. Same for GL(dl_tls_max_dtv_idx) and
> 
> s/it in/it in an/g
> 
> s/Same/The same applies for/g
> 
> > GL(dl_tls_generation), but there we rely on the read value being larger
> > than the last written value that was synchronized.
> 
> Do you mean to imply that the synchronized writes all increase the generation
> counter, and so any out of order reads rely on the value to be increasing?

yes, if the current thread is synchronized with a dlopen and reads
GL(dl_tls_genertion) with relaxed MO then either the gen of the
dlopened module is read or a larger value. So we can use relaxed MO
value to see if the dtv needs update at tls access.

(This is the difficult bit in fixing bug 19924: we can use relaxed
MO because we update the dtv upto the gen of the module. slotinfo
entries with larger gen are ignored, but if we want to update upto
the global gen then we need additional synchronization.)

> Suggested:
> The same applies for GL(dl_tls_max_dtv_idx) and GL(dl_tls_generation), but
> there the algorithm relies on the fact that the read of the last synchronized
> write is an increasing value.

Thanks for the review, i attached the fixed patch i plan to commit
later today if there are no further comments.


[-- Attachment #2: 0001-elf-Use-relaxed-atomics-for-racy-accesses-BZ-19329.patch --]
[-- Type: text/x-diff, Size: 8063 bytes --]

From 2a65e592a753310334c1ded2841215fb9c6024f8 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <szabolcs.nagy@arm.com>
Date: Wed, 30 Dec 2020 19:19:37 +0000
Subject: [PATCH] elf: Use relaxed atomics for racy accesses [BZ #19329]

This is a follow up patch to the fix for bug 19329.  This adds relaxed
MO atomics to accesses that were previously data races but are now
race conditions, and where relaxed MO is sufficient.

The race conditions all follow the pattern that the write is behind the
dlopen lock, but a read can happen concurrently (e.g. during tls access)
without holding the lock.  For slotinfo entries the read value only
matters if it reads from a synchronized write in dlopen or dlclose,
otherwise the related dtv entry is not valid to access so it is fine
to leave it in an inconsistent state.  The same applies for
GL(dl_tls_max_dtv_idx) and GL(dl_tls_generation), but there the
algorithm relies on the fact that the read of the last synchronized
write is an increasing value.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
---
 elf/dl-close.c          | 20 +++++++++++++-------
 elf/dl-open.c           |  5 ++++-
 elf/dl-tls.c            | 31 +++++++++++++++++++++++--------
 sysdeps/x86_64/dl-tls.c |  3 ++-
 4 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/elf/dl-close.c b/elf/dl-close.c
index c51becd06b..3720e47dd1 100644
--- a/elf/dl-close.c
+++ b/elf/dl-close.c
@@ -79,9 +79,10 @@ remove_slotinfo (size_t idx, struct dtv_slotinfo_list *listp, size_t disp,
 	{
 	  assert (old_map->l_tls_modid == idx);
 
-	  /* Mark the entry as unused. */
-	  listp->slotinfo[idx - disp].gen = GL(dl_tls_generation) + 1;
-	  listp->slotinfo[idx - disp].map = NULL;
+	  /* Mark the entry as unused.  These can be read concurrently.  */
+	  atomic_store_relaxed (&listp->slotinfo[idx - disp].gen,
+				GL(dl_tls_generation) + 1);
+	  atomic_store_relaxed (&listp->slotinfo[idx - disp].map, NULL);
 	}
 
       /* If this is not the last currently used entry no need to look
@@ -96,8 +97,8 @@ remove_slotinfo (size_t idx, struct dtv_slotinfo_list *listp, size_t disp,
 
       if (listp->slotinfo[idx - disp].map != NULL)
 	{
-	  /* Found a new last used index.  */
-	  GL(dl_tls_max_dtv_idx) = idx;
+	  /* Found a new last used index.  This can be read concurrently.  */
+	  atomic_store_relaxed (&GL(dl_tls_max_dtv_idx), idx);
 	  return true;
 	}
     }
@@ -571,7 +572,9 @@ _dl_close_worker (struct link_map *map, bool force)
 					GL(dl_tls_dtv_slotinfo_list), 0,
 					imap->l_init_called))
 		/* All dynamically loaded modules with TLS are unloaded.  */
-		GL(dl_tls_max_dtv_idx) = GL(dl_tls_static_nelem);
+		/* Can be read concurrently.  */
+		atomic_store_relaxed (&GL(dl_tls_max_dtv_idx),
+				      GL(dl_tls_static_nelem));
 
 	      if (imap->l_tls_offset != NO_TLS_OFFSET
 		  && imap->l_tls_offset != FORCED_DYNAMIC_TLS_OFFSET)
@@ -769,8 +772,11 @@ _dl_close_worker (struct link_map *map, bool force)
   /* If we removed any object which uses TLS bump the generation counter.  */
   if (any_tls)
     {
-      if (__glibc_unlikely (++GL(dl_tls_generation) == 0))
+      size_t newgen = GL(dl_tls_generation) + 1;
+      if (__glibc_unlikely (newgen == 0))
 	_dl_fatal_printf ("TLS generation counter wrapped!  Please report as described in "REPORT_BUGS_TO".\n");
+      /* Can be read concurrently.  */
+      atomic_store_relaxed (&GL(dl_tls_generation), newgen);
 
       if (tls_free_end == GL(dl_tls_static_used))
 	GL(dl_tls_static_used) = tls_free_start;
diff --git a/elf/dl-open.c b/elf/dl-open.c
index 09f0df7d38..bb79ef00f1 100644
--- a/elf/dl-open.c
+++ b/elf/dl-open.c
@@ -395,9 +395,12 @@ update_tls_slotinfo (struct link_map *new)
 	}
     }
 
-  if (__builtin_expect (++GL(dl_tls_generation) == 0, 0))
+  size_t newgen = GL(dl_tls_generation) + 1;
+  if (__glibc_unlikely (newgen == 0))
     _dl_fatal_printf (N_("\
 TLS generation counter wrapped!  Please report this."));
+  /* Can be read concurrently.  */
+  atomic_store_relaxed (&GL(dl_tls_generation), newgen);
 
   /* We need a second pass for static tls data, because
      _dl_update_slotinfo must not be run while calls to
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 94f3cdbae0..dc69cd984e 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -179,7 +179,9 @@ _dl_next_tls_modid (void)
       /* No gaps, allocate a new entry.  */
     nogaps:
 
-      result = ++GL(dl_tls_max_dtv_idx);
+      result = GL(dl_tls_max_dtv_idx) + 1;
+      /* Can be read concurrently.  */
+      atomic_store_relaxed (&GL(dl_tls_max_dtv_idx), result);
     }
 
   return result;
@@ -363,10 +365,12 @@ allocate_dtv (void *result)
   dtv_t *dtv;
   size_t dtv_length;
 
+  /* Relaxed MO, because the dtv size is later rechecked, not relied on.  */
+  size_t max_modid = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx));
   /* We allocate a few more elements in the dtv than are needed for the
      initial set of modules.  This should avoid in most cases expansions
      of the dtv.  */
-  dtv_length = GL(dl_tls_max_dtv_idx) + DTV_SURPLUS;
+  dtv_length = max_modid + DTV_SURPLUS;
   dtv = calloc (dtv_length + 2, sizeof (dtv_t));
   if (dtv != NULL)
     {
@@ -771,7 +775,7 @@ _dl_update_slotinfo (unsigned long int req_modid)
 	      if (modid > max_modid)
 		break;
 
-	      size_t gen = listp->slotinfo[cnt].gen;
+	      size_t gen = atomic_load_relaxed (&listp->slotinfo[cnt].gen);
 
 	      if (gen > new_gen)
 		/* Not relevant.  */
@@ -783,7 +787,8 @@ _dl_update_slotinfo (unsigned long int req_modid)
 		continue;
 
 	      /* If there is no map this means the entry is empty.  */
-	      struct link_map *map = listp->slotinfo[cnt].map;
+	      struct link_map *map
+		= atomic_load_relaxed (&listp->slotinfo[cnt].map);
 	      /* Check whether the current dtv array is large enough.  */
 	      if (dtv[-1].counter < modid)
 		{
@@ -927,7 +932,12 @@ __tls_get_addr (GET_ADDR_ARGS)
 {
   dtv_t *dtv = THREAD_DTV ();
 
-  if (__glibc_unlikely (dtv[0].counter != GL(dl_tls_generation)))
+  /* Update is needed if dtv[0].counter < the generation of the accessed
+     module.  The global generation counter is used here as it is easier
+     to check.  Synchronization for the relaxed MO access is guaranteed
+     by user code, see CONCURRENCY NOTES in _dl_update_slotinfo.  */
+  size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
+  if (__glibc_unlikely (dtv[0].counter != gen))
     return update_get_addr (GET_ADDR_PARAM);
 
   void *p = dtv[GET_ADDR_MODULE].pointer.val;
@@ -950,7 +960,10 @@ _dl_tls_get_addr_soft (struct link_map *l)
     return NULL;
 
   dtv_t *dtv = THREAD_DTV ();
-  if (__glibc_unlikely (dtv[0].counter != GL(dl_tls_generation)))
+  /* This may be called without holding the GL(dl_load_lock).  Reading
+     arbitrary gen value is fine since this is best effort code.  */
+  size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
+  if (__glibc_unlikely (dtv[0].counter != gen))
     {
       /* This thread's DTV is not completely current,
 	 but it might already cover this module.  */
@@ -1036,8 +1049,10 @@ cannot create TLS data structures"));
   /* Add the information into the slotinfo data structure.  */
   if (do_add)
     {
-      listp->slotinfo[idx].map = l;
-      listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
+      /* Can be read concurrently.  See _dl_update_slotinfo.  */
+      atomic_store_relaxed (&listp->slotinfo[idx].map, l);
+      atomic_store_relaxed (&listp->slotinfo[idx].gen,
+			    GL(dl_tls_generation) + 1);
     }
 }
 
diff --git a/sysdeps/x86_64/dl-tls.c b/sysdeps/x86_64/dl-tls.c
index 6595f6615b..24ef560b71 100644
--- a/sysdeps/x86_64/dl-tls.c
+++ b/sysdeps/x86_64/dl-tls.c
@@ -40,7 +40,8 @@ __tls_get_addr_slow (GET_ADDR_ARGS)
 {
   dtv_t *dtv = THREAD_DTV ();
 
-  if (__glibc_unlikely (dtv[0].counter != GL(dl_tls_generation)))
+  size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
+  if (__glibc_unlikely (dtv[0].counter != gen))
     return update_get_addr (GET_ADDR_PARAM);
 
   return tls_get_addr_tail (GET_ADDR_PARAM, dtv, NULL);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 06/14] elf: Use relaxed atomics for racy accesses [BZ #19329]
  2021-05-11  9:31         ` Szabolcs Nagy
@ 2021-05-11 16:19           ` Szabolcs Nagy
  2021-05-12 20:33           ` Carlos O'Donell
  1 sibling, 0 replies; 44+ messages in thread
From: Szabolcs Nagy @ 2021-05-11 16:19 UTC (permalink / raw)
  To: Carlos O'Donell, libc-alpha

The 05/11/2021 10:31, Szabolcs Nagy via Libc-alpha wrote:
> Thanks for the review, i attached the fixed patch i plan to commit
> later today if there are no further comments.

committed as f4f8f4d4e0f92488431b268c8cd9555730b9afe9

and committed the other reviewed patches from the series too.

> From 2a65e592a753310334c1ded2841215fb9c6024f8 Mon Sep 17 00:00:00 2001
> From: Szabolcs Nagy <szabolcs.nagy@arm.com>
> Date: Wed, 30 Dec 2020 19:19:37 +0000
> Subject: [PATCH] elf: Use relaxed atomics for racy accesses [BZ #19329]
> 
> This is a follow up patch to the fix for bug 19329.  This adds relaxed
> MO atomics to accesses that were previously data races but are now
> race conditions, and where relaxed MO is sufficient.
> 
> The race conditions all follow the pattern that the write is behind the
> dlopen lock, but a read can happen concurrently (e.g. during tls access)
> without holding the lock.  For slotinfo entries the read value only
> matters if it reads from a synchronized write in dlopen or dlclose,
> otherwise the related dtv entry is not valid to access so it is fine
> to leave it in an inconsistent state.  The same applies for
> GL(dl_tls_max_dtv_idx) and GL(dl_tls_generation), but there the
> algorithm relies on the fact that the read of the last synchronized
> write is an increasing value.
> 
> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 06/14] elf: Use relaxed atomics for racy accesses [BZ #19329]
  2021-05-11  9:31         ` Szabolcs Nagy
  2021-05-11 16:19           ` Szabolcs Nagy
@ 2021-05-12 20:33           ` Carlos O'Donell
  1 sibling, 0 replies; 44+ messages in thread
From: Carlos O'Donell @ 2021-05-12 20:33 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Adhemerval Zanella, libc-alpha

On 5/11/21 5:31 AM, Szabolcs Nagy wrote:
> The 05/10/2021 22:56, Carlos O'Donell wrote:
>> On 4/16/21 5:12 AM, Szabolcs Nagy via Libc-alpha wrote:
>>> The 04/15/2021 15:21, Adhemerval Zanella wrote:
>>>> On 13/04/2021 05:19, Szabolcs Nagy via Libc-alpha wrote:
>>>>> This is a follow up patch to the fix for bug 19329.  This adds
>>>>> relaxed MO atomics to accesses that are racy, but relaxed MO is
>>>>> enough.
>>>>
>>>> Could you extend a bit why relaxed MO should be suffice?
>>>
>>> is it ok to change the commit message to:
>>>
>>> This is a follow up patch to the fix for bug 19329.  This adds relaxed
>>> MO atomics to accesses that are racy, but relaxed MO is enough.
>>
>> Suggest:
>>
>> This is a follow up patch to the fix for bug 19329.  This adds relaxed
>> MO atomics to accesses that were previously data races but are now
>> race conditions, and where relaxed MO is sufficient.
>>  
>>> The racy accesses all follow the pattern that the write is behind the
>>
>> s/racy accesses/race conditions/g
>>
>>> dlopen lock, but a read can happen concurrently (e.g. during tls access)
>>> without holding the lock.  For slotinfo entries the read value only
>>> matters if it reads from a synchronized write in dlopen or dlclose,
>>> otherwise the related dtv entry is not valid to access so it is fine to
>>> leave it in inconsistent state. Same for GL(dl_tls_max_dtv_idx) and
>>
>> s/it in/it in an/g
>>
>> s/Same/The same applies for/g
>>
>>> GL(dl_tls_generation), but there we rely on the read value being larger
>>> than the last written value that was synchronized.
>>
>> Do you mean to imply that the synchronized writes all increase the generation
>> counter, and so any out of order reads rely on the value to be increasing?
> 
> yes, if the current thread is synchronized with a dlopen and reads
> GL(dl_tls_genertion) with relaxed MO then either the gen of the
> dlopened module is read or a larger value. So we can use relaxed MO
> value to see if the dtv needs update at tls access.
> 
> (This is the difficult bit in fixing bug 19924: we can use relaxed
> MO because we update the dtv upto the gen of the module. slotinfo
> entries with larger gen are ignored, but if we want to update upto
> the global gen then we need additional synchronization.)
> 
>> Suggested:
>> The same applies for GL(dl_tls_max_dtv_idx) and GL(dl_tls_generation), but
>> there the algorithm relies on the fact that the read of the last synchronized
>> write is an increasing value.
> 
> Thanks for the review, i attached the fixed patch i plan to commit
> later today if there are no further comments.
 
Thanks. I saw you commited this and it looked good to me!

Thanks for working through these issues!

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 08/14] elf: Fix DTV gap reuse logic [BZ #27135]
  2021-04-13  8:20 ` [PATCH v2 08/14] elf: Fix DTV gap reuse logic [BZ #27135] Szabolcs Nagy
  2021-04-15 19:45   ` Adhemerval Zanella
@ 2021-06-24  9:48   ` Florian Weimer
  2021-06-24 12:27     ` Florian Weimer
  1 sibling, 1 reply; 44+ messages in thread
From: Florian Weimer @ 2021-06-24  9:48 UTC (permalink / raw)
  To: Szabolcs Nagy via Libc-alpha; +Cc: Szabolcs Nagy

* Szabolcs Nagy via Libc-alpha:

> For some reason only dlopen failure caused dtv gaps to be reused.
>
> It is possible that the intent was to never reuse modids for a
> different module, but after dlopen failure all gaps are reused
> not just the ones caused by the unfinished dlopened.
>
> So the code has to handle reused modids already which seems to
> work, however the data races at thread creation and tls access
> (see bug 19329 and bug 27111) may be more severe if slots are
> reused so this is scheduled after those fixes. I think fixing
> the races are not simpler if reuse is disallowed and reuse has
> other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are
> removed from the middle of the slotinfo list. The value does
> not have to be correct: incorrect true value causes the next
> modid query to do a slotinfo walk, incorrect false will leave
> gaps and new entries are added at the end.
>
> Fixes bug 27135.
> ---
>  elf/dl-close.c |  6 +++++-
>  elf/dl-open.c  | 10 ----------
>  elf/dl-tls.c   |  5 +----
>  3 files changed, 6 insertions(+), 15 deletions(-)

Apparently this broke GNOME Shell:

  <https://bugzilla.redhat.com/show_bug.cgi?id=1974970>

I'm trying to figure out why.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 08/14] elf: Fix DTV gap reuse logic [BZ #27135]
  2021-06-24  9:48   ` Florian Weimer
@ 2021-06-24 12:27     ` Florian Weimer
  2021-06-24 12:57       ` Adhemerval Zanella
  2021-06-24 18:58       ` Szabolcs Nagy
  0 siblings, 2 replies; 44+ messages in thread
From: Florian Weimer @ 2021-06-24 12:27 UTC (permalink / raw)
  To: Szabolcs Nagy via Libc-alpha; +Cc: Szabolcs Nagy

* Florian Weimer:

> * Szabolcs Nagy via Libc-alpha:
>
>> For some reason only dlopen failure caused dtv gaps to be reused.
>>
>> It is possible that the intent was to never reuse modids for a
>> different module, but after dlopen failure all gaps are reused
>> not just the ones caused by the unfinished dlopened.
>>
>> So the code has to handle reused modids already which seems to
>> work, however the data races at thread creation and tls access
>> (see bug 19329 and bug 27111) may be more severe if slots are
>> reused so this is scheduled after those fixes. I think fixing
>> the races are not simpler if reuse is disallowed and reuse has
>> other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are
>> removed from the middle of the slotinfo list. The value does
>> not have to be correct: incorrect true value causes the next
>> modid query to do a slotinfo walk, incorrect false will leave
>> gaps and new entries are added at the end.
>>
>> Fixes bug 27135.
>> ---
>>  elf/dl-close.c |  6 +++++-
>>  elf/dl-open.c  | 10 ----------
>>  elf/dl-tls.c   |  5 +----
>>  3 files changed, 6 insertions(+), 15 deletions(-)
>
> Apparently this broke GNOME Shell:
>
>   <https://bugzilla.redhat.com/show_bug.cgi?id=1974970>
>
> I'm trying to figure out why.

The bug is that if there is a gap, _dl_next_tls_modid does not update
the slotinfo list to mark the modid to be returned as reserved, so
multiple calls in a single dlopen operation keep returning the same
modid.

I'm not yet sure what the proper fix is for that.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 08/14] elf: Fix DTV gap reuse logic [BZ #27135]
  2021-06-24 12:27     ` Florian Weimer
@ 2021-06-24 12:57       ` Adhemerval Zanella
  2021-06-24 14:20         ` Florian Weimer
  2021-06-24 18:58       ` Szabolcs Nagy
  1 sibling, 1 reply; 44+ messages in thread
From: Adhemerval Zanella @ 2021-06-24 12:57 UTC (permalink / raw)
  To: libc-alpha



On 24/06/2021 09:27, Florian Weimer via Libc-alpha wrote:
> * Florian Weimer:
> 
>> * Szabolcs Nagy via Libc-alpha:
>>
>>> For some reason only dlopen failure caused dtv gaps to be reused.
>>>
>>> It is possible that the intent was to never reuse modids for a
>>> different module, but after dlopen failure all gaps are reused
>>> not just the ones caused by the unfinished dlopened.
>>>
>>> So the code has to handle reused modids already which seems to
>>> work, however the data races at thread creation and tls access
>>> (see bug 19329 and bug 27111) may be more severe if slots are
>>> reused so this is scheduled after those fixes. I think fixing
>>> the races are not simpler if reuse is disallowed and reuse has
>>> other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are
>>> removed from the middle of the slotinfo list. The value does
>>> not have to be correct: incorrect true value causes the next
>>> modid query to do a slotinfo walk, incorrect false will leave
>>> gaps and new entries are added at the end.
>>>
>>> Fixes bug 27135.
>>> ---
>>>  elf/dl-close.c |  6 +++++-
>>>  elf/dl-open.c  | 10 ----------
>>>  elf/dl-tls.c   |  5 +----
>>>  3 files changed, 6 insertions(+), 15 deletions(-)
>>
>> Apparently this broke GNOME Shell:
>>
>>   <https://bugzilla.redhat.com/show_bug.cgi?id=1974970>
>>
>> I'm trying to figure out why.
> 
> The bug is that if there is a gap, _dl_next_tls_modid does not update
> the slotinfo list to mark the modid to be returned as reserved, so
> multiple calls in a single dlopen operation keep returning the same
> modid.
> 
> I'm not yet sure what the proper fix is for that.

How hard would be to create a testcase for this?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 08/14] elf: Fix DTV gap reuse logic [BZ #27135]
  2021-06-24 12:57       ` Adhemerval Zanella
@ 2021-06-24 14:20         ` Florian Weimer
  0 siblings, 0 replies; 44+ messages in thread
From: Florian Weimer @ 2021-06-24 14:20 UTC (permalink / raw)
  To: Adhemerval Zanella via Libc-alpha

* Adhemerval Zanella via Libc-alpha:

> On 24/06/2021 09:27, Florian Weimer via Libc-alpha wrote:
>> * Florian Weimer:
>> 
>>> * Szabolcs Nagy via Libc-alpha:
>>>
>>>> For some reason only dlopen failure caused dtv gaps to be reused.
>>>>
>>>> It is possible that the intent was to never reuse modids for a
>>>> different module, but after dlopen failure all gaps are reused
>>>> not just the ones caused by the unfinished dlopened.
>>>>
>>>> So the code has to handle reused modids already which seems to
>>>> work, however the data races at thread creation and tls access
>>>> (see bug 19329 and bug 27111) may be more severe if slots are
>>>> reused so this is scheduled after those fixes. I think fixing
>>>> the races are not simpler if reuse is disallowed and reuse has
>>>> other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are
>>>> removed from the middle of the slotinfo list. The value does
>>>> not have to be correct: incorrect true value causes the next
>>>> modid query to do a slotinfo walk, incorrect false will leave
>>>> gaps and new entries are added at the end.
>>>>
>>>> Fixes bug 27135.
>>>> ---
>>>>  elf/dl-close.c |  6 +++++-
>>>>  elf/dl-open.c  | 10 ----------
>>>>  elf/dl-tls.c   |  5 +----
>>>>  3 files changed, 6 insertions(+), 15 deletions(-)
>>>
>>> Apparently this broke GNOME Shell:
>>>
>>>   <https://bugzilla.redhat.com/show_bug.cgi?id=1974970>
>>>
>>> I'm trying to figure out why.
>> 
>> The bug is that if there is a gap, _dl_next_tls_modid does not update
>> the slotinfo list to mark the modid to be returned as reserved, so
>> multiple calls in a single dlopen operation keep returning the same
>> modid.
>> 
>> I'm not yet sure what the proper fix is for that.
>
> How hard would be to create a testcase for this?

Not particularly hard, I think.

We need six modules (mod1 to mod6), all using dynamic TLS with different
symbols (sym1 to sym6).  mod4 depends on mod5 and mod6, but no other
dependencies.

dlopen mod1
dlopen mod2
dlopen mod3
dlclose mod2 # create modid gap
dlclose mod1 # more modid gap
dlopen mod4

Then check that all six TLS variables have different addresses.  If the
bug is present, sym4 to to sym6 should all have the same address because
the modid is the same.

I have not written a test yet and won't get to it today.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 08/14] elf: Fix DTV gap reuse logic [BZ #27135]
  2021-06-24 12:27     ` Florian Weimer
  2021-06-24 12:57       ` Adhemerval Zanella
@ 2021-06-24 18:58       ` Szabolcs Nagy
  1 sibling, 0 replies; 44+ messages in thread
From: Szabolcs Nagy @ 2021-06-24 18:58 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Szabolcs Nagy via Libc-alpha

The 06/24/2021 14:27, Florian Weimer wrote:
> * Florian Weimer:
> 
> > * Szabolcs Nagy via Libc-alpha:
> >
> >> For some reason only dlopen failure caused dtv gaps to be reused.
> >>
> >> It is possible that the intent was to never reuse modids for a
> >> different module, but after dlopen failure all gaps are reused
> >> not just the ones caused by the unfinished dlopened.
> >>
> >> So the code has to handle reused modids already which seems to
> >> work, however the data races at thread creation and tls access
> >> (see bug 19329 and bug 27111) may be more severe if slots are
> >> reused so this is scheduled after those fixes. I think fixing
> >> the races are not simpler if reuse is disallowed and reuse has
> >> other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are
> >> removed from the middle of the slotinfo list. The value does
> >> not have to be correct: incorrect true value causes the next
> >> modid query to do a slotinfo walk, incorrect false will leave
> >> gaps and new entries are added at the end.
> >>
> >> Fixes bug 27135.
> >> ---
> >>  elf/dl-close.c |  6 +++++-
> >>  elf/dl-open.c  | 10 ----------
> >>  elf/dl-tls.c   |  5 +----
> >>  3 files changed, 6 insertions(+), 15 deletions(-)
> >
> > Apparently this broke GNOME Shell:
> >
> >   <https://bugzilla.redhat.com/show_bug.cgi?id=1974970>
> >
> > I'm trying to figure out why.
> 
> The bug is that if there is a gap, _dl_next_tls_modid does not update
> the slotinfo list to mark the modid to be returned as reserved, so
> multiple calls in a single dlopen operation keep returning the same
> modid.
> 
> I'm not yet sure what the proper fix is for that.

this patch is not critical for the other tls issues i fixed
for 2.34, so it should be safe to revert.

this might have been the reason why gap reuse was not enabled.

but if _dl_next_tls_modid is broken then likely a failed
dlopen can trigger that on old glibc too, so a fix would be
nice.

thanks for debugging this.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 14/14] RFC elf: Fix slow tls access after dlopen [BZ #19924]
  2021-04-13  8:21 ` [PATCH v2 14/14] RFC elf: Fix slow tls access after dlopen [BZ #19924] Szabolcs Nagy
@ 2022-09-16  9:54   ` Carlos O'Donell
  0 siblings, 0 replies; 44+ messages in thread
From: Carlos O'Donell @ 2022-09-16  9:54 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

On Tue, Apr 13, 2021 at 09:21:58AM +0100, Szabolcs Nagy via Libc-alpha wrote:
> In short: __tls_get_addr checks the global generation counter,
> _dl_update_slotinfo updates up to the generation of the accessed
> module. If the global generation is newer than geneneration of the
> module then __tls_get_addr keeps hitting the slow path that updates
> the dtv.
> 
> Possible approaches i can see:
> 
> 1. update to global generation instead of module,
> 2. check the module generation in the fast path.
> 
> This patch is 1.: it needs additional sync (load acquire) so the
> slotinfo list is up to date with the observed global generation.
> 
> Approach 2. would require walking the slotinfo list at all times.
> I don't know how to make that fast with many modules.
> 
> Note: in the x86_64 version of dl-tls.c the generation is only loaded
> once, since relaxed mo is not faster than acquire mo load.
> 
> I have not benchmarked this yet.

Szabolcs,

Could you please repost this as a distinct patch? I'd like to track
this as a possible solution that we can discuss and improve. It's the
only patch that we haven't applied from this series.

Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2022-09-16  9:54 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13  8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
2021-04-13  8:18 ` [PATCH v2 01/14] elf: Fix a DTV setup issue [BZ #27136] Szabolcs Nagy
2021-04-13  8:36   ` Andreas Schwab
2021-04-13  9:35     ` Szabolcs Nagy
2021-04-13 10:22       ` Andreas Schwab
2021-04-13 10:34         ` Szabolcs Nagy
2021-04-13 10:51           ` Andreas Schwab
2021-04-13  8:18 ` [PATCH v2 02/14] elf: Add a DTV setup test " Szabolcs Nagy
2021-04-14 18:06   ` Adhemerval Zanella
2021-04-15  9:53     ` Szabolcs Nagy
2021-04-13  8:18 ` [PATCH v2 03/14] elf: Fix comments and logic in _dl_add_to_slotinfo Szabolcs Nagy
2021-04-14 18:12   ` Adhemerval Zanella
2021-04-13  8:18 ` [PATCH v2 04/14] elf: Refactor _dl_update_slotinfo to avoid use after free Szabolcs Nagy
2021-04-14 18:20   ` Adhemerval Zanella
2021-04-13  8:19 ` [PATCH v2 05/14] elf: Fix data races in pthread_create and TLS access [BZ #19329] Szabolcs Nagy
2021-04-15 17:44   ` Adhemerval Zanella
2021-04-13  8:19 ` [PATCH v2 06/14] elf: Use relaxed atomics for racy accesses " Szabolcs Nagy
2021-04-15 18:21   ` Adhemerval Zanella
2021-04-16  9:12     ` Szabolcs Nagy
2021-05-11  2:56       ` Carlos O'Donell
2021-05-11  9:31         ` Szabolcs Nagy
2021-05-11 16:19           ` Szabolcs Nagy
2021-05-12 20:33           ` Carlos O'Donell
2021-04-13  8:19 ` [PATCH v2 07/14] elf: Add test case for " Szabolcs Nagy
2021-04-15 19:21   ` Adhemerval Zanella
2021-04-13  8:20 ` [PATCH v2 08/14] elf: Fix DTV gap reuse logic [BZ #27135] Szabolcs Nagy
2021-04-15 19:45   ` Adhemerval Zanella
2021-06-24  9:48   ` Florian Weimer
2021-06-24 12:27     ` Florian Weimer
2021-06-24 12:57       ` Adhemerval Zanella
2021-06-24 14:20         ` Florian Weimer
2021-06-24 18:58       ` Szabolcs Nagy
2021-04-13  8:20 ` [PATCH v2 09/14] x86_64: Avoid lazy relocation of tlsdesc [BZ #27137] Szabolcs Nagy
2021-04-13 14:02   ` H.J. Lu
2021-04-13  8:20 ` [PATCH v2 10/14] i386: " Szabolcs Nagy
2021-04-13 14:02   ` H.J. Lu
2021-04-13  8:21 ` [PATCH v2 11/14] x86_64: Remove lazy tlsdesc relocation related code Szabolcs Nagy
2021-04-13 14:03   ` H.J. Lu
2021-04-13  8:21 ` [PATCH v2 12/14] i386: " Szabolcs Nagy
2021-04-13 14:04   ` H.J. Lu
2021-04-13  8:21 ` [PATCH v2 13/14] elf: " Szabolcs Nagy
2021-04-15 19:52   ` Adhemerval Zanella
2021-04-13  8:21 ` [PATCH v2 14/14] RFC elf: Fix slow tls access after dlopen [BZ #19924] Szabolcs Nagy
2022-09-16  9:54   ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).