From: Szabolcs Nagy <szabolcs.nagy@arm.com>
To: libc-alpha@sourceware.org
Subject: [PATCH v2 14/14] RFC elf: Fix slow tls access after dlopen [BZ #19924]
Date: Tue, 13 Apr 2021 09:21:58 +0100 [thread overview]
Message-ID: <b116855de71098ef7dd2875dd3237f8f3ecc12c2.1618301209.git.szabolcs.nagy@arm.com> (raw)
In-Reply-To: <cover.1618301209.git.szabolcs.nagy@arm.com>
In short: __tls_get_addr checks the global generation counter,
_dl_update_slotinfo updates up to the generation of the accessed
module. If the global generation is newer than geneneration of the
module then __tls_get_addr keeps hitting the slow path that updates
the dtv.
Possible approaches i can see:
1. update to global generation instead of module,
2. check the module generation in the fast path.
This patch is 1.: it needs additional sync (load acquire) so the
slotinfo list is up to date with the observed global generation.
Approach 2. would require walking the slotinfo list at all times.
I don't know how to make that fast with many modules.
Note: in the x86_64 version of dl-tls.c the generation is only loaded
once, since relaxed mo is not faster than acquire mo load.
I have not benchmarked this yet.
---
elf/dl-close.c | 2 +-
elf/dl-open.c | 8 ++++----
elf/dl-reloc.c | 5 ++---
elf/dl-tls.c | 28 ++++++++++++----------------
sysdeps/generic/ldsodefs.h | 3 ++-
sysdeps/x86_64/dl-tls.c | 4 ++--
6 files changed, 23 insertions(+), 27 deletions(-)
diff --git a/elf/dl-close.c b/elf/dl-close.c
index 9f31532f41..45f8a7fe31 100644
--- a/elf/dl-close.c
+++ b/elf/dl-close.c
@@ -780,7 +780,7 @@ _dl_close_worker (struct link_map *map, bool force)
if (__glibc_unlikely (newgen == 0))
_dl_fatal_printf ("TLS generation counter wrapped! Please report as described in "REPORT_BUGS_TO".\n");
/* Can be read concurrently. */
- atomic_store_relaxed (&GL(dl_tls_generation), newgen);
+ atomic_store_release (&GL(dl_tls_generation), newgen);
if (tls_free_end == GL(dl_tls_static_used))
GL(dl_tls_static_used) = tls_free_start;
diff --git a/elf/dl-open.c b/elf/dl-open.c
index 661f26977e..5b9816e4e8 100644
--- a/elf/dl-open.c
+++ b/elf/dl-open.c
@@ -400,7 +400,7 @@ update_tls_slotinfo (struct link_map *new)
_dl_fatal_printf (N_("\
TLS generation counter wrapped! Please report this."));
/* Can be read concurrently. */
- atomic_store_relaxed (&GL(dl_tls_generation), newgen);
+ atomic_store_release (&GL(dl_tls_generation), newgen);
/* We need a second pass for static tls data, because
_dl_update_slotinfo must not be run while calls to
@@ -417,8 +417,8 @@ TLS generation counter wrapped! Please report this."));
now, but we can delay updating the DTV. */
imap->l_need_tls_init = 0;
#ifdef SHARED
- /* Update the slot information data for at least the
- generation of the DSO we are allocating data for. */
+ /* Update the slot information data for the current
+ generation. */
/* FIXME: This can terminate the process on memory
allocation failure. It is not possible to raise
@@ -426,7 +426,7 @@ TLS generation counter wrapped! Please report this."));
_dl_update_slotinfo would have to be split into two
operations, similar to resize_scopes and update_scopes
above. This is related to bug 16134. */
- _dl_update_slotinfo (imap->l_tls_modid);
+ _dl_update_slotinfo (imap->l_tls_modid, newgen);
#endif
GL(dl_init_static_tls) (imap);
diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
index c2df26deea..427669d769 100644
--- a/elf/dl-reloc.c
+++ b/elf/dl-reloc.c
@@ -111,11 +111,10 @@ _dl_try_allocate_static_tls (struct link_map *map, bool optional)
if (map->l_real->l_relocated)
{
#ifdef SHARED
+// TODO: it is not clear why we need to update the DTV here, add comment
if (__builtin_expect (THREAD_DTV()[0].counter != GL(dl_tls_generation),
0))
- /* Update the slot information data for at least the generation of
- the DSO we are allocating data for. */
- (void) _dl_update_slotinfo (map->l_tls_modid);
+ (void) _dl_update_slotinfo (map->l_tls_modid, GL(dl_tls_generation));
#endif
GL(dl_init_static_tls) (map);
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index b0257185e9..b51a4f3a19 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -701,7 +701,7 @@ allocate_and_init (struct link_map *map)
struct link_map *
-_dl_update_slotinfo (unsigned long int req_modid)
+_dl_update_slotinfo (unsigned long int req_modid, size_t new_gen)
{
struct link_map *the_map = NULL;
dtv_t *dtv = THREAD_DTV ();
@@ -718,19 +718,12 @@ _dl_update_slotinfo (unsigned long int req_modid)
code and therefore add to the slotinfo list. This is a problem
since we must not pick up any information about incomplete work.
The solution to this is to ignore all dtv slots which were
- created after the one we are currently interested. We know that
- dynamic loading for this module is completed and this is the last
- load operation we know finished. */
- unsigned long int idx = req_modid;
+ created after the generation we are interested in. We know that
+ dynamic loading for this generation is completed and this is the
+ last load operation we know finished. */
struct dtv_slotinfo_list *listp = GL(dl_tls_dtv_slotinfo_list);
- while (idx >= listp->len)
- {
- idx -= listp->len;
- listp = listp->next;
- }
-
- if (dtv[0].counter < listp->slotinfo[idx].gen)
+ if (dtv[0].counter < new_gen)
{
/* CONCURRENCY NOTES:
@@ -751,7 +744,6 @@ _dl_update_slotinfo (unsigned long int req_modid)
other entries are racy. However updating a non-relevant dtv
entry does not affect correctness. For a relevant module m,
max_modid >= modid of m. */
- size_t new_gen = listp->slotinfo[idx].gen;
size_t total = 0;
size_t max_modid = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx));
assert (max_modid >= req_modid);
@@ -894,9 +886,9 @@ tls_get_addr_tail (GET_ADDR_ARGS, dtv_t *dtv, struct link_map *the_map)
static struct link_map *
__attribute_noinline__
-update_get_addr (GET_ADDR_ARGS)
+update_get_addr (GET_ADDR_ARGS, size_t gen)
{
- struct link_map *the_map = _dl_update_slotinfo (GET_ADDR_MODULE);
+ struct link_map *the_map = _dl_update_slotinfo (GET_ADDR_MODULE, gen);
dtv_t *dtv = THREAD_DTV ();
void *p = dtv[GET_ADDR_MODULE].pointer.val;
@@ -931,7 +923,11 @@ __tls_get_addr (GET_ADDR_ARGS)
by user code, see CONCURRENCY NOTES in _dl_update_slotinfo. */
size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
if (__glibc_unlikely (dtv[0].counter != gen))
- return update_get_addr (GET_ADDR_PARAM);
+ {
+// TODO: needs comment update if we rely on consistent generation with slotinfo
+ gen = atomic_load_acquire (&GL(dl_tls_generation));
+ return update_get_addr (GET_ADDR_PARAM, gen);
+ }
void *p = dtv[GET_ADDR_MODULE].pointer.val;
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index ea3f7a69d0..614463f016 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -1224,7 +1224,8 @@ extern void _dl_add_to_slotinfo (struct link_map *l, bool do_add)
/* Update slot information data for at least the generation of the
module with the given index. */
-extern struct link_map *_dl_update_slotinfo (unsigned long int req_modid)
+extern struct link_map *_dl_update_slotinfo (unsigned long int req_modid,
+ size_t gen)
attribute_hidden;
/* Look up the module's TLS block as for __tls_get_addr,
diff --git a/sysdeps/x86_64/dl-tls.c b/sysdeps/x86_64/dl-tls.c
index 24ef560b71..4ded8dd6b9 100644
--- a/sysdeps/x86_64/dl-tls.c
+++ b/sysdeps/x86_64/dl-tls.c
@@ -40,9 +40,9 @@ __tls_get_addr_slow (GET_ADDR_ARGS)
{
dtv_t *dtv = THREAD_DTV ();
- size_t gen = atomic_load_relaxed (&GL(dl_tls_generation));
+ size_t gen = atomic_load_acquire (&GL(dl_tls_generation));
if (__glibc_unlikely (dtv[0].counter != gen))
- return update_get_addr (GET_ADDR_PARAM);
+ return update_get_addr (GET_ADDR_PARAM, gen);
return tls_get_addr_tail (GET_ADDR_PARAM, dtv, NULL);
}
--
2.17.1
next prev parent reply other threads:[~2021-04-13 8:22 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-13 8:17 [PATCH v2 00/14] Dynamic TLS related data race fixes Szabolcs Nagy
2021-04-13 8:18 ` [PATCH v2 01/14] elf: Fix a DTV setup issue [BZ #27136] Szabolcs Nagy
2021-04-13 8:36 ` Andreas Schwab
2021-04-13 9:35 ` Szabolcs Nagy
2021-04-13 10:22 ` Andreas Schwab
2021-04-13 10:34 ` Szabolcs Nagy
2021-04-13 10:51 ` Andreas Schwab
2021-04-13 8:18 ` [PATCH v2 02/14] elf: Add a DTV setup test " Szabolcs Nagy
2021-04-14 18:06 ` Adhemerval Zanella
2021-04-15 9:53 ` Szabolcs Nagy
2021-04-13 8:18 ` [PATCH v2 03/14] elf: Fix comments and logic in _dl_add_to_slotinfo Szabolcs Nagy
2021-04-14 18:12 ` Adhemerval Zanella
2021-04-13 8:18 ` [PATCH v2 04/14] elf: Refactor _dl_update_slotinfo to avoid use after free Szabolcs Nagy
2021-04-14 18:20 ` Adhemerval Zanella
2021-04-13 8:19 ` [PATCH v2 05/14] elf: Fix data races in pthread_create and TLS access [BZ #19329] Szabolcs Nagy
2021-04-15 17:44 ` Adhemerval Zanella
2021-04-13 8:19 ` [PATCH v2 06/14] elf: Use relaxed atomics for racy accesses " Szabolcs Nagy
2021-04-15 18:21 ` Adhemerval Zanella
2021-04-16 9:12 ` Szabolcs Nagy
2021-05-11 2:56 ` Carlos O'Donell
2021-05-11 9:31 ` Szabolcs Nagy
2021-05-11 16:19 ` Szabolcs Nagy
2021-05-12 20:33 ` Carlos O'Donell
2021-04-13 8:19 ` [PATCH v2 07/14] elf: Add test case for " Szabolcs Nagy
2021-04-15 19:21 ` Adhemerval Zanella
2021-04-13 8:20 ` [PATCH v2 08/14] elf: Fix DTV gap reuse logic [BZ #27135] Szabolcs Nagy
2021-04-15 19:45 ` Adhemerval Zanella
2021-06-24 9:48 ` Florian Weimer
2021-06-24 12:27 ` Florian Weimer
2021-06-24 12:57 ` Adhemerval Zanella
2021-06-24 14:20 ` Florian Weimer
2021-06-24 18:58 ` Szabolcs Nagy
2021-04-13 8:20 ` [PATCH v2 09/14] x86_64: Avoid lazy relocation of tlsdesc [BZ #27137] Szabolcs Nagy
2021-04-13 14:02 ` H.J. Lu
2021-04-13 8:20 ` [PATCH v2 10/14] i386: " Szabolcs Nagy
2021-04-13 14:02 ` H.J. Lu
2021-04-13 8:21 ` [PATCH v2 11/14] x86_64: Remove lazy tlsdesc relocation related code Szabolcs Nagy
2021-04-13 14:03 ` H.J. Lu
2021-04-13 8:21 ` [PATCH v2 12/14] i386: " Szabolcs Nagy
2021-04-13 14:04 ` H.J. Lu
2021-04-13 8:21 ` [PATCH v2 13/14] elf: " Szabolcs Nagy
2021-04-15 19:52 ` Adhemerval Zanella
2021-04-13 8:21 ` Szabolcs Nagy [this message]
2022-09-16 9:54 ` [PATCH v2 14/14] RFC elf: Fix slow tls access after dlopen [BZ #19924] Carlos O'Donell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b116855de71098ef7dd2875dd3237f8f3ecc12c2.1618301209.git.szabolcs.nagy@arm.com \
--to=szabolcs.nagy@arm.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).