* [PATCH v10 0/1] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers
@ 2024-02-26 14:37 H.J. Lu
2024-02-26 14:37 ` [PATCH v10 1/1] " H.J. Lu
0 siblings, 1 reply; 7+ messages in thread
From: H.J. Lu @ 2024-02-26 14:37 UTC (permalink / raw)
To: libc-alpha; +Cc: goldstein.w.n, adhemerval.zanella
Changes in v10:
1. Change the test to xfail.
Changes in v9:
1. Drop the first patch which has been merged into master branch.
2. Correct BEFORE_TLSDESC_CALL placement in test.
3. Define MOD only if undefined in test.
Changes in v8:
1. Remove malloc-for-test.c and move malloc to tst-gnu2-tls2.c.
2. Add malloc_counter to verify malloc in tst-gnu2-tls2.c is called for
TLSDESC call.
3. Add BEFORE_TLSDESC_CALL and AFTER_TLSDESC_CALL.
4. Use /* ... */ in assembly code comments.
Changes in v7:
1. Generate malloc-for-test.map at build time to get the correct version
map for malloc.
Changes in v6:
1. Drop Tile registers.
Changes in v5:
1. Also preserve Tile registers.
2. Add an error check in i386 dl-tlsdesc-dynamic.h.
Changes in v4:
1. Add APX registers to STATE_SAVE_MASK so that APX registers are saved
in ld.so trampoline.
2. Also save x87 FPU stack registers for TLSDESC_CALL and TLS_DESC_CALL.
3. Change i386 _dl_tlsdesc_dynamic to IFUNC.
4. Rename GLRO(dl_x86_64_tlsdesc_dynamic) to GLRO(dl_x86_tlsdesc_dynamic)
for both i386 and x86-64.
5. Update the testcase for i386 with a simple malloc interceptor.
Changes in v3:
1. Don't add GLRO(dl_x86_64_tlsdesc_dynamic) to libc.a.
Changes in v2:
1. Add GLRO(dl_x86_64_runtime_resolve) to optimize
elf_machine_runtime_setup.
---
Add APX registers to STATE_SAVE_MASK so that APX registers are saved in
ld.so trampoline. This fixes BZ #31371.
Compiler generates the following instruction sequence for GNU2 dynamic
TLS access:
leaq tls_var@TLSDESC(%rip), %rax
call *tls_var@TLSCALL(%rax)
or
leal tls_var@TLSDESC(%ebx), %eax
call *tls_var@TLSCALL(%eax)
CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS and RAX/EAX, are unchanged after CALL. When
_dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
path. __tls_get_addr is a normal function which doesn't preserve any
caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer
caller-saved registers, but didn't preserve any other caller-saved
registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
XSAVE and XSAVEC to save and restore all caller-saved registers. This
fixes BZ #31372.
Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
to optimize elf_machine_runtime_setup.
H.J. Lu (1):
x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers
elf/Makefile | 18 ++
elf/tst-gnu2-tls2.c | 122 ++++++++++++
elf/tst-gnu2-tls2.h | 36 ++++
elf/tst-gnu2-tls2mod0.c | 31 +++
elf/tst-gnu2-tls2mod1.c | 31 +++
elf/tst-gnu2-tls2mod2.c | 31 +++
sysdeps/i386/dl-machine.h | 2 +-
sysdeps/i386/dl-tlsdesc-dynamic.h | 190 +++++++++++++++++++
sysdeps/i386/dl-tlsdesc.S | 115 +++++------
sysdeps/x86/Makefile | 7 +-
sysdeps/x86/cpu-features.c | 56 +++++-
sysdeps/x86/dl-procinfo.c | 16 ++
sysdeps/{x86_64 => x86}/features-offsets.sym | 2 +
sysdeps/x86/sysdep.h | 6 +
sysdeps/x86/tst-gnu2-tls2.c | 20 ++
sysdeps/x86_64/Makefile | 2 +-
sysdeps/x86_64/dl-machine.h | 19 +-
sysdeps/x86_64/dl-procinfo.c | 16 ++
sysdeps/x86_64/dl-tlsdesc-dynamic.h | 166 ++++++++++++++++
sysdeps/x86_64/dl-tlsdesc.S | 108 ++++-------
sysdeps/x86_64/dl-trampoline-save.h | 34 ++++
sysdeps/x86_64/dl-trampoline-state.h | 51 +++++
sysdeps/x86_64/dl-trampoline.S | 20 +-
sysdeps/x86_64/dl-trampoline.h | 34 +---
24 files changed, 920 insertions(+), 213 deletions(-)
create mode 100644 elf/tst-gnu2-tls2.c
create mode 100644 elf/tst-gnu2-tls2.h
create mode 100644 elf/tst-gnu2-tls2mod0.c
create mode 100644 elf/tst-gnu2-tls2mod1.c
create mode 100644 elf/tst-gnu2-tls2mod2.c
create mode 100644 sysdeps/i386/dl-tlsdesc-dynamic.h
rename sysdeps/{x86_64 => x86}/features-offsets.sym (89%)
create mode 100644 sysdeps/x86/tst-gnu2-tls2.c
create mode 100644 sysdeps/x86_64/dl-tlsdesc-dynamic.h
create mode 100644 sysdeps/x86_64/dl-trampoline-save.h
create mode 100644 sysdeps/x86_64/dl-trampoline-state.h
--
2.43.2
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v10 1/1] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers
2024-02-26 14:37 [PATCH v10 0/1] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers H.J. Lu
@ 2024-02-26 14:37 ` H.J. Lu
2024-02-26 19:02 ` Noah Goldstein
0 siblings, 1 reply; 7+ messages in thread
From: H.J. Lu @ 2024-02-26 14:37 UTC (permalink / raw)
To: libc-alpha; +Cc: goldstein.w.n, adhemerval.zanella
Compiler generates the following instruction sequence for GNU2 dynamic
TLS access:
leaq tls_var@TLSDESC(%rip), %rax
call *tls_var@TLSCALL(%rax)
or
leal tls_var@TLSDESC(%ebx), %eax
call *tls_var@TLSCALL(%eax)
CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS and RAX/EAX, are unchanged after CALL. When
_dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
path. __tls_get_addr is a normal function which doesn't preserve any
caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer
caller-saved registers, but didn't preserve any other caller-saved
registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
XSAVE and XSAVEC to save and restore all caller-saved registers. This
fixes BZ #31372.
Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
to optimize elf_machine_runtime_setup.
---
elf/Makefile | 18 ++
elf/tst-gnu2-tls2.c | 122 ++++++++++++
elf/tst-gnu2-tls2.h | 36 ++++
elf/tst-gnu2-tls2mod0.c | 31 +++
elf/tst-gnu2-tls2mod1.c | 31 +++
elf/tst-gnu2-tls2mod2.c | 31 +++
sysdeps/i386/dl-machine.h | 2 +-
sysdeps/i386/dl-tlsdesc-dynamic.h | 190 +++++++++++++++++++
sysdeps/i386/dl-tlsdesc.S | 115 +++++------
sysdeps/x86/Makefile | 7 +-
sysdeps/x86/cpu-features.c | 56 +++++-
sysdeps/x86/dl-procinfo.c | 16 ++
sysdeps/{x86_64 => x86}/features-offsets.sym | 2 +
sysdeps/x86/sysdep.h | 6 +
sysdeps/x86/tst-gnu2-tls2.c | 20 ++
sysdeps/x86_64/Makefile | 2 +-
sysdeps/x86_64/dl-machine.h | 19 +-
sysdeps/x86_64/dl-procinfo.c | 16 ++
sysdeps/x86_64/dl-tlsdesc-dynamic.h | 166 ++++++++++++++++
sysdeps/x86_64/dl-tlsdesc.S | 108 ++++-------
sysdeps/x86_64/dl-trampoline-save.h | 34 ++++
sysdeps/x86_64/dl-trampoline-state.h | 51 +++++
sysdeps/x86_64/dl-trampoline.S | 20 +-
sysdeps/x86_64/dl-trampoline.h | 34 +---
24 files changed, 920 insertions(+), 213 deletions(-)
create mode 100644 elf/tst-gnu2-tls2.c
create mode 100644 elf/tst-gnu2-tls2.h
create mode 100644 elf/tst-gnu2-tls2mod0.c
create mode 100644 elf/tst-gnu2-tls2mod1.c
create mode 100644 elf/tst-gnu2-tls2mod2.c
create mode 100644 sysdeps/i386/dl-tlsdesc-dynamic.h
rename sysdeps/{x86_64 => x86}/features-offsets.sym (89%)
create mode 100644 sysdeps/x86/tst-gnu2-tls2.c
create mode 100644 sysdeps/x86_64/dl-tlsdesc-dynamic.h
create mode 100644 sysdeps/x86_64/dl-trampoline-save.h
create mode 100644 sysdeps/x86_64/dl-trampoline-state.h
diff --git a/elf/Makefile b/elf/Makefile
index 36c04baf02..02dc476e27 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -424,6 +424,7 @@ tests += \
tst-glibc-hwcaps-prepend \
tst-global1 \
tst-global2 \
+ tst-gnu2-tls2 \
tst-initfinilazyfail \
tst-initorder \
tst-initorder2 \
@@ -846,6 +847,9 @@ modules-names += \
tst-filterobj-flt \
tst-finilazyfailmod \
tst-globalmod2 \
+ tst-gnu2-tls2mod0 \
+ tst-gnu2-tls2mod1 \
+ tst-gnu2-tls2mod2 \
tst-initlazyfailmod \
tst-initorder2a \
tst-initorder2b \
@@ -3044,8 +3048,22 @@ $(objpfx)tst-tlsgap.out: \
$(objpfx)tst-tlsgap-mod0.so \
$(objpfx)tst-tlsgap-mod1.so \
$(objpfx)tst-tlsgap-mod2.so
+
+$(objpfx)tst-gnu2-tls2: $(shared-thread-library)
+$(objpfx)tst-gnu2-tls2.out: \
+ $(objpfx)tst-gnu2-tls2mod0.so \
+ $(objpfx)tst-gnu2-tls2mod1.so \
+ $(objpfx)tst-gnu2-tls2mod2.so
+
ifeq (yes,$(have-mtls-dialect-gnu2))
+# This test fails if dl_tlsdesc_dynamic doesn't preserve all caller-saved
+# registers. See https://sourceware.org/bugzilla/show_bug.cgi?id=31372
+test-xfail-tst-gnu2-tls2 = yes
+
CFLAGS-tst-tlsgap-mod0.c += -mtls-dialect=gnu2
CFLAGS-tst-tlsgap-mod1.c += -mtls-dialect=gnu2
CFLAGS-tst-tlsgap-mod2.c += -mtls-dialect=gnu2
+CFLAGS-tst-gnu2-tls2mod0.c += -mtls-dialect=gnu2
+CFLAGS-tst-gnu2-tls2mod1.c += -mtls-dialect=gnu2
+CFLAGS-tst-gnu2-tls2mod2.c += -mtls-dialect=gnu2
endif
diff --git a/elf/tst-gnu2-tls2.c b/elf/tst-gnu2-tls2.c
new file mode 100644
index 0000000000..7ac04d7f33
--- /dev/null
+++ b/elf/tst-gnu2-tls2.c
@@ -0,0 +1,122 @@
+/* Test TLSDESC relocation.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <dlfcn.h>
+#include <pthread.h>
+#include <support/xdlfcn.h>
+#include <support/xthread.h>
+#include <support/check.h>
+#include <support/test-driver.h>
+#include "tst-gnu2-tls2.h"
+
+#ifndef IS_SUPPORTED
+# define IS_SUPPORTED() true
+#endif
+
+/* An architecture can define it to clobber caller-saved registers in
+ malloc below to verify that the implicit TLSDESC call won't change
+ caller-saved registers. */
+#ifndef PREPARE_MALLOC
+# define PREPARE_MALLOC()
+#endif
+
+extern void * __libc_malloc (size_t);
+
+size_t malloc_counter = 0;
+
+void *
+malloc (size_t n)
+{
+ PREPARE_MALLOC ();
+ malloc_counter++;
+ return __libc_malloc (n);
+}
+
+static void *mod[3];
+#ifndef MOD
+# define MOD(i) "tst-gnu2-tls2mod" #i ".so"
+#endif
+static const char *modname[3] = { MOD(0), MOD(1), MOD(2) };
+#undef MOD
+
+static void
+open_mod (int i)
+{
+ mod[i] = xdlopen (modname[i], RTLD_LAZY);
+ printf ("open %s\n", modname[i]);
+}
+
+static void
+close_mod (int i)
+{
+ xdlclose (mod[i]);
+ mod[i] = NULL;
+ printf ("close %s\n", modname[i]);
+}
+
+static void
+access_mod (int i, const char *sym)
+{
+ struct tls var = { -1, -1, -1, -1 };
+ struct tls *(*f) (struct tls *) = xdlsym (mod[i], sym);
+ /* Check that our malloc is called. */
+ malloc_counter = 0;
+ struct tls *p = f (&var);
+ TEST_VERIFY (malloc_counter != 0);
+ printf ("access %s: %s() = %p\n", modname[i], sym, p);
+ TEST_VERIFY_EXIT (memcmp (p, &var, sizeof (var)) == 0);
+ ++(p->a);
+}
+
+static void *
+start (void *arg)
+{
+ /* The DTV generation is at the last dlopen of mod0 and the
+ entry for mod1 is NULL. */
+
+ open_mod (1); /* Reuse modid of mod1. Uses dynamic TLS. */
+
+ /* Force the slow path in GNU2 TLS descriptor call. */
+ access_mod (1, "apply_tls");
+
+ return arg;
+}
+
+static int
+do_test (void)
+{
+ if (!IS_SUPPORTED ())
+ return EXIT_UNSUPPORTED;
+
+ open_mod (0);
+ open_mod (1);
+ open_mod (2);
+ close_mod (0);
+ close_mod (1); /* Create modid gap at mod1. */
+ open_mod (0); /* Reuse modid of mod0, bump generation count. */
+
+ /* Create a thread where DTV of mod1 is NULL. */
+ pthread_t t = xpthread_create (NULL, start, NULL);
+ xpthread_join (t);
+ return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/elf/tst-gnu2-tls2.h b/elf/tst-gnu2-tls2.h
new file mode 100644
index 0000000000..77964a57a3
--- /dev/null
+++ b/elf/tst-gnu2-tls2.h
@@ -0,0 +1,36 @@
+/* Test TLSDESC relocation.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <stdint.h>
+
+struct tls
+{
+ int64_t a, b, c, d;
+};
+
+extern struct tls *apply_tls (struct tls *);
+
+/* An architecture can define them to verify that clobber caller-saved
+ registers aren't changed by the implicit TLSDESC call. */
+#ifndef BEFORE_TLSDESC_CALL
+# define BEFORE_TLSDESC_CALL()
+#endif
+
+#ifndef AFTER_TLSDESC_CALL
+# define AFTER_TLSDESC_CALL()
+#endif
diff --git a/elf/tst-gnu2-tls2mod0.c b/elf/tst-gnu2-tls2mod0.c
new file mode 100644
index 0000000000..45556a0e17
--- /dev/null
+++ b/elf/tst-gnu2-tls2mod0.c
@@ -0,0 +1,31 @@
+/* DSO used by tst-gnu2-tls2.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "tst-gnu2-tls2.h"
+
+__thread struct tls tls_var0 __attribute__ ((visibility ("hidden")));
+
+struct tls *
+apply_tls (struct tls *p)
+{
+ BEFORE_TLSDESC_CALL ();
+ tls_var0 = *p;
+ struct tls *ret = &tls_var0;
+ AFTER_TLSDESC_CALL ();
+ return ret;
+}
diff --git a/elf/tst-gnu2-tls2mod1.c b/elf/tst-gnu2-tls2mod1.c
new file mode 100644
index 0000000000..e10b9dbc0a
--- /dev/null
+++ b/elf/tst-gnu2-tls2mod1.c
@@ -0,0 +1,31 @@
+/* DSO used by tst-gnu2-tls2.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "tst-gnu2-tls2.h"
+
+__thread struct tls tls_var1[100] __attribute__ ((visibility ("hidden")));
+
+struct tls *
+apply_tls (struct tls *p)
+{
+ BEFORE_TLSDESC_CALL ();
+ tls_var1[1] = *p;
+ struct tls *ret = &tls_var1[1];
+ AFTER_TLSDESC_CALL ();
+ return ret;
+}
diff --git a/elf/tst-gnu2-tls2mod2.c b/elf/tst-gnu2-tls2mod2.c
new file mode 100644
index 0000000000..141af51e55
--- /dev/null
+++ b/elf/tst-gnu2-tls2mod2.c
@@ -0,0 +1,31 @@
+/* DSO used by tst-gnu2-tls2.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include "tst-gnu2-tls2.h"
+
+__thread struct tls tls_var2 __attribute__ ((visibility ("hidden")));
+
+struct tls *
+apply_tls (struct tls *p)
+{
+ BEFORE_TLSDESC_CALL ();
+ tls_var2 = *p;
+ struct tls *ret = &tls_var2;
+ AFTER_TLSDESC_CALL ();
+ return ret;
+}
diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
index fc1ef96587..50d74fe6e9 100644
--- a/sysdeps/i386/dl-machine.h
+++ b/sysdeps/i386/dl-machine.h
@@ -347,7 +347,7 @@ and creates an unsatisfiable circular dependency.\n",
{
td->arg = _dl_make_tlsdesc_dynamic
(sym_map, sym->st_value + (ElfW(Word))td->arg);
- td->entry = _dl_tlsdesc_dynamic;
+ td->entry = GLRO(dl_x86_tlsdesc_dynamic);
}
else
# endif
diff --git a/sysdeps/i386/dl-tlsdesc-dynamic.h b/sysdeps/i386/dl-tlsdesc-dynamic.h
new file mode 100644
index 0000000000..3627028577
--- /dev/null
+++ b/sysdeps/i386/dl-tlsdesc-dynamic.h
@@ -0,0 +1,190 @@
+/* Thread-local storage handling in the ELF dynamic linker. i386 version.
+ Copyright (C) 2004-2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#undef REGISTER_SAVE_AREA
+
+#if !defined USE_FNSAVE && (STATE_SAVE_ALIGNMENT % 16) != 0
+# error STATE_SAVE_ALIGNMENT must be multiple of 16
+#endif
+
+#if DL_RUNTIME_RESOLVE_REALIGN_STACK
+# ifdef USE_FNSAVE
+# error USE_FNSAVE shouldn't be defined
+# endif
+# ifdef USE_FXSAVE
+/* Use fxsave to save all registers. */
+# define REGISTER_SAVE_AREA 512
+# endif
+#else
+# ifdef USE_FNSAVE
+/* Use fnsave to save x87 FPU stack registers. */
+# define REGISTER_SAVE_AREA 108
+# else
+# ifndef USE_FXSAVE
+# error USE_FXSAVE must be defined
+# endif
+/* Use fxsave to save all registers. Add 12 bytes to align the stack
+ to 16 bytes. */
+# define REGISTER_SAVE_AREA (512 + 12)
+# endif
+#endif
+
+ .hidden _dl_tlsdesc_dynamic
+ .global _dl_tlsdesc_dynamic
+ .type _dl_tlsdesc_dynamic,@function
+
+ /* This function is used for symbols that need dynamic TLS.
+
+ %eax points to the TLS descriptor, such that 0(%eax) points to
+ _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
+ tlsdesc_dynamic_arg object. It must return in %eax the offset
+ between the thread pointer and the object denoted by the
+ argument, without clobbering any registers.
+
+ The assembly code that follows is a rendition of the following
+ C code, hand-optimized a little bit.
+
+ptrdiff_t
+__attribute__ ((__regparm__ (1)))
+_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
+{
+ struct tlsdesc_dynamic_arg *td = tdp->arg;
+ dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
+ if (__builtin_expect (td->gen_count <= dtv[0].counter
+ && (dtv[td->tlsinfo.ti_module].pointer.val
+ != TLS_DTV_UNALLOCATED),
+ 1))
+ return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
+ - __thread_pointer;
+
+ return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
+}
+*/
+ cfi_startproc
+ .align 16
+_dl_tlsdesc_dynamic:
+ /* Like all TLS resolvers, preserve call-clobbered registers.
+ We need two scratch regs anyway. */
+ subl $32, %esp
+ cfi_adjust_cfa_offset (32)
+ movl %ecx, 20(%esp)
+ movl %edx, 24(%esp)
+ movl TLSDESC_ARG(%eax), %eax
+ movl %gs:DTV_OFFSET, %edx
+ movl TLSDESC_GEN_COUNT(%eax), %ecx
+ cmpl (%edx), %ecx
+ ja 2f
+ movl TLSDESC_MODID(%eax), %ecx
+ movl (%edx,%ecx,8), %edx
+ cmpl $-1, %edx
+ je 2f
+ movl TLSDESC_MODOFF(%eax), %eax
+ addl %edx, %eax
+1:
+ movl 20(%esp), %ecx
+ subl %gs:0, %eax
+ movl 24(%esp), %edx
+ addl $32, %esp
+ cfi_adjust_cfa_offset (-32)
+ ret
+ .p2align 4,,7
+2:
+ cfi_adjust_cfa_offset (32)
+#if DL_RUNTIME_RESOLVE_REALIGN_STACK
+ movl %ebx, -28(%esp)
+ movl %esp, %ebx
+ cfi_def_cfa_register(%ebx)
+ and $-STATE_SAVE_ALIGNMENT, %esp
+#endif
+#ifdef REGISTER_SAVE_AREA
+ subl $REGISTER_SAVE_AREA, %esp
+# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
+ cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
+# endif
+#else
+# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
+# error DL_RUNTIME_RESOLVE_REALIGN_STACK must be true
+# endif
+ /* Allocate stack space of the required size to save the state. */
+ LOAD_PIC_REG (cx)
+ subl RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET+_rtld_local_ro@GOTOFF(%ecx), %esp
+#endif
+#ifdef USE_FNSAVE
+ fnsave (%esp)
+#elif defined USE_FXSAVE
+ fxsave (%esp)
+#else
+ /* Save the argument for ___tls_get_addr in EAX. */
+ movl %eax, %ecx
+ movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
+ xorl %edx, %edx
+ /* Clear the XSAVE Header. */
+# ifdef USE_XSAVE
+ movl %edx, (512)(%esp)
+ movl %edx, (512 + 4 * 1)(%esp)
+ movl %edx, (512 + 4 * 2)(%esp)
+ movl %edx, (512 + 4 * 3)(%esp)
+# endif
+ movl %edx, (512 + 4 * 4)(%esp)
+ movl %edx, (512 + 4 * 5)(%esp)
+ movl %edx, (512 + 4 * 6)(%esp)
+ movl %edx, (512 + 4 * 7)(%esp)
+ movl %edx, (512 + 4 * 8)(%esp)
+ movl %edx, (512 + 4 * 9)(%esp)
+ movl %edx, (512 + 4 * 10)(%esp)
+ movl %edx, (512 + 4 * 11)(%esp)
+ movl %edx, (512 + 4 * 12)(%esp)
+ movl %edx, (512 + 4 * 13)(%esp)
+ movl %edx, (512 + 4 * 14)(%esp)
+ movl %edx, (512 + 4 * 15)(%esp)
+# ifdef USE_XSAVE
+ xsave (%esp)
+# else
+ xsavec (%esp)
+# endif
+ /* Restore the argument for ___tls_get_addr in EAX. */
+ movl %ecx, %eax
+#endif
+ call HIDDEN_JUMPTARGET (___tls_get_addr)
+ /* Get register content back. */
+#ifdef USE_FNSAVE
+ frstor (%esp)
+#elif defined USE_FXSAVE
+ fxrstor (%esp)
+#else
+ /* Save and retore ___tls_get_addr return value stored in EAX. */
+ movl %eax, %ecx
+ movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
+ xorl %edx, %edx
+ xrstor (%esp)
+ movl %ecx, %eax
+#endif
+#if DL_RUNTIME_RESOLVE_REALIGN_STACK
+ mov %ebx, %esp
+ cfi_def_cfa_register(%esp)
+ movl -28(%esp), %ebx
+ cfi_restore(%ebx)
+#else
+ addl $REGISTER_SAVE_AREA, %esp
+ cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA)
+#endif
+ jmp 1b
+ cfi_endproc
+ .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
+
+#undef STATE_SAVE_ALIGNMENT
diff --git a/sysdeps/i386/dl-tlsdesc.S b/sysdeps/i386/dl-tlsdesc.S
index 90d93caa0c..f002feee56 100644
--- a/sysdeps/i386/dl-tlsdesc.S
+++ b/sysdeps/i386/dl-tlsdesc.S
@@ -18,8 +18,27 @@
#include <sysdep.h>
#include <tls.h>
+#include <cpu-features-offsets.h>
+#include <features-offsets.h>
#include "tlsdesc.h"
+#ifndef DL_STACK_ALIGNMENT
+/* Due to GCC bug:
+
+ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
+
+ __tls_get_addr may be called with 4-byte stack alignment. Although
+ this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
+ that stack will be always aligned at 16 bytes. */
+# define DL_STACK_ALIGNMENT 4
+#endif
+
+/* True if _dl_tlsdesc_dynamic should align stack for STATE_SAVE or align
+ stack to MINIMUM_ALIGNMENT bytes before calling ___tls_get_addr. */
+#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
+ (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
+ || MINIMUM_ALIGNMENT > DL_STACK_ALIGNMENT)
+
.text
/* This function is used to compute the TP offset for symbols in
@@ -65,69 +84,35 @@ _dl_tlsdesc_undefweak:
.size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
#ifdef SHARED
- .hidden _dl_tlsdesc_dynamic
- .global _dl_tlsdesc_dynamic
- .type _dl_tlsdesc_dynamic,@function
-
- /* This function is used for symbols that need dynamic TLS.
-
- %eax points to the TLS descriptor, such that 0(%eax) points to
- _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
- tlsdesc_dynamic_arg object. It must return in %eax the offset
- between the thread pointer and the object denoted by the
- argument, without clobbering any registers.
-
- The assembly code that follows is a rendition of the following
- C code, hand-optimized a little bit.
-
-ptrdiff_t
-__attribute__ ((__regparm__ (1)))
-_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
-{
- struct tlsdesc_dynamic_arg *td = tdp->arg;
- dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
- if (__builtin_expect (td->gen_count <= dtv[0].counter
- && (dtv[td->tlsinfo.ti_module].pointer.val
- != TLS_DTV_UNALLOCATED),
- 1))
- return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
- - __thread_pointer;
-
- return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
-}
-*/
- cfi_startproc
- .align 16
-_dl_tlsdesc_dynamic:
- /* Like all TLS resolvers, preserve call-clobbered registers.
- We need two scratch regs anyway. */
- subl $28, %esp
- cfi_adjust_cfa_offset (28)
- movl %ecx, 20(%esp)
- movl %edx, 24(%esp)
- movl TLSDESC_ARG(%eax), %eax
- movl %gs:DTV_OFFSET, %edx
- movl TLSDESC_GEN_COUNT(%eax), %ecx
- cmpl (%edx), %ecx
- ja .Lslow
- movl TLSDESC_MODID(%eax), %ecx
- movl (%edx,%ecx,8), %edx
- cmpl $-1, %edx
- je .Lslow
- movl TLSDESC_MODOFF(%eax), %eax
- addl %edx, %eax
-.Lret:
- movl 20(%esp), %ecx
- subl %gs:0, %eax
- movl 24(%esp), %edx
- addl $28, %esp
- cfi_adjust_cfa_offset (-28)
- ret
- .p2align 4,,7
-.Lslow:
- cfi_adjust_cfa_offset (28)
- call HIDDEN_JUMPTARGET (___tls_get_addr)
- jmp .Lret
- cfi_endproc
- .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
+# define USE_FNSAVE
+# define MINIMUM_ALIGNMENT 4
+# define STATE_SAVE_ALIGNMENT 4
+# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fnsave
+# include "dl-tlsdesc-dynamic.h"
+# undef _dl_tlsdesc_dynamic
+# undef MINIMUM_ALIGNMENT
+# undef USE_FNSAVE
+
+# define MINIMUM_ALIGNMENT 16
+
+# define USE_FXSAVE
+# define STATE_SAVE_ALIGNMENT 16
+# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave
+# include "dl-tlsdesc-dynamic.h"
+# undef _dl_tlsdesc_dynamic
+# undef USE_FXSAVE
+
+# define USE_XSAVE
+# define STATE_SAVE_ALIGNMENT 64
+# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave
+# include "dl-tlsdesc-dynamic.h"
+# undef _dl_tlsdesc_dynamic
+# undef USE_XSAVE
+
+# define USE_XSAVEC
+# define STATE_SAVE_ALIGNMENT 64
+# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec
+# include "dl-tlsdesc-dynamic.h"
+# undef _dl_tlsdesc_dynamic
+# undef USE_XSAVEC
#endif /* SHARED */
diff --git a/sysdeps/x86/Makefile b/sysdeps/x86/Makefile
index 73b29cc78c..5311b594af 100644
--- a/sysdeps/x86/Makefile
+++ b/sysdeps/x86/Makefile
@@ -1,5 +1,5 @@
ifeq ($(subdir),csu)
-gen-as-const-headers += cpu-features-offsets.sym
+gen-as-const-headers += cpu-features-offsets.sym features-offsets.sym
endif
ifeq ($(subdir),elf)
@@ -86,6 +86,11 @@ endif
tst-ifunc-isa-2-ENV = GLIBC_TUNABLES=glibc.cpu.hwcaps=-SSE4_2,-AVX,-AVX2,-AVX512F
tst-ifunc-isa-2-static-ENV = $(tst-ifunc-isa-2-ENV)
tst-hwcap-tunables-ARGS = -- $(host-test-program-cmd)
+
+CFLAGS-tst-gnu2-tls2.c += -msse
+CFLAGS-tst-gnu2-tls2mod0.c += -msse2 -mtune=haswell
+CFLAGS-tst-gnu2-tls2mod1.c += -msse2 -mtune=haswell
+CFLAGS-tst-gnu2-tls2mod2.c += -msse2 -mtune=haswell
endif
ifeq ($(subdir),math)
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index 25e6622a79..835113b42f 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -27,8 +27,13 @@
extern void TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *)
attribute_hidden;
-#if defined SHARED && defined __x86_64__
-# include <dl-plt-rewrite.h>
+#if defined SHARED
+extern void _dl_tlsdesc_dynamic_fxsave (void) attribute_hidden;
+extern void _dl_tlsdesc_dynamic_xsave (void) attribute_hidden;
+extern void _dl_tlsdesc_dynamic_xsavec (void) attribute_hidden;
+
+# ifdef __x86_64__
+# include <dl-plt-rewrite.h>
static void
TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
@@ -47,6 +52,15 @@ TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
: plt_rewrite_jmp);
}
}
+# else
+extern void _dl_tlsdesc_dynamic_fnsave (void) attribute_hidden;
+# endif
+#endif
+
+#ifdef __x86_64__
+extern void _dl_runtime_resolve_fxsave (void) attribute_hidden;
+extern void _dl_runtime_resolve_xsave (void) attribute_hidden;
+extern void _dl_runtime_resolve_xsavec (void) attribute_hidden;
#endif
#ifdef __LP64__
@@ -1130,6 +1144,44 @@ no_cpuid:
TUNABLE_CALLBACK (set_x86_shstk));
#endif
+ if (GLRO(dl_x86_cpu_features).xsave_state_size != 0)
+ {
+ if (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC))
+ {
+#ifdef __x86_64__
+ GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_xsavec;
+#endif
+#ifdef SHARED
+ GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_xsavec;
+#endif
+ }
+ else
+ {
+#ifdef __x86_64__
+ GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_xsave;
+#endif
+#ifdef SHARED
+ GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_xsave;
+#endif
+ }
+ }
+ else
+ {
+#ifdef __x86_64__
+ GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_fxsave;
+# ifdef SHARED
+ GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fxsave;
+# endif
+#else
+# ifdef SHARED
+ if (CPU_FEATURE_USABLE_P (cpu_features, FXSR))
+ GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fxsave;
+ else
+ GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fnsave;
+# endif
+#endif
+ }
+
#ifdef SHARED
# ifdef __x86_64__
TUNABLE_GET (plt_rewrite, tunable_val_t *,
diff --git a/sysdeps/x86/dl-procinfo.c b/sysdeps/x86/dl-procinfo.c
index ee957b4d70..5920d4b320 100644
--- a/sysdeps/x86/dl-procinfo.c
+++ b/sysdeps/x86/dl-procinfo.c
@@ -86,3 +86,19 @@ PROCINFO_CLASS const char _dl_x86_platforms[4][9]
#else
,
#endif
+
+#if defined SHARED && !IS_IN (ldconfig)
+# if !defined PROCINFO_DECL
+ ._dl_x86_tlsdesc_dynamic
+# else
+PROCINFO_CLASS void * _dl_x86_tlsdesc_dynamic
+# endif
+# ifndef PROCINFO_DECL
+= NULL
+# endif
+# ifdef PROCINFO_DECL
+;
+# else
+,
+# endif
+#endif
diff --git a/sysdeps/x86_64/features-offsets.sym b/sysdeps/x86/features-offsets.sym
similarity index 89%
rename from sysdeps/x86_64/features-offsets.sym
rename to sysdeps/x86/features-offsets.sym
index 9e4be3393a..77e990c705 100644
--- a/sysdeps/x86_64/features-offsets.sym
+++ b/sysdeps/x86/features-offsets.sym
@@ -3,4 +3,6 @@
#include <ldsodefs.h>
RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET offsetof (struct rtld_global_ro, _dl_x86_cpu_features)
+#ifdef __x86_64__
RTLD_GLOBAL_DL_X86_FEATURE_1_OFFSET offsetof (struct rtld_global, _dl_x86_feature_1)
+#endif
diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h
index 837fd28734..485cad9c02 100644
--- a/sysdeps/x86/sysdep.h
+++ b/sysdeps/x86/sysdep.h
@@ -70,6 +70,12 @@
| (1 << X86_XSTATE_ZMM_H_ID))
#endif
+/* States which should be saved for TLSDESC_CALL and TLS_DESC_CALL.
+ Compiler assumes that all registers, including x87 FPU stack registers,
+ are unchanged after CALL, except for EFLAGS and RAX/EAX. */
+#define TLSDESC_CALL_STATE_SAVE_MASK \
+ (STATE_SAVE_MASK | (1 << X86_XSTATE_X87_ID))
+
/* Constants for bits in __x86_string_control: */
/* Avoid short distance REP MOVSB. */
diff --git a/sysdeps/x86/tst-gnu2-tls2.c b/sysdeps/x86/tst-gnu2-tls2.c
new file mode 100644
index 0000000000..de900a423b
--- /dev/null
+++ b/sysdeps/x86/tst-gnu2-tls2.c
@@ -0,0 +1,20 @@
+#ifndef __x86_64__
+#include <sys/platform/x86.h>
+
+#define IS_SUPPORTED() CPU_FEATURE_ACTIVE (SSE2)
+#endif
+
+/* Clear XMM0...XMM7 */
+#define PREPARE_MALLOC() \
+{ \
+ asm volatile ("xorps %%xmm0, %%xmm0" : : : "xmm0" ); \
+ asm volatile ("xorps %%xmm1, %%xmm1" : : : "xmm1" ); \
+ asm volatile ("xorps %%xmm2, %%xmm2" : : : "xmm2" ); \
+ asm volatile ("xorps %%xmm3, %%xmm3" : : : "xmm3" ); \
+ asm volatile ("xorps %%xmm4, %%xmm4" : : : "xmm4" ); \
+ asm volatile ("xorps %%xmm5, %%xmm5" : : : "xmm5" ); \
+ asm volatile ("xorps %%xmm6, %%xmm6" : : : "xmm6" ); \
+ asm volatile ("xorps %%xmm7, %%xmm7" : : : "xmm7" ); \
+}
+
+#include <elf/tst-gnu2-tls2.c>
diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index 145f25e7f6..9337e95093 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -10,7 +10,7 @@ LDFLAGS-rtld += -Wl,-z,nomark-plt
endif
ifeq ($(subdir),csu)
-gen-as-const-headers += features-offsets.sym link-defines.sym
+gen-as-const-headers += link-defines.sym
endif
ifeq ($(subdir),gmon)
diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index 6d605d0d32..ff5d45f7cb 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -71,9 +71,6 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
int lazy, int profile)
{
Elf64_Addr *got;
- extern void _dl_runtime_resolve_fxsave (ElfW(Word)) attribute_hidden;
- extern void _dl_runtime_resolve_xsave (ElfW(Word)) attribute_hidden;
- extern void _dl_runtime_resolve_xsavec (ElfW(Word)) attribute_hidden;
extern void _dl_runtime_profile_sse (ElfW(Word)) attribute_hidden;
extern void _dl_runtime_profile_avx (ElfW(Word)) attribute_hidden;
extern void _dl_runtime_profile_avx512 (ElfW(Word)) attribute_hidden;
@@ -96,8 +93,6 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
/* Identify this shared object. */
*(ElfW(Addr) *) (got + 1) = (ElfW(Addr)) l;
- const struct cpu_features* cpu_features = __get_cpu_features ();
-
#ifdef SHARED
/* The got[2] entry contains the address of a function which gets
called to get the address of a so far unresolved function and
@@ -107,6 +102,7 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
end in this function. */
if (__glibc_unlikely (profile))
{
+ const struct cpu_features* cpu_features = __get_cpu_features ();
if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512F))
*(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx512;
else if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX))
@@ -126,15 +122,8 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
/* This function will get called to fix up the GOT entry
indicated by the offset on the stack, and then jump to
the resolved address. */
- if (MINIMUM_X86_ISA_LEVEL >= AVX_X86_ISA_LEVEL
- || GLRO(dl_x86_cpu_features).xsave_state_size != 0)
- *(ElfW(Addr) *) (got + 2)
- = (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC)
- ? (ElfW(Addr)) &_dl_runtime_resolve_xsavec
- : (ElfW(Addr)) &_dl_runtime_resolve_xsave);
- else
- *(ElfW(Addr) *) (got + 2)
- = (ElfW(Addr)) &_dl_runtime_resolve_fxsave;
+ *(ElfW(Addr) *) (got + 2)
+ = (ElfW(Addr)) GLRO(dl_x86_64_runtime_resolve);
}
}
@@ -383,7 +372,7 @@ and creates an unsatisfiable circular dependency.\n",
{
td->arg = _dl_make_tlsdesc_dynamic
(sym_map, sym->st_value + reloc->r_addend);
- td->entry = _dl_tlsdesc_dynamic;
+ td->entry = GLRO(dl_x86_tlsdesc_dynamic);
}
else
# endif
diff --git a/sysdeps/x86_64/dl-procinfo.c b/sysdeps/x86_64/dl-procinfo.c
index 4d1d790fbb..06637a8154 100644
--- a/sysdeps/x86_64/dl-procinfo.c
+++ b/sysdeps/x86_64/dl-procinfo.c
@@ -41,5 +41,21 @@
#include <sysdeps/x86/dl-procinfo.c>
+#if !IS_IN (ldconfig)
+# if !defined PROCINFO_DECL && defined SHARED
+ ._dl_x86_64_runtime_resolve
+# else
+PROCINFO_CLASS void * _dl_x86_64_runtime_resolve
+# endif
+# ifndef PROCINFO_DECL
+= NULL
+# endif
+# if !defined SHARED || defined PROCINFO_DECL
+;
+# else
+,
+# endif
+#endif
+
#undef PROCINFO_DECL
#undef PROCINFO_CLASS
diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
new file mode 100644
index 0000000000..0c2e8d5320
--- /dev/null
+++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
@@ -0,0 +1,166 @@
+/* Thread-local storage handling in the ELF dynamic linker. x86_64 version.
+ Copyright (C) 2004-2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#ifndef SECTION
+# define SECTION(p) p
+#endif
+
+#undef REGISTER_SAVE_AREA
+#undef LOCAL_STORAGE_AREA
+#undef BASE
+
+#include "dl-trampoline-state.h"
+
+ .section SECTION(.text),"ax",@progbits
+
+ .hidden _dl_tlsdesc_dynamic
+ .global _dl_tlsdesc_dynamic
+ .type _dl_tlsdesc_dynamic,@function
+
+ /* %rax points to the TLS descriptor, such that 0(%rax) points to
+ _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
+ tlsdesc_dynamic_arg object. It must return in %rax the offset
+ between the thread pointer and the object denoted by the
+ argument, without clobbering any registers.
+
+ The assembly code that follows is a rendition of the following
+ C code, hand-optimized a little bit.
+
+ptrdiff_t
+_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
+{
+ struct tlsdesc_dynamic_arg *td = tdp->arg;
+ dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
+ if (__builtin_expect (td->gen_count <= dtv[0].counter
+ && (dtv[td->tlsinfo.ti_module].pointer.val
+ != TLS_DTV_UNALLOCATED),
+ 1))
+ return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
+ - __thread_pointer;
+
+ return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
+}
+*/
+ cfi_startproc
+ .align 16
+_dl_tlsdesc_dynamic:
+ _CET_ENDBR
+ /* Preserve call-clobbered registers that we modify.
+ We need two scratch regs anyway. */
+ movq %rsi, -16(%rsp)
+ mov %fs:DTV_OFFSET, %RSI_LP
+ movq %rdi, -8(%rsp)
+ movq TLSDESC_ARG(%rax), %rdi
+ movq (%rsi), %rax
+ cmpq %rax, TLSDESC_GEN_COUNT(%rdi)
+ ja 2f
+ movq TLSDESC_MODID(%rdi), %rax
+ salq $4, %rax
+ movq (%rax,%rsi), %rax
+ cmpq $-1, %rax
+ je 2f
+ addq TLSDESC_MODOFF(%rdi), %rax
+1:
+ movq -16(%rsp), %rsi
+ sub %fs:0, %RAX_LP
+ movq -8(%rsp), %rdi
+ ret
+2:
+#if DL_RUNTIME_RESOLVE_REALIGN_STACK
+ movq %rbx, -24(%rsp)
+ mov %RSP_LP, %RBX_LP
+ cfi_def_cfa_register(%rbx)
+ and $-STATE_SAVE_ALIGNMENT, %RSP_LP
+#endif
+#ifdef REGISTER_SAVE_AREA
+# if DL_RUNTIME_RESOLVE_REALIGN_STACK
+ /* STATE_SAVE_OFFSET has space for 8 integer registers. But we
+ need space for RCX, RDX, RSI, RDI, R8, R9, R10 and R11, plus
+ RBX above. */
+ sub $(REGISTER_SAVE_AREA + STATE_SAVE_ALIGNMENT), %RSP_LP
+# else
+ sub $REGISTER_SAVE_AREA, %RSP_LP
+ cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
+# endif
+#else
+ /* Allocate stack space of the required size to save the state. */
+ sub _rtld_local_ro+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET(%rip), %RSP_LP
+#endif
+ /* Besides rdi and rsi, saved above, save rcx, rdx, r8, r9,
+ r10 and r11. */
+ movq %rcx, REGISTER_SAVE_RCX(%rsp)
+ movq %rdx, REGISTER_SAVE_RDX(%rsp)
+ movq %r8, REGISTER_SAVE_R8(%rsp)
+ movq %r9, REGISTER_SAVE_R9(%rsp)
+ movq %r10, REGISTER_SAVE_R10(%rsp)
+ movq %r11, REGISTER_SAVE_R11(%rsp)
+#ifdef USE_FXSAVE
+ fxsave STATE_SAVE_OFFSET(%rsp)
+#else
+ movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
+ xorl %edx, %edx
+ /* Clear the XSAVE Header. */
+# ifdef USE_XSAVE
+ movq %rdx, (STATE_SAVE_OFFSET + 512)(%rsp)
+ movq %rdx, (STATE_SAVE_OFFSET + 512 + 8)(%rsp)
+# endif
+ movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 2)(%rsp)
+ movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 3)(%rsp)
+ movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 4)(%rsp)
+ movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 5)(%rsp)
+ movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 6)(%rsp)
+ movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 7)(%rsp)
+# ifdef USE_XSAVE
+ xsave STATE_SAVE_OFFSET(%rsp)
+# else
+ xsavec STATE_SAVE_OFFSET(%rsp)
+# endif
+#endif
+ /* %rdi already points to the tlsinfo data structure. */
+ call HIDDEN_JUMPTARGET (__tls_get_addr)
+ # Get register content back.
+#ifdef USE_FXSAVE
+ fxrstor STATE_SAVE_OFFSET(%rsp)
+#else
+ /* Save and retore __tls_get_addr return value stored in RAX. */
+ mov %RAX_LP, %RCX_LP
+ movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
+ xorl %edx, %edx
+ xrstor STATE_SAVE_OFFSET(%rsp)
+ mov %RCX_LP, %RAX_LP
+#endif
+ movq REGISTER_SAVE_R11(%rsp), %r11
+ movq REGISTER_SAVE_R10(%rsp), %r10
+ movq REGISTER_SAVE_R9(%rsp), %r9
+ movq REGISTER_SAVE_R8(%rsp), %r8
+ movq REGISTER_SAVE_RDX(%rsp), %rdx
+ movq REGISTER_SAVE_RCX(%rsp), %rcx
+#if DL_RUNTIME_RESOLVE_REALIGN_STACK
+ mov %RBX_LP, %RSP_LP
+ cfi_def_cfa_register(%rsp)
+ movq -24(%rsp), %rbx
+ cfi_restore(%rbx)
+#else
+ add $REGISTER_SAVE_AREA, %RSP_LP
+ cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA)
+#endif
+ jmp 1b
+ cfi_endproc
+ .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
+
+#undef STATE_SAVE_ALIGNMENT
diff --git a/sysdeps/x86_64/dl-tlsdesc.S b/sysdeps/x86_64/dl-tlsdesc.S
index f748af2ece..ea69f5223a 100644
--- a/sysdeps/x86_64/dl-tlsdesc.S
+++ b/sysdeps/x86_64/dl-tlsdesc.S
@@ -18,7 +18,19 @@
#include <sysdep.h>
#include <tls.h>
+#include <cpu-features-offsets.h>
+#include <features-offsets.h>
#include "tlsdesc.h"
+#include "dl-trampoline-save.h"
+
+/* Area on stack to save and restore registers used for parameter
+ passing when calling _dl_tlsdesc_dynamic. */
+#define REGISTER_SAVE_RCX 0
+#define REGISTER_SAVE_RDX (REGISTER_SAVE_RCX + 8)
+#define REGISTER_SAVE_R8 (REGISTER_SAVE_RDX + 8)
+#define REGISTER_SAVE_R9 (REGISTER_SAVE_R8 + 8)
+#define REGISTER_SAVE_R10 (REGISTER_SAVE_R9 + 8)
+#define REGISTER_SAVE_R11 (REGISTER_SAVE_R10 + 8)
.text
@@ -67,80 +79,24 @@ _dl_tlsdesc_undefweak:
.size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
#ifdef SHARED
- .hidden _dl_tlsdesc_dynamic
- .global _dl_tlsdesc_dynamic
- .type _dl_tlsdesc_dynamic,@function
-
- /* %rax points to the TLS descriptor, such that 0(%rax) points to
- _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
- tlsdesc_dynamic_arg object. It must return in %rax the offset
- between the thread pointer and the object denoted by the
- argument, without clobbering any registers.
-
- The assembly code that follows is a rendition of the following
- C code, hand-optimized a little bit.
-
-ptrdiff_t
-_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
-{
- struct tlsdesc_dynamic_arg *td = tdp->arg;
- dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
- if (__builtin_expect (td->gen_count <= dtv[0].counter
- && (dtv[td->tlsinfo.ti_module].pointer.val
- != TLS_DTV_UNALLOCATED),
- 1))
- return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
- - __thread_pointer;
-
- return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
-}
-*/
- cfi_startproc
- .align 16
-_dl_tlsdesc_dynamic:
- _CET_ENDBR
- /* Preserve call-clobbered registers that we modify.
- We need two scratch regs anyway. */
- movq %rsi, -16(%rsp)
- mov %fs:DTV_OFFSET, %RSI_LP
- movq %rdi, -8(%rsp)
- movq TLSDESC_ARG(%rax), %rdi
- movq (%rsi), %rax
- cmpq %rax, TLSDESC_GEN_COUNT(%rdi)
- ja .Lslow
- movq TLSDESC_MODID(%rdi), %rax
- salq $4, %rax
- movq (%rax,%rsi), %rax
- cmpq $-1, %rax
- je .Lslow
- addq TLSDESC_MODOFF(%rdi), %rax
-.Lret:
- movq -16(%rsp), %rsi
- sub %fs:0, %RAX_LP
- movq -8(%rsp), %rdi
- ret
-.Lslow:
- /* Besides rdi and rsi, saved above, save rdx, rcx, r8, r9,
- r10 and r11. Also, align the stack, that's off by 8 bytes. */
- subq $72, %rsp
- cfi_adjust_cfa_offset (72)
- movq %rdx, 8(%rsp)
- movq %rcx, 16(%rsp)
- movq %r8, 24(%rsp)
- movq %r9, 32(%rsp)
- movq %r10, 40(%rsp)
- movq %r11, 48(%rsp)
- /* %rdi already points to the tlsinfo data structure. */
- call HIDDEN_JUMPTARGET (__tls_get_addr)
- movq 8(%rsp), %rdx
- movq 16(%rsp), %rcx
- movq 24(%rsp), %r8
- movq 32(%rsp), %r9
- movq 40(%rsp), %r10
- movq 48(%rsp), %r11
- addq $72, %rsp
- cfi_adjust_cfa_offset (-72)
- jmp .Lret
- cfi_endproc
- .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
+# define USE_FXSAVE
+# define STATE_SAVE_ALIGNMENT 16
+# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave
+# include "dl-tlsdesc-dynamic.h"
+# undef _dl_tlsdesc_dynamic
+# undef USE_FXSAVE
+
+# define USE_XSAVE
+# define STATE_SAVE_ALIGNMENT 64
+# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave
+# include "dl-tlsdesc-dynamic.h"
+# undef _dl_tlsdesc_dynamic
+# undef USE_XSAVE
+
+# define USE_XSAVEC
+# define STATE_SAVE_ALIGNMENT 64
+# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec
+# include "dl-tlsdesc-dynamic.h"
+# undef _dl_tlsdesc_dynamic
+# undef USE_XSAVEC
#endif /* SHARED */
diff --git a/sysdeps/x86_64/dl-trampoline-save.h b/sysdeps/x86_64/dl-trampoline-save.h
new file mode 100644
index 0000000000..84eac4a8ac
--- /dev/null
+++ b/sysdeps/x86_64/dl-trampoline-save.h
@@ -0,0 +1,34 @@
+/* x86-64 PLT trampoline register save macros.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#ifndef DL_STACK_ALIGNMENT
+/* Due to GCC bug:
+
+ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
+
+ __tls_get_addr may be called with 8-byte stack alignment. Although
+ this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
+ that stack will be always aligned at 16 bytes. */
+# define DL_STACK_ALIGNMENT 8
+#endif
+
+/* True if _dl_runtime_resolve should align stack for STATE_SAVE or align
+ stack to 16 bytes before calling _dl_fixup. */
+#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
+ (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
+ || 16 > DL_STACK_ALIGNMENT)
diff --git a/sysdeps/x86_64/dl-trampoline-state.h b/sysdeps/x86_64/dl-trampoline-state.h
new file mode 100644
index 0000000000..575f120797
--- /dev/null
+++ b/sysdeps/x86_64/dl-trampoline-state.h
@@ -0,0 +1,51 @@
+/* x86-64 PLT dl-trampoline state macros.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#if (STATE_SAVE_ALIGNMENT % 16) != 0
+# error STATE_SAVE_ALIGNMENT must be multiple of 16
+#endif
+
+#if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) != 0
+# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT
+#endif
+
+#if DL_RUNTIME_RESOLVE_REALIGN_STACK
+/* Local stack area before jumping to function address: RBX. */
+# define LOCAL_STORAGE_AREA 8
+# define BASE rbx
+# ifdef USE_FXSAVE
+/* Use fxsave to save XMM registers. */
+# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET)
+# if (REGISTER_SAVE_AREA % 16) != 0
+# error REGISTER_SAVE_AREA must be multiple of 16
+# endif
+# endif
+#else
+# ifndef USE_FXSAVE
+# error USE_FXSAVE must be defined
+# endif
+/* Use fxsave to save XMM registers. */
+# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8)
+/* Local stack area before jumping to function address: All saved
+ registers. */
+# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA
+# define BASE rsp
+# if (REGISTER_SAVE_AREA % 16) != 8
+# error REGISTER_SAVE_AREA must be odd multiple of 8
+# endif
+#endif
diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S
index b2e7e0f69b..87c5137837 100644
--- a/sysdeps/x86_64/dl-trampoline.S
+++ b/sysdeps/x86_64/dl-trampoline.S
@@ -22,25 +22,7 @@
#include <features-offsets.h>
#include <link-defines.h>
#include <isa-level.h>
-
-#ifndef DL_STACK_ALIGNMENT
-/* Due to GCC bug:
-
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
-
- __tls_get_addr may be called with 8-byte stack alignment. Although
- this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
- that stack will be always aligned at 16 bytes. We use unaligned
- 16-byte move to load and store SSE registers, which has no penalty
- on modern processors if stack is 16-byte aligned. */
-# define DL_STACK_ALIGNMENT 8
-#endif
-
-/* True if _dl_runtime_resolve should align stack for STATE_SAVE or align
- stack to 16 bytes before calling _dl_fixup. */
-#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
- (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
- || 16 > DL_STACK_ALIGNMENT)
+#include "dl-trampoline-save.h"
/* Area on stack to save and restore registers used for parameter
passing when calling _dl_fixup. */
diff --git a/sysdeps/x86_64/dl-trampoline.h b/sysdeps/x86_64/dl-trampoline.h
index f55c6ea040..d9ccfb40d4 100644
--- a/sysdeps/x86_64/dl-trampoline.h
+++ b/sysdeps/x86_64/dl-trampoline.h
@@ -27,39 +27,7 @@
# undef LOCAL_STORAGE_AREA
# undef BASE
-# if (STATE_SAVE_ALIGNMENT % 16) != 0
-# error STATE_SAVE_ALIGNMENT must be multiple of 16
-# endif
-
-# if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) != 0
-# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT
-# endif
-
-# if DL_RUNTIME_RESOLVE_REALIGN_STACK
-/* Local stack area before jumping to function address: RBX. */
-# define LOCAL_STORAGE_AREA 8
-# define BASE rbx
-# ifdef USE_FXSAVE
-/* Use fxsave to save XMM registers. */
-# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET)
-# if (REGISTER_SAVE_AREA % 16) != 0
-# error REGISTER_SAVE_AREA must be multiple of 16
-# endif
-# endif
-# else
-# ifndef USE_FXSAVE
-# error USE_FXSAVE must be defined
-# endif
-/* Use fxsave to save XMM registers. */
-# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8)
-/* Local stack area before jumping to function address: All saved
- registers. */
-# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA
-# define BASE rsp
-# if (REGISTER_SAVE_AREA % 16) != 8
-# error REGISTER_SAVE_AREA must be odd multiple of 8
-# endif
-# endif
+# include "dl-trampoline-state.h"
.globl _dl_runtime_resolve
.hidden _dl_runtime_resolve
--
2.43.2
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v10 1/1] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers
2024-02-26 14:37 ` [PATCH v10 1/1] " H.J. Lu
@ 2024-02-26 19:02 ` Noah Goldstein
2024-02-26 19:02 ` Noah Goldstein
0 siblings, 1 reply; 7+ messages in thread
From: Noah Goldstein @ 2024-02-26 19:02 UTC (permalink / raw)
To: H.J. Lu; +Cc: libc-alpha, adhemerval.zanella
On Mon, Feb 26, 2024 at 8:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> Compiler generates the following instruction sequence for GNU2 dynamic
> TLS access:
>
> leaq tls_var@TLSDESC(%rip), %rax
> call *tls_var@TLSCALL(%rax)
>
> or
>
> leal tls_var@TLSDESC(%ebx), %eax
> call *tls_var@TLSCALL(%eax)
>
> CALL instruction is transparent to compiler which assumes all registers,
> except for EFLAGS and RAX/EAX, are unchanged after CALL. When
> _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
> path. __tls_get_addr is a normal function which doesn't preserve any
> caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer
> caller-saved registers, but didn't preserve any other caller-saved
> registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
> XSAVE and XSAVEC to save and restore all caller-saved registers. This
> fixes BZ #31372.
>
> Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
> to optimize elf_machine_runtime_setup.
> ---
> elf/Makefile | 18 ++
> elf/tst-gnu2-tls2.c | 122 ++++++++++++
> elf/tst-gnu2-tls2.h | 36 ++++
> elf/tst-gnu2-tls2mod0.c | 31 +++
> elf/tst-gnu2-tls2mod1.c | 31 +++
> elf/tst-gnu2-tls2mod2.c | 31 +++
> sysdeps/i386/dl-machine.h | 2 +-
> sysdeps/i386/dl-tlsdesc-dynamic.h | 190 +++++++++++++++++++
> sysdeps/i386/dl-tlsdesc.S | 115 +++++------
> sysdeps/x86/Makefile | 7 +-
> sysdeps/x86/cpu-features.c | 56 +++++-
> sysdeps/x86/dl-procinfo.c | 16 ++
> sysdeps/{x86_64 => x86}/features-offsets.sym | 2 +
> sysdeps/x86/sysdep.h | 6 +
> sysdeps/x86/tst-gnu2-tls2.c | 20 ++
> sysdeps/x86_64/Makefile | 2 +-
> sysdeps/x86_64/dl-machine.h | 19 +-
> sysdeps/x86_64/dl-procinfo.c | 16 ++
> sysdeps/x86_64/dl-tlsdesc-dynamic.h | 166 ++++++++++++++++
> sysdeps/x86_64/dl-tlsdesc.S | 108 ++++-------
> sysdeps/x86_64/dl-trampoline-save.h | 34 ++++
> sysdeps/x86_64/dl-trampoline-state.h | 51 +++++
> sysdeps/x86_64/dl-trampoline.S | 20 +-
> sysdeps/x86_64/dl-trampoline.h | 34 +---
> 24 files changed, 920 insertions(+), 213 deletions(-)
> create mode 100644 elf/tst-gnu2-tls2.c
> create mode 100644 elf/tst-gnu2-tls2.h
> create mode 100644 elf/tst-gnu2-tls2mod0.c
> create mode 100644 elf/tst-gnu2-tls2mod1.c
> create mode 100644 elf/tst-gnu2-tls2mod2.c
> create mode 100644 sysdeps/i386/dl-tlsdesc-dynamic.h
> rename sysdeps/{x86_64 => x86}/features-offsets.sym (89%)
> create mode 100644 sysdeps/x86/tst-gnu2-tls2.c
> create mode 100644 sysdeps/x86_64/dl-tlsdesc-dynamic.h
> create mode 100644 sysdeps/x86_64/dl-trampoline-save.h
> create mode 100644 sysdeps/x86_64/dl-trampoline-state.h
>
> diff --git a/elf/Makefile b/elf/Makefile
> index 36c04baf02..02dc476e27 100644
> --- a/elf/Makefile
> +++ b/elf/Makefile
> @@ -424,6 +424,7 @@ tests += \
> tst-glibc-hwcaps-prepend \
> tst-global1 \
> tst-global2 \
> + tst-gnu2-tls2 \
> tst-initfinilazyfail \
> tst-initorder \
> tst-initorder2 \
> @@ -846,6 +847,9 @@ modules-names += \
> tst-filterobj-flt \
> tst-finilazyfailmod \
> tst-globalmod2 \
> + tst-gnu2-tls2mod0 \
> + tst-gnu2-tls2mod1 \
> + tst-gnu2-tls2mod2 \
> tst-initlazyfailmod \
> tst-initorder2a \
> tst-initorder2b \
> @@ -3044,8 +3048,22 @@ $(objpfx)tst-tlsgap.out: \
> $(objpfx)tst-tlsgap-mod0.so \
> $(objpfx)tst-tlsgap-mod1.so \
> $(objpfx)tst-tlsgap-mod2.so
> +
> +$(objpfx)tst-gnu2-tls2: $(shared-thread-library)
> +$(objpfx)tst-gnu2-tls2.out: \
> + $(objpfx)tst-gnu2-tls2mod0.so \
> + $(objpfx)tst-gnu2-tls2mod1.so \
> + $(objpfx)tst-gnu2-tls2mod2.so
> +
> ifeq (yes,$(have-mtls-dialect-gnu2))
> +# This test fails if dl_tlsdesc_dynamic doesn't preserve all caller-saved
> +# registers. See https://sourceware.org/bugzilla/show_bug.cgi?id=31372
> +test-xfail-tst-gnu2-tls2 = yes
> +
> CFLAGS-tst-tlsgap-mod0.c += -mtls-dialect=gnu2
> CFLAGS-tst-tlsgap-mod1.c += -mtls-dialect=gnu2
> CFLAGS-tst-tlsgap-mod2.c += -mtls-dialect=gnu2
> +CFLAGS-tst-gnu2-tls2mod0.c += -mtls-dialect=gnu2
> +CFLAGS-tst-gnu2-tls2mod1.c += -mtls-dialect=gnu2
> +CFLAGS-tst-gnu2-tls2mod2.c += -mtls-dialect=gnu2
> endif
> diff --git a/elf/tst-gnu2-tls2.c b/elf/tst-gnu2-tls2.c
> new file mode 100644
> index 0000000000..7ac04d7f33
> --- /dev/null
> +++ b/elf/tst-gnu2-tls2.c
> @@ -0,0 +1,122 @@
> +/* Test TLSDESC relocation.
> + Copyright (C) 2024 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <dlfcn.h>
> +#include <pthread.h>
> +#include <support/xdlfcn.h>
> +#include <support/xthread.h>
> +#include <support/check.h>
> +#include <support/test-driver.h>
> +#include "tst-gnu2-tls2.h"
> +
> +#ifndef IS_SUPPORTED
> +# define IS_SUPPORTED() true
> +#endif
> +
> +/* An architecture can define it to clobber caller-saved registers in
> + malloc below to verify that the implicit TLSDESC call won't change
> + caller-saved registers. */
> +#ifndef PREPARE_MALLOC
> +# define PREPARE_MALLOC()
> +#endif
> +
> +extern void * __libc_malloc (size_t);
> +
> +size_t malloc_counter = 0;
> +
> +void *
> +malloc (size_t n)
> +{
> + PREPARE_MALLOC ();
> + malloc_counter++;
> + return __libc_malloc (n);
> +}
> +
> +static void *mod[3];
> +#ifndef MOD
> +# define MOD(i) "tst-gnu2-tls2mod" #i ".so"
> +#endif
> +static const char *modname[3] = { MOD(0), MOD(1), MOD(2) };
> +#undef MOD
> +
> +static void
> +open_mod (int i)
> +{
> + mod[i] = xdlopen (modname[i], RTLD_LAZY);
> + printf ("open %s\n", modname[i]);
> +}
> +
> +static void
> +close_mod (int i)
> +{
> + xdlclose (mod[i]);
> + mod[i] = NULL;
> + printf ("close %s\n", modname[i]);
> +}
> +
> +static void
> +access_mod (int i, const char *sym)
> +{
> + struct tls var = { -1, -1, -1, -1 };
> + struct tls *(*f) (struct tls *) = xdlsym (mod[i], sym);
> + /* Check that our malloc is called. */
> + malloc_counter = 0;
> + struct tls *p = f (&var);
> + TEST_VERIFY (malloc_counter != 0);
> + printf ("access %s: %s() = %p\n", modname[i], sym, p);
> + TEST_VERIFY_EXIT (memcmp (p, &var, sizeof (var)) == 0);
> + ++(p->a);
> +}
> +
> +static void *
> +start (void *arg)
> +{
> + /* The DTV generation is at the last dlopen of mod0 and the
> + entry for mod1 is NULL. */
> +
> + open_mod (1); /* Reuse modid of mod1. Uses dynamic TLS. */
> +
> + /* Force the slow path in GNU2 TLS descriptor call. */
> + access_mod (1, "apply_tls");
> +
> + return arg;
> +}
> +
> +static int
> +do_test (void)
> +{
> + if (!IS_SUPPORTED ())
> + return EXIT_UNSUPPORTED;
> +
> + open_mod (0);
> + open_mod (1);
> + open_mod (2);
> + close_mod (0);
> + close_mod (1); /* Create modid gap at mod1. */
> + open_mod (0); /* Reuse modid of mod0, bump generation count. */
> +
> + /* Create a thread where DTV of mod1 is NULL. */
> + pthread_t t = xpthread_create (NULL, start, NULL);
> + xpthread_join (t);
> + return 0;
> +}
> +
> +#include <support/test-driver.c>
> diff --git a/elf/tst-gnu2-tls2.h b/elf/tst-gnu2-tls2.h
> new file mode 100644
> index 0000000000..77964a57a3
> --- /dev/null
> +++ b/elf/tst-gnu2-tls2.h
> @@ -0,0 +1,36 @@
> +/* Test TLSDESC relocation.
> + Copyright (C) 2024 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#include <stdint.h>
> +
> +struct tls
> +{
> + int64_t a, b, c, d;
> +};
> +
> +extern struct tls *apply_tls (struct tls *);
> +
> +/* An architecture can define them to verify that clobber caller-saved
> + registers aren't changed by the implicit TLSDESC call. */
> +#ifndef BEFORE_TLSDESC_CALL
> +# define BEFORE_TLSDESC_CALL()
> +#endif
> +
> +#ifndef AFTER_TLSDESC_CALL
> +# define AFTER_TLSDESC_CALL()
> +#endif
> diff --git a/elf/tst-gnu2-tls2mod0.c b/elf/tst-gnu2-tls2mod0.c
> new file mode 100644
> index 0000000000..45556a0e17
> --- /dev/null
> +++ b/elf/tst-gnu2-tls2mod0.c
> @@ -0,0 +1,31 @@
> +/* DSO used by tst-gnu2-tls2.
> + Copyright (C) 2024 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#include "tst-gnu2-tls2.h"
> +
> +__thread struct tls tls_var0 __attribute__ ((visibility ("hidden")));
> +
> +struct tls *
> +apply_tls (struct tls *p)
> +{
> + BEFORE_TLSDESC_CALL ();
> + tls_var0 = *p;
> + struct tls *ret = &tls_var0;
> + AFTER_TLSDESC_CALL ();
> + return ret;
> +}
> diff --git a/elf/tst-gnu2-tls2mod1.c b/elf/tst-gnu2-tls2mod1.c
> new file mode 100644
> index 0000000000..e10b9dbc0a
> --- /dev/null
> +++ b/elf/tst-gnu2-tls2mod1.c
> @@ -0,0 +1,31 @@
> +/* DSO used by tst-gnu2-tls2.
> + Copyright (C) 2024 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#include "tst-gnu2-tls2.h"
> +
> +__thread struct tls tls_var1[100] __attribute__ ((visibility ("hidden")));
> +
> +struct tls *
> +apply_tls (struct tls *p)
> +{
> + BEFORE_TLSDESC_CALL ();
> + tls_var1[1] = *p;
> + struct tls *ret = &tls_var1[1];
> + AFTER_TLSDESC_CALL ();
> + return ret;
> +}
> diff --git a/elf/tst-gnu2-tls2mod2.c b/elf/tst-gnu2-tls2mod2.c
> new file mode 100644
> index 0000000000..141af51e55
> --- /dev/null
> +++ b/elf/tst-gnu2-tls2mod2.c
> @@ -0,0 +1,31 @@
> +/* DSO used by tst-gnu2-tls2.
> + Copyright (C) 2024 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#include "tst-gnu2-tls2.h"
> +
> +__thread struct tls tls_var2 __attribute__ ((visibility ("hidden")));
> +
> +struct tls *
> +apply_tls (struct tls *p)
> +{
> + BEFORE_TLSDESC_CALL ();
> + tls_var2 = *p;
> + struct tls *ret = &tls_var2;
> + AFTER_TLSDESC_CALL ();
> + return ret;
> +}
> diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
> index fc1ef96587..50d74fe6e9 100644
> --- a/sysdeps/i386/dl-machine.h
> +++ b/sysdeps/i386/dl-machine.h
> @@ -347,7 +347,7 @@ and creates an unsatisfiable circular dependency.\n",
> {
> td->arg = _dl_make_tlsdesc_dynamic
> (sym_map, sym->st_value + (ElfW(Word))td->arg);
> - td->entry = _dl_tlsdesc_dynamic;
> + td->entry = GLRO(dl_x86_tlsdesc_dynamic);
> }
> else
> # endif
> diff --git a/sysdeps/i386/dl-tlsdesc-dynamic.h b/sysdeps/i386/dl-tlsdesc-dynamic.h
> new file mode 100644
> index 0000000000..3627028577
> --- /dev/null
> +++ b/sysdeps/i386/dl-tlsdesc-dynamic.h
> @@ -0,0 +1,190 @@
> +/* Thread-local storage handling in the ELF dynamic linker. i386 version.
> + Copyright (C) 2004-2024 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#undef REGISTER_SAVE_AREA
> +
> +#if !defined USE_FNSAVE && (STATE_SAVE_ALIGNMENT % 16) != 0
> +# error STATE_SAVE_ALIGNMENT must be multiple of 16
> +#endif
> +
> +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> +# ifdef USE_FNSAVE
> +# error USE_FNSAVE shouldn't be defined
> +# endif
> +# ifdef USE_FXSAVE
> +/* Use fxsave to save all registers. */
> +# define REGISTER_SAVE_AREA 512
> +# endif
> +#else
> +# ifdef USE_FNSAVE
> +/* Use fnsave to save x87 FPU stack registers. */
> +# define REGISTER_SAVE_AREA 108
> +# else
> +# ifndef USE_FXSAVE
> +# error USE_FXSAVE must be defined
> +# endif
> +/* Use fxsave to save all registers. Add 12 bytes to align the stack
> + to 16 bytes. */
> +# define REGISTER_SAVE_AREA (512 + 12)
> +# endif
> +#endif
> +
> + .hidden _dl_tlsdesc_dynamic
> + .global _dl_tlsdesc_dynamic
> + .type _dl_tlsdesc_dynamic,@function
> +
> + /* This function is used for symbols that need dynamic TLS.
> +
> + %eax points to the TLS descriptor, such that 0(%eax) points to
> + _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
> + tlsdesc_dynamic_arg object. It must return in %eax the offset
> + between the thread pointer and the object denoted by the
> + argument, without clobbering any registers.
> +
> + The assembly code that follows is a rendition of the following
> + C code, hand-optimized a little bit.
> +
> +ptrdiff_t
> +__attribute__ ((__regparm__ (1)))
> +_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
> +{
> + struct tlsdesc_dynamic_arg *td = tdp->arg;
> + dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> + if (__builtin_expect (td->gen_count <= dtv[0].counter
> + && (dtv[td->tlsinfo.ti_module].pointer.val
> + != TLS_DTV_UNALLOCATED),
> + 1))
> + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> + - __thread_pointer;
> +
> + return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
> +}
> +*/
> + cfi_startproc
> + .align 16
> +_dl_tlsdesc_dynamic:
> + /* Like all TLS resolvers, preserve call-clobbered registers.
> + We need two scratch regs anyway. */
> + subl $32, %esp
> + cfi_adjust_cfa_offset (32)
> + movl %ecx, 20(%esp)
> + movl %edx, 24(%esp)
> + movl TLSDESC_ARG(%eax), %eax
> + movl %gs:DTV_OFFSET, %edx
> + movl TLSDESC_GEN_COUNT(%eax), %ecx
> + cmpl (%edx), %ecx
> + ja 2f
> + movl TLSDESC_MODID(%eax), %ecx
> + movl (%edx,%ecx,8), %edx
> + cmpl $-1, %edx
> + je 2f
> + movl TLSDESC_MODOFF(%eax), %eax
> + addl %edx, %eax
> +1:
> + movl 20(%esp), %ecx
> + subl %gs:0, %eax
> + movl 24(%esp), %edx
> + addl $32, %esp
> + cfi_adjust_cfa_offset (-32)
> + ret
> + .p2align 4,,7
> +2:
> + cfi_adjust_cfa_offset (32)
> +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> + movl %ebx, -28(%esp)
> + movl %esp, %ebx
> + cfi_def_cfa_register(%ebx)
> + and $-STATE_SAVE_ALIGNMENT, %esp
> +#endif
> +#ifdef REGISTER_SAVE_AREA
> + subl $REGISTER_SAVE_AREA, %esp
> +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
> + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
> +# endif
> +#else
> +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
> +# error DL_RUNTIME_RESOLVE_REALIGN_STACK must be true
> +# endif
> + /* Allocate stack space of the required size to save the state. */
> + LOAD_PIC_REG (cx)
> + subl RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET+_rtld_local_ro@GOTOFF(%ecx), %esp
> +#endif
> +#ifdef USE_FNSAVE
> + fnsave (%esp)
> +#elif defined USE_FXSAVE
> + fxsave (%esp)
> +#else
> + /* Save the argument for ___tls_get_addr in EAX. */
> + movl %eax, %ecx
> + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> + xorl %edx, %edx
> + /* Clear the XSAVE Header. */
> +# ifdef USE_XSAVE
> + movl %edx, (512)(%esp)
> + movl %edx, (512 + 4 * 1)(%esp)
> + movl %edx, (512 + 4 * 2)(%esp)
> + movl %edx, (512 + 4 * 3)(%esp)
> +# endif
> + movl %edx, (512 + 4 * 4)(%esp)
> + movl %edx, (512 + 4 * 5)(%esp)
> + movl %edx, (512 + 4 * 6)(%esp)
> + movl %edx, (512 + 4 * 7)(%esp)
> + movl %edx, (512 + 4 * 8)(%esp)
> + movl %edx, (512 + 4 * 9)(%esp)
> + movl %edx, (512 + 4 * 10)(%esp)
> + movl %edx, (512 + 4 * 11)(%esp)
> + movl %edx, (512 + 4 * 12)(%esp)
> + movl %edx, (512 + 4 * 13)(%esp)
> + movl %edx, (512 + 4 * 14)(%esp)
> + movl %edx, (512 + 4 * 15)(%esp)
> +# ifdef USE_XSAVE
> + xsave (%esp)
> +# else
> + xsavec (%esp)
> +# endif
> + /* Restore the argument for ___tls_get_addr in EAX. */
> + movl %ecx, %eax
> +#endif
> + call HIDDEN_JUMPTARGET (___tls_get_addr)
> + /* Get register content back. */
> +#ifdef USE_FNSAVE
> + frstor (%esp)
> +#elif defined USE_FXSAVE
> + fxrstor (%esp)
> +#else
> + /* Save and retore ___tls_get_addr return value stored in EAX. */
> + movl %eax, %ecx
> + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> + xorl %edx, %edx
> + xrstor (%esp)
> + movl %ecx, %eax
> +#endif
> +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> + mov %ebx, %esp
> + cfi_def_cfa_register(%esp)
> + movl -28(%esp), %ebx
> + cfi_restore(%ebx)
> +#else
> + addl $REGISTER_SAVE_AREA, %esp
> + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA)
> +#endif
> + jmp 1b
> + cfi_endproc
> + .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> +
> +#undef STATE_SAVE_ALIGNMENT
> diff --git a/sysdeps/i386/dl-tlsdesc.S b/sysdeps/i386/dl-tlsdesc.S
> index 90d93caa0c..f002feee56 100644
> --- a/sysdeps/i386/dl-tlsdesc.S
> +++ b/sysdeps/i386/dl-tlsdesc.S
> @@ -18,8 +18,27 @@
>
> #include <sysdep.h>
> #include <tls.h>
> +#include <cpu-features-offsets.h>
> +#include <features-offsets.h>
> #include "tlsdesc.h"
>
> +#ifndef DL_STACK_ALIGNMENT
> +/* Due to GCC bug:
> +
> + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> +
> + __tls_get_addr may be called with 4-byte stack alignment. Although
> + this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> + that stack will be always aligned at 16 bytes. */
> +# define DL_STACK_ALIGNMENT 4
> +#endif
> +
> +/* True if _dl_tlsdesc_dynamic should align stack for STATE_SAVE or align
> + stack to MINIMUM_ALIGNMENT bytes before calling ___tls_get_addr. */
> +#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> + (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> + || MINIMUM_ALIGNMENT > DL_STACK_ALIGNMENT)
> +
> .text
>
> /* This function is used to compute the TP offset for symbols in
> @@ -65,69 +84,35 @@ _dl_tlsdesc_undefweak:
> .size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
>
> #ifdef SHARED
> - .hidden _dl_tlsdesc_dynamic
> - .global _dl_tlsdesc_dynamic
> - .type _dl_tlsdesc_dynamic,@function
> -
> - /* This function is used for symbols that need dynamic TLS.
> -
> - %eax points to the TLS descriptor, such that 0(%eax) points to
> - _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
> - tlsdesc_dynamic_arg object. It must return in %eax the offset
> - between the thread pointer and the object denoted by the
> - argument, without clobbering any registers.
> -
> - The assembly code that follows is a rendition of the following
> - C code, hand-optimized a little bit.
> -
> -ptrdiff_t
> -__attribute__ ((__regparm__ (1)))
> -_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
> -{
> - struct tlsdesc_dynamic_arg *td = tdp->arg;
> - dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> - if (__builtin_expect (td->gen_count <= dtv[0].counter
> - && (dtv[td->tlsinfo.ti_module].pointer.val
> - != TLS_DTV_UNALLOCATED),
> - 1))
> - return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> - - __thread_pointer;
> -
> - return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
> -}
> -*/
> - cfi_startproc
> - .align 16
> -_dl_tlsdesc_dynamic:
> - /* Like all TLS resolvers, preserve call-clobbered registers.
> - We need two scratch regs anyway. */
> - subl $28, %esp
> - cfi_adjust_cfa_offset (28)
> - movl %ecx, 20(%esp)
> - movl %edx, 24(%esp)
> - movl TLSDESC_ARG(%eax), %eax
> - movl %gs:DTV_OFFSET, %edx
> - movl TLSDESC_GEN_COUNT(%eax), %ecx
> - cmpl (%edx), %ecx
> - ja .Lslow
> - movl TLSDESC_MODID(%eax), %ecx
> - movl (%edx,%ecx,8), %edx
> - cmpl $-1, %edx
> - je .Lslow
> - movl TLSDESC_MODOFF(%eax), %eax
> - addl %edx, %eax
> -.Lret:
> - movl 20(%esp), %ecx
> - subl %gs:0, %eax
> - movl 24(%esp), %edx
> - addl $28, %esp
> - cfi_adjust_cfa_offset (-28)
> - ret
> - .p2align 4,,7
> -.Lslow:
> - cfi_adjust_cfa_offset (28)
> - call HIDDEN_JUMPTARGET (___tls_get_addr)
> - jmp .Lret
> - cfi_endproc
> - .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> +# define USE_FNSAVE
> +# define MINIMUM_ALIGNMENT 4
> +# define STATE_SAVE_ALIGNMENT 4
> +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fnsave
> +# include "dl-tlsdesc-dynamic.h"
> +# undef _dl_tlsdesc_dynamic
> +# undef MINIMUM_ALIGNMENT
> +# undef USE_FNSAVE
> +
> +# define MINIMUM_ALIGNMENT 16
> +
> +# define USE_FXSAVE
> +# define STATE_SAVE_ALIGNMENT 16
> +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave
> +# include "dl-tlsdesc-dynamic.h"
> +# undef _dl_tlsdesc_dynamic
> +# undef USE_FXSAVE
> +
> +# define USE_XSAVE
> +# define STATE_SAVE_ALIGNMENT 64
> +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave
> +# include "dl-tlsdesc-dynamic.h"
> +# undef _dl_tlsdesc_dynamic
> +# undef USE_XSAVE
> +
> +# define USE_XSAVEC
> +# define STATE_SAVE_ALIGNMENT 64
> +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec
> +# include "dl-tlsdesc-dynamic.h"
> +# undef _dl_tlsdesc_dynamic
> +# undef USE_XSAVEC
> #endif /* SHARED */
> diff --git a/sysdeps/x86/Makefile b/sysdeps/x86/Makefile
> index 73b29cc78c..5311b594af 100644
> --- a/sysdeps/x86/Makefile
> +++ b/sysdeps/x86/Makefile
> @@ -1,5 +1,5 @@
> ifeq ($(subdir),csu)
> -gen-as-const-headers += cpu-features-offsets.sym
> +gen-as-const-headers += cpu-features-offsets.sym features-offsets.sym
> endif
>
> ifeq ($(subdir),elf)
> @@ -86,6 +86,11 @@ endif
> tst-ifunc-isa-2-ENV = GLIBC_TUNABLES=glibc.cpu.hwcaps=-SSE4_2,-AVX,-AVX2,-AVX512F
> tst-ifunc-isa-2-static-ENV = $(tst-ifunc-isa-2-ENV)
> tst-hwcap-tunables-ARGS = -- $(host-test-program-cmd)
> +
> +CFLAGS-tst-gnu2-tls2.c += -msse
> +CFLAGS-tst-gnu2-tls2mod0.c += -msse2 -mtune=haswell
> +CFLAGS-tst-gnu2-tls2mod1.c += -msse2 -mtune=haswell
> +CFLAGS-tst-gnu2-tls2mod2.c += -msse2 -mtune=haswell
> endif
>
> ifeq ($(subdir),math)
> diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> index 25e6622a79..835113b42f 100644
> --- a/sysdeps/x86/cpu-features.c
> +++ b/sysdeps/x86/cpu-features.c
> @@ -27,8 +27,13 @@
> extern void TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *)
> attribute_hidden;
>
> -#if defined SHARED && defined __x86_64__
> -# include <dl-plt-rewrite.h>
> +#if defined SHARED
> +extern void _dl_tlsdesc_dynamic_fxsave (void) attribute_hidden;
> +extern void _dl_tlsdesc_dynamic_xsave (void) attribute_hidden;
> +extern void _dl_tlsdesc_dynamic_xsavec (void) attribute_hidden;
> +
> +# ifdef __x86_64__
> +# include <dl-plt-rewrite.h>
>
> static void
> TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
> @@ -47,6 +52,15 @@ TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
> : plt_rewrite_jmp);
> }
> }
> +# else
> +extern void _dl_tlsdesc_dynamic_fnsave (void) attribute_hidden;
> +# endif
> +#endif
> +
> +#ifdef __x86_64__
> +extern void _dl_runtime_resolve_fxsave (void) attribute_hidden;
> +extern void _dl_runtime_resolve_xsave (void) attribute_hidden;
> +extern void _dl_runtime_resolve_xsavec (void) attribute_hidden;
> #endif
>
> #ifdef __LP64__
> @@ -1130,6 +1144,44 @@ no_cpuid:
> TUNABLE_CALLBACK (set_x86_shstk));
> #endif
>
> + if (GLRO(dl_x86_cpu_features).xsave_state_size != 0)
> + {
> + if (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC))
> + {
> +#ifdef __x86_64__
> + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_xsavec;
> +#endif
> +#ifdef SHARED
> + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_xsavec;
> +#endif
> + }
> + else
> + {
> +#ifdef __x86_64__
> + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_xsave;
> +#endif
> +#ifdef SHARED
> + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_xsave;
> +#endif
> + }
> + }
> + else
> + {
> +#ifdef __x86_64__
> + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_fxsave;
> +# ifdef SHARED
> + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fxsave;
> +# endif
> +#else
> +# ifdef SHARED
> + if (CPU_FEATURE_USABLE_P (cpu_features, FXSR))
> + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fxsave;
> + else
> + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fnsave;
> +# endif
> +#endif
> + }
> +
> #ifdef SHARED
> # ifdef __x86_64__
> TUNABLE_GET (plt_rewrite, tunable_val_t *,
> diff --git a/sysdeps/x86/dl-procinfo.c b/sysdeps/x86/dl-procinfo.c
> index ee957b4d70..5920d4b320 100644
> --- a/sysdeps/x86/dl-procinfo.c
> +++ b/sysdeps/x86/dl-procinfo.c
> @@ -86,3 +86,19 @@ PROCINFO_CLASS const char _dl_x86_platforms[4][9]
> #else
> ,
> #endif
> +
> +#if defined SHARED && !IS_IN (ldconfig)
> +# if !defined PROCINFO_DECL
> + ._dl_x86_tlsdesc_dynamic
> +# else
> +PROCINFO_CLASS void * _dl_x86_tlsdesc_dynamic
> +# endif
> +# ifndef PROCINFO_DECL
> += NULL
> +# endif
> +# ifdef PROCINFO_DECL
> +;
> +# else
> +,
> +# endif
> +#endif
> diff --git a/sysdeps/x86_64/features-offsets.sym b/sysdeps/x86/features-offsets.sym
> similarity index 89%
> rename from sysdeps/x86_64/features-offsets.sym
> rename to sysdeps/x86/features-offsets.sym
> index 9e4be3393a..77e990c705 100644
> --- a/sysdeps/x86_64/features-offsets.sym
> +++ b/sysdeps/x86/features-offsets.sym
> @@ -3,4 +3,6 @@
> #include <ldsodefs.h>
>
> RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET offsetof (struct rtld_global_ro, _dl_x86_cpu_features)
> +#ifdef __x86_64__
> RTLD_GLOBAL_DL_X86_FEATURE_1_OFFSET offsetof (struct rtld_global, _dl_x86_feature_1)
> +#endif
> diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h
> index 837fd28734..485cad9c02 100644
> --- a/sysdeps/x86/sysdep.h
> +++ b/sysdeps/x86/sysdep.h
> @@ -70,6 +70,12 @@
> | (1 << X86_XSTATE_ZMM_H_ID))
> #endif
>
> +/* States which should be saved for TLSDESC_CALL and TLS_DESC_CALL.
> + Compiler assumes that all registers, including x87 FPU stack registers,
> + are unchanged after CALL, except for EFLAGS and RAX/EAX. */
> +#define TLSDESC_CALL_STATE_SAVE_MASK \
> + (STATE_SAVE_MASK | (1 << X86_XSTATE_X87_ID))
> +
> /* Constants for bits in __x86_string_control: */
>
> /* Avoid short distance REP MOVSB. */
> diff --git a/sysdeps/x86/tst-gnu2-tls2.c b/sysdeps/x86/tst-gnu2-tls2.c
> new file mode 100644
> index 0000000000..de900a423b
> --- /dev/null
> +++ b/sysdeps/x86/tst-gnu2-tls2.c
> @@ -0,0 +1,20 @@
> +#ifndef __x86_64__
> +#include <sys/platform/x86.h>
> +
> +#define IS_SUPPORTED() CPU_FEATURE_ACTIVE (SSE2)
> +#endif
> +
> +/* Clear XMM0...XMM7 */
> +#define PREPARE_MALLOC() \
> +{ \
> + asm volatile ("xorps %%xmm0, %%xmm0" : : : "xmm0" ); \
> + asm volatile ("xorps %%xmm1, %%xmm1" : : : "xmm1" ); \
> + asm volatile ("xorps %%xmm2, %%xmm2" : : : "xmm2" ); \
> + asm volatile ("xorps %%xmm3, %%xmm3" : : : "xmm3" ); \
> + asm volatile ("xorps %%xmm4, %%xmm4" : : : "xmm4" ); \
> + asm volatile ("xorps %%xmm5, %%xmm5" : : : "xmm5" ); \
> + asm volatile ("xorps %%xmm6, %%xmm6" : : : "xmm6" ); \
> + asm volatile ("xorps %%xmm7, %%xmm7" : : : "xmm7" ); \
> +}
> +
> +#include <elf/tst-gnu2-tls2.c>
> diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
> index 145f25e7f6..9337e95093 100644
> --- a/sysdeps/x86_64/Makefile
> +++ b/sysdeps/x86_64/Makefile
> @@ -10,7 +10,7 @@ LDFLAGS-rtld += -Wl,-z,nomark-plt
> endif
>
> ifeq ($(subdir),csu)
> -gen-as-const-headers += features-offsets.sym link-defines.sym
> +gen-as-const-headers += link-defines.sym
> endif
>
> ifeq ($(subdir),gmon)
> diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> index 6d605d0d32..ff5d45f7cb 100644
> --- a/sysdeps/x86_64/dl-machine.h
> +++ b/sysdeps/x86_64/dl-machine.h
> @@ -71,9 +71,6 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> int lazy, int profile)
> {
> Elf64_Addr *got;
> - extern void _dl_runtime_resolve_fxsave (ElfW(Word)) attribute_hidden;
> - extern void _dl_runtime_resolve_xsave (ElfW(Word)) attribute_hidden;
> - extern void _dl_runtime_resolve_xsavec (ElfW(Word)) attribute_hidden;
> extern void _dl_runtime_profile_sse (ElfW(Word)) attribute_hidden;
> extern void _dl_runtime_profile_avx (ElfW(Word)) attribute_hidden;
> extern void _dl_runtime_profile_avx512 (ElfW(Word)) attribute_hidden;
> @@ -96,8 +93,6 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> /* Identify this shared object. */
> *(ElfW(Addr) *) (got + 1) = (ElfW(Addr)) l;
>
> - const struct cpu_features* cpu_features = __get_cpu_features ();
> -
> #ifdef SHARED
> /* The got[2] entry contains the address of a function which gets
> called to get the address of a so far unresolved function and
> @@ -107,6 +102,7 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> end in this function. */
> if (__glibc_unlikely (profile))
> {
> + const struct cpu_features* cpu_features = __get_cpu_features ();
> if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512F))
> *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx512;
> else if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX))
> @@ -126,15 +122,8 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> /* This function will get called to fix up the GOT entry
> indicated by the offset on the stack, and then jump to
> the resolved address. */
> - if (MINIMUM_X86_ISA_LEVEL >= AVX_X86_ISA_LEVEL
> - || GLRO(dl_x86_cpu_features).xsave_state_size != 0)
> - *(ElfW(Addr) *) (got + 2)
> - = (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC)
> - ? (ElfW(Addr)) &_dl_runtime_resolve_xsavec
> - : (ElfW(Addr)) &_dl_runtime_resolve_xsave);
> - else
> - *(ElfW(Addr) *) (got + 2)
> - = (ElfW(Addr)) &_dl_runtime_resolve_fxsave;
> + *(ElfW(Addr) *) (got + 2)
> + = (ElfW(Addr)) GLRO(dl_x86_64_runtime_resolve);
> }
> }
>
> @@ -383,7 +372,7 @@ and creates an unsatisfiable circular dependency.\n",
> {
> td->arg = _dl_make_tlsdesc_dynamic
> (sym_map, sym->st_value + reloc->r_addend);
> - td->entry = _dl_tlsdesc_dynamic;
> + td->entry = GLRO(dl_x86_tlsdesc_dynamic);
> }
> else
> # endif
> diff --git a/sysdeps/x86_64/dl-procinfo.c b/sysdeps/x86_64/dl-procinfo.c
> index 4d1d790fbb..06637a8154 100644
> --- a/sysdeps/x86_64/dl-procinfo.c
> +++ b/sysdeps/x86_64/dl-procinfo.c
> @@ -41,5 +41,21 @@
>
> #include <sysdeps/x86/dl-procinfo.c>
>
> +#if !IS_IN (ldconfig)
> +# if !defined PROCINFO_DECL && defined SHARED
> + ._dl_x86_64_runtime_resolve
> +# else
> +PROCINFO_CLASS void * _dl_x86_64_runtime_resolve
> +# endif
> +# ifndef PROCINFO_DECL
> += NULL
> +# endif
> +# if !defined SHARED || defined PROCINFO_DECL
> +;
> +# else
> +,
> +# endif
> +#endif
> +
> #undef PROCINFO_DECL
> #undef PROCINFO_CLASS
> diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> new file mode 100644
> index 0000000000..0c2e8d5320
> --- /dev/null
> +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> @@ -0,0 +1,166 @@
> +/* Thread-local storage handling in the ELF dynamic linker. x86_64 version.
> + Copyright (C) 2004-2024 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#ifndef SECTION
> +# define SECTION(p) p
> +#endif
> +
> +#undef REGISTER_SAVE_AREA
> +#undef LOCAL_STORAGE_AREA
> +#undef BASE
> +
> +#include "dl-trampoline-state.h"
> +
> + .section SECTION(.text),"ax",@progbits
> +
> + .hidden _dl_tlsdesc_dynamic
> + .global _dl_tlsdesc_dynamic
> + .type _dl_tlsdesc_dynamic,@function
> +
> + /* %rax points to the TLS descriptor, such that 0(%rax) points to
> + _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
> + tlsdesc_dynamic_arg object. It must return in %rax the offset
> + between the thread pointer and the object denoted by the
> + argument, without clobbering any registers.
> +
> + The assembly code that follows is a rendition of the following
> + C code, hand-optimized a little bit.
> +
> +ptrdiff_t
> +_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
> +{
> + struct tlsdesc_dynamic_arg *td = tdp->arg;
> + dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> + if (__builtin_expect (td->gen_count <= dtv[0].counter
> + && (dtv[td->tlsinfo.ti_module].pointer.val
> + != TLS_DTV_UNALLOCATED),
> + 1))
> + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> + - __thread_pointer;
> +
> + return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
> +}
> +*/
> + cfi_startproc
> + .align 16
> +_dl_tlsdesc_dynamic:
> + _CET_ENDBR
> + /* Preserve call-clobbered registers that we modify.
> + We need two scratch regs anyway. */
> + movq %rsi, -16(%rsp)
> + mov %fs:DTV_OFFSET, %RSI_LP
> + movq %rdi, -8(%rsp)
> + movq TLSDESC_ARG(%rax), %rdi
> + movq (%rsi), %rax
> + cmpq %rax, TLSDESC_GEN_COUNT(%rdi)
> + ja 2f
> + movq TLSDESC_MODID(%rdi), %rax
> + salq $4, %rax
> + movq (%rax,%rsi), %rax
> + cmpq $-1, %rax
> + je 2f
> + addq TLSDESC_MODOFF(%rdi), %rax
> +1:
> + movq -16(%rsp), %rsi
> + sub %fs:0, %RAX_LP
> + movq -8(%rsp), %rdi
> + ret
> +2:
> +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> + movq %rbx, -24(%rsp)
> + mov %RSP_LP, %RBX_LP
> + cfi_def_cfa_register(%rbx)
> + and $-STATE_SAVE_ALIGNMENT, %RSP_LP
> +#endif
> +#ifdef REGISTER_SAVE_AREA
> +# if DL_RUNTIME_RESOLVE_REALIGN_STACK
> + /* STATE_SAVE_OFFSET has space for 8 integer registers. But we
> + need space for RCX, RDX, RSI, RDI, R8, R9, R10 and R11, plus
> + RBX above. */
> + sub $(REGISTER_SAVE_AREA + STATE_SAVE_ALIGNMENT), %RSP_LP
> +# else
> + sub $REGISTER_SAVE_AREA, %RSP_LP
> + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
> +# endif
> +#else
> + /* Allocate stack space of the required size to save the state. */
> + sub _rtld_local_ro+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET(%rip), %RSP_LP
> +#endif
> + /* Besides rdi and rsi, saved above, save rcx, rdx, r8, r9,
> + r10 and r11. */
> + movq %rcx, REGISTER_SAVE_RCX(%rsp)
> + movq %rdx, REGISTER_SAVE_RDX(%rsp)
> + movq %r8, REGISTER_SAVE_R8(%rsp)
> + movq %r9, REGISTER_SAVE_R9(%rsp)
> + movq %r10, REGISTER_SAVE_R10(%rsp)
> + movq %r11, REGISTER_SAVE_R11(%rsp)
> +#ifdef USE_FXSAVE
> + fxsave STATE_SAVE_OFFSET(%rsp)
> +#else
> + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> + xorl %edx, %edx
> + /* Clear the XSAVE Header. */
> +# ifdef USE_XSAVE
> + movq %rdx, (STATE_SAVE_OFFSET + 512)(%rsp)
> + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8)(%rsp)
> +# endif
> + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 2)(%rsp)
> + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 3)(%rsp)
> + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 4)(%rsp)
> + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 5)(%rsp)
> + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 6)(%rsp)
> + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 7)(%rsp)
> +# ifdef USE_XSAVE
> + xsave STATE_SAVE_OFFSET(%rsp)
> +# else
> + xsavec STATE_SAVE_OFFSET(%rsp)
> +# endif
> +#endif
> + /* %rdi already points to the tlsinfo data structure. */
> + call HIDDEN_JUMPTARGET (__tls_get_addr)
> + # Get register content back.
> +#ifdef USE_FXSAVE
> + fxrstor STATE_SAVE_OFFSET(%rsp)
> +#else
> + /* Save and retore __tls_get_addr return value stored in RAX. */
> + mov %RAX_LP, %RCX_LP
> + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> + xorl %edx, %edx
> + xrstor STATE_SAVE_OFFSET(%rsp)
> + mov %RCX_LP, %RAX_LP
> +#endif
> + movq REGISTER_SAVE_R11(%rsp), %r11
> + movq REGISTER_SAVE_R10(%rsp), %r10
> + movq REGISTER_SAVE_R9(%rsp), %r9
> + movq REGISTER_SAVE_R8(%rsp), %r8
> + movq REGISTER_SAVE_RDX(%rsp), %rdx
> + movq REGISTER_SAVE_RCX(%rsp), %rcx
> +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> + mov %RBX_LP, %RSP_LP
> + cfi_def_cfa_register(%rsp)
> + movq -24(%rsp), %rbx
> + cfi_restore(%rbx)
> +#else
> + add $REGISTER_SAVE_AREA, %RSP_LP
> + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA)
> +#endif
> + jmp 1b
> + cfi_endproc
> + .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> +
> +#undef STATE_SAVE_ALIGNMENT
> diff --git a/sysdeps/x86_64/dl-tlsdesc.S b/sysdeps/x86_64/dl-tlsdesc.S
> index f748af2ece..ea69f5223a 100644
> --- a/sysdeps/x86_64/dl-tlsdesc.S
> +++ b/sysdeps/x86_64/dl-tlsdesc.S
> @@ -18,7 +18,19 @@
>
> #include <sysdep.h>
> #include <tls.h>
> +#include <cpu-features-offsets.h>
> +#include <features-offsets.h>
> #include "tlsdesc.h"
> +#include "dl-trampoline-save.h"
> +
> +/* Area on stack to save and restore registers used for parameter
> + passing when calling _dl_tlsdesc_dynamic. */
> +#define REGISTER_SAVE_RCX 0
> +#define REGISTER_SAVE_RDX (REGISTER_SAVE_RCX + 8)
> +#define REGISTER_SAVE_R8 (REGISTER_SAVE_RDX + 8)
> +#define REGISTER_SAVE_R9 (REGISTER_SAVE_R8 + 8)
> +#define REGISTER_SAVE_R10 (REGISTER_SAVE_R9 + 8)
> +#define REGISTER_SAVE_R11 (REGISTER_SAVE_R10 + 8)
>
> .text
>
> @@ -67,80 +79,24 @@ _dl_tlsdesc_undefweak:
> .size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
>
> #ifdef SHARED
> - .hidden _dl_tlsdesc_dynamic
> - .global _dl_tlsdesc_dynamic
> - .type _dl_tlsdesc_dynamic,@function
> -
> - /* %rax points to the TLS descriptor, such that 0(%rax) points to
> - _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
> - tlsdesc_dynamic_arg object. It must return in %rax the offset
> - between the thread pointer and the object denoted by the
> - argument, without clobbering any registers.
> -
> - The assembly code that follows is a rendition of the following
> - C code, hand-optimized a little bit.
> -
> -ptrdiff_t
> -_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
> -{
> - struct tlsdesc_dynamic_arg *td = tdp->arg;
> - dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> - if (__builtin_expect (td->gen_count <= dtv[0].counter
> - && (dtv[td->tlsinfo.ti_module].pointer.val
> - != TLS_DTV_UNALLOCATED),
> - 1))
> - return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> - - __thread_pointer;
> -
> - return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
> -}
> -*/
> - cfi_startproc
> - .align 16
> -_dl_tlsdesc_dynamic:
> - _CET_ENDBR
> - /* Preserve call-clobbered registers that we modify.
> - We need two scratch regs anyway. */
> - movq %rsi, -16(%rsp)
> - mov %fs:DTV_OFFSET, %RSI_LP
> - movq %rdi, -8(%rsp)
> - movq TLSDESC_ARG(%rax), %rdi
> - movq (%rsi), %rax
> - cmpq %rax, TLSDESC_GEN_COUNT(%rdi)
> - ja .Lslow
> - movq TLSDESC_MODID(%rdi), %rax
> - salq $4, %rax
> - movq (%rax,%rsi), %rax
> - cmpq $-1, %rax
> - je .Lslow
> - addq TLSDESC_MODOFF(%rdi), %rax
> -.Lret:
> - movq -16(%rsp), %rsi
> - sub %fs:0, %RAX_LP
> - movq -8(%rsp), %rdi
> - ret
> -.Lslow:
> - /* Besides rdi and rsi, saved above, save rdx, rcx, r8, r9,
> - r10 and r11. Also, align the stack, that's off by 8 bytes. */
> - subq $72, %rsp
> - cfi_adjust_cfa_offset (72)
> - movq %rdx, 8(%rsp)
> - movq %rcx, 16(%rsp)
> - movq %r8, 24(%rsp)
> - movq %r9, 32(%rsp)
> - movq %r10, 40(%rsp)
> - movq %r11, 48(%rsp)
> - /* %rdi already points to the tlsinfo data structure. */
> - call HIDDEN_JUMPTARGET (__tls_get_addr)
> - movq 8(%rsp), %rdx
> - movq 16(%rsp), %rcx
> - movq 24(%rsp), %r8
> - movq 32(%rsp), %r9
> - movq 40(%rsp), %r10
> - movq 48(%rsp), %r11
> - addq $72, %rsp
> - cfi_adjust_cfa_offset (-72)
> - jmp .Lret
> - cfi_endproc
> - .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> +# define USE_FXSAVE
> +# define STATE_SAVE_ALIGNMENT 16
> +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave
> +# include "dl-tlsdesc-dynamic.h"
> +# undef _dl_tlsdesc_dynamic
> +# undef USE_FXSAVE
> +
> +# define USE_XSAVE
> +# define STATE_SAVE_ALIGNMENT 64
> +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave
> +# include "dl-tlsdesc-dynamic.h"
> +# undef _dl_tlsdesc_dynamic
> +# undef USE_XSAVE
> +
> +# define USE_XSAVEC
> +# define STATE_SAVE_ALIGNMENT 64
> +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec
> +# include "dl-tlsdesc-dynamic.h"
> +# undef _dl_tlsdesc_dynamic
> +# undef USE_XSAVEC
> #endif /* SHARED */
> diff --git a/sysdeps/x86_64/dl-trampoline-save.h b/sysdeps/x86_64/dl-trampoline-save.h
> new file mode 100644
> index 0000000000..84eac4a8ac
> --- /dev/null
> +++ b/sysdeps/x86_64/dl-trampoline-save.h
> @@ -0,0 +1,34 @@
> +/* x86-64 PLT trampoline register save macros.
> + Copyright (C) 2024 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#ifndef DL_STACK_ALIGNMENT
> +/* Due to GCC bug:
> +
> + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> +
> + __tls_get_addr may be called with 8-byte stack alignment. Although
> + this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> + that stack will be always aligned at 16 bytes. */
> +# define DL_STACK_ALIGNMENT 8
> +#endif
> +
> +/* True if _dl_runtime_resolve should align stack for STATE_SAVE or align
> + stack to 16 bytes before calling _dl_fixup. */
> +#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> + (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> + || 16 > DL_STACK_ALIGNMENT)
> diff --git a/sysdeps/x86_64/dl-trampoline-state.h b/sysdeps/x86_64/dl-trampoline-state.h
> new file mode 100644
> index 0000000000..575f120797
> --- /dev/null
> +++ b/sysdeps/x86_64/dl-trampoline-state.h
> @@ -0,0 +1,51 @@
> +/* x86-64 PLT dl-trampoline state macros.
> + Copyright (C) 2024 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <https://www.gnu.org/licenses/>. */
> +
> +#if (STATE_SAVE_ALIGNMENT % 16) != 0
> +# error STATE_SAVE_ALIGNMENT must be multiple of 16
> +#endif
> +
> +#if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) != 0
> +# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT
> +#endif
> +
> +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> +/* Local stack area before jumping to function address: RBX. */
> +# define LOCAL_STORAGE_AREA 8
> +# define BASE rbx
> +# ifdef USE_FXSAVE
> +/* Use fxsave to save XMM registers. */
> +# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET)
> +# if (REGISTER_SAVE_AREA % 16) != 0
> +# error REGISTER_SAVE_AREA must be multiple of 16
> +# endif
> +# endif
> +#else
> +# ifndef USE_FXSAVE
> +# error USE_FXSAVE must be defined
> +# endif
> +/* Use fxsave to save XMM registers. */
> +# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8)
> +/* Local stack area before jumping to function address: All saved
> + registers. */
> +# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA
> +# define BASE rsp
> +# if (REGISTER_SAVE_AREA % 16) != 8
> +# error REGISTER_SAVE_AREA must be odd multiple of 8
> +# endif
> +#endif
> diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S
> index b2e7e0f69b..87c5137837 100644
> --- a/sysdeps/x86_64/dl-trampoline.S
> +++ b/sysdeps/x86_64/dl-trampoline.S
> @@ -22,25 +22,7 @@
> #include <features-offsets.h>
> #include <link-defines.h>
> #include <isa-level.h>
> -
> -#ifndef DL_STACK_ALIGNMENT
> -/* Due to GCC bug:
> -
> - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> -
> - __tls_get_addr may be called with 8-byte stack alignment. Although
> - this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> - that stack will be always aligned at 16 bytes. We use unaligned
> - 16-byte move to load and store SSE registers, which has no penalty
> - on modern processors if stack is 16-byte aligned. */
> -# define DL_STACK_ALIGNMENT 8
> -#endif
> -
> -/* True if _dl_runtime_resolve should align stack for STATE_SAVE or align
> - stack to 16 bytes before calling _dl_fixup. */
> -#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> - (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> - || 16 > DL_STACK_ALIGNMENT)
> +#include "dl-trampoline-save.h"
>
> /* Area on stack to save and restore registers used for parameter
> passing when calling _dl_fixup. */
> diff --git a/sysdeps/x86_64/dl-trampoline.h b/sysdeps/x86_64/dl-trampoline.h
> index f55c6ea040..d9ccfb40d4 100644
> --- a/sysdeps/x86_64/dl-trampoline.h
> +++ b/sysdeps/x86_64/dl-trampoline.h
> @@ -27,39 +27,7 @@
> # undef LOCAL_STORAGE_AREA
> # undef BASE
>
> -# if (STATE_SAVE_ALIGNMENT % 16) != 0
> -# error STATE_SAVE_ALIGNMENT must be multiple of 16
> -# endif
> -
> -# if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) != 0
> -# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT
> -# endif
> -
> -# if DL_RUNTIME_RESOLVE_REALIGN_STACK
> -/* Local stack area before jumping to function address: RBX. */
> -# define LOCAL_STORAGE_AREA 8
> -# define BASE rbx
> -# ifdef USE_FXSAVE
> -/* Use fxsave to save XMM registers. */
> -# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET)
> -# if (REGISTER_SAVE_AREA % 16) != 0
> -# error REGISTER_SAVE_AREA must be multiple of 16
> -# endif
> -# endif
> -# else
> -# ifndef USE_FXSAVE
> -# error USE_FXSAVE must be defined
> -# endif
> -/* Use fxsave to save XMM registers. */
> -# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8)
> -/* Local stack area before jumping to function address: All saved
> - registers. */
> -# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA
> -# define BASE rsp
> -# if (REGISTER_SAVE_AREA % 16) != 8
> -# error REGISTER_SAVE_AREA must be odd multiple of 8
> -# endif
> -# endif
> +# include "dl-trampoline-state.h"
>
> .globl _dl_runtime_resolve
> .hidden _dl_runtime_resolve
> --
> 2.43.2
>
This still LGTM.
Feel free to add my reviewed-by tag to future posts of the patch.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v10 1/1] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers
2024-02-26 19:02 ` Noah Goldstein
@ 2024-02-26 19:02 ` Noah Goldstein
2024-02-26 19:06 ` H.J. Lu
0 siblings, 1 reply; 7+ messages in thread
From: Noah Goldstein @ 2024-02-26 19:02 UTC (permalink / raw)
To: H.J. Lu; +Cc: libc-alpha, adhemerval.zanella
On Mon, Feb 26, 2024 at 1:02 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Mon, Feb 26, 2024 at 8:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > Compiler generates the following instruction sequence for GNU2 dynamic
> > TLS access:
> >
> > leaq tls_var@TLSDESC(%rip), %rax
> > call *tls_var@TLSCALL(%rax)
> >
> > or
> >
> > leal tls_var@TLSDESC(%ebx), %eax
> > call *tls_var@TLSCALL(%eax)
> >
> > CALL instruction is transparent to compiler which assumes all registers,
> > except for EFLAGS and RAX/EAX, are unchanged after CALL. When
> > _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
> > path. __tls_get_addr is a normal function which doesn't preserve any
> > caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer
> > caller-saved registers, but didn't preserve any other caller-saved
> > registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
> > XSAVE and XSAVEC to save and restore all caller-saved registers. This
> > fixes BZ #31372.
> >
> > Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
> > to optimize elf_machine_runtime_setup.
> > ---
> > elf/Makefile | 18 ++
> > elf/tst-gnu2-tls2.c | 122 ++++++++++++
> > elf/tst-gnu2-tls2.h | 36 ++++
> > elf/tst-gnu2-tls2mod0.c | 31 +++
> > elf/tst-gnu2-tls2mod1.c | 31 +++
> > elf/tst-gnu2-tls2mod2.c | 31 +++
> > sysdeps/i386/dl-machine.h | 2 +-
> > sysdeps/i386/dl-tlsdesc-dynamic.h | 190 +++++++++++++++++++
> > sysdeps/i386/dl-tlsdesc.S | 115 +++++------
> > sysdeps/x86/Makefile | 7 +-
> > sysdeps/x86/cpu-features.c | 56 +++++-
> > sysdeps/x86/dl-procinfo.c | 16 ++
> > sysdeps/{x86_64 => x86}/features-offsets.sym | 2 +
> > sysdeps/x86/sysdep.h | 6 +
> > sysdeps/x86/tst-gnu2-tls2.c | 20 ++
> > sysdeps/x86_64/Makefile | 2 +-
> > sysdeps/x86_64/dl-machine.h | 19 +-
> > sysdeps/x86_64/dl-procinfo.c | 16 ++
> > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 166 ++++++++++++++++
> > sysdeps/x86_64/dl-tlsdesc.S | 108 ++++-------
> > sysdeps/x86_64/dl-trampoline-save.h | 34 ++++
> > sysdeps/x86_64/dl-trampoline-state.h | 51 +++++
> > sysdeps/x86_64/dl-trampoline.S | 20 +-
> > sysdeps/x86_64/dl-trampoline.h | 34 +---
> > 24 files changed, 920 insertions(+), 213 deletions(-)
> > create mode 100644 elf/tst-gnu2-tls2.c
> > create mode 100644 elf/tst-gnu2-tls2.h
> > create mode 100644 elf/tst-gnu2-tls2mod0.c
> > create mode 100644 elf/tst-gnu2-tls2mod1.c
> > create mode 100644 elf/tst-gnu2-tls2mod2.c
> > create mode 100644 sysdeps/i386/dl-tlsdesc-dynamic.h
> > rename sysdeps/{x86_64 => x86}/features-offsets.sym (89%)
> > create mode 100644 sysdeps/x86/tst-gnu2-tls2.c
> > create mode 100644 sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > create mode 100644 sysdeps/x86_64/dl-trampoline-save.h
> > create mode 100644 sysdeps/x86_64/dl-trampoline-state.h
> >
> > diff --git a/elf/Makefile b/elf/Makefile
> > index 36c04baf02..02dc476e27 100644
> > --- a/elf/Makefile
> > +++ b/elf/Makefile
> > @@ -424,6 +424,7 @@ tests += \
> > tst-glibc-hwcaps-prepend \
> > tst-global1 \
> > tst-global2 \
> > + tst-gnu2-tls2 \
> > tst-initfinilazyfail \
> > tst-initorder \
> > tst-initorder2 \
> > @@ -846,6 +847,9 @@ modules-names += \
> > tst-filterobj-flt \
> > tst-finilazyfailmod \
> > tst-globalmod2 \
> > + tst-gnu2-tls2mod0 \
> > + tst-gnu2-tls2mod1 \
> > + tst-gnu2-tls2mod2 \
> > tst-initlazyfailmod \
> > tst-initorder2a \
> > tst-initorder2b \
> > @@ -3044,8 +3048,22 @@ $(objpfx)tst-tlsgap.out: \
> > $(objpfx)tst-tlsgap-mod0.so \
> > $(objpfx)tst-tlsgap-mod1.so \
> > $(objpfx)tst-tlsgap-mod2.so
> > +
> > +$(objpfx)tst-gnu2-tls2: $(shared-thread-library)
> > +$(objpfx)tst-gnu2-tls2.out: \
> > + $(objpfx)tst-gnu2-tls2mod0.so \
> > + $(objpfx)tst-gnu2-tls2mod1.so \
> > + $(objpfx)tst-gnu2-tls2mod2.so
> > +
> > ifeq (yes,$(have-mtls-dialect-gnu2))
> > +# This test fails if dl_tlsdesc_dynamic doesn't preserve all caller-saved
> > +# registers. See https://sourceware.org/bugzilla/show_bug.cgi?id=31372
> > +test-xfail-tst-gnu2-tls2 = yes
> > +
> > CFLAGS-tst-tlsgap-mod0.c += -mtls-dialect=gnu2
> > CFLAGS-tst-tlsgap-mod1.c += -mtls-dialect=gnu2
> > CFLAGS-tst-tlsgap-mod2.c += -mtls-dialect=gnu2
> > +CFLAGS-tst-gnu2-tls2mod0.c += -mtls-dialect=gnu2
> > +CFLAGS-tst-gnu2-tls2mod1.c += -mtls-dialect=gnu2
> > +CFLAGS-tst-gnu2-tls2mod2.c += -mtls-dialect=gnu2
> > endif
> > diff --git a/elf/tst-gnu2-tls2.c b/elf/tst-gnu2-tls2.c
> > new file mode 100644
> > index 0000000000..7ac04d7f33
> > --- /dev/null
> > +++ b/elf/tst-gnu2-tls2.c
> > @@ -0,0 +1,122 @@
> > +/* Test TLSDESC relocation.
> > + Copyright (C) 2024 Free Software Foundation, Inc.
> > + This file is part of the GNU C Library.
> > +
> > + The GNU C Library is free software; you can redistribute it and/or
> > + modify it under the terms of the GNU Lesser General Public
> > + License as published by the Free Software Foundation; either
> > + version 2.1 of the License, or (at your option) any later version.
> > +
> > + The GNU C Library is distributed in the hope that it will be useful,
> > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + Lesser General Public License for more details.
> > +
> > + You should have received a copy of the GNU Lesser General Public
> > + License along with the GNU C Library; if not, see
> > + <http://www.gnu.org/licenses/>. */
> > +
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <dlfcn.h>
> > +#include <pthread.h>
> > +#include <support/xdlfcn.h>
> > +#include <support/xthread.h>
> > +#include <support/check.h>
> > +#include <support/test-driver.h>
> > +#include "tst-gnu2-tls2.h"
> > +
> > +#ifndef IS_SUPPORTED
> > +# define IS_SUPPORTED() true
> > +#endif
> > +
> > +/* An architecture can define it to clobber caller-saved registers in
> > + malloc below to verify that the implicit TLSDESC call won't change
> > + caller-saved registers. */
> > +#ifndef PREPARE_MALLOC
> > +# define PREPARE_MALLOC()
> > +#endif
> > +
> > +extern void * __libc_malloc (size_t);
> > +
> > +size_t malloc_counter = 0;
> > +
> > +void *
> > +malloc (size_t n)
> > +{
> > + PREPARE_MALLOC ();
> > + malloc_counter++;
> > + return __libc_malloc (n);
> > +}
> > +
> > +static void *mod[3];
> > +#ifndef MOD
> > +# define MOD(i) "tst-gnu2-tls2mod" #i ".so"
> > +#endif
> > +static const char *modname[3] = { MOD(0), MOD(1), MOD(2) };
> > +#undef MOD
> > +
> > +static void
> > +open_mod (int i)
> > +{
> > + mod[i] = xdlopen (modname[i], RTLD_LAZY);
> > + printf ("open %s\n", modname[i]);
> > +}
> > +
> > +static void
> > +close_mod (int i)
> > +{
> > + xdlclose (mod[i]);
> > + mod[i] = NULL;
> > + printf ("close %s\n", modname[i]);
> > +}
> > +
> > +static void
> > +access_mod (int i, const char *sym)
> > +{
> > + struct tls var = { -1, -1, -1, -1 };
> > + struct tls *(*f) (struct tls *) = xdlsym (mod[i], sym);
> > + /* Check that our malloc is called. */
> > + malloc_counter = 0;
> > + struct tls *p = f (&var);
> > + TEST_VERIFY (malloc_counter != 0);
> > + printf ("access %s: %s() = %p\n", modname[i], sym, p);
> > + TEST_VERIFY_EXIT (memcmp (p, &var, sizeof (var)) == 0);
> > + ++(p->a);
> > +}
> > +
> > +static void *
> > +start (void *arg)
> > +{
> > + /* The DTV generation is at the last dlopen of mod0 and the
> > + entry for mod1 is NULL. */
> > +
> > + open_mod (1); /* Reuse modid of mod1. Uses dynamic TLS. */
> > +
> > + /* Force the slow path in GNU2 TLS descriptor call. */
> > + access_mod (1, "apply_tls");
> > +
> > + return arg;
> > +}
> > +
> > +static int
> > +do_test (void)
> > +{
> > + if (!IS_SUPPORTED ())
> > + return EXIT_UNSUPPORTED;
> > +
> > + open_mod (0);
> > + open_mod (1);
> > + open_mod (2);
> > + close_mod (0);
> > + close_mod (1); /* Create modid gap at mod1. */
> > + open_mod (0); /* Reuse modid of mod0, bump generation count. */
> > +
> > + /* Create a thread where DTV of mod1 is NULL. */
> > + pthread_t t = xpthread_create (NULL, start, NULL);
> > + xpthread_join (t);
> > + return 0;
> > +}
> > +
> > +#include <support/test-driver.c>
> > diff --git a/elf/tst-gnu2-tls2.h b/elf/tst-gnu2-tls2.h
> > new file mode 100644
> > index 0000000000..77964a57a3
> > --- /dev/null
> > +++ b/elf/tst-gnu2-tls2.h
> > @@ -0,0 +1,36 @@
> > +/* Test TLSDESC relocation.
> > + Copyright (C) 2024 Free Software Foundation, Inc.
> > + This file is part of the GNU C Library.
> > +
> > + The GNU C Library is free software; you can redistribute it and/or
> > + modify it under the terms of the GNU Lesser General Public
> > + License as published by the Free Software Foundation; either
> > + version 2.1 of the License, or (at your option) any later version.
> > +
> > + The GNU C Library is distributed in the hope that it will be useful,
> > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + Lesser General Public License for more details.
> > +
> > + You should have received a copy of the GNU Lesser General Public
> > + License along with the GNU C Library; if not, see
> > + <https://www.gnu.org/licenses/>. */
> > +
> > +#include <stdint.h>
> > +
> > +struct tls
> > +{
> > + int64_t a, b, c, d;
> > +};
> > +
> > +extern struct tls *apply_tls (struct tls *);
> > +
> > +/* An architecture can define them to verify that clobber caller-saved
> > + registers aren't changed by the implicit TLSDESC call. */
> > +#ifndef BEFORE_TLSDESC_CALL
> > +# define BEFORE_TLSDESC_CALL()
> > +#endif
> > +
> > +#ifndef AFTER_TLSDESC_CALL
> > +# define AFTER_TLSDESC_CALL()
> > +#endif
> > diff --git a/elf/tst-gnu2-tls2mod0.c b/elf/tst-gnu2-tls2mod0.c
> > new file mode 100644
> > index 0000000000..45556a0e17
> > --- /dev/null
> > +++ b/elf/tst-gnu2-tls2mod0.c
> > @@ -0,0 +1,31 @@
> > +/* DSO used by tst-gnu2-tls2.
> > + Copyright (C) 2024 Free Software Foundation, Inc.
> > + This file is part of the GNU C Library.
> > +
> > + The GNU C Library is free software; you can redistribute it and/or
> > + modify it under the terms of the GNU Lesser General Public
> > + License as published by the Free Software Foundation; either
> > + version 2.1 of the License, or (at your option) any later version.
> > +
> > + The GNU C Library is distributed in the hope that it will be useful,
> > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + Lesser General Public License for more details.
> > +
> > + You should have received a copy of the GNU Lesser General Public
> > + License along with the GNU C Library; if not, see
> > + <https://www.gnu.org/licenses/>. */
> > +
> > +#include "tst-gnu2-tls2.h"
> > +
> > +__thread struct tls tls_var0 __attribute__ ((visibility ("hidden")));
> > +
> > +struct tls *
> > +apply_tls (struct tls *p)
> > +{
> > + BEFORE_TLSDESC_CALL ();
> > + tls_var0 = *p;
> > + struct tls *ret = &tls_var0;
> > + AFTER_TLSDESC_CALL ();
> > + return ret;
> > +}
> > diff --git a/elf/tst-gnu2-tls2mod1.c b/elf/tst-gnu2-tls2mod1.c
> > new file mode 100644
> > index 0000000000..e10b9dbc0a
> > --- /dev/null
> > +++ b/elf/tst-gnu2-tls2mod1.c
> > @@ -0,0 +1,31 @@
> > +/* DSO used by tst-gnu2-tls2.
> > + Copyright (C) 2024 Free Software Foundation, Inc.
> > + This file is part of the GNU C Library.
> > +
> > + The GNU C Library is free software; you can redistribute it and/or
> > + modify it under the terms of the GNU Lesser General Public
> > + License as published by the Free Software Foundation; either
> > + version 2.1 of the License, or (at your option) any later version.
> > +
> > + The GNU C Library is distributed in the hope that it will be useful,
> > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + Lesser General Public License for more details.
> > +
> > + You should have received a copy of the GNU Lesser General Public
> > + License along with the GNU C Library; if not, see
> > + <https://www.gnu.org/licenses/>. */
> > +
> > +#include "tst-gnu2-tls2.h"
> > +
> > +__thread struct tls tls_var1[100] __attribute__ ((visibility ("hidden")));
> > +
> > +struct tls *
> > +apply_tls (struct tls *p)
> > +{
> > + BEFORE_TLSDESC_CALL ();
> > + tls_var1[1] = *p;
> > + struct tls *ret = &tls_var1[1];
> > + AFTER_TLSDESC_CALL ();
> > + return ret;
> > +}
> > diff --git a/elf/tst-gnu2-tls2mod2.c b/elf/tst-gnu2-tls2mod2.c
> > new file mode 100644
> > index 0000000000..141af51e55
> > --- /dev/null
> > +++ b/elf/tst-gnu2-tls2mod2.c
> > @@ -0,0 +1,31 @@
> > +/* DSO used by tst-gnu2-tls2.
> > + Copyright (C) 2024 Free Software Foundation, Inc.
> > + This file is part of the GNU C Library.
> > +
> > + The GNU C Library is free software; you can redistribute it and/or
> > + modify it under the terms of the GNU Lesser General Public
> > + License as published by the Free Software Foundation; either
> > + version 2.1 of the License, or (at your option) any later version.
> > +
> > + The GNU C Library is distributed in the hope that it will be useful,
> > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + Lesser General Public License for more details.
> > +
> > + You should have received a copy of the GNU Lesser General Public
> > + License along with the GNU C Library; if not, see
> > + <https://www.gnu.org/licenses/>. */
> > +
> > +#include "tst-gnu2-tls2.h"
> > +
> > +__thread struct tls tls_var2 __attribute__ ((visibility ("hidden")));
> > +
> > +struct tls *
> > +apply_tls (struct tls *p)
> > +{
> > + BEFORE_TLSDESC_CALL ();
> > + tls_var2 = *p;
> > + struct tls *ret = &tls_var2;
> > + AFTER_TLSDESC_CALL ();
> > + return ret;
> > +}
> > diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
> > index fc1ef96587..50d74fe6e9 100644
> > --- a/sysdeps/i386/dl-machine.h
> > +++ b/sysdeps/i386/dl-machine.h
> > @@ -347,7 +347,7 @@ and creates an unsatisfiable circular dependency.\n",
> > {
> > td->arg = _dl_make_tlsdesc_dynamic
> > (sym_map, sym->st_value + (ElfW(Word))td->arg);
> > - td->entry = _dl_tlsdesc_dynamic;
> > + td->entry = GLRO(dl_x86_tlsdesc_dynamic);
> > }
> > else
> > # endif
> > diff --git a/sysdeps/i386/dl-tlsdesc-dynamic.h b/sysdeps/i386/dl-tlsdesc-dynamic.h
> > new file mode 100644
> > index 0000000000..3627028577
> > --- /dev/null
> > +++ b/sysdeps/i386/dl-tlsdesc-dynamic.h
> > @@ -0,0 +1,190 @@
> > +/* Thread-local storage handling in the ELF dynamic linker. i386 version.
> > + Copyright (C) 2004-2024 Free Software Foundation, Inc.
> > + This file is part of the GNU C Library.
> > +
> > + The GNU C Library is free software; you can redistribute it and/or
> > + modify it under the terms of the GNU Lesser General Public
> > + License as published by the Free Software Foundation; either
> > + version 2.1 of the License, or (at your option) any later version.
> > +
> > + The GNU C Library is distributed in the hope that it will be useful,
> > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + Lesser General Public License for more details.
> > +
> > + You should have received a copy of the GNU Lesser General Public
> > + License along with the GNU C Library; if not, see
> > + <https://www.gnu.org/licenses/>. */
> > +
> > +#undef REGISTER_SAVE_AREA
> > +
> > +#if !defined USE_FNSAVE && (STATE_SAVE_ALIGNMENT % 16) != 0
> > +# error STATE_SAVE_ALIGNMENT must be multiple of 16
> > +#endif
> > +
> > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > +# ifdef USE_FNSAVE
> > +# error USE_FNSAVE shouldn't be defined
> > +# endif
> > +# ifdef USE_FXSAVE
> > +/* Use fxsave to save all registers. */
> > +# define REGISTER_SAVE_AREA 512
> > +# endif
> > +#else
> > +# ifdef USE_FNSAVE
> > +/* Use fnsave to save x87 FPU stack registers. */
> > +# define REGISTER_SAVE_AREA 108
> > +# else
> > +# ifndef USE_FXSAVE
> > +# error USE_FXSAVE must be defined
> > +# endif
> > +/* Use fxsave to save all registers. Add 12 bytes to align the stack
> > + to 16 bytes. */
> > +# define REGISTER_SAVE_AREA (512 + 12)
> > +# endif
> > +#endif
> > +
> > + .hidden _dl_tlsdesc_dynamic
> > + .global _dl_tlsdesc_dynamic
> > + .type _dl_tlsdesc_dynamic,@function
> > +
> > + /* This function is used for symbols that need dynamic TLS.
> > +
> > + %eax points to the TLS descriptor, such that 0(%eax) points to
> > + _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
> > + tlsdesc_dynamic_arg object. It must return in %eax the offset
> > + between the thread pointer and the object denoted by the
> > + argument, without clobbering any registers.
> > +
> > + The assembly code that follows is a rendition of the following
> > + C code, hand-optimized a little bit.
> > +
> > +ptrdiff_t
> > +__attribute__ ((__regparm__ (1)))
> > +_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
> > +{
> > + struct tlsdesc_dynamic_arg *td = tdp->arg;
> > + dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > + if (__builtin_expect (td->gen_count <= dtv[0].counter
> > + && (dtv[td->tlsinfo.ti_module].pointer.val
> > + != TLS_DTV_UNALLOCATED),
> > + 1))
> > + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > + - __thread_pointer;
> > +
> > + return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
> > +}
> > +*/
> > + cfi_startproc
> > + .align 16
> > +_dl_tlsdesc_dynamic:
> > + /* Like all TLS resolvers, preserve call-clobbered registers.
> > + We need two scratch regs anyway. */
> > + subl $32, %esp
> > + cfi_adjust_cfa_offset (32)
> > + movl %ecx, 20(%esp)
> > + movl %edx, 24(%esp)
> > + movl TLSDESC_ARG(%eax), %eax
> > + movl %gs:DTV_OFFSET, %edx
> > + movl TLSDESC_GEN_COUNT(%eax), %ecx
> > + cmpl (%edx), %ecx
> > + ja 2f
> > + movl TLSDESC_MODID(%eax), %ecx
> > + movl (%edx,%ecx,8), %edx
> > + cmpl $-1, %edx
> > + je 2f
> > + movl TLSDESC_MODOFF(%eax), %eax
> > + addl %edx, %eax
> > +1:
> > + movl 20(%esp), %ecx
> > + subl %gs:0, %eax
> > + movl 24(%esp), %edx
> > + addl $32, %esp
> > + cfi_adjust_cfa_offset (-32)
> > + ret
> > + .p2align 4,,7
> > +2:
> > + cfi_adjust_cfa_offset (32)
> > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > + movl %ebx, -28(%esp)
> > + movl %esp, %ebx
> > + cfi_def_cfa_register(%ebx)
> > + and $-STATE_SAVE_ALIGNMENT, %esp
> > +#endif
> > +#ifdef REGISTER_SAVE_AREA
> > + subl $REGISTER_SAVE_AREA, %esp
> > +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
> > + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
> > +# endif
> > +#else
> > +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
> > +# error DL_RUNTIME_RESOLVE_REALIGN_STACK must be true
> > +# endif
> > + /* Allocate stack space of the required size to save the state. */
> > + LOAD_PIC_REG (cx)
> > + subl RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET+_rtld_local_ro@GOTOFF(%ecx), %esp
> > +#endif
> > +#ifdef USE_FNSAVE
> > + fnsave (%esp)
> > +#elif defined USE_FXSAVE
> > + fxsave (%esp)
> > +#else
> > + /* Save the argument for ___tls_get_addr in EAX. */
> > + movl %eax, %ecx
> > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > + xorl %edx, %edx
> > + /* Clear the XSAVE Header. */
> > +# ifdef USE_XSAVE
> > + movl %edx, (512)(%esp)
> > + movl %edx, (512 + 4 * 1)(%esp)
> > + movl %edx, (512 + 4 * 2)(%esp)
> > + movl %edx, (512 + 4 * 3)(%esp)
> > +# endif
> > + movl %edx, (512 + 4 * 4)(%esp)
> > + movl %edx, (512 + 4 * 5)(%esp)
> > + movl %edx, (512 + 4 * 6)(%esp)
> > + movl %edx, (512 + 4 * 7)(%esp)
> > + movl %edx, (512 + 4 * 8)(%esp)
> > + movl %edx, (512 + 4 * 9)(%esp)
> > + movl %edx, (512 + 4 * 10)(%esp)
> > + movl %edx, (512 + 4 * 11)(%esp)
> > + movl %edx, (512 + 4 * 12)(%esp)
> > + movl %edx, (512 + 4 * 13)(%esp)
> > + movl %edx, (512 + 4 * 14)(%esp)
> > + movl %edx, (512 + 4 * 15)(%esp)
> > +# ifdef USE_XSAVE
> > + xsave (%esp)
> > +# else
> > + xsavec (%esp)
> > +# endif
> > + /* Restore the argument for ___tls_get_addr in EAX. */
> > + movl %ecx, %eax
> > +#endif
> > + call HIDDEN_JUMPTARGET (___tls_get_addr)
> > + /* Get register content back. */
> > +#ifdef USE_FNSAVE
> > + frstor (%esp)
> > +#elif defined USE_FXSAVE
> > + fxrstor (%esp)
> > +#else
> > + /* Save and retore ___tls_get_addr return value stored in EAX. */
> > + movl %eax, %ecx
> > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > + xorl %edx, %edx
> > + xrstor (%esp)
> > + movl %ecx, %eax
> > +#endif
> > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > + mov %ebx, %esp
> > + cfi_def_cfa_register(%esp)
> > + movl -28(%esp), %ebx
> > + cfi_restore(%ebx)
> > +#else
> > + addl $REGISTER_SAVE_AREA, %esp
> > + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA)
> > +#endif
> > + jmp 1b
> > + cfi_endproc
> > + .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > +
> > +#undef STATE_SAVE_ALIGNMENT
> > diff --git a/sysdeps/i386/dl-tlsdesc.S b/sysdeps/i386/dl-tlsdesc.S
> > index 90d93caa0c..f002feee56 100644
> > --- a/sysdeps/i386/dl-tlsdesc.S
> > +++ b/sysdeps/i386/dl-tlsdesc.S
> > @@ -18,8 +18,27 @@
> >
> > #include <sysdep.h>
> > #include <tls.h>
> > +#include <cpu-features-offsets.h>
> > +#include <features-offsets.h>
> > #include "tlsdesc.h"
> >
> > +#ifndef DL_STACK_ALIGNMENT
> > +/* Due to GCC bug:
> > +
> > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> > +
> > + __tls_get_addr may be called with 4-byte stack alignment. Although
> > + this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> > + that stack will be always aligned at 16 bytes. */
> > +# define DL_STACK_ALIGNMENT 4
> > +#endif
> > +
> > +/* True if _dl_tlsdesc_dynamic should align stack for STATE_SAVE or align
> > + stack to MINIMUM_ALIGNMENT bytes before calling ___tls_get_addr. */
> > +#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> > + (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> > + || MINIMUM_ALIGNMENT > DL_STACK_ALIGNMENT)
> > +
> > .text
> >
> > /* This function is used to compute the TP offset for symbols in
> > @@ -65,69 +84,35 @@ _dl_tlsdesc_undefweak:
> > .size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
> >
> > #ifdef SHARED
> > - .hidden _dl_tlsdesc_dynamic
> > - .global _dl_tlsdesc_dynamic
> > - .type _dl_tlsdesc_dynamic,@function
> > -
> > - /* This function is used for symbols that need dynamic TLS.
> > -
> > - %eax points to the TLS descriptor, such that 0(%eax) points to
> > - _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
> > - tlsdesc_dynamic_arg object. It must return in %eax the offset
> > - between the thread pointer and the object denoted by the
> > - argument, without clobbering any registers.
> > -
> > - The assembly code that follows is a rendition of the following
> > - C code, hand-optimized a little bit.
> > -
> > -ptrdiff_t
> > -__attribute__ ((__regparm__ (1)))
> > -_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
> > -{
> > - struct tlsdesc_dynamic_arg *td = tdp->arg;
> > - dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > - if (__builtin_expect (td->gen_count <= dtv[0].counter
> > - && (dtv[td->tlsinfo.ti_module].pointer.val
> > - != TLS_DTV_UNALLOCATED),
> > - 1))
> > - return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > - - __thread_pointer;
> > -
> > - return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
> > -}
> > -*/
> > - cfi_startproc
> > - .align 16
> > -_dl_tlsdesc_dynamic:
> > - /* Like all TLS resolvers, preserve call-clobbered registers.
> > - We need two scratch regs anyway. */
> > - subl $28, %esp
> > - cfi_adjust_cfa_offset (28)
> > - movl %ecx, 20(%esp)
> > - movl %edx, 24(%esp)
> > - movl TLSDESC_ARG(%eax), %eax
> > - movl %gs:DTV_OFFSET, %edx
> > - movl TLSDESC_GEN_COUNT(%eax), %ecx
> > - cmpl (%edx), %ecx
> > - ja .Lslow
> > - movl TLSDESC_MODID(%eax), %ecx
> > - movl (%edx,%ecx,8), %edx
> > - cmpl $-1, %edx
> > - je .Lslow
> > - movl TLSDESC_MODOFF(%eax), %eax
> > - addl %edx, %eax
> > -.Lret:
> > - movl 20(%esp), %ecx
> > - subl %gs:0, %eax
> > - movl 24(%esp), %edx
> > - addl $28, %esp
> > - cfi_adjust_cfa_offset (-28)
> > - ret
> > - .p2align 4,,7
> > -.Lslow:
> > - cfi_adjust_cfa_offset (28)
> > - call HIDDEN_JUMPTARGET (___tls_get_addr)
> > - jmp .Lret
> > - cfi_endproc
> > - .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > +# define USE_FNSAVE
> > +# define MINIMUM_ALIGNMENT 4
> > +# define STATE_SAVE_ALIGNMENT 4
> > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fnsave
> > +# include "dl-tlsdesc-dynamic.h"
> > +# undef _dl_tlsdesc_dynamic
> > +# undef MINIMUM_ALIGNMENT
> > +# undef USE_FNSAVE
> > +
> > +# define MINIMUM_ALIGNMENT 16
> > +
> > +# define USE_FXSAVE
> > +# define STATE_SAVE_ALIGNMENT 16
> > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave
> > +# include "dl-tlsdesc-dynamic.h"
> > +# undef _dl_tlsdesc_dynamic
> > +# undef USE_FXSAVE
> > +
> > +# define USE_XSAVE
> > +# define STATE_SAVE_ALIGNMENT 64
> > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave
> > +# include "dl-tlsdesc-dynamic.h"
> > +# undef _dl_tlsdesc_dynamic
> > +# undef USE_XSAVE
> > +
> > +# define USE_XSAVEC
> > +# define STATE_SAVE_ALIGNMENT 64
> > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec
> > +# include "dl-tlsdesc-dynamic.h"
> > +# undef _dl_tlsdesc_dynamic
> > +# undef USE_XSAVEC
> > #endif /* SHARED */
> > diff --git a/sysdeps/x86/Makefile b/sysdeps/x86/Makefile
> > index 73b29cc78c..5311b594af 100644
> > --- a/sysdeps/x86/Makefile
> > +++ b/sysdeps/x86/Makefile
> > @@ -1,5 +1,5 @@
> > ifeq ($(subdir),csu)
> > -gen-as-const-headers += cpu-features-offsets.sym
> > +gen-as-const-headers += cpu-features-offsets.sym features-offsets.sym
> > endif
> >
> > ifeq ($(subdir),elf)
> > @@ -86,6 +86,11 @@ endif
> > tst-ifunc-isa-2-ENV = GLIBC_TUNABLES=glibc.cpu.hwcaps=-SSE4_2,-AVX,-AVX2,-AVX512F
> > tst-ifunc-isa-2-static-ENV = $(tst-ifunc-isa-2-ENV)
> > tst-hwcap-tunables-ARGS = -- $(host-test-program-cmd)
> > +
> > +CFLAGS-tst-gnu2-tls2.c += -msse
> > +CFLAGS-tst-gnu2-tls2mod0.c += -msse2 -mtune=haswell
> > +CFLAGS-tst-gnu2-tls2mod1.c += -msse2 -mtune=haswell
> > +CFLAGS-tst-gnu2-tls2mod2.c += -msse2 -mtune=haswell
> > endif
> >
> > ifeq ($(subdir),math)
> > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> > index 25e6622a79..835113b42f 100644
> > --- a/sysdeps/x86/cpu-features.c
> > +++ b/sysdeps/x86/cpu-features.c
> > @@ -27,8 +27,13 @@
> > extern void TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *)
> > attribute_hidden;
> >
> > -#if defined SHARED && defined __x86_64__
> > -# include <dl-plt-rewrite.h>
> > +#if defined SHARED
> > +extern void _dl_tlsdesc_dynamic_fxsave (void) attribute_hidden;
> > +extern void _dl_tlsdesc_dynamic_xsave (void) attribute_hidden;
> > +extern void _dl_tlsdesc_dynamic_xsavec (void) attribute_hidden;
> > +
> > +# ifdef __x86_64__
> > +# include <dl-plt-rewrite.h>
> >
> > static void
> > TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
> > @@ -47,6 +52,15 @@ TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
> > : plt_rewrite_jmp);
> > }
> > }
> > +# else
> > +extern void _dl_tlsdesc_dynamic_fnsave (void) attribute_hidden;
> > +# endif
> > +#endif
> > +
> > +#ifdef __x86_64__
> > +extern void _dl_runtime_resolve_fxsave (void) attribute_hidden;
> > +extern void _dl_runtime_resolve_xsave (void) attribute_hidden;
> > +extern void _dl_runtime_resolve_xsavec (void) attribute_hidden;
> > #endif
> >
> > #ifdef __LP64__
> > @@ -1130,6 +1144,44 @@ no_cpuid:
> > TUNABLE_CALLBACK (set_x86_shstk));
> > #endif
> >
> > + if (GLRO(dl_x86_cpu_features).xsave_state_size != 0)
> > + {
> > + if (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC))
> > + {
> > +#ifdef __x86_64__
> > + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_xsavec;
> > +#endif
> > +#ifdef SHARED
> > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_xsavec;
> > +#endif
> > + }
> > + else
> > + {
> > +#ifdef __x86_64__
> > + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_xsave;
> > +#endif
> > +#ifdef SHARED
> > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_xsave;
> > +#endif
> > + }
> > + }
> > + else
> > + {
> > +#ifdef __x86_64__
> > + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_fxsave;
> > +# ifdef SHARED
> > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fxsave;
> > +# endif
> > +#else
> > +# ifdef SHARED
> > + if (CPU_FEATURE_USABLE_P (cpu_features, FXSR))
> > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fxsave;
> > + else
> > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fnsave;
> > +# endif
> > +#endif
> > + }
> > +
> > #ifdef SHARED
> > # ifdef __x86_64__
> > TUNABLE_GET (plt_rewrite, tunable_val_t *,
> > diff --git a/sysdeps/x86/dl-procinfo.c b/sysdeps/x86/dl-procinfo.c
> > index ee957b4d70..5920d4b320 100644
> > --- a/sysdeps/x86/dl-procinfo.c
> > +++ b/sysdeps/x86/dl-procinfo.c
> > @@ -86,3 +86,19 @@ PROCINFO_CLASS const char _dl_x86_platforms[4][9]
> > #else
> > ,
> > #endif
> > +
> > +#if defined SHARED && !IS_IN (ldconfig)
> > +# if !defined PROCINFO_DECL
> > + ._dl_x86_tlsdesc_dynamic
> > +# else
> > +PROCINFO_CLASS void * _dl_x86_tlsdesc_dynamic
> > +# endif
> > +# ifndef PROCINFO_DECL
> > += NULL
> > +# endif
> > +# ifdef PROCINFO_DECL
> > +;
> > +# else
> > +,
> > +# endif
> > +#endif
> > diff --git a/sysdeps/x86_64/features-offsets.sym b/sysdeps/x86/features-offsets.sym
> > similarity index 89%
> > rename from sysdeps/x86_64/features-offsets.sym
> > rename to sysdeps/x86/features-offsets.sym
> > index 9e4be3393a..77e990c705 100644
> > --- a/sysdeps/x86_64/features-offsets.sym
> > +++ b/sysdeps/x86/features-offsets.sym
> > @@ -3,4 +3,6 @@
> > #include <ldsodefs.h>
> >
> > RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET offsetof (struct rtld_global_ro, _dl_x86_cpu_features)
> > +#ifdef __x86_64__
> > RTLD_GLOBAL_DL_X86_FEATURE_1_OFFSET offsetof (struct rtld_global, _dl_x86_feature_1)
> > +#endif
> > diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h
> > index 837fd28734..485cad9c02 100644
> > --- a/sysdeps/x86/sysdep.h
> > +++ b/sysdeps/x86/sysdep.h
> > @@ -70,6 +70,12 @@
> > | (1 << X86_XSTATE_ZMM_H_ID))
> > #endif
> >
> > +/* States which should be saved for TLSDESC_CALL and TLS_DESC_CALL.
> > + Compiler assumes that all registers, including x87 FPU stack registers,
> > + are unchanged after CALL, except for EFLAGS and RAX/EAX. */
> > +#define TLSDESC_CALL_STATE_SAVE_MASK \
> > + (STATE_SAVE_MASK | (1 << X86_XSTATE_X87_ID))
> > +
> > /* Constants for bits in __x86_string_control: */
> >
> > /* Avoid short distance REP MOVSB. */
> > diff --git a/sysdeps/x86/tst-gnu2-tls2.c b/sysdeps/x86/tst-gnu2-tls2.c
> > new file mode 100644
> > index 0000000000..de900a423b
> > --- /dev/null
> > +++ b/sysdeps/x86/tst-gnu2-tls2.c
> > @@ -0,0 +1,20 @@
> > +#ifndef __x86_64__
> > +#include <sys/platform/x86.h>
> > +
> > +#define IS_SUPPORTED() CPU_FEATURE_ACTIVE (SSE2)
> > +#endif
> > +
> > +/* Clear XMM0...XMM7 */
> > +#define PREPARE_MALLOC() \
> > +{ \
> > + asm volatile ("xorps %%xmm0, %%xmm0" : : : "xmm0" ); \
> > + asm volatile ("xorps %%xmm1, %%xmm1" : : : "xmm1" ); \
> > + asm volatile ("xorps %%xmm2, %%xmm2" : : : "xmm2" ); \
> > + asm volatile ("xorps %%xmm3, %%xmm3" : : : "xmm3" ); \
> > + asm volatile ("xorps %%xmm4, %%xmm4" : : : "xmm4" ); \
> > + asm volatile ("xorps %%xmm5, %%xmm5" : : : "xmm5" ); \
> > + asm volatile ("xorps %%xmm6, %%xmm6" : : : "xmm6" ); \
> > + asm volatile ("xorps %%xmm7, %%xmm7" : : : "xmm7" ); \
> > +}
> > +
> > +#include <elf/tst-gnu2-tls2.c>
> > diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
> > index 145f25e7f6..9337e95093 100644
> > --- a/sysdeps/x86_64/Makefile
> > +++ b/sysdeps/x86_64/Makefile
> > @@ -10,7 +10,7 @@ LDFLAGS-rtld += -Wl,-z,nomark-plt
> > endif
> >
> > ifeq ($(subdir),csu)
> > -gen-as-const-headers += features-offsets.sym link-defines.sym
> > +gen-as-const-headers += link-defines.sym
> > endif
> >
> > ifeq ($(subdir),gmon)
> > diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> > index 6d605d0d32..ff5d45f7cb 100644
> > --- a/sysdeps/x86_64/dl-machine.h
> > +++ b/sysdeps/x86_64/dl-machine.h
> > @@ -71,9 +71,6 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > int lazy, int profile)
> > {
> > Elf64_Addr *got;
> > - extern void _dl_runtime_resolve_fxsave (ElfW(Word)) attribute_hidden;
> > - extern void _dl_runtime_resolve_xsave (ElfW(Word)) attribute_hidden;
> > - extern void _dl_runtime_resolve_xsavec (ElfW(Word)) attribute_hidden;
> > extern void _dl_runtime_profile_sse (ElfW(Word)) attribute_hidden;
> > extern void _dl_runtime_profile_avx (ElfW(Word)) attribute_hidden;
> > extern void _dl_runtime_profile_avx512 (ElfW(Word)) attribute_hidden;
> > @@ -96,8 +93,6 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > /* Identify this shared object. */
> > *(ElfW(Addr) *) (got + 1) = (ElfW(Addr)) l;
> >
> > - const struct cpu_features* cpu_features = __get_cpu_features ();
> > -
> > #ifdef SHARED
> > /* The got[2] entry contains the address of a function which gets
> > called to get the address of a so far unresolved function and
> > @@ -107,6 +102,7 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > end in this function. */
> > if (__glibc_unlikely (profile))
> > {
> > + const struct cpu_features* cpu_features = __get_cpu_features ();
> > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512F))
> > *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx512;
> > else if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX))
> > @@ -126,15 +122,8 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > /* This function will get called to fix up the GOT entry
> > indicated by the offset on the stack, and then jump to
> > the resolved address. */
> > - if (MINIMUM_X86_ISA_LEVEL >= AVX_X86_ISA_LEVEL
> > - || GLRO(dl_x86_cpu_features).xsave_state_size != 0)
> > - *(ElfW(Addr) *) (got + 2)
> > - = (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC)
> > - ? (ElfW(Addr)) &_dl_runtime_resolve_xsavec
> > - : (ElfW(Addr)) &_dl_runtime_resolve_xsave);
> > - else
> > - *(ElfW(Addr) *) (got + 2)
> > - = (ElfW(Addr)) &_dl_runtime_resolve_fxsave;
> > + *(ElfW(Addr) *) (got + 2)
> > + = (ElfW(Addr)) GLRO(dl_x86_64_runtime_resolve);
> > }
> > }
> >
> > @@ -383,7 +372,7 @@ and creates an unsatisfiable circular dependency.\n",
> > {
> > td->arg = _dl_make_tlsdesc_dynamic
> > (sym_map, sym->st_value + reloc->r_addend);
> > - td->entry = _dl_tlsdesc_dynamic;
> > + td->entry = GLRO(dl_x86_tlsdesc_dynamic);
> > }
> > else
> > # endif
> > diff --git a/sysdeps/x86_64/dl-procinfo.c b/sysdeps/x86_64/dl-procinfo.c
> > index 4d1d790fbb..06637a8154 100644
> > --- a/sysdeps/x86_64/dl-procinfo.c
> > +++ b/sysdeps/x86_64/dl-procinfo.c
> > @@ -41,5 +41,21 @@
> >
> > #include <sysdeps/x86/dl-procinfo.c>
> >
> > +#if !IS_IN (ldconfig)
> > +# if !defined PROCINFO_DECL && defined SHARED
> > + ._dl_x86_64_runtime_resolve
> > +# else
> > +PROCINFO_CLASS void * _dl_x86_64_runtime_resolve
> > +# endif
> > +# ifndef PROCINFO_DECL
> > += NULL
> > +# endif
> > +# if !defined SHARED || defined PROCINFO_DECL
> > +;
> > +# else
> > +,
> > +# endif
> > +#endif
> > +
> > #undef PROCINFO_DECL
> > #undef PROCINFO_CLASS
> > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > new file mode 100644
> > index 0000000000..0c2e8d5320
> > --- /dev/null
> > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > @@ -0,0 +1,166 @@
> > +/* Thread-local storage handling in the ELF dynamic linker. x86_64 version.
> > + Copyright (C) 2004-2024 Free Software Foundation, Inc.
> > + This file is part of the GNU C Library.
> > +
> > + The GNU C Library is free software; you can redistribute it and/or
> > + modify it under the terms of the GNU Lesser General Public
> > + License as published by the Free Software Foundation; either
> > + version 2.1 of the License, or (at your option) any later version.
> > +
> > + The GNU C Library is distributed in the hope that it will be useful,
> > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + Lesser General Public License for more details.
> > +
> > + You should have received a copy of the GNU Lesser General Public
> > + License along with the GNU C Library; if not, see
> > + <https://www.gnu.org/licenses/>. */
> > +
> > +#ifndef SECTION
> > +# define SECTION(p) p
> > +#endif
> > +
> > +#undef REGISTER_SAVE_AREA
> > +#undef LOCAL_STORAGE_AREA
> > +#undef BASE
> > +
> > +#include "dl-trampoline-state.h"
> > +
> > + .section SECTION(.text),"ax",@progbits
> > +
> > + .hidden _dl_tlsdesc_dynamic
> > + .global _dl_tlsdesc_dynamic
> > + .type _dl_tlsdesc_dynamic,@function
> > +
> > + /* %rax points to the TLS descriptor, such that 0(%rax) points to
> > + _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
> > + tlsdesc_dynamic_arg object. It must return in %rax the offset
> > + between the thread pointer and the object denoted by the
> > + argument, without clobbering any registers.
> > +
> > + The assembly code that follows is a rendition of the following
> > + C code, hand-optimized a little bit.
> > +
> > +ptrdiff_t
> > +_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
> > +{
> > + struct tlsdesc_dynamic_arg *td = tdp->arg;
> > + dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > + if (__builtin_expect (td->gen_count <= dtv[0].counter
> > + && (dtv[td->tlsinfo.ti_module].pointer.val
> > + != TLS_DTV_UNALLOCATED),
> > + 1))
> > + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > + - __thread_pointer;
> > +
> > + return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
> > +}
> > +*/
> > + cfi_startproc
> > + .align 16
> > +_dl_tlsdesc_dynamic:
> > + _CET_ENDBR
> > + /* Preserve call-clobbered registers that we modify.
> > + We need two scratch regs anyway. */
> > + movq %rsi, -16(%rsp)
> > + mov %fs:DTV_OFFSET, %RSI_LP
> > + movq %rdi, -8(%rsp)
> > + movq TLSDESC_ARG(%rax), %rdi
> > + movq (%rsi), %rax
> > + cmpq %rax, TLSDESC_GEN_COUNT(%rdi)
> > + ja 2f
> > + movq TLSDESC_MODID(%rdi), %rax
> > + salq $4, %rax
> > + movq (%rax,%rsi), %rax
> > + cmpq $-1, %rax
> > + je 2f
> > + addq TLSDESC_MODOFF(%rdi), %rax
> > +1:
> > + movq -16(%rsp), %rsi
> > + sub %fs:0, %RAX_LP
> > + movq -8(%rsp), %rdi
> > + ret
> > +2:
> > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > + movq %rbx, -24(%rsp)
> > + mov %RSP_LP, %RBX_LP
> > + cfi_def_cfa_register(%rbx)
> > + and $-STATE_SAVE_ALIGNMENT, %RSP_LP
> > +#endif
> > +#ifdef REGISTER_SAVE_AREA
> > +# if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > + /* STATE_SAVE_OFFSET has space for 8 integer registers. But we
> > + need space for RCX, RDX, RSI, RDI, R8, R9, R10 and R11, plus
> > + RBX above. */
> > + sub $(REGISTER_SAVE_AREA + STATE_SAVE_ALIGNMENT), %RSP_LP
> > +# else
> > + sub $REGISTER_SAVE_AREA, %RSP_LP
> > + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
> > +# endif
> > +#else
> > + /* Allocate stack space of the required size to save the state. */
> > + sub _rtld_local_ro+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET(%rip), %RSP_LP
> > +#endif
> > + /* Besides rdi and rsi, saved above, save rcx, rdx, r8, r9,
> > + r10 and r11. */
> > + movq %rcx, REGISTER_SAVE_RCX(%rsp)
> > + movq %rdx, REGISTER_SAVE_RDX(%rsp)
> > + movq %r8, REGISTER_SAVE_R8(%rsp)
> > + movq %r9, REGISTER_SAVE_R9(%rsp)
> > + movq %r10, REGISTER_SAVE_R10(%rsp)
> > + movq %r11, REGISTER_SAVE_R11(%rsp)
> > +#ifdef USE_FXSAVE
> > + fxsave STATE_SAVE_OFFSET(%rsp)
> > +#else
> > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > + xorl %edx, %edx
> > + /* Clear the XSAVE Header. */
> > +# ifdef USE_XSAVE
> > + movq %rdx, (STATE_SAVE_OFFSET + 512)(%rsp)
> > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8)(%rsp)
> > +# endif
> > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 2)(%rsp)
> > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 3)(%rsp)
> > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 4)(%rsp)
> > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 5)(%rsp)
> > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 6)(%rsp)
> > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 7)(%rsp)
> > +# ifdef USE_XSAVE
> > + xsave STATE_SAVE_OFFSET(%rsp)
> > +# else
> > + xsavec STATE_SAVE_OFFSET(%rsp)
> > +# endif
> > +#endif
> > + /* %rdi already points to the tlsinfo data structure. */
> > + call HIDDEN_JUMPTARGET (__tls_get_addr)
> > + # Get register content back.
> > +#ifdef USE_FXSAVE
> > + fxrstor STATE_SAVE_OFFSET(%rsp)
> > +#else
> > + /* Save and retore __tls_get_addr return value stored in RAX. */
> > + mov %RAX_LP, %RCX_LP
> > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > + xorl %edx, %edx
> > + xrstor STATE_SAVE_OFFSET(%rsp)
> > + mov %RCX_LP, %RAX_LP
> > +#endif
> > + movq REGISTER_SAVE_R11(%rsp), %r11
> > + movq REGISTER_SAVE_R10(%rsp), %r10
> > + movq REGISTER_SAVE_R9(%rsp), %r9
> > + movq REGISTER_SAVE_R8(%rsp), %r8
> > + movq REGISTER_SAVE_RDX(%rsp), %rdx
> > + movq REGISTER_SAVE_RCX(%rsp), %rcx
> > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > + mov %RBX_LP, %RSP_LP
> > + cfi_def_cfa_register(%rsp)
> > + movq -24(%rsp), %rbx
> > + cfi_restore(%rbx)
> > +#else
> > + add $REGISTER_SAVE_AREA, %RSP_LP
> > + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA)
> > +#endif
> > + jmp 1b
> > + cfi_endproc
> > + .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > +
> > +#undef STATE_SAVE_ALIGNMENT
> > diff --git a/sysdeps/x86_64/dl-tlsdesc.S b/sysdeps/x86_64/dl-tlsdesc.S
> > index f748af2ece..ea69f5223a 100644
> > --- a/sysdeps/x86_64/dl-tlsdesc.S
> > +++ b/sysdeps/x86_64/dl-tlsdesc.S
> > @@ -18,7 +18,19 @@
> >
> > #include <sysdep.h>
> > #include <tls.h>
> > +#include <cpu-features-offsets.h>
> > +#include <features-offsets.h>
> > #include "tlsdesc.h"
> > +#include "dl-trampoline-save.h"
> > +
> > +/* Area on stack to save and restore registers used for parameter
> > + passing when calling _dl_tlsdesc_dynamic. */
> > +#define REGISTER_SAVE_RCX 0
> > +#define REGISTER_SAVE_RDX (REGISTER_SAVE_RCX + 8)
> > +#define REGISTER_SAVE_R8 (REGISTER_SAVE_RDX + 8)
> > +#define REGISTER_SAVE_R9 (REGISTER_SAVE_R8 + 8)
> > +#define REGISTER_SAVE_R10 (REGISTER_SAVE_R9 + 8)
> > +#define REGISTER_SAVE_R11 (REGISTER_SAVE_R10 + 8)
> >
> > .text
> >
> > @@ -67,80 +79,24 @@ _dl_tlsdesc_undefweak:
> > .size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
> >
> > #ifdef SHARED
> > - .hidden _dl_tlsdesc_dynamic
> > - .global _dl_tlsdesc_dynamic
> > - .type _dl_tlsdesc_dynamic,@function
> > -
> > - /* %rax points to the TLS descriptor, such that 0(%rax) points to
> > - _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
> > - tlsdesc_dynamic_arg object. It must return in %rax the offset
> > - between the thread pointer and the object denoted by the
> > - argument, without clobbering any registers.
> > -
> > - The assembly code that follows is a rendition of the following
> > - C code, hand-optimized a little bit.
> > -
> > -ptrdiff_t
> > -_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
> > -{
> > - struct tlsdesc_dynamic_arg *td = tdp->arg;
> > - dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > - if (__builtin_expect (td->gen_count <= dtv[0].counter
> > - && (dtv[td->tlsinfo.ti_module].pointer.val
> > - != TLS_DTV_UNALLOCATED),
> > - 1))
> > - return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > - - __thread_pointer;
> > -
> > - return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
> > -}
> > -*/
> > - cfi_startproc
> > - .align 16
> > -_dl_tlsdesc_dynamic:
> > - _CET_ENDBR
> > - /* Preserve call-clobbered registers that we modify.
> > - We need two scratch regs anyway. */
> > - movq %rsi, -16(%rsp)
> > - mov %fs:DTV_OFFSET, %RSI_LP
> > - movq %rdi, -8(%rsp)
> > - movq TLSDESC_ARG(%rax), %rdi
> > - movq (%rsi), %rax
> > - cmpq %rax, TLSDESC_GEN_COUNT(%rdi)
> > - ja .Lslow
> > - movq TLSDESC_MODID(%rdi), %rax
> > - salq $4, %rax
> > - movq (%rax,%rsi), %rax
> > - cmpq $-1, %rax
> > - je .Lslow
> > - addq TLSDESC_MODOFF(%rdi), %rax
> > -.Lret:
> > - movq -16(%rsp), %rsi
> > - sub %fs:0, %RAX_LP
> > - movq -8(%rsp), %rdi
> > - ret
> > -.Lslow:
> > - /* Besides rdi and rsi, saved above, save rdx, rcx, r8, r9,
> > - r10 and r11. Also, align the stack, that's off by 8 bytes. */
> > - subq $72, %rsp
> > - cfi_adjust_cfa_offset (72)
> > - movq %rdx, 8(%rsp)
> > - movq %rcx, 16(%rsp)
> > - movq %r8, 24(%rsp)
> > - movq %r9, 32(%rsp)
> > - movq %r10, 40(%rsp)
> > - movq %r11, 48(%rsp)
> > - /* %rdi already points to the tlsinfo data structure. */
> > - call HIDDEN_JUMPTARGET (__tls_get_addr)
> > - movq 8(%rsp), %rdx
> > - movq 16(%rsp), %rcx
> > - movq 24(%rsp), %r8
> > - movq 32(%rsp), %r9
> > - movq 40(%rsp), %r10
> > - movq 48(%rsp), %r11
> > - addq $72, %rsp
> > - cfi_adjust_cfa_offset (-72)
> > - jmp .Lret
> > - cfi_endproc
> > - .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > +# define USE_FXSAVE
> > +# define STATE_SAVE_ALIGNMENT 16
> > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave
> > +# include "dl-tlsdesc-dynamic.h"
> > +# undef _dl_tlsdesc_dynamic
> > +# undef USE_FXSAVE
> > +
> > +# define USE_XSAVE
> > +# define STATE_SAVE_ALIGNMENT 64
> > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave
> > +# include "dl-tlsdesc-dynamic.h"
> > +# undef _dl_tlsdesc_dynamic
> > +# undef USE_XSAVE
> > +
> > +# define USE_XSAVEC
> > +# define STATE_SAVE_ALIGNMENT 64
> > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec
> > +# include "dl-tlsdesc-dynamic.h"
> > +# undef _dl_tlsdesc_dynamic
> > +# undef USE_XSAVEC
> > #endif /* SHARED */
> > diff --git a/sysdeps/x86_64/dl-trampoline-save.h b/sysdeps/x86_64/dl-trampoline-save.h
> > new file mode 100644
> > index 0000000000..84eac4a8ac
> > --- /dev/null
> > +++ b/sysdeps/x86_64/dl-trampoline-save.h
> > @@ -0,0 +1,34 @@
> > +/* x86-64 PLT trampoline register save macros.
> > + Copyright (C) 2024 Free Software Foundation, Inc.
> > + This file is part of the GNU C Library.
> > +
> > + The GNU C Library is free software; you can redistribute it and/or
> > + modify it under the terms of the GNU Lesser General Public
> > + License as published by the Free Software Foundation; either
> > + version 2.1 of the License, or (at your option) any later version.
> > +
> > + The GNU C Library is distributed in the hope that it will be useful,
> > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + Lesser General Public License for more details.
> > +
> > + You should have received a copy of the GNU Lesser General Public
> > + License along with the GNU C Library; if not, see
> > + <https://www.gnu.org/licenses/>. */
> > +
> > +#ifndef DL_STACK_ALIGNMENT
> > +/* Due to GCC bug:
> > +
> > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> > +
> > + __tls_get_addr may be called with 8-byte stack alignment. Although
> > + this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> > + that stack will be always aligned at 16 bytes. */
> > +# define DL_STACK_ALIGNMENT 8
> > +#endif
> > +
> > +/* True if _dl_runtime_resolve should align stack for STATE_SAVE or align
> > + stack to 16 bytes before calling _dl_fixup. */
> > +#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> > + (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> > + || 16 > DL_STACK_ALIGNMENT)
> > diff --git a/sysdeps/x86_64/dl-trampoline-state.h b/sysdeps/x86_64/dl-trampoline-state.h
> > new file mode 100644
> > index 0000000000..575f120797
> > --- /dev/null
> > +++ b/sysdeps/x86_64/dl-trampoline-state.h
> > @@ -0,0 +1,51 @@
> > +/* x86-64 PLT dl-trampoline state macros.
> > + Copyright (C) 2024 Free Software Foundation, Inc.
> > + This file is part of the GNU C Library.
> > +
> > + The GNU C Library is free software; you can redistribute it and/or
> > + modify it under the terms of the GNU Lesser General Public
> > + License as published by the Free Software Foundation; either
> > + version 2.1 of the License, or (at your option) any later version.
> > +
> > + The GNU C Library is distributed in the hope that it will be useful,
> > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + Lesser General Public License for more details.
> > +
> > + You should have received a copy of the GNU Lesser General Public
> > + License along with the GNU C Library; if not, see
> > + <https://www.gnu.org/licenses/>. */
> > +
> > +#if (STATE_SAVE_ALIGNMENT % 16) != 0
> > +# error STATE_SAVE_ALIGNMENT must be multiple of 16
> > +#endif
> > +
> > +#if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) != 0
> > +# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT
> > +#endif
> > +
> > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > +/* Local stack area before jumping to function address: RBX. */
> > +# define LOCAL_STORAGE_AREA 8
> > +# define BASE rbx
> > +# ifdef USE_FXSAVE
> > +/* Use fxsave to save XMM registers. */
> > +# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET)
> > +# if (REGISTER_SAVE_AREA % 16) != 0
> > +# error REGISTER_SAVE_AREA must be multiple of 16
> > +# endif
> > +# endif
> > +#else
> > +# ifndef USE_FXSAVE
> > +# error USE_FXSAVE must be defined
> > +# endif
> > +/* Use fxsave to save XMM registers. */
> > +# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8)
> > +/* Local stack area before jumping to function address: All saved
> > + registers. */
> > +# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA
> > +# define BASE rsp
> > +# if (REGISTER_SAVE_AREA % 16) != 8
> > +# error REGISTER_SAVE_AREA must be odd multiple of 8
> > +# endif
> > +#endif
> > diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S
> > index b2e7e0f69b..87c5137837 100644
> > --- a/sysdeps/x86_64/dl-trampoline.S
> > +++ b/sysdeps/x86_64/dl-trampoline.S
> > @@ -22,25 +22,7 @@
> > #include <features-offsets.h>
> > #include <link-defines.h>
> > #include <isa-level.h>
> > -
> > -#ifndef DL_STACK_ALIGNMENT
> > -/* Due to GCC bug:
> > -
> > - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> > -
> > - __tls_get_addr may be called with 8-byte stack alignment. Although
> > - this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> > - that stack will be always aligned at 16 bytes. We use unaligned
> > - 16-byte move to load and store SSE registers, which has no penalty
> > - on modern processors if stack is 16-byte aligned. */
> > -# define DL_STACK_ALIGNMENT 8
> > -#endif
> > -
> > -/* True if _dl_runtime_resolve should align stack for STATE_SAVE or align
> > - stack to 16 bytes before calling _dl_fixup. */
> > -#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> > - (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> > - || 16 > DL_STACK_ALIGNMENT)
> > +#include "dl-trampoline-save.h"
> >
> > /* Area on stack to save and restore registers used for parameter
> > passing when calling _dl_fixup. */
> > diff --git a/sysdeps/x86_64/dl-trampoline.h b/sysdeps/x86_64/dl-trampoline.h
> > index f55c6ea040..d9ccfb40d4 100644
> > --- a/sysdeps/x86_64/dl-trampoline.h
> > +++ b/sysdeps/x86_64/dl-trampoline.h
> > @@ -27,39 +27,7 @@
> > # undef LOCAL_STORAGE_AREA
> > # undef BASE
> >
> > -# if (STATE_SAVE_ALIGNMENT % 16) != 0
> > -# error STATE_SAVE_ALIGNMENT must be multiple of 16
> > -# endif
> > -
> > -# if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) != 0
> > -# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT
> > -# endif
> > -
> > -# if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > -/* Local stack area before jumping to function address: RBX. */
> > -# define LOCAL_STORAGE_AREA 8
> > -# define BASE rbx
> > -# ifdef USE_FXSAVE
> > -/* Use fxsave to save XMM registers. */
> > -# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET)
> > -# if (REGISTER_SAVE_AREA % 16) != 0
> > -# error REGISTER_SAVE_AREA must be multiple of 16
> > -# endif
> > -# endif
> > -# else
> > -# ifndef USE_FXSAVE
> > -# error USE_FXSAVE must be defined
> > -# endif
> > -/* Use fxsave to save XMM registers. */
> > -# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8)
> > -/* Local stack area before jumping to function address: All saved
> > - registers. */
> > -# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA
> > -# define BASE rsp
> > -# if (REGISTER_SAVE_AREA % 16) != 8
> > -# error REGISTER_SAVE_AREA must be odd multiple of 8
> > -# endif
> > -# endif
> > +# include "dl-trampoline-state.h"
> >
> > .globl _dl_runtime_resolve
> > .hidden _dl_runtime_resolve
> > --
> > 2.43.2
> >
>
> This still LGTM.
> Feel free to add my reviewed-by tag to future posts of the patch.
> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Please wait for community consensus before pushing
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v10 1/1] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers
2024-02-26 19:02 ` Noah Goldstein
@ 2024-02-26 19:06 ` H.J. Lu
2024-02-27 17:00 ` Noah Goldstein
0 siblings, 1 reply; 7+ messages in thread
From: H.J. Lu @ 2024-02-26 19:06 UTC (permalink / raw)
To: Noah Goldstein; +Cc: libc-alpha, adhemerval.zanella
On Mon, Feb 26, 2024 at 11:02 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Mon, Feb 26, 2024 at 1:02 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > On Mon, Feb 26, 2024 at 8:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > Compiler generates the following instruction sequence for GNU2 dynamic
> > > TLS access:
> > >
> > > leaq tls_var@TLSDESC(%rip), %rax
> > > call *tls_var@TLSCALL(%rax)
> > >
> > > or
> > >
> > > leal tls_var@TLSDESC(%ebx), %eax
> > > call *tls_var@TLSCALL(%eax)
> > >
> > > CALL instruction is transparent to compiler which assumes all registers,
> > > except for EFLAGS and RAX/EAX, are unchanged after CALL. When
> > > _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
> > > path. __tls_get_addr is a normal function which doesn't preserve any
> > > caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer
> > > caller-saved registers, but didn't preserve any other caller-saved
> > > registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
> > > XSAVE and XSAVEC to save and restore all caller-saved registers. This
> > > fixes BZ #31372.
> > >
> > > Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
> > > to optimize elf_machine_runtime_setup.
> > > ---
> > > elf/Makefile | 18 ++
> > > elf/tst-gnu2-tls2.c | 122 ++++++++++++
> > > elf/tst-gnu2-tls2.h | 36 ++++
> > > elf/tst-gnu2-tls2mod0.c | 31 +++
> > > elf/tst-gnu2-tls2mod1.c | 31 +++
> > > elf/tst-gnu2-tls2mod2.c | 31 +++
> > > sysdeps/i386/dl-machine.h | 2 +-
> > > sysdeps/i386/dl-tlsdesc-dynamic.h | 190 +++++++++++++++++++
> > > sysdeps/i386/dl-tlsdesc.S | 115 +++++------
> > > sysdeps/x86/Makefile | 7 +-
> > > sysdeps/x86/cpu-features.c | 56 +++++-
> > > sysdeps/x86/dl-procinfo.c | 16 ++
> > > sysdeps/{x86_64 => x86}/features-offsets.sym | 2 +
> > > sysdeps/x86/sysdep.h | 6 +
> > > sysdeps/x86/tst-gnu2-tls2.c | 20 ++
> > > sysdeps/x86_64/Makefile | 2 +-
> > > sysdeps/x86_64/dl-machine.h | 19 +-
> > > sysdeps/x86_64/dl-procinfo.c | 16 ++
> > > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 166 ++++++++++++++++
> > > sysdeps/x86_64/dl-tlsdesc.S | 108 ++++-------
> > > sysdeps/x86_64/dl-trampoline-save.h | 34 ++++
> > > sysdeps/x86_64/dl-trampoline-state.h | 51 +++++
> > > sysdeps/x86_64/dl-trampoline.S | 20 +-
> > > sysdeps/x86_64/dl-trampoline.h | 34 +---
> > > 24 files changed, 920 insertions(+), 213 deletions(-)
> > > create mode 100644 elf/tst-gnu2-tls2.c
> > > create mode 100644 elf/tst-gnu2-tls2.h
> > > create mode 100644 elf/tst-gnu2-tls2mod0.c
> > > create mode 100644 elf/tst-gnu2-tls2mod1.c
> > > create mode 100644 elf/tst-gnu2-tls2mod2.c
> > > create mode 100644 sysdeps/i386/dl-tlsdesc-dynamic.h
> > > rename sysdeps/{x86_64 => x86}/features-offsets.sym (89%)
> > > create mode 100644 sysdeps/x86/tst-gnu2-tls2.c
> > > create mode 100644 sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > create mode 100644 sysdeps/x86_64/dl-trampoline-save.h
> > > create mode 100644 sysdeps/x86_64/dl-trampoline-state.h
> > >
> > > diff --git a/elf/Makefile b/elf/Makefile
> > > index 36c04baf02..02dc476e27 100644
> > > --- a/elf/Makefile
> > > +++ b/elf/Makefile
> > > @@ -424,6 +424,7 @@ tests += \
> > > tst-glibc-hwcaps-prepend \
> > > tst-global1 \
> > > tst-global2 \
> > > + tst-gnu2-tls2 \
> > > tst-initfinilazyfail \
> > > tst-initorder \
> > > tst-initorder2 \
> > > @@ -846,6 +847,9 @@ modules-names += \
> > > tst-filterobj-flt \
> > > tst-finilazyfailmod \
> > > tst-globalmod2 \
> > > + tst-gnu2-tls2mod0 \
> > > + tst-gnu2-tls2mod1 \
> > > + tst-gnu2-tls2mod2 \
> > > tst-initlazyfailmod \
> > > tst-initorder2a \
> > > tst-initorder2b \
> > > @@ -3044,8 +3048,22 @@ $(objpfx)tst-tlsgap.out: \
> > > $(objpfx)tst-tlsgap-mod0.so \
> > > $(objpfx)tst-tlsgap-mod1.so \
> > > $(objpfx)tst-tlsgap-mod2.so
> > > +
> > > +$(objpfx)tst-gnu2-tls2: $(shared-thread-library)
> > > +$(objpfx)tst-gnu2-tls2.out: \
> > > + $(objpfx)tst-gnu2-tls2mod0.so \
> > > + $(objpfx)tst-gnu2-tls2mod1.so \
> > > + $(objpfx)tst-gnu2-tls2mod2.so
> > > +
> > > ifeq (yes,$(have-mtls-dialect-gnu2))
> > > +# This test fails if dl_tlsdesc_dynamic doesn't preserve all caller-saved
> > > +# registers. See https://sourceware.org/bugzilla/show_bug.cgi?id=31372
> > > +test-xfail-tst-gnu2-tls2 = yes
> > > +
> > > CFLAGS-tst-tlsgap-mod0.c += -mtls-dialect=gnu2
> > > CFLAGS-tst-tlsgap-mod1.c += -mtls-dialect=gnu2
> > > CFLAGS-tst-tlsgap-mod2.c += -mtls-dialect=gnu2
> > > +CFLAGS-tst-gnu2-tls2mod0.c += -mtls-dialect=gnu2
> > > +CFLAGS-tst-gnu2-tls2mod1.c += -mtls-dialect=gnu2
> > > +CFLAGS-tst-gnu2-tls2mod2.c += -mtls-dialect=gnu2
> > > endif
> > > diff --git a/elf/tst-gnu2-tls2.c b/elf/tst-gnu2-tls2.c
> > > new file mode 100644
> > > index 0000000000..7ac04d7f33
> > > --- /dev/null
> > > +++ b/elf/tst-gnu2-tls2.c
> > > @@ -0,0 +1,122 @@
> > > +/* Test TLSDESC relocation.
> > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > + This file is part of the GNU C Library.
> > > +
> > > + The GNU C Library is free software; you can redistribute it and/or
> > > + modify it under the terms of the GNU Lesser General Public
> > > + License as published by the Free Software Foundation; either
> > > + version 2.1 of the License, or (at your option) any later version.
> > > +
> > > + The GNU C Library is distributed in the hope that it will be useful,
> > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > + Lesser General Public License for more details.
> > > +
> > > + You should have received a copy of the GNU Lesser General Public
> > > + License along with the GNU C Library; if not, see
> > > + <http://www.gnu.org/licenses/>. */
> > > +
> > > +#include <stdio.h>
> > > +#include <stdlib.h>
> > > +#include <string.h>
> > > +#include <dlfcn.h>
> > > +#include <pthread.h>
> > > +#include <support/xdlfcn.h>
> > > +#include <support/xthread.h>
> > > +#include <support/check.h>
> > > +#include <support/test-driver.h>
> > > +#include "tst-gnu2-tls2.h"
> > > +
> > > +#ifndef IS_SUPPORTED
> > > +# define IS_SUPPORTED() true
> > > +#endif
> > > +
> > > +/* An architecture can define it to clobber caller-saved registers in
> > > + malloc below to verify that the implicit TLSDESC call won't change
> > > + caller-saved registers. */
> > > +#ifndef PREPARE_MALLOC
> > > +# define PREPARE_MALLOC()
> > > +#endif
> > > +
> > > +extern void * __libc_malloc (size_t);
> > > +
> > > +size_t malloc_counter = 0;
> > > +
> > > +void *
> > > +malloc (size_t n)
> > > +{
> > > + PREPARE_MALLOC ();
> > > + malloc_counter++;
> > > + return __libc_malloc (n);
> > > +}
> > > +
> > > +static void *mod[3];
> > > +#ifndef MOD
> > > +# define MOD(i) "tst-gnu2-tls2mod" #i ".so"
> > > +#endif
> > > +static const char *modname[3] = { MOD(0), MOD(1), MOD(2) };
> > > +#undef MOD
> > > +
> > > +static void
> > > +open_mod (int i)
> > > +{
> > > + mod[i] = xdlopen (modname[i], RTLD_LAZY);
> > > + printf ("open %s\n", modname[i]);
> > > +}
> > > +
> > > +static void
> > > +close_mod (int i)
> > > +{
> > > + xdlclose (mod[i]);
> > > + mod[i] = NULL;
> > > + printf ("close %s\n", modname[i]);
> > > +}
> > > +
> > > +static void
> > > +access_mod (int i, const char *sym)
> > > +{
> > > + struct tls var = { -1, -1, -1, -1 };
> > > + struct tls *(*f) (struct tls *) = xdlsym (mod[i], sym);
> > > + /* Check that our malloc is called. */
> > > + malloc_counter = 0;
> > > + struct tls *p = f (&var);
> > > + TEST_VERIFY (malloc_counter != 0);
> > > + printf ("access %s: %s() = %p\n", modname[i], sym, p);
> > > + TEST_VERIFY_EXIT (memcmp (p, &var, sizeof (var)) == 0);
> > > + ++(p->a);
> > > +}
> > > +
> > > +static void *
> > > +start (void *arg)
> > > +{
> > > + /* The DTV generation is at the last dlopen of mod0 and the
> > > + entry for mod1 is NULL. */
> > > +
> > > + open_mod (1); /* Reuse modid of mod1. Uses dynamic TLS. */
> > > +
> > > + /* Force the slow path in GNU2 TLS descriptor call. */
> > > + access_mod (1, "apply_tls");
> > > +
> > > + return arg;
> > > +}
> > > +
> > > +static int
> > > +do_test (void)
> > > +{
> > > + if (!IS_SUPPORTED ())
> > > + return EXIT_UNSUPPORTED;
> > > +
> > > + open_mod (0);
> > > + open_mod (1);
> > > + open_mod (2);
> > > + close_mod (0);
> > > + close_mod (1); /* Create modid gap at mod1. */
> > > + open_mod (0); /* Reuse modid of mod0, bump generation count. */
> > > +
> > > + /* Create a thread where DTV of mod1 is NULL. */
> > > + pthread_t t = xpthread_create (NULL, start, NULL);
> > > + xpthread_join (t);
> > > + return 0;
> > > +}
> > > +
> > > +#include <support/test-driver.c>
> > > diff --git a/elf/tst-gnu2-tls2.h b/elf/tst-gnu2-tls2.h
> > > new file mode 100644
> > > index 0000000000..77964a57a3
> > > --- /dev/null
> > > +++ b/elf/tst-gnu2-tls2.h
> > > @@ -0,0 +1,36 @@
> > > +/* Test TLSDESC relocation.
> > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > + This file is part of the GNU C Library.
> > > +
> > > + The GNU C Library is free software; you can redistribute it and/or
> > > + modify it under the terms of the GNU Lesser General Public
> > > + License as published by the Free Software Foundation; either
> > > + version 2.1 of the License, or (at your option) any later version.
> > > +
> > > + The GNU C Library is distributed in the hope that it will be useful,
> > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > + Lesser General Public License for more details.
> > > +
> > > + You should have received a copy of the GNU Lesser General Public
> > > + License along with the GNU C Library; if not, see
> > > + <https://www.gnu.org/licenses/>. */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +struct tls
> > > +{
> > > + int64_t a, b, c, d;
> > > +};
> > > +
> > > +extern struct tls *apply_tls (struct tls *);
> > > +
> > > +/* An architecture can define them to verify that clobber caller-saved
> > > + registers aren't changed by the implicit TLSDESC call. */
> > > +#ifndef BEFORE_TLSDESC_CALL
> > > +# define BEFORE_TLSDESC_CALL()
> > > +#endif
> > > +
> > > +#ifndef AFTER_TLSDESC_CALL
> > > +# define AFTER_TLSDESC_CALL()
> > > +#endif
> > > diff --git a/elf/tst-gnu2-tls2mod0.c b/elf/tst-gnu2-tls2mod0.c
> > > new file mode 100644
> > > index 0000000000..45556a0e17
> > > --- /dev/null
> > > +++ b/elf/tst-gnu2-tls2mod0.c
> > > @@ -0,0 +1,31 @@
> > > +/* DSO used by tst-gnu2-tls2.
> > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > + This file is part of the GNU C Library.
> > > +
> > > + The GNU C Library is free software; you can redistribute it and/or
> > > + modify it under the terms of the GNU Lesser General Public
> > > + License as published by the Free Software Foundation; either
> > > + version 2.1 of the License, or (at your option) any later version.
> > > +
> > > + The GNU C Library is distributed in the hope that it will be useful,
> > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > + Lesser General Public License for more details.
> > > +
> > > + You should have received a copy of the GNU Lesser General Public
> > > + License along with the GNU C Library; if not, see
> > > + <https://www.gnu.org/licenses/>. */
> > > +
> > > +#include "tst-gnu2-tls2.h"
> > > +
> > > +__thread struct tls tls_var0 __attribute__ ((visibility ("hidden")));
> > > +
> > > +struct tls *
> > > +apply_tls (struct tls *p)
> > > +{
> > > + BEFORE_TLSDESC_CALL ();
> > > + tls_var0 = *p;
> > > + struct tls *ret = &tls_var0;
> > > + AFTER_TLSDESC_CALL ();
> > > + return ret;
> > > +}
> > > diff --git a/elf/tst-gnu2-tls2mod1.c b/elf/tst-gnu2-tls2mod1.c
> > > new file mode 100644
> > > index 0000000000..e10b9dbc0a
> > > --- /dev/null
> > > +++ b/elf/tst-gnu2-tls2mod1.c
> > > @@ -0,0 +1,31 @@
> > > +/* DSO used by tst-gnu2-tls2.
> > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > + This file is part of the GNU C Library.
> > > +
> > > + The GNU C Library is free software; you can redistribute it and/or
> > > + modify it under the terms of the GNU Lesser General Public
> > > + License as published by the Free Software Foundation; either
> > > + version 2.1 of the License, or (at your option) any later version.
> > > +
> > > + The GNU C Library is distributed in the hope that it will be useful,
> > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > + Lesser General Public License for more details.
> > > +
> > > + You should have received a copy of the GNU Lesser General Public
> > > + License along with the GNU C Library; if not, see
> > > + <https://www.gnu.org/licenses/>. */
> > > +
> > > +#include "tst-gnu2-tls2.h"
> > > +
> > > +__thread struct tls tls_var1[100] __attribute__ ((visibility ("hidden")));
> > > +
> > > +struct tls *
> > > +apply_tls (struct tls *p)
> > > +{
> > > + BEFORE_TLSDESC_CALL ();
> > > + tls_var1[1] = *p;
> > > + struct tls *ret = &tls_var1[1];
> > > + AFTER_TLSDESC_CALL ();
> > > + return ret;
> > > +}
> > > diff --git a/elf/tst-gnu2-tls2mod2.c b/elf/tst-gnu2-tls2mod2.c
> > > new file mode 100644
> > > index 0000000000..141af51e55
> > > --- /dev/null
> > > +++ b/elf/tst-gnu2-tls2mod2.c
> > > @@ -0,0 +1,31 @@
> > > +/* DSO used by tst-gnu2-tls2.
> > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > + This file is part of the GNU C Library.
> > > +
> > > + The GNU C Library is free software; you can redistribute it and/or
> > > + modify it under the terms of the GNU Lesser General Public
> > > + License as published by the Free Software Foundation; either
> > > + version 2.1 of the License, or (at your option) any later version.
> > > +
> > > + The GNU C Library is distributed in the hope that it will be useful,
> > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > + Lesser General Public License for more details.
> > > +
> > > + You should have received a copy of the GNU Lesser General Public
> > > + License along with the GNU C Library; if not, see
> > > + <https://www.gnu.org/licenses/>. */
> > > +
> > > +#include "tst-gnu2-tls2.h"
> > > +
> > > +__thread struct tls tls_var2 __attribute__ ((visibility ("hidden")));
> > > +
> > > +struct tls *
> > > +apply_tls (struct tls *p)
> > > +{
> > > + BEFORE_TLSDESC_CALL ();
> > > + tls_var2 = *p;
> > > + struct tls *ret = &tls_var2;
> > > + AFTER_TLSDESC_CALL ();
> > > + return ret;
> > > +}
> > > diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
> > > index fc1ef96587..50d74fe6e9 100644
> > > --- a/sysdeps/i386/dl-machine.h
> > > +++ b/sysdeps/i386/dl-machine.h
> > > @@ -347,7 +347,7 @@ and creates an unsatisfiable circular dependency.\n",
> > > {
> > > td->arg = _dl_make_tlsdesc_dynamic
> > > (sym_map, sym->st_value + (ElfW(Word))td->arg);
> > > - td->entry = _dl_tlsdesc_dynamic;
> > > + td->entry = GLRO(dl_x86_tlsdesc_dynamic);
> > > }
> > > else
> > > # endif
> > > diff --git a/sysdeps/i386/dl-tlsdesc-dynamic.h b/sysdeps/i386/dl-tlsdesc-dynamic.h
> > > new file mode 100644
> > > index 0000000000..3627028577
> > > --- /dev/null
> > > +++ b/sysdeps/i386/dl-tlsdesc-dynamic.h
> > > @@ -0,0 +1,190 @@
> > > +/* Thread-local storage handling in the ELF dynamic linker. i386 version.
> > > + Copyright (C) 2004-2024 Free Software Foundation, Inc.
> > > + This file is part of the GNU C Library.
> > > +
> > > + The GNU C Library is free software; you can redistribute it and/or
> > > + modify it under the terms of the GNU Lesser General Public
> > > + License as published by the Free Software Foundation; either
> > > + version 2.1 of the License, or (at your option) any later version.
> > > +
> > > + The GNU C Library is distributed in the hope that it will be useful,
> > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > + Lesser General Public License for more details.
> > > +
> > > + You should have received a copy of the GNU Lesser General Public
> > > + License along with the GNU C Library; if not, see
> > > + <https://www.gnu.org/licenses/>. */
> > > +
> > > +#undef REGISTER_SAVE_AREA
> > > +
> > > +#if !defined USE_FNSAVE && (STATE_SAVE_ALIGNMENT % 16) != 0
> > > +# error STATE_SAVE_ALIGNMENT must be multiple of 16
> > > +#endif
> > > +
> > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > +# ifdef USE_FNSAVE
> > > +# error USE_FNSAVE shouldn't be defined
> > > +# endif
> > > +# ifdef USE_FXSAVE
> > > +/* Use fxsave to save all registers. */
> > > +# define REGISTER_SAVE_AREA 512
> > > +# endif
> > > +#else
> > > +# ifdef USE_FNSAVE
> > > +/* Use fnsave to save x87 FPU stack registers. */
> > > +# define REGISTER_SAVE_AREA 108
> > > +# else
> > > +# ifndef USE_FXSAVE
> > > +# error USE_FXSAVE must be defined
> > > +# endif
> > > +/* Use fxsave to save all registers. Add 12 bytes to align the stack
> > > + to 16 bytes. */
> > > +# define REGISTER_SAVE_AREA (512 + 12)
> > > +# endif
> > > +#endif
> > > +
> > > + .hidden _dl_tlsdesc_dynamic
> > > + .global _dl_tlsdesc_dynamic
> > > + .type _dl_tlsdesc_dynamic,@function
> > > +
> > > + /* This function is used for symbols that need dynamic TLS.
> > > +
> > > + %eax points to the TLS descriptor, such that 0(%eax) points to
> > > + _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
> > > + tlsdesc_dynamic_arg object. It must return in %eax the offset
> > > + between the thread pointer and the object denoted by the
> > > + argument, without clobbering any registers.
> > > +
> > > + The assembly code that follows is a rendition of the following
> > > + C code, hand-optimized a little bit.
> > > +
> > > +ptrdiff_t
> > > +__attribute__ ((__regparm__ (1)))
> > > +_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
> > > +{
> > > + struct tlsdesc_dynamic_arg *td = tdp->arg;
> > > + dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > > + if (__builtin_expect (td->gen_count <= dtv[0].counter
> > > + && (dtv[td->tlsinfo.ti_module].pointer.val
> > > + != TLS_DTV_UNALLOCATED),
> > > + 1))
> > > + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > > + - __thread_pointer;
> > > +
> > > + return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
> > > +}
> > > +*/
> > > + cfi_startproc
> > > + .align 16
> > > +_dl_tlsdesc_dynamic:
> > > + /* Like all TLS resolvers, preserve call-clobbered registers.
> > > + We need two scratch regs anyway. */
> > > + subl $32, %esp
> > > + cfi_adjust_cfa_offset (32)
> > > + movl %ecx, 20(%esp)
> > > + movl %edx, 24(%esp)
> > > + movl TLSDESC_ARG(%eax), %eax
> > > + movl %gs:DTV_OFFSET, %edx
> > > + movl TLSDESC_GEN_COUNT(%eax), %ecx
> > > + cmpl (%edx), %ecx
> > > + ja 2f
> > > + movl TLSDESC_MODID(%eax), %ecx
> > > + movl (%edx,%ecx,8), %edx
> > > + cmpl $-1, %edx
> > > + je 2f
> > > + movl TLSDESC_MODOFF(%eax), %eax
> > > + addl %edx, %eax
> > > +1:
> > > + movl 20(%esp), %ecx
> > > + subl %gs:0, %eax
> > > + movl 24(%esp), %edx
> > > + addl $32, %esp
> > > + cfi_adjust_cfa_offset (-32)
> > > + ret
> > > + .p2align 4,,7
> > > +2:
> > > + cfi_adjust_cfa_offset (32)
> > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > + movl %ebx, -28(%esp)
> > > + movl %esp, %ebx
> > > + cfi_def_cfa_register(%ebx)
> > > + and $-STATE_SAVE_ALIGNMENT, %esp
> > > +#endif
> > > +#ifdef REGISTER_SAVE_AREA
> > > + subl $REGISTER_SAVE_AREA, %esp
> > > +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
> > > +# endif
> > > +#else
> > > +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > +# error DL_RUNTIME_RESOLVE_REALIGN_STACK must be true
> > > +# endif
> > > + /* Allocate stack space of the required size to save the state. */
> > > + LOAD_PIC_REG (cx)
> > > + subl RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET+_rtld_local_ro@GOTOFF(%ecx), %esp
> > > +#endif
> > > +#ifdef USE_FNSAVE
> > > + fnsave (%esp)
> > > +#elif defined USE_FXSAVE
> > > + fxsave (%esp)
> > > +#else
> > > + /* Save the argument for ___tls_get_addr in EAX. */
> > > + movl %eax, %ecx
> > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > + xorl %edx, %edx
> > > + /* Clear the XSAVE Header. */
> > > +# ifdef USE_XSAVE
> > > + movl %edx, (512)(%esp)
> > > + movl %edx, (512 + 4 * 1)(%esp)
> > > + movl %edx, (512 + 4 * 2)(%esp)
> > > + movl %edx, (512 + 4 * 3)(%esp)
> > > +# endif
> > > + movl %edx, (512 + 4 * 4)(%esp)
> > > + movl %edx, (512 + 4 * 5)(%esp)
> > > + movl %edx, (512 + 4 * 6)(%esp)
> > > + movl %edx, (512 + 4 * 7)(%esp)
> > > + movl %edx, (512 + 4 * 8)(%esp)
> > > + movl %edx, (512 + 4 * 9)(%esp)
> > > + movl %edx, (512 + 4 * 10)(%esp)
> > > + movl %edx, (512 + 4 * 11)(%esp)
> > > + movl %edx, (512 + 4 * 12)(%esp)
> > > + movl %edx, (512 + 4 * 13)(%esp)
> > > + movl %edx, (512 + 4 * 14)(%esp)
> > > + movl %edx, (512 + 4 * 15)(%esp)
> > > +# ifdef USE_XSAVE
> > > + xsave (%esp)
> > > +# else
> > > + xsavec (%esp)
> > > +# endif
> > > + /* Restore the argument for ___tls_get_addr in EAX. */
> > > + movl %ecx, %eax
> > > +#endif
> > > + call HIDDEN_JUMPTARGET (___tls_get_addr)
> > > + /* Get register content back. */
> > > +#ifdef USE_FNSAVE
> > > + frstor (%esp)
> > > +#elif defined USE_FXSAVE
> > > + fxrstor (%esp)
> > > +#else
> > > + /* Save and retore ___tls_get_addr return value stored in EAX. */
> > > + movl %eax, %ecx
> > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > + xorl %edx, %edx
> > > + xrstor (%esp)
> > > + movl %ecx, %eax
> > > +#endif
> > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > + mov %ebx, %esp
> > > + cfi_def_cfa_register(%esp)
> > > + movl -28(%esp), %ebx
> > > + cfi_restore(%ebx)
> > > +#else
> > > + addl $REGISTER_SAVE_AREA, %esp
> > > + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA)
> > > +#endif
> > > + jmp 1b
> > > + cfi_endproc
> > > + .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > > +
> > > +#undef STATE_SAVE_ALIGNMENT
> > > diff --git a/sysdeps/i386/dl-tlsdesc.S b/sysdeps/i386/dl-tlsdesc.S
> > > index 90d93caa0c..f002feee56 100644
> > > --- a/sysdeps/i386/dl-tlsdesc.S
> > > +++ b/sysdeps/i386/dl-tlsdesc.S
> > > @@ -18,8 +18,27 @@
> > >
> > > #include <sysdep.h>
> > > #include <tls.h>
> > > +#include <cpu-features-offsets.h>
> > > +#include <features-offsets.h>
> > > #include "tlsdesc.h"
> > >
> > > +#ifndef DL_STACK_ALIGNMENT
> > > +/* Due to GCC bug:
> > > +
> > > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> > > +
> > > + __tls_get_addr may be called with 4-byte stack alignment. Although
> > > + this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> > > + that stack will be always aligned at 16 bytes. */
> > > +# define DL_STACK_ALIGNMENT 4
> > > +#endif
> > > +
> > > +/* True if _dl_tlsdesc_dynamic should align stack for STATE_SAVE or align
> > > + stack to MINIMUM_ALIGNMENT bytes before calling ___tls_get_addr. */
> > > +#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> > > + (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> > > + || MINIMUM_ALIGNMENT > DL_STACK_ALIGNMENT)
> > > +
> > > .text
> > >
> > > /* This function is used to compute the TP offset for symbols in
> > > @@ -65,69 +84,35 @@ _dl_tlsdesc_undefweak:
> > > .size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
> > >
> > > #ifdef SHARED
> > > - .hidden _dl_tlsdesc_dynamic
> > > - .global _dl_tlsdesc_dynamic
> > > - .type _dl_tlsdesc_dynamic,@function
> > > -
> > > - /* This function is used for symbols that need dynamic TLS.
> > > -
> > > - %eax points to the TLS descriptor, such that 0(%eax) points to
> > > - _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
> > > - tlsdesc_dynamic_arg object. It must return in %eax the offset
> > > - between the thread pointer and the object denoted by the
> > > - argument, without clobbering any registers.
> > > -
> > > - The assembly code that follows is a rendition of the following
> > > - C code, hand-optimized a little bit.
> > > -
> > > -ptrdiff_t
> > > -__attribute__ ((__regparm__ (1)))
> > > -_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
> > > -{
> > > - struct tlsdesc_dynamic_arg *td = tdp->arg;
> > > - dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > > - if (__builtin_expect (td->gen_count <= dtv[0].counter
> > > - && (dtv[td->tlsinfo.ti_module].pointer.val
> > > - != TLS_DTV_UNALLOCATED),
> > > - 1))
> > > - return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > > - - __thread_pointer;
> > > -
> > > - return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
> > > -}
> > > -*/
> > > - cfi_startproc
> > > - .align 16
> > > -_dl_tlsdesc_dynamic:
> > > - /* Like all TLS resolvers, preserve call-clobbered registers.
> > > - We need two scratch regs anyway. */
> > > - subl $28, %esp
> > > - cfi_adjust_cfa_offset (28)
> > > - movl %ecx, 20(%esp)
> > > - movl %edx, 24(%esp)
> > > - movl TLSDESC_ARG(%eax), %eax
> > > - movl %gs:DTV_OFFSET, %edx
> > > - movl TLSDESC_GEN_COUNT(%eax), %ecx
> > > - cmpl (%edx), %ecx
> > > - ja .Lslow
> > > - movl TLSDESC_MODID(%eax), %ecx
> > > - movl (%edx,%ecx,8), %edx
> > > - cmpl $-1, %edx
> > > - je .Lslow
> > > - movl TLSDESC_MODOFF(%eax), %eax
> > > - addl %edx, %eax
> > > -.Lret:
> > > - movl 20(%esp), %ecx
> > > - subl %gs:0, %eax
> > > - movl 24(%esp), %edx
> > > - addl $28, %esp
> > > - cfi_adjust_cfa_offset (-28)
> > > - ret
> > > - .p2align 4,,7
> > > -.Lslow:
> > > - cfi_adjust_cfa_offset (28)
> > > - call HIDDEN_JUMPTARGET (___tls_get_addr)
> > > - jmp .Lret
> > > - cfi_endproc
> > > - .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > > +# define USE_FNSAVE
> > > +# define MINIMUM_ALIGNMENT 4
> > > +# define STATE_SAVE_ALIGNMENT 4
> > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fnsave
> > > +# include "dl-tlsdesc-dynamic.h"
> > > +# undef _dl_tlsdesc_dynamic
> > > +# undef MINIMUM_ALIGNMENT
> > > +# undef USE_FNSAVE
> > > +
> > > +# define MINIMUM_ALIGNMENT 16
> > > +
> > > +# define USE_FXSAVE
> > > +# define STATE_SAVE_ALIGNMENT 16
> > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave
> > > +# include "dl-tlsdesc-dynamic.h"
> > > +# undef _dl_tlsdesc_dynamic
> > > +# undef USE_FXSAVE
> > > +
> > > +# define USE_XSAVE
> > > +# define STATE_SAVE_ALIGNMENT 64
> > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave
> > > +# include "dl-tlsdesc-dynamic.h"
> > > +# undef _dl_tlsdesc_dynamic
> > > +# undef USE_XSAVE
> > > +
> > > +# define USE_XSAVEC
> > > +# define STATE_SAVE_ALIGNMENT 64
> > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec
> > > +# include "dl-tlsdesc-dynamic.h"
> > > +# undef _dl_tlsdesc_dynamic
> > > +# undef USE_XSAVEC
> > > #endif /* SHARED */
> > > diff --git a/sysdeps/x86/Makefile b/sysdeps/x86/Makefile
> > > index 73b29cc78c..5311b594af 100644
> > > --- a/sysdeps/x86/Makefile
> > > +++ b/sysdeps/x86/Makefile
> > > @@ -1,5 +1,5 @@
> > > ifeq ($(subdir),csu)
> > > -gen-as-const-headers += cpu-features-offsets.sym
> > > +gen-as-const-headers += cpu-features-offsets.sym features-offsets.sym
> > > endif
> > >
> > > ifeq ($(subdir),elf)
> > > @@ -86,6 +86,11 @@ endif
> > > tst-ifunc-isa-2-ENV = GLIBC_TUNABLES=glibc.cpu.hwcaps=-SSE4_2,-AVX,-AVX2,-AVX512F
> > > tst-ifunc-isa-2-static-ENV = $(tst-ifunc-isa-2-ENV)
> > > tst-hwcap-tunables-ARGS = -- $(host-test-program-cmd)
> > > +
> > > +CFLAGS-tst-gnu2-tls2.c += -msse
> > > +CFLAGS-tst-gnu2-tls2mod0.c += -msse2 -mtune=haswell
> > > +CFLAGS-tst-gnu2-tls2mod1.c += -msse2 -mtune=haswell
> > > +CFLAGS-tst-gnu2-tls2mod2.c += -msse2 -mtune=haswell
> > > endif
> > >
> > > ifeq ($(subdir),math)
> > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> > > index 25e6622a79..835113b42f 100644
> > > --- a/sysdeps/x86/cpu-features.c
> > > +++ b/sysdeps/x86/cpu-features.c
> > > @@ -27,8 +27,13 @@
> > > extern void TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *)
> > > attribute_hidden;
> > >
> > > -#if defined SHARED && defined __x86_64__
> > > -# include <dl-plt-rewrite.h>
> > > +#if defined SHARED
> > > +extern void _dl_tlsdesc_dynamic_fxsave (void) attribute_hidden;
> > > +extern void _dl_tlsdesc_dynamic_xsave (void) attribute_hidden;
> > > +extern void _dl_tlsdesc_dynamic_xsavec (void) attribute_hidden;
> > > +
> > > +# ifdef __x86_64__
> > > +# include <dl-plt-rewrite.h>
> > >
> > > static void
> > > TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
> > > @@ -47,6 +52,15 @@ TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
> > > : plt_rewrite_jmp);
> > > }
> > > }
> > > +# else
> > > +extern void _dl_tlsdesc_dynamic_fnsave (void) attribute_hidden;
> > > +# endif
> > > +#endif
> > > +
> > > +#ifdef __x86_64__
> > > +extern void _dl_runtime_resolve_fxsave (void) attribute_hidden;
> > > +extern void _dl_runtime_resolve_xsave (void) attribute_hidden;
> > > +extern void _dl_runtime_resolve_xsavec (void) attribute_hidden;
> > > #endif
> > >
> > > #ifdef __LP64__
> > > @@ -1130,6 +1144,44 @@ no_cpuid:
> > > TUNABLE_CALLBACK (set_x86_shstk));
> > > #endif
> > >
> > > + if (GLRO(dl_x86_cpu_features).xsave_state_size != 0)
> > > + {
> > > + if (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC))
> > > + {
> > > +#ifdef __x86_64__
> > > + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_xsavec;
> > > +#endif
> > > +#ifdef SHARED
> > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_xsavec;
> > > +#endif
> > > + }
> > > + else
> > > + {
> > > +#ifdef __x86_64__
> > > + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_xsave;
> > > +#endif
> > > +#ifdef SHARED
> > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_xsave;
> > > +#endif
> > > + }
> > > + }
> > > + else
> > > + {
> > > +#ifdef __x86_64__
> > > + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_fxsave;
> > > +# ifdef SHARED
> > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fxsave;
> > > +# endif
> > > +#else
> > > +# ifdef SHARED
> > > + if (CPU_FEATURE_USABLE_P (cpu_features, FXSR))
> > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fxsave;
> > > + else
> > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fnsave;
> > > +# endif
> > > +#endif
> > > + }
> > > +
> > > #ifdef SHARED
> > > # ifdef __x86_64__
> > > TUNABLE_GET (plt_rewrite, tunable_val_t *,
> > > diff --git a/sysdeps/x86/dl-procinfo.c b/sysdeps/x86/dl-procinfo.c
> > > index ee957b4d70..5920d4b320 100644
> > > --- a/sysdeps/x86/dl-procinfo.c
> > > +++ b/sysdeps/x86/dl-procinfo.c
> > > @@ -86,3 +86,19 @@ PROCINFO_CLASS const char _dl_x86_platforms[4][9]
> > > #else
> > > ,
> > > #endif
> > > +
> > > +#if defined SHARED && !IS_IN (ldconfig)
> > > +# if !defined PROCINFO_DECL
> > > + ._dl_x86_tlsdesc_dynamic
> > > +# else
> > > +PROCINFO_CLASS void * _dl_x86_tlsdesc_dynamic
> > > +# endif
> > > +# ifndef PROCINFO_DECL
> > > += NULL
> > > +# endif
> > > +# ifdef PROCINFO_DECL
> > > +;
> > > +# else
> > > +,
> > > +# endif
> > > +#endif
> > > diff --git a/sysdeps/x86_64/features-offsets.sym b/sysdeps/x86/features-offsets.sym
> > > similarity index 89%
> > > rename from sysdeps/x86_64/features-offsets.sym
> > > rename to sysdeps/x86/features-offsets.sym
> > > index 9e4be3393a..77e990c705 100644
> > > --- a/sysdeps/x86_64/features-offsets.sym
> > > +++ b/sysdeps/x86/features-offsets.sym
> > > @@ -3,4 +3,6 @@
> > > #include <ldsodefs.h>
> > >
> > > RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET offsetof (struct rtld_global_ro, _dl_x86_cpu_features)
> > > +#ifdef __x86_64__
> > > RTLD_GLOBAL_DL_X86_FEATURE_1_OFFSET offsetof (struct rtld_global, _dl_x86_feature_1)
> > > +#endif
> > > diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h
> > > index 837fd28734..485cad9c02 100644
> > > --- a/sysdeps/x86/sysdep.h
> > > +++ b/sysdeps/x86/sysdep.h
> > > @@ -70,6 +70,12 @@
> > > | (1 << X86_XSTATE_ZMM_H_ID))
> > > #endif
> > >
> > > +/* States which should be saved for TLSDESC_CALL and TLS_DESC_CALL.
> > > + Compiler assumes that all registers, including x87 FPU stack registers,
> > > + are unchanged after CALL, except for EFLAGS and RAX/EAX. */
> > > +#define TLSDESC_CALL_STATE_SAVE_MASK \
> > > + (STATE_SAVE_MASK | (1 << X86_XSTATE_X87_ID))
> > > +
> > > /* Constants for bits in __x86_string_control: */
> > >
> > > /* Avoid short distance REP MOVSB. */
> > > diff --git a/sysdeps/x86/tst-gnu2-tls2.c b/sysdeps/x86/tst-gnu2-tls2.c
> > > new file mode 100644
> > > index 0000000000..de900a423b
> > > --- /dev/null
> > > +++ b/sysdeps/x86/tst-gnu2-tls2.c
> > > @@ -0,0 +1,20 @@
> > > +#ifndef __x86_64__
> > > +#include <sys/platform/x86.h>
> > > +
> > > +#define IS_SUPPORTED() CPU_FEATURE_ACTIVE (SSE2)
> > > +#endif
> > > +
> > > +/* Clear XMM0...XMM7 */
> > > +#define PREPARE_MALLOC() \
> > > +{ \
> > > + asm volatile ("xorps %%xmm0, %%xmm0" : : : "xmm0" ); \
> > > + asm volatile ("xorps %%xmm1, %%xmm1" : : : "xmm1" ); \
> > > + asm volatile ("xorps %%xmm2, %%xmm2" : : : "xmm2" ); \
> > > + asm volatile ("xorps %%xmm3, %%xmm3" : : : "xmm3" ); \
> > > + asm volatile ("xorps %%xmm4, %%xmm4" : : : "xmm4" ); \
> > > + asm volatile ("xorps %%xmm5, %%xmm5" : : : "xmm5" ); \
> > > + asm volatile ("xorps %%xmm6, %%xmm6" : : : "xmm6" ); \
> > > + asm volatile ("xorps %%xmm7, %%xmm7" : : : "xmm7" ); \
> > > +}
> > > +
> > > +#include <elf/tst-gnu2-tls2.c>
> > > diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
> > > index 145f25e7f6..9337e95093 100644
> > > --- a/sysdeps/x86_64/Makefile
> > > +++ b/sysdeps/x86_64/Makefile
> > > @@ -10,7 +10,7 @@ LDFLAGS-rtld += -Wl,-z,nomark-plt
> > > endif
> > >
> > > ifeq ($(subdir),csu)
> > > -gen-as-const-headers += features-offsets.sym link-defines.sym
> > > +gen-as-const-headers += link-defines.sym
> > > endif
> > >
> > > ifeq ($(subdir),gmon)
> > > diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> > > index 6d605d0d32..ff5d45f7cb 100644
> > > --- a/sysdeps/x86_64/dl-machine.h
> > > +++ b/sysdeps/x86_64/dl-machine.h
> > > @@ -71,9 +71,6 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > > int lazy, int profile)
> > > {
> > > Elf64_Addr *got;
> > > - extern void _dl_runtime_resolve_fxsave (ElfW(Word)) attribute_hidden;
> > > - extern void _dl_runtime_resolve_xsave (ElfW(Word)) attribute_hidden;
> > > - extern void _dl_runtime_resolve_xsavec (ElfW(Word)) attribute_hidden;
> > > extern void _dl_runtime_profile_sse (ElfW(Word)) attribute_hidden;
> > > extern void _dl_runtime_profile_avx (ElfW(Word)) attribute_hidden;
> > > extern void _dl_runtime_profile_avx512 (ElfW(Word)) attribute_hidden;
> > > @@ -96,8 +93,6 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > > /* Identify this shared object. */
> > > *(ElfW(Addr) *) (got + 1) = (ElfW(Addr)) l;
> > >
> > > - const struct cpu_features* cpu_features = __get_cpu_features ();
> > > -
> > > #ifdef SHARED
> > > /* The got[2] entry contains the address of a function which gets
> > > called to get the address of a so far unresolved function and
> > > @@ -107,6 +102,7 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > > end in this function. */
> > > if (__glibc_unlikely (profile))
> > > {
> > > + const struct cpu_features* cpu_features = __get_cpu_features ();
> > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512F))
> > > *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx512;
> > > else if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX))
> > > @@ -126,15 +122,8 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > > /* This function will get called to fix up the GOT entry
> > > indicated by the offset on the stack, and then jump to
> > > the resolved address. */
> > > - if (MINIMUM_X86_ISA_LEVEL >= AVX_X86_ISA_LEVEL
> > > - || GLRO(dl_x86_cpu_features).xsave_state_size != 0)
> > > - *(ElfW(Addr) *) (got + 2)
> > > - = (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC)
> > > - ? (ElfW(Addr)) &_dl_runtime_resolve_xsavec
> > > - : (ElfW(Addr)) &_dl_runtime_resolve_xsave);
> > > - else
> > > - *(ElfW(Addr) *) (got + 2)
> > > - = (ElfW(Addr)) &_dl_runtime_resolve_fxsave;
> > > + *(ElfW(Addr) *) (got + 2)
> > > + = (ElfW(Addr)) GLRO(dl_x86_64_runtime_resolve);
> > > }
> > > }
> > >
> > > @@ -383,7 +372,7 @@ and creates an unsatisfiable circular dependency.\n",
> > > {
> > > td->arg = _dl_make_tlsdesc_dynamic
> > > (sym_map, sym->st_value + reloc->r_addend);
> > > - td->entry = _dl_tlsdesc_dynamic;
> > > + td->entry = GLRO(dl_x86_tlsdesc_dynamic);
> > > }
> > > else
> > > # endif
> > > diff --git a/sysdeps/x86_64/dl-procinfo.c b/sysdeps/x86_64/dl-procinfo.c
> > > index 4d1d790fbb..06637a8154 100644
> > > --- a/sysdeps/x86_64/dl-procinfo.c
> > > +++ b/sysdeps/x86_64/dl-procinfo.c
> > > @@ -41,5 +41,21 @@
> > >
> > > #include <sysdeps/x86/dl-procinfo.c>
> > >
> > > +#if !IS_IN (ldconfig)
> > > +# if !defined PROCINFO_DECL && defined SHARED
> > > + ._dl_x86_64_runtime_resolve
> > > +# else
> > > +PROCINFO_CLASS void * _dl_x86_64_runtime_resolve
> > > +# endif
> > > +# ifndef PROCINFO_DECL
> > > += NULL
> > > +# endif
> > > +# if !defined SHARED || defined PROCINFO_DECL
> > > +;
> > > +# else
> > > +,
> > > +# endif
> > > +#endif
> > > +
> > > #undef PROCINFO_DECL
> > > #undef PROCINFO_CLASS
> > > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > new file mode 100644
> > > index 0000000000..0c2e8d5320
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > @@ -0,0 +1,166 @@
> > > +/* Thread-local storage handling in the ELF dynamic linker. x86_64 version.
> > > + Copyright (C) 2004-2024 Free Software Foundation, Inc.
> > > + This file is part of the GNU C Library.
> > > +
> > > + The GNU C Library is free software; you can redistribute it and/or
> > > + modify it under the terms of the GNU Lesser General Public
> > > + License as published by the Free Software Foundation; either
> > > + version 2.1 of the License, or (at your option) any later version.
> > > +
> > > + The GNU C Library is distributed in the hope that it will be useful,
> > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > + Lesser General Public License for more details.
> > > +
> > > + You should have received a copy of the GNU Lesser General Public
> > > + License along with the GNU C Library; if not, see
> > > + <https://www.gnu.org/licenses/>. */
> > > +
> > > +#ifndef SECTION
> > > +# define SECTION(p) p
> > > +#endif
> > > +
> > > +#undef REGISTER_SAVE_AREA
> > > +#undef LOCAL_STORAGE_AREA
> > > +#undef BASE
> > > +
> > > +#include "dl-trampoline-state.h"
> > > +
> > > + .section SECTION(.text),"ax",@progbits
> > > +
> > > + .hidden _dl_tlsdesc_dynamic
> > > + .global _dl_tlsdesc_dynamic
> > > + .type _dl_tlsdesc_dynamic,@function
> > > +
> > > + /* %rax points to the TLS descriptor, such that 0(%rax) points to
> > > + _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
> > > + tlsdesc_dynamic_arg object. It must return in %rax the offset
> > > + between the thread pointer and the object denoted by the
> > > + argument, without clobbering any registers.
> > > +
> > > + The assembly code that follows is a rendition of the following
> > > + C code, hand-optimized a little bit.
> > > +
> > > +ptrdiff_t
> > > +_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
> > > +{
> > > + struct tlsdesc_dynamic_arg *td = tdp->arg;
> > > + dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > > + if (__builtin_expect (td->gen_count <= dtv[0].counter
> > > + && (dtv[td->tlsinfo.ti_module].pointer.val
> > > + != TLS_DTV_UNALLOCATED),
> > > + 1))
> > > + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > > + - __thread_pointer;
> > > +
> > > + return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
> > > +}
> > > +*/
> > > + cfi_startproc
> > > + .align 16
> > > +_dl_tlsdesc_dynamic:
> > > + _CET_ENDBR
> > > + /* Preserve call-clobbered registers that we modify.
> > > + We need two scratch regs anyway. */
> > > + movq %rsi, -16(%rsp)
> > > + mov %fs:DTV_OFFSET, %RSI_LP
> > > + movq %rdi, -8(%rsp)
> > > + movq TLSDESC_ARG(%rax), %rdi
> > > + movq (%rsi), %rax
> > > + cmpq %rax, TLSDESC_GEN_COUNT(%rdi)
> > > + ja 2f
> > > + movq TLSDESC_MODID(%rdi), %rax
> > > + salq $4, %rax
> > > + movq (%rax,%rsi), %rax
> > > + cmpq $-1, %rax
> > > + je 2f
> > > + addq TLSDESC_MODOFF(%rdi), %rax
> > > +1:
> > > + movq -16(%rsp), %rsi
> > > + sub %fs:0, %RAX_LP
> > > + movq -8(%rsp), %rdi
> > > + ret
> > > +2:
> > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > + movq %rbx, -24(%rsp)
> > > + mov %RSP_LP, %RBX_LP
> > > + cfi_def_cfa_register(%rbx)
> > > + and $-STATE_SAVE_ALIGNMENT, %RSP_LP
> > > +#endif
> > > +#ifdef REGISTER_SAVE_AREA
> > > +# if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > + /* STATE_SAVE_OFFSET has space for 8 integer registers. But we
> > > + need space for RCX, RDX, RSI, RDI, R8, R9, R10 and R11, plus
> > > + RBX above. */
> > > + sub $(REGISTER_SAVE_AREA + STATE_SAVE_ALIGNMENT), %RSP_LP
> > > +# else
> > > + sub $REGISTER_SAVE_AREA, %RSP_LP
> > > + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
> > > +# endif
> > > +#else
> > > + /* Allocate stack space of the required size to save the state. */
> > > + sub _rtld_local_ro+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET(%rip), %RSP_LP
> > > +#endif
> > > + /* Besides rdi and rsi, saved above, save rcx, rdx, r8, r9,
> > > + r10 and r11. */
> > > + movq %rcx, REGISTER_SAVE_RCX(%rsp)
> > > + movq %rdx, REGISTER_SAVE_RDX(%rsp)
> > > + movq %r8, REGISTER_SAVE_R8(%rsp)
> > > + movq %r9, REGISTER_SAVE_R9(%rsp)
> > > + movq %r10, REGISTER_SAVE_R10(%rsp)
> > > + movq %r11, REGISTER_SAVE_R11(%rsp)
> > > +#ifdef USE_FXSAVE
> > > + fxsave STATE_SAVE_OFFSET(%rsp)
> > > +#else
> > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > + xorl %edx, %edx
> > > + /* Clear the XSAVE Header. */
> > > +# ifdef USE_XSAVE
> > > + movq %rdx, (STATE_SAVE_OFFSET + 512)(%rsp)
> > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8)(%rsp)
> > > +# endif
> > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 2)(%rsp)
> > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 3)(%rsp)
> > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 4)(%rsp)
> > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 5)(%rsp)
> > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 6)(%rsp)
> > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 7)(%rsp)
> > > +# ifdef USE_XSAVE
> > > + xsave STATE_SAVE_OFFSET(%rsp)
> > > +# else
> > > + xsavec STATE_SAVE_OFFSET(%rsp)
> > > +# endif
> > > +#endif
> > > + /* %rdi already points to the tlsinfo data structure. */
> > > + call HIDDEN_JUMPTARGET (__tls_get_addr)
> > > + # Get register content back.
> > > +#ifdef USE_FXSAVE
> > > + fxrstor STATE_SAVE_OFFSET(%rsp)
> > > +#else
> > > + /* Save and retore __tls_get_addr return value stored in RAX. */
> > > + mov %RAX_LP, %RCX_LP
> > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > + xorl %edx, %edx
> > > + xrstor STATE_SAVE_OFFSET(%rsp)
> > > + mov %RCX_LP, %RAX_LP
> > > +#endif
> > > + movq REGISTER_SAVE_R11(%rsp), %r11
> > > + movq REGISTER_SAVE_R10(%rsp), %r10
> > > + movq REGISTER_SAVE_R9(%rsp), %r9
> > > + movq REGISTER_SAVE_R8(%rsp), %r8
> > > + movq REGISTER_SAVE_RDX(%rsp), %rdx
> > > + movq REGISTER_SAVE_RCX(%rsp), %rcx
> > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > + mov %RBX_LP, %RSP_LP
> > > + cfi_def_cfa_register(%rsp)
> > > + movq -24(%rsp), %rbx
> > > + cfi_restore(%rbx)
> > > +#else
> > > + add $REGISTER_SAVE_AREA, %RSP_LP
> > > + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA)
> > > +#endif
> > > + jmp 1b
> > > + cfi_endproc
> > > + .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > > +
> > > +#undef STATE_SAVE_ALIGNMENT
> > > diff --git a/sysdeps/x86_64/dl-tlsdesc.S b/sysdeps/x86_64/dl-tlsdesc.S
> > > index f748af2ece..ea69f5223a 100644
> > > --- a/sysdeps/x86_64/dl-tlsdesc.S
> > > +++ b/sysdeps/x86_64/dl-tlsdesc.S
> > > @@ -18,7 +18,19 @@
> > >
> > > #include <sysdep.h>
> > > #include <tls.h>
> > > +#include <cpu-features-offsets.h>
> > > +#include <features-offsets.h>
> > > #include "tlsdesc.h"
> > > +#include "dl-trampoline-save.h"
> > > +
> > > +/* Area on stack to save and restore registers used for parameter
> > > + passing when calling _dl_tlsdesc_dynamic. */
> > > +#define REGISTER_SAVE_RCX 0
> > > +#define REGISTER_SAVE_RDX (REGISTER_SAVE_RCX + 8)
> > > +#define REGISTER_SAVE_R8 (REGISTER_SAVE_RDX + 8)
> > > +#define REGISTER_SAVE_R9 (REGISTER_SAVE_R8 + 8)
> > > +#define REGISTER_SAVE_R10 (REGISTER_SAVE_R9 + 8)
> > > +#define REGISTER_SAVE_R11 (REGISTER_SAVE_R10 + 8)
> > >
> > > .text
> > >
> > > @@ -67,80 +79,24 @@ _dl_tlsdesc_undefweak:
> > > .size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
> > >
> > > #ifdef SHARED
> > > - .hidden _dl_tlsdesc_dynamic
> > > - .global _dl_tlsdesc_dynamic
> > > - .type _dl_tlsdesc_dynamic,@function
> > > -
> > > - /* %rax points to the TLS descriptor, such that 0(%rax) points to
> > > - _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
> > > - tlsdesc_dynamic_arg object. It must return in %rax the offset
> > > - between the thread pointer and the object denoted by the
> > > - argument, without clobbering any registers.
> > > -
> > > - The assembly code that follows is a rendition of the following
> > > - C code, hand-optimized a little bit.
> > > -
> > > -ptrdiff_t
> > > -_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
> > > -{
> > > - struct tlsdesc_dynamic_arg *td = tdp->arg;
> > > - dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > > - if (__builtin_expect (td->gen_count <= dtv[0].counter
> > > - && (dtv[td->tlsinfo.ti_module].pointer.val
> > > - != TLS_DTV_UNALLOCATED),
> > > - 1))
> > > - return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > > - - __thread_pointer;
> > > -
> > > - return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
> > > -}
> > > -*/
> > > - cfi_startproc
> > > - .align 16
> > > -_dl_tlsdesc_dynamic:
> > > - _CET_ENDBR
> > > - /* Preserve call-clobbered registers that we modify.
> > > - We need two scratch regs anyway. */
> > > - movq %rsi, -16(%rsp)
> > > - mov %fs:DTV_OFFSET, %RSI_LP
> > > - movq %rdi, -8(%rsp)
> > > - movq TLSDESC_ARG(%rax), %rdi
> > > - movq (%rsi), %rax
> > > - cmpq %rax, TLSDESC_GEN_COUNT(%rdi)
> > > - ja .Lslow
> > > - movq TLSDESC_MODID(%rdi), %rax
> > > - salq $4, %rax
> > > - movq (%rax,%rsi), %rax
> > > - cmpq $-1, %rax
> > > - je .Lslow
> > > - addq TLSDESC_MODOFF(%rdi), %rax
> > > -.Lret:
> > > - movq -16(%rsp), %rsi
> > > - sub %fs:0, %RAX_LP
> > > - movq -8(%rsp), %rdi
> > > - ret
> > > -.Lslow:
> > > - /* Besides rdi and rsi, saved above, save rdx, rcx, r8, r9,
> > > - r10 and r11. Also, align the stack, that's off by 8 bytes. */
> > > - subq $72, %rsp
> > > - cfi_adjust_cfa_offset (72)
> > > - movq %rdx, 8(%rsp)
> > > - movq %rcx, 16(%rsp)
> > > - movq %r8, 24(%rsp)
> > > - movq %r9, 32(%rsp)
> > > - movq %r10, 40(%rsp)
> > > - movq %r11, 48(%rsp)
> > > - /* %rdi already points to the tlsinfo data structure. */
> > > - call HIDDEN_JUMPTARGET (__tls_get_addr)
> > > - movq 8(%rsp), %rdx
> > > - movq 16(%rsp), %rcx
> > > - movq 24(%rsp), %r8
> > > - movq 32(%rsp), %r9
> > > - movq 40(%rsp), %r10
> > > - movq 48(%rsp), %r11
> > > - addq $72, %rsp
> > > - cfi_adjust_cfa_offset (-72)
> > > - jmp .Lret
> > > - cfi_endproc
> > > - .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > > +# define USE_FXSAVE
> > > +# define STATE_SAVE_ALIGNMENT 16
> > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave
> > > +# include "dl-tlsdesc-dynamic.h"
> > > +# undef _dl_tlsdesc_dynamic
> > > +# undef USE_FXSAVE
> > > +
> > > +# define USE_XSAVE
> > > +# define STATE_SAVE_ALIGNMENT 64
> > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave
> > > +# include "dl-tlsdesc-dynamic.h"
> > > +# undef _dl_tlsdesc_dynamic
> > > +# undef USE_XSAVE
> > > +
> > > +# define USE_XSAVEC
> > > +# define STATE_SAVE_ALIGNMENT 64
> > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec
> > > +# include "dl-tlsdesc-dynamic.h"
> > > +# undef _dl_tlsdesc_dynamic
> > > +# undef USE_XSAVEC
> > > #endif /* SHARED */
> > > diff --git a/sysdeps/x86_64/dl-trampoline-save.h b/sysdeps/x86_64/dl-trampoline-save.h
> > > new file mode 100644
> > > index 0000000000..84eac4a8ac
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/dl-trampoline-save.h
> > > @@ -0,0 +1,34 @@
> > > +/* x86-64 PLT trampoline register save macros.
> > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > + This file is part of the GNU C Library.
> > > +
> > > + The GNU C Library is free software; you can redistribute it and/or
> > > + modify it under the terms of the GNU Lesser General Public
> > > + License as published by the Free Software Foundation; either
> > > + version 2.1 of the License, or (at your option) any later version.
> > > +
> > > + The GNU C Library is distributed in the hope that it will be useful,
> > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > + Lesser General Public License for more details.
> > > +
> > > + You should have received a copy of the GNU Lesser General Public
> > > + License along with the GNU C Library; if not, see
> > > + <https://www.gnu.org/licenses/>. */
> > > +
> > > +#ifndef DL_STACK_ALIGNMENT
> > > +/* Due to GCC bug:
> > > +
> > > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> > > +
> > > + __tls_get_addr may be called with 8-byte stack alignment. Although
> > > + this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> > > + that stack will be always aligned at 16 bytes. */
> > > +# define DL_STACK_ALIGNMENT 8
> > > +#endif
> > > +
> > > +/* True if _dl_runtime_resolve should align stack for STATE_SAVE or align
> > > + stack to 16 bytes before calling _dl_fixup. */
> > > +#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> > > + (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> > > + || 16 > DL_STACK_ALIGNMENT)
> > > diff --git a/sysdeps/x86_64/dl-trampoline-state.h b/sysdeps/x86_64/dl-trampoline-state.h
> > > new file mode 100644
> > > index 0000000000..575f120797
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/dl-trampoline-state.h
> > > @@ -0,0 +1,51 @@
> > > +/* x86-64 PLT dl-trampoline state macros.
> > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > + This file is part of the GNU C Library.
> > > +
> > > + The GNU C Library is free software; you can redistribute it and/or
> > > + modify it under the terms of the GNU Lesser General Public
> > > + License as published by the Free Software Foundation; either
> > > + version 2.1 of the License, or (at your option) any later version.
> > > +
> > > + The GNU C Library is distributed in the hope that it will be useful,
> > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > + Lesser General Public License for more details.
> > > +
> > > + You should have received a copy of the GNU Lesser General Public
> > > + License along with the GNU C Library; if not, see
> > > + <https://www.gnu.org/licenses/>. */
> > > +
> > > +#if (STATE_SAVE_ALIGNMENT % 16) != 0
> > > +# error STATE_SAVE_ALIGNMENT must be multiple of 16
> > > +#endif
> > > +
> > > +#if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) != 0
> > > +# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT
> > > +#endif
> > > +
> > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > +/* Local stack area before jumping to function address: RBX. */
> > > +# define LOCAL_STORAGE_AREA 8
> > > +# define BASE rbx
> > > +# ifdef USE_FXSAVE
> > > +/* Use fxsave to save XMM registers. */
> > > +# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET)
> > > +# if (REGISTER_SAVE_AREA % 16) != 0
> > > +# error REGISTER_SAVE_AREA must be multiple of 16
> > > +# endif
> > > +# endif
> > > +#else
> > > +# ifndef USE_FXSAVE
> > > +# error USE_FXSAVE must be defined
> > > +# endif
> > > +/* Use fxsave to save XMM registers. */
> > > +# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8)
> > > +/* Local stack area before jumping to function address: All saved
> > > + registers. */
> > > +# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA
> > > +# define BASE rsp
> > > +# if (REGISTER_SAVE_AREA % 16) != 8
> > > +# error REGISTER_SAVE_AREA must be odd multiple of 8
> > > +# endif
> > > +#endif
> > > diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S
> > > index b2e7e0f69b..87c5137837 100644
> > > --- a/sysdeps/x86_64/dl-trampoline.S
> > > +++ b/sysdeps/x86_64/dl-trampoline.S
> > > @@ -22,25 +22,7 @@
> > > #include <features-offsets.h>
> > > #include <link-defines.h>
> > > #include <isa-level.h>
> > > -
> > > -#ifndef DL_STACK_ALIGNMENT
> > > -/* Due to GCC bug:
> > > -
> > > - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> > > -
> > > - __tls_get_addr may be called with 8-byte stack alignment. Although
> > > - this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> > > - that stack will be always aligned at 16 bytes. We use unaligned
> > > - 16-byte move to load and store SSE registers, which has no penalty
> > > - on modern processors if stack is 16-byte aligned. */
> > > -# define DL_STACK_ALIGNMENT 8
> > > -#endif
> > > -
> > > -/* True if _dl_runtime_resolve should align stack for STATE_SAVE or align
> > > - stack to 16 bytes before calling _dl_fixup. */
> > > -#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> > > - (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> > > - || 16 > DL_STACK_ALIGNMENT)
> > > +#include "dl-trampoline-save.h"
> > >
> > > /* Area on stack to save and restore registers used for parameter
> > > passing when calling _dl_fixup. */
> > > diff --git a/sysdeps/x86_64/dl-trampoline.h b/sysdeps/x86_64/dl-trampoline.h
> > > index f55c6ea040..d9ccfb40d4 100644
> > > --- a/sysdeps/x86_64/dl-trampoline.h
> > > +++ b/sysdeps/x86_64/dl-trampoline.h
> > > @@ -27,39 +27,7 @@
> > > # undef LOCAL_STORAGE_AREA
> > > # undef BASE
> > >
> > > -# if (STATE_SAVE_ALIGNMENT % 16) != 0
> > > -# error STATE_SAVE_ALIGNMENT must be multiple of 16
> > > -# endif
> > > -
> > > -# if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) != 0
> > > -# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT
> > > -# endif
> > > -
> > > -# if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > -/* Local stack area before jumping to function address: RBX. */
> > > -# define LOCAL_STORAGE_AREA 8
> > > -# define BASE rbx
> > > -# ifdef USE_FXSAVE
> > > -/* Use fxsave to save XMM registers. */
> > > -# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET)
> > > -# if (REGISTER_SAVE_AREA % 16) != 0
> > > -# error REGISTER_SAVE_AREA must be multiple of 16
> > > -# endif
> > > -# endif
> > > -# else
> > > -# ifndef USE_FXSAVE
> > > -# error USE_FXSAVE must be defined
> > > -# endif
> > > -/* Use fxsave to save XMM registers. */
> > > -# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8)
> > > -/* Local stack area before jumping to function address: All saved
> > > - registers. */
> > > -# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA
> > > -# define BASE rsp
> > > -# if (REGISTER_SAVE_AREA % 16) != 8
> > > -# error REGISTER_SAVE_AREA must be odd multiple of 8
> > > -# endif
> > > -# endif
> > > +# include "dl-trampoline-state.h"
> > >
> > > .globl _dl_runtime_resolve
> > > .hidden _dl_runtime_resolve
> > > --
> > > 2.43.2
> > >
> >
> > This still LGTM.
> > Feel free to add my reviewed-by tag to future posts of the patch.
> > Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
>
> Please wait for community consensus before pushing
It was agreed that I should send out the v10 patch with xfail on
the new test. Someone just needs to double check that my
v10 patch does it.
--
H.J.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v10 1/1] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers
2024-02-26 19:06 ` H.J. Lu
@ 2024-02-27 17:00 ` Noah Goldstein
2024-03-08 20:10 ` H.J. Lu
0 siblings, 1 reply; 7+ messages in thread
From: Noah Goldstein @ 2024-02-27 17:00 UTC (permalink / raw)
To: H.J. Lu; +Cc: libc-alpha, adhemerval.zanella
On Mon, Feb 26, 2024 at 1:06 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Feb 26, 2024 at 11:02 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > On Mon, Feb 26, 2024 at 1:02 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > >
> > > On Mon, Feb 26, 2024 at 8:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > Compiler generates the following instruction sequence for GNU2 dynamic
> > > > TLS access:
> > > >
> > > > leaq tls_var@TLSDESC(%rip), %rax
> > > > call *tls_var@TLSCALL(%rax)
> > > >
> > > > or
> > > >
> > > > leal tls_var@TLSDESC(%ebx), %eax
> > > > call *tls_var@TLSCALL(%eax)
> > > >
> > > > CALL instruction is transparent to compiler which assumes all registers,
> > > > except for EFLAGS and RAX/EAX, are unchanged after CALL. When
> > > > _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
> > > > path. __tls_get_addr is a normal function which doesn't preserve any
> > > > caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer
> > > > caller-saved registers, but didn't preserve any other caller-saved
> > > > registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
> > > > XSAVE and XSAVEC to save and restore all caller-saved registers. This
> > > > fixes BZ #31372.
> > > >
> > > > Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
> > > > to optimize elf_machine_runtime_setup.
> > > > ---
> > > > elf/Makefile | 18 ++
> > > > elf/tst-gnu2-tls2.c | 122 ++++++++++++
> > > > elf/tst-gnu2-tls2.h | 36 ++++
> > > > elf/tst-gnu2-tls2mod0.c | 31 +++
> > > > elf/tst-gnu2-tls2mod1.c | 31 +++
> > > > elf/tst-gnu2-tls2mod2.c | 31 +++
> > > > sysdeps/i386/dl-machine.h | 2 +-
> > > > sysdeps/i386/dl-tlsdesc-dynamic.h | 190 +++++++++++++++++++
> > > > sysdeps/i386/dl-tlsdesc.S | 115 +++++------
> > > > sysdeps/x86/Makefile | 7 +-
> > > > sysdeps/x86/cpu-features.c | 56 +++++-
> > > > sysdeps/x86/dl-procinfo.c | 16 ++
> > > > sysdeps/{x86_64 => x86}/features-offsets.sym | 2 +
> > > > sysdeps/x86/sysdep.h | 6 +
> > > > sysdeps/x86/tst-gnu2-tls2.c | 20 ++
> > > > sysdeps/x86_64/Makefile | 2 +-
> > > > sysdeps/x86_64/dl-machine.h | 19 +-
> > > > sysdeps/x86_64/dl-procinfo.c | 16 ++
> > > > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 166 ++++++++++++++++
> > > > sysdeps/x86_64/dl-tlsdesc.S | 108 ++++-------
> > > > sysdeps/x86_64/dl-trampoline-save.h | 34 ++++
> > > > sysdeps/x86_64/dl-trampoline-state.h | 51 +++++
> > > > sysdeps/x86_64/dl-trampoline.S | 20 +-
> > > > sysdeps/x86_64/dl-trampoline.h | 34 +---
> > > > 24 files changed, 920 insertions(+), 213 deletions(-)
> > > > create mode 100644 elf/tst-gnu2-tls2.c
> > > > create mode 100644 elf/tst-gnu2-tls2.h
> > > > create mode 100644 elf/tst-gnu2-tls2mod0.c
> > > > create mode 100644 elf/tst-gnu2-tls2mod1.c
> > > > create mode 100644 elf/tst-gnu2-tls2mod2.c
> > > > create mode 100644 sysdeps/i386/dl-tlsdesc-dynamic.h
> > > > rename sysdeps/{x86_64 => x86}/features-offsets.sym (89%)
> > > > create mode 100644 sysdeps/x86/tst-gnu2-tls2.c
> > > > create mode 100644 sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > > create mode 100644 sysdeps/x86_64/dl-trampoline-save.h
> > > > create mode 100644 sysdeps/x86_64/dl-trampoline-state.h
> > > >
> > > > diff --git a/elf/Makefile b/elf/Makefile
> > > > index 36c04baf02..02dc476e27 100644
> > > > --- a/elf/Makefile
> > > > +++ b/elf/Makefile
> > > > @@ -424,6 +424,7 @@ tests += \
> > > > tst-glibc-hwcaps-prepend \
> > > > tst-global1 \
> > > > tst-global2 \
> > > > + tst-gnu2-tls2 \
> > > > tst-initfinilazyfail \
> > > > tst-initorder \
> > > > tst-initorder2 \
> > > > @@ -846,6 +847,9 @@ modules-names += \
> > > > tst-filterobj-flt \
> > > > tst-finilazyfailmod \
> > > > tst-globalmod2 \
> > > > + tst-gnu2-tls2mod0 \
> > > > + tst-gnu2-tls2mod1 \
> > > > + tst-gnu2-tls2mod2 \
> > > > tst-initlazyfailmod \
> > > > tst-initorder2a \
> > > > tst-initorder2b \
> > > > @@ -3044,8 +3048,22 @@ $(objpfx)tst-tlsgap.out: \
> > > > $(objpfx)tst-tlsgap-mod0.so \
> > > > $(objpfx)tst-tlsgap-mod1.so \
> > > > $(objpfx)tst-tlsgap-mod2.so
> > > > +
> > > > +$(objpfx)tst-gnu2-tls2: $(shared-thread-library)
> > > > +$(objpfx)tst-gnu2-tls2.out: \
> > > > + $(objpfx)tst-gnu2-tls2mod0.so \
> > > > + $(objpfx)tst-gnu2-tls2mod1.so \
> > > > + $(objpfx)tst-gnu2-tls2mod2.so
> > > > +
> > > > ifeq (yes,$(have-mtls-dialect-gnu2))
> > > > +# This test fails if dl_tlsdesc_dynamic doesn't preserve all caller-saved
> > > > +# registers. See https://sourceware.org/bugzilla/show_bug.cgi?id=31372
> > > > +test-xfail-tst-gnu2-tls2 = yes
> > > > +
> > > > CFLAGS-tst-tlsgap-mod0.c += -mtls-dialect=gnu2
> > > > CFLAGS-tst-tlsgap-mod1.c += -mtls-dialect=gnu2
> > > > CFLAGS-tst-tlsgap-mod2.c += -mtls-dialect=gnu2
> > > > +CFLAGS-tst-gnu2-tls2mod0.c += -mtls-dialect=gnu2
> > > > +CFLAGS-tst-gnu2-tls2mod1.c += -mtls-dialect=gnu2
> > > > +CFLAGS-tst-gnu2-tls2mod2.c += -mtls-dialect=gnu2
> > > > endif
> > > > diff --git a/elf/tst-gnu2-tls2.c b/elf/tst-gnu2-tls2.c
> > > > new file mode 100644
> > > > index 0000000000..7ac04d7f33
> > > > --- /dev/null
> > > > +++ b/elf/tst-gnu2-tls2.c
> > > > @@ -0,0 +1,122 @@
> > > > +/* Test TLSDESC relocation.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <http://www.gnu.org/licenses/>. */
> > > > +
> > > > +#include <stdio.h>
> > > > +#include <stdlib.h>
> > > > +#include <string.h>
> > > > +#include <dlfcn.h>
> > > > +#include <pthread.h>
> > > > +#include <support/xdlfcn.h>
> > > > +#include <support/xthread.h>
> > > > +#include <support/check.h>
> > > > +#include <support/test-driver.h>
> > > > +#include "tst-gnu2-tls2.h"
> > > > +
> > > > +#ifndef IS_SUPPORTED
> > > > +# define IS_SUPPORTED() true
> > > > +#endif
> > > > +
> > > > +/* An architecture can define it to clobber caller-saved registers in
> > > > + malloc below to verify that the implicit TLSDESC call won't change
> > > > + caller-saved registers. */
> > > > +#ifndef PREPARE_MALLOC
> > > > +# define PREPARE_MALLOC()
> > > > +#endif
> > > > +
> > > > +extern void * __libc_malloc (size_t);
> > > > +
> > > > +size_t malloc_counter = 0;
> > > > +
> > > > +void *
> > > > +malloc (size_t n)
> > > > +{
> > > > + PREPARE_MALLOC ();
> > > > + malloc_counter++;
> > > > + return __libc_malloc (n);
> > > > +}
> > > > +
> > > > +static void *mod[3];
> > > > +#ifndef MOD
> > > > +# define MOD(i) "tst-gnu2-tls2mod" #i ".so"
> > > > +#endif
> > > > +static const char *modname[3] = { MOD(0), MOD(1), MOD(2) };
> > > > +#undef MOD
> > > > +
> > > > +static void
> > > > +open_mod (int i)
> > > > +{
> > > > + mod[i] = xdlopen (modname[i], RTLD_LAZY);
> > > > + printf ("open %s\n", modname[i]);
> > > > +}
> > > > +
> > > > +static void
> > > > +close_mod (int i)
> > > > +{
> > > > + xdlclose (mod[i]);
> > > > + mod[i] = NULL;
> > > > + printf ("close %s\n", modname[i]);
> > > > +}
> > > > +
> > > > +static void
> > > > +access_mod (int i, const char *sym)
> > > > +{
> > > > + struct tls var = { -1, -1, -1, -1 };
> > > > + struct tls *(*f) (struct tls *) = xdlsym (mod[i], sym);
> > > > + /* Check that our malloc is called. */
> > > > + malloc_counter = 0;
> > > > + struct tls *p = f (&var);
> > > > + TEST_VERIFY (malloc_counter != 0);
> > > > + printf ("access %s: %s() = %p\n", modname[i], sym, p);
> > > > + TEST_VERIFY_EXIT (memcmp (p, &var, sizeof (var)) == 0);
> > > > + ++(p->a);
> > > > +}
> > > > +
> > > > +static void *
> > > > +start (void *arg)
> > > > +{
> > > > + /* The DTV generation is at the last dlopen of mod0 and the
> > > > + entry for mod1 is NULL. */
> > > > +
> > > > + open_mod (1); /* Reuse modid of mod1. Uses dynamic TLS. */
> > > > +
> > > > + /* Force the slow path in GNU2 TLS descriptor call. */
> > > > + access_mod (1, "apply_tls");
> > > > +
> > > > + return arg;
> > > > +}
> > > > +
> > > > +static int
> > > > +do_test (void)
> > > > +{
> > > > + if (!IS_SUPPORTED ())
> > > > + return EXIT_UNSUPPORTED;
> > > > +
> > > > + open_mod (0);
> > > > + open_mod (1);
> > > > + open_mod (2);
> > > > + close_mod (0);
> > > > + close_mod (1); /* Create modid gap at mod1. */
> > > > + open_mod (0); /* Reuse modid of mod0, bump generation count. */
> > > > +
> > > > + /* Create a thread where DTV of mod1 is NULL. */
> > > > + pthread_t t = xpthread_create (NULL, start, NULL);
> > > > + xpthread_join (t);
> > > > + return 0;
> > > > +}
> > > > +
> > > > +#include <support/test-driver.c>
> > > > diff --git a/elf/tst-gnu2-tls2.h b/elf/tst-gnu2-tls2.h
> > > > new file mode 100644
> > > > index 0000000000..77964a57a3
> > > > --- /dev/null
> > > > +++ b/elf/tst-gnu2-tls2.h
> > > > @@ -0,0 +1,36 @@
> > > > +/* Test TLSDESC relocation.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <https://www.gnu.org/licenses/>. */
> > > > +
> > > > +#include <stdint.h>
> > > > +
> > > > +struct tls
> > > > +{
> > > > + int64_t a, b, c, d;
> > > > +};
> > > > +
> > > > +extern struct tls *apply_tls (struct tls *);
> > > > +
> > > > +/* An architecture can define them to verify that clobber caller-saved
> > > > + registers aren't changed by the implicit TLSDESC call. */
> > > > +#ifndef BEFORE_TLSDESC_CALL
> > > > +# define BEFORE_TLSDESC_CALL()
> > > > +#endif
> > > > +
> > > > +#ifndef AFTER_TLSDESC_CALL
> > > > +# define AFTER_TLSDESC_CALL()
> > > > +#endif
> > > > diff --git a/elf/tst-gnu2-tls2mod0.c b/elf/tst-gnu2-tls2mod0.c
> > > > new file mode 100644
> > > > index 0000000000..45556a0e17
> > > > --- /dev/null
> > > > +++ b/elf/tst-gnu2-tls2mod0.c
> > > > @@ -0,0 +1,31 @@
> > > > +/* DSO used by tst-gnu2-tls2.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <https://www.gnu.org/licenses/>. */
> > > > +
> > > > +#include "tst-gnu2-tls2.h"
> > > > +
> > > > +__thread struct tls tls_var0 __attribute__ ((visibility ("hidden")));
> > > > +
> > > > +struct tls *
> > > > +apply_tls (struct tls *p)
> > > > +{
> > > > + BEFORE_TLSDESC_CALL ();
> > > > + tls_var0 = *p;
> > > > + struct tls *ret = &tls_var0;
> > > > + AFTER_TLSDESC_CALL ();
> > > > + return ret;
> > > > +}
> > > > diff --git a/elf/tst-gnu2-tls2mod1.c b/elf/tst-gnu2-tls2mod1.c
> > > > new file mode 100644
> > > > index 0000000000..e10b9dbc0a
> > > > --- /dev/null
> > > > +++ b/elf/tst-gnu2-tls2mod1.c
> > > > @@ -0,0 +1,31 @@
> > > > +/* DSO used by tst-gnu2-tls2.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <https://www.gnu.org/licenses/>. */
> > > > +
> > > > +#include "tst-gnu2-tls2.h"
> > > > +
> > > > +__thread struct tls tls_var1[100] __attribute__ ((visibility ("hidden")));
> > > > +
> > > > +struct tls *
> > > > +apply_tls (struct tls *p)
> > > > +{
> > > > + BEFORE_TLSDESC_CALL ();
> > > > + tls_var1[1] = *p;
> > > > + struct tls *ret = &tls_var1[1];
> > > > + AFTER_TLSDESC_CALL ();
> > > > + return ret;
> > > > +}
> > > > diff --git a/elf/tst-gnu2-tls2mod2.c b/elf/tst-gnu2-tls2mod2.c
> > > > new file mode 100644
> > > > index 0000000000..141af51e55
> > > > --- /dev/null
> > > > +++ b/elf/tst-gnu2-tls2mod2.c
> > > > @@ -0,0 +1,31 @@
> > > > +/* DSO used by tst-gnu2-tls2.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <https://www.gnu.org/licenses/>. */
> > > > +
> > > > +#include "tst-gnu2-tls2.h"
> > > > +
> > > > +__thread struct tls tls_var2 __attribute__ ((visibility ("hidden")));
> > > > +
> > > > +struct tls *
> > > > +apply_tls (struct tls *p)
> > > > +{
> > > > + BEFORE_TLSDESC_CALL ();
> > > > + tls_var2 = *p;
> > > > + struct tls *ret = &tls_var2;
> > > > + AFTER_TLSDESC_CALL ();
> > > > + return ret;
> > > > +}
> > > > diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
> > > > index fc1ef96587..50d74fe6e9 100644
> > > > --- a/sysdeps/i386/dl-machine.h
> > > > +++ b/sysdeps/i386/dl-machine.h
> > > > @@ -347,7 +347,7 @@ and creates an unsatisfiable circular dependency.\n",
> > > > {
> > > > td->arg = _dl_make_tlsdesc_dynamic
> > > > (sym_map, sym->st_value + (ElfW(Word))td->arg);
> > > > - td->entry = _dl_tlsdesc_dynamic;
> > > > + td->entry = GLRO(dl_x86_tlsdesc_dynamic);
> > > > }
> > > > else
> > > > # endif
> > > > diff --git a/sysdeps/i386/dl-tlsdesc-dynamic.h b/sysdeps/i386/dl-tlsdesc-dynamic.h
> > > > new file mode 100644
> > > > index 0000000000..3627028577
> > > > --- /dev/null
> > > > +++ b/sysdeps/i386/dl-tlsdesc-dynamic.h
> > > > @@ -0,0 +1,190 @@
> > > > +/* Thread-local storage handling in the ELF dynamic linker. i386 version.
> > > > + Copyright (C) 2004-2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <https://www.gnu.org/licenses/>. */
> > > > +
> > > > +#undef REGISTER_SAVE_AREA
> > > > +
> > > > +#if !defined USE_FNSAVE && (STATE_SAVE_ALIGNMENT % 16) != 0
> > > > +# error STATE_SAVE_ALIGNMENT must be multiple of 16
> > > > +#endif
> > > > +
> > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > +# ifdef USE_FNSAVE
> > > > +# error USE_FNSAVE shouldn't be defined
> > > > +# endif
> > > > +# ifdef USE_FXSAVE
> > > > +/* Use fxsave to save all registers. */
> > > > +# define REGISTER_SAVE_AREA 512
> > > > +# endif
> > > > +#else
> > > > +# ifdef USE_FNSAVE
> > > > +/* Use fnsave to save x87 FPU stack registers. */
> > > > +# define REGISTER_SAVE_AREA 108
> > > > +# else
> > > > +# ifndef USE_FXSAVE
> > > > +# error USE_FXSAVE must be defined
> > > > +# endif
> > > > +/* Use fxsave to save all registers. Add 12 bytes to align the stack
> > > > + to 16 bytes. */
> > > > +# define REGISTER_SAVE_AREA (512 + 12)
> > > > +# endif
> > > > +#endif
> > > > +
> > > > + .hidden _dl_tlsdesc_dynamic
> > > > + .global _dl_tlsdesc_dynamic
> > > > + .type _dl_tlsdesc_dynamic,@function
> > > > +
> > > > + /* This function is used for symbols that need dynamic TLS.
> > > > +
> > > > + %eax points to the TLS descriptor, such that 0(%eax) points to
> > > > + _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
> > > > + tlsdesc_dynamic_arg object. It must return in %eax the offset
> > > > + between the thread pointer and the object denoted by the
> > > > + argument, without clobbering any registers.
> > > > +
> > > > + The assembly code that follows is a rendition of the following
> > > > + C code, hand-optimized a little bit.
> > > > +
> > > > +ptrdiff_t
> > > > +__attribute__ ((__regparm__ (1)))
> > > > +_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
> > > > +{
> > > > + struct tlsdesc_dynamic_arg *td = tdp->arg;
> > > > + dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > > > + if (__builtin_expect (td->gen_count <= dtv[0].counter
> > > > + && (dtv[td->tlsinfo.ti_module].pointer.val
> > > > + != TLS_DTV_UNALLOCATED),
> > > > + 1))
> > > > + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > > > + - __thread_pointer;
> > > > +
> > > > + return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
> > > > +}
> > > > +*/
> > > > + cfi_startproc
> > > > + .align 16
> > > > +_dl_tlsdesc_dynamic:
> > > > + /* Like all TLS resolvers, preserve call-clobbered registers.
> > > > + We need two scratch regs anyway. */
> > > > + subl $32, %esp
> > > > + cfi_adjust_cfa_offset (32)
> > > > + movl %ecx, 20(%esp)
> > > > + movl %edx, 24(%esp)
> > > > + movl TLSDESC_ARG(%eax), %eax
> > > > + movl %gs:DTV_OFFSET, %edx
> > > > + movl TLSDESC_GEN_COUNT(%eax), %ecx
> > > > + cmpl (%edx), %ecx
> > > > + ja 2f
> > > > + movl TLSDESC_MODID(%eax), %ecx
> > > > + movl (%edx,%ecx,8), %edx
> > > > + cmpl $-1, %edx
> > > > + je 2f
> > > > + movl TLSDESC_MODOFF(%eax), %eax
> > > > + addl %edx, %eax
> > > > +1:
> > > > + movl 20(%esp), %ecx
> > > > + subl %gs:0, %eax
> > > > + movl 24(%esp), %edx
> > > > + addl $32, %esp
> > > > + cfi_adjust_cfa_offset (-32)
> > > > + ret
> > > > + .p2align 4,,7
> > > > +2:
> > > > + cfi_adjust_cfa_offset (32)
> > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > + movl %ebx, -28(%esp)
> > > > + movl %esp, %ebx
> > > > + cfi_def_cfa_register(%ebx)
> > > > + and $-STATE_SAVE_ALIGNMENT, %esp
> > > > +#endif
> > > > +#ifdef REGISTER_SAVE_AREA
> > > > + subl $REGISTER_SAVE_AREA, %esp
> > > > +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
> > > > +# endif
> > > > +#else
> > > > +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > +# error DL_RUNTIME_RESOLVE_REALIGN_STACK must be true
> > > > +# endif
> > > > + /* Allocate stack space of the required size to save the state. */
> > > > + LOAD_PIC_REG (cx)
> > > > + subl RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET+_rtld_local_ro@GOTOFF(%ecx), %esp
> > > > +#endif
> > > > +#ifdef USE_FNSAVE
> > > > + fnsave (%esp)
> > > > +#elif defined USE_FXSAVE
> > > > + fxsave (%esp)
> > > > +#else
> > > > + /* Save the argument for ___tls_get_addr in EAX. */
> > > > + movl %eax, %ecx
> > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > > + xorl %edx, %edx
> > > > + /* Clear the XSAVE Header. */
> > > > +# ifdef USE_XSAVE
> > > > + movl %edx, (512)(%esp)
> > > > + movl %edx, (512 + 4 * 1)(%esp)
> > > > + movl %edx, (512 + 4 * 2)(%esp)
> > > > + movl %edx, (512 + 4 * 3)(%esp)
> > > > +# endif
> > > > + movl %edx, (512 + 4 * 4)(%esp)
> > > > + movl %edx, (512 + 4 * 5)(%esp)
> > > > + movl %edx, (512 + 4 * 6)(%esp)
> > > > + movl %edx, (512 + 4 * 7)(%esp)
> > > > + movl %edx, (512 + 4 * 8)(%esp)
> > > > + movl %edx, (512 + 4 * 9)(%esp)
> > > > + movl %edx, (512 + 4 * 10)(%esp)
> > > > + movl %edx, (512 + 4 * 11)(%esp)
> > > > + movl %edx, (512 + 4 * 12)(%esp)
> > > > + movl %edx, (512 + 4 * 13)(%esp)
> > > > + movl %edx, (512 + 4 * 14)(%esp)
> > > > + movl %edx, (512 + 4 * 15)(%esp)
> > > > +# ifdef USE_XSAVE
> > > > + xsave (%esp)
> > > > +# else
> > > > + xsavec (%esp)
> > > > +# endif
> > > > + /* Restore the argument for ___tls_get_addr in EAX. */
> > > > + movl %ecx, %eax
> > > > +#endif
> > > > + call HIDDEN_JUMPTARGET (___tls_get_addr)
> > > > + /* Get register content back. */
> > > > +#ifdef USE_FNSAVE
> > > > + frstor (%esp)
> > > > +#elif defined USE_FXSAVE
> > > > + fxrstor (%esp)
> > > > +#else
> > > > + /* Save and retore ___tls_get_addr return value stored in EAX. */
> > > > + movl %eax, %ecx
> > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > > + xorl %edx, %edx
> > > > + xrstor (%esp)
> > > > + movl %ecx, %eax
> > > > +#endif
> > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > + mov %ebx, %esp
> > > > + cfi_def_cfa_register(%esp)
> > > > + movl -28(%esp), %ebx
> > > > + cfi_restore(%ebx)
> > > > +#else
> > > > + addl $REGISTER_SAVE_AREA, %esp
> > > > + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA)
> > > > +#endif
> > > > + jmp 1b
> > > > + cfi_endproc
> > > > + .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > > > +
> > > > +#undef STATE_SAVE_ALIGNMENT
> > > > diff --git a/sysdeps/i386/dl-tlsdesc.S b/sysdeps/i386/dl-tlsdesc.S
> > > > index 90d93caa0c..f002feee56 100644
> > > > --- a/sysdeps/i386/dl-tlsdesc.S
> > > > +++ b/sysdeps/i386/dl-tlsdesc.S
> > > > @@ -18,8 +18,27 @@
> > > >
> > > > #include <sysdep.h>
> > > > #include <tls.h>
> > > > +#include <cpu-features-offsets.h>
> > > > +#include <features-offsets.h>
> > > > #include "tlsdesc.h"
> > > >
> > > > +#ifndef DL_STACK_ALIGNMENT
> > > > +/* Due to GCC bug:
> > > > +
> > > > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> > > > +
> > > > + __tls_get_addr may be called with 4-byte stack alignment. Although
> > > > + this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> > > > + that stack will be always aligned at 16 bytes. */
> > > > +# define DL_STACK_ALIGNMENT 4
> > > > +#endif
> > > > +
> > > > +/* True if _dl_tlsdesc_dynamic should align stack for STATE_SAVE or align
> > > > + stack to MINIMUM_ALIGNMENT bytes before calling ___tls_get_addr. */
> > > > +#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> > > > + (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> > > > + || MINIMUM_ALIGNMENT > DL_STACK_ALIGNMENT)
> > > > +
> > > > .text
> > > >
> > > > /* This function is used to compute the TP offset for symbols in
> > > > @@ -65,69 +84,35 @@ _dl_tlsdesc_undefweak:
> > > > .size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
> > > >
> > > > #ifdef SHARED
> > > > - .hidden _dl_tlsdesc_dynamic
> > > > - .global _dl_tlsdesc_dynamic
> > > > - .type _dl_tlsdesc_dynamic,@function
> > > > -
> > > > - /* This function is used for symbols that need dynamic TLS.
> > > > -
> > > > - %eax points to the TLS descriptor, such that 0(%eax) points to
> > > > - _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
> > > > - tlsdesc_dynamic_arg object. It must return in %eax the offset
> > > > - between the thread pointer and the object denoted by the
> > > > - argument, without clobbering any registers.
> > > > -
> > > > - The assembly code that follows is a rendition of the following
> > > > - C code, hand-optimized a little bit.
> > > > -
> > > > -ptrdiff_t
> > > > -__attribute__ ((__regparm__ (1)))
> > > > -_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
> > > > -{
> > > > - struct tlsdesc_dynamic_arg *td = tdp->arg;
> > > > - dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > > > - if (__builtin_expect (td->gen_count <= dtv[0].counter
> > > > - && (dtv[td->tlsinfo.ti_module].pointer.val
> > > > - != TLS_DTV_UNALLOCATED),
> > > > - 1))
> > > > - return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > > > - - __thread_pointer;
> > > > -
> > > > - return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
> > > > -}
> > > > -*/
> > > > - cfi_startproc
> > > > - .align 16
> > > > -_dl_tlsdesc_dynamic:
> > > > - /* Like all TLS resolvers, preserve call-clobbered registers.
> > > > - We need two scratch regs anyway. */
> > > > - subl $28, %esp
> > > > - cfi_adjust_cfa_offset (28)
> > > > - movl %ecx, 20(%esp)
> > > > - movl %edx, 24(%esp)
> > > > - movl TLSDESC_ARG(%eax), %eax
> > > > - movl %gs:DTV_OFFSET, %edx
> > > > - movl TLSDESC_GEN_COUNT(%eax), %ecx
> > > > - cmpl (%edx), %ecx
> > > > - ja .Lslow
> > > > - movl TLSDESC_MODID(%eax), %ecx
> > > > - movl (%edx,%ecx,8), %edx
> > > > - cmpl $-1, %edx
> > > > - je .Lslow
> > > > - movl TLSDESC_MODOFF(%eax), %eax
> > > > - addl %edx, %eax
> > > > -.Lret:
> > > > - movl 20(%esp), %ecx
> > > > - subl %gs:0, %eax
> > > > - movl 24(%esp), %edx
> > > > - addl $28, %esp
> > > > - cfi_adjust_cfa_offset (-28)
> > > > - ret
> > > > - .p2align 4,,7
> > > > -.Lslow:
> > > > - cfi_adjust_cfa_offset (28)
> > > > - call HIDDEN_JUMPTARGET (___tls_get_addr)
> > > > - jmp .Lret
> > > > - cfi_endproc
> > > > - .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > > > +# define USE_FNSAVE
> > > > +# define MINIMUM_ALIGNMENT 4
> > > > +# define STATE_SAVE_ALIGNMENT 4
> > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fnsave
> > > > +# include "dl-tlsdesc-dynamic.h"
> > > > +# undef _dl_tlsdesc_dynamic
> > > > +# undef MINIMUM_ALIGNMENT
> > > > +# undef USE_FNSAVE
> > > > +
> > > > +# define MINIMUM_ALIGNMENT 16
> > > > +
> > > > +# define USE_FXSAVE
> > > > +# define STATE_SAVE_ALIGNMENT 16
> > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave
> > > > +# include "dl-tlsdesc-dynamic.h"
> > > > +# undef _dl_tlsdesc_dynamic
> > > > +# undef USE_FXSAVE
> > > > +
> > > > +# define USE_XSAVE
> > > > +# define STATE_SAVE_ALIGNMENT 64
> > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave
> > > > +# include "dl-tlsdesc-dynamic.h"
> > > > +# undef _dl_tlsdesc_dynamic
> > > > +# undef USE_XSAVE
> > > > +
> > > > +# define USE_XSAVEC
> > > > +# define STATE_SAVE_ALIGNMENT 64
> > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec
> > > > +# include "dl-tlsdesc-dynamic.h"
> > > > +# undef _dl_tlsdesc_dynamic
> > > > +# undef USE_XSAVEC
> > > > #endif /* SHARED */
> > > > diff --git a/sysdeps/x86/Makefile b/sysdeps/x86/Makefile
> > > > index 73b29cc78c..5311b594af 100644
> > > > --- a/sysdeps/x86/Makefile
> > > > +++ b/sysdeps/x86/Makefile
> > > > @@ -1,5 +1,5 @@
> > > > ifeq ($(subdir),csu)
> > > > -gen-as-const-headers += cpu-features-offsets.sym
> > > > +gen-as-const-headers += cpu-features-offsets.sym features-offsets.sym
> > > > endif
> > > >
> > > > ifeq ($(subdir),elf)
> > > > @@ -86,6 +86,11 @@ endif
> > > > tst-ifunc-isa-2-ENV = GLIBC_TUNABLES=glibc.cpu.hwcaps=-SSE4_2,-AVX,-AVX2,-AVX512F
> > > > tst-ifunc-isa-2-static-ENV = $(tst-ifunc-isa-2-ENV)
> > > > tst-hwcap-tunables-ARGS = -- $(host-test-program-cmd)
> > > > +
> > > > +CFLAGS-tst-gnu2-tls2.c += -msse
> > > > +CFLAGS-tst-gnu2-tls2mod0.c += -msse2 -mtune=haswell
> > > > +CFLAGS-tst-gnu2-tls2mod1.c += -msse2 -mtune=haswell
> > > > +CFLAGS-tst-gnu2-tls2mod2.c += -msse2 -mtune=haswell
> > > > endif
> > > >
> > > > ifeq ($(subdir),math)
> > > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> > > > index 25e6622a79..835113b42f 100644
> > > > --- a/sysdeps/x86/cpu-features.c
> > > > +++ b/sysdeps/x86/cpu-features.c
> > > > @@ -27,8 +27,13 @@
> > > > extern void TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *)
> > > > attribute_hidden;
> > > >
> > > > -#if defined SHARED && defined __x86_64__
> > > > -# include <dl-plt-rewrite.h>
> > > > +#if defined SHARED
> > > > +extern void _dl_tlsdesc_dynamic_fxsave (void) attribute_hidden;
> > > > +extern void _dl_tlsdesc_dynamic_xsave (void) attribute_hidden;
> > > > +extern void _dl_tlsdesc_dynamic_xsavec (void) attribute_hidden;
> > > > +
> > > > +# ifdef __x86_64__
> > > > +# include <dl-plt-rewrite.h>
> > > >
> > > > static void
> > > > TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
> > > > @@ -47,6 +52,15 @@ TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
> > > > : plt_rewrite_jmp);
> > > > }
> > > > }
> > > > +# else
> > > > +extern void _dl_tlsdesc_dynamic_fnsave (void) attribute_hidden;
> > > > +# endif
> > > > +#endif
> > > > +
> > > > +#ifdef __x86_64__
> > > > +extern void _dl_runtime_resolve_fxsave (void) attribute_hidden;
> > > > +extern void _dl_runtime_resolve_xsave (void) attribute_hidden;
> > > > +extern void _dl_runtime_resolve_xsavec (void) attribute_hidden;
> > > > #endif
> > > >
> > > > #ifdef __LP64__
> > > > @@ -1130,6 +1144,44 @@ no_cpuid:
> > > > TUNABLE_CALLBACK (set_x86_shstk));
> > > > #endif
> > > >
> > > > + if (GLRO(dl_x86_cpu_features).xsave_state_size != 0)
> > > > + {
> > > > + if (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC))
> > > > + {
> > > > +#ifdef __x86_64__
> > > > + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_xsavec;
> > > > +#endif
> > > > +#ifdef SHARED
> > > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_xsavec;
> > > > +#endif
> > > > + }
> > > > + else
> > > > + {
> > > > +#ifdef __x86_64__
> > > > + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_xsave;
> > > > +#endif
> > > > +#ifdef SHARED
> > > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_xsave;
> > > > +#endif
> > > > + }
> > > > + }
> > > > + else
> > > > + {
> > > > +#ifdef __x86_64__
> > > > + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_fxsave;
> > > > +# ifdef SHARED
> > > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fxsave;
> > > > +# endif
> > > > +#else
> > > > +# ifdef SHARED
> > > > + if (CPU_FEATURE_USABLE_P (cpu_features, FXSR))
> > > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fxsave;
> > > > + else
> > > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fnsave;
> > > > +# endif
> > > > +#endif
> > > > + }
> > > > +
> > > > #ifdef SHARED
> > > > # ifdef __x86_64__
> > > > TUNABLE_GET (plt_rewrite, tunable_val_t *,
> > > > diff --git a/sysdeps/x86/dl-procinfo.c b/sysdeps/x86/dl-procinfo.c
> > > > index ee957b4d70..5920d4b320 100644
> > > > --- a/sysdeps/x86/dl-procinfo.c
> > > > +++ b/sysdeps/x86/dl-procinfo.c
> > > > @@ -86,3 +86,19 @@ PROCINFO_CLASS const char _dl_x86_platforms[4][9]
> > > > #else
> > > > ,
> > > > #endif
> > > > +
> > > > +#if defined SHARED && !IS_IN (ldconfig)
> > > > +# if !defined PROCINFO_DECL
> > > > + ._dl_x86_tlsdesc_dynamic
> > > > +# else
> > > > +PROCINFO_CLASS void * _dl_x86_tlsdesc_dynamic
> > > > +# endif
> > > > +# ifndef PROCINFO_DECL
> > > > += NULL
> > > > +# endif
> > > > +# ifdef PROCINFO_DECL
> > > > +;
> > > > +# else
> > > > +,
> > > > +# endif
> > > > +#endif
> > > > diff --git a/sysdeps/x86_64/features-offsets.sym b/sysdeps/x86/features-offsets.sym
> > > > similarity index 89%
> > > > rename from sysdeps/x86_64/features-offsets.sym
> > > > rename to sysdeps/x86/features-offsets.sym
> > > > index 9e4be3393a..77e990c705 100644
> > > > --- a/sysdeps/x86_64/features-offsets.sym
> > > > +++ b/sysdeps/x86/features-offsets.sym
> > > > @@ -3,4 +3,6 @@
> > > > #include <ldsodefs.h>
> > > >
> > > > RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET offsetof (struct rtld_global_ro, _dl_x86_cpu_features)
> > > > +#ifdef __x86_64__
> > > > RTLD_GLOBAL_DL_X86_FEATURE_1_OFFSET offsetof (struct rtld_global, _dl_x86_feature_1)
> > > > +#endif
> > > > diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h
> > > > index 837fd28734..485cad9c02 100644
> > > > --- a/sysdeps/x86/sysdep.h
> > > > +++ b/sysdeps/x86/sysdep.h
> > > > @@ -70,6 +70,12 @@
> > > > | (1 << X86_XSTATE_ZMM_H_ID))
> > > > #endif
> > > >
> > > > +/* States which should be saved for TLSDESC_CALL and TLS_DESC_CALL.
> > > > + Compiler assumes that all registers, including x87 FPU stack registers,
> > > > + are unchanged after CALL, except for EFLAGS and RAX/EAX. */
> > > > +#define TLSDESC_CALL_STATE_SAVE_MASK \
> > > > + (STATE_SAVE_MASK | (1 << X86_XSTATE_X87_ID))
> > > > +
> > > > /* Constants for bits in __x86_string_control: */
> > > >
> > > > /* Avoid short distance REP MOVSB. */
> > > > diff --git a/sysdeps/x86/tst-gnu2-tls2.c b/sysdeps/x86/tst-gnu2-tls2.c
> > > > new file mode 100644
> > > > index 0000000000..de900a423b
> > > > --- /dev/null
> > > > +++ b/sysdeps/x86/tst-gnu2-tls2.c
> > > > @@ -0,0 +1,20 @@
> > > > +#ifndef __x86_64__
> > > > +#include <sys/platform/x86.h>
> > > > +
> > > > +#define IS_SUPPORTED() CPU_FEATURE_ACTIVE (SSE2)
> > > > +#endif
> > > > +
> > > > +/* Clear XMM0...XMM7 */
> > > > +#define PREPARE_MALLOC() \
> > > > +{ \
> > > > + asm volatile ("xorps %%xmm0, %%xmm0" : : : "xmm0" ); \
> > > > + asm volatile ("xorps %%xmm1, %%xmm1" : : : "xmm1" ); \
> > > > + asm volatile ("xorps %%xmm2, %%xmm2" : : : "xmm2" ); \
> > > > + asm volatile ("xorps %%xmm3, %%xmm3" : : : "xmm3" ); \
> > > > + asm volatile ("xorps %%xmm4, %%xmm4" : : : "xmm4" ); \
> > > > + asm volatile ("xorps %%xmm5, %%xmm5" : : : "xmm5" ); \
> > > > + asm volatile ("xorps %%xmm6, %%xmm6" : : : "xmm6" ); \
> > > > + asm volatile ("xorps %%xmm7, %%xmm7" : : : "xmm7" ); \
> > > > +}
> > > > +
> > > > +#include <elf/tst-gnu2-tls2.c>
> > > > diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
> > > > index 145f25e7f6..9337e95093 100644
> > > > --- a/sysdeps/x86_64/Makefile
> > > > +++ b/sysdeps/x86_64/Makefile
> > > > @@ -10,7 +10,7 @@ LDFLAGS-rtld += -Wl,-z,nomark-plt
> > > > endif
> > > >
> > > > ifeq ($(subdir),csu)
> > > > -gen-as-const-headers += features-offsets.sym link-defines.sym
> > > > +gen-as-const-headers += link-defines.sym
> > > > endif
> > > >
> > > > ifeq ($(subdir),gmon)
> > > > diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> > > > index 6d605d0d32..ff5d45f7cb 100644
> > > > --- a/sysdeps/x86_64/dl-machine.h
> > > > +++ b/sysdeps/x86_64/dl-machine.h
> > > > @@ -71,9 +71,6 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > > > int lazy, int profile)
> > > > {
> > > > Elf64_Addr *got;
> > > > - extern void _dl_runtime_resolve_fxsave (ElfW(Word)) attribute_hidden;
> > > > - extern void _dl_runtime_resolve_xsave (ElfW(Word)) attribute_hidden;
> > > > - extern void _dl_runtime_resolve_xsavec (ElfW(Word)) attribute_hidden;
> > > > extern void _dl_runtime_profile_sse (ElfW(Word)) attribute_hidden;
> > > > extern void _dl_runtime_profile_avx (ElfW(Word)) attribute_hidden;
> > > > extern void _dl_runtime_profile_avx512 (ElfW(Word)) attribute_hidden;
> > > > @@ -96,8 +93,6 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > > > /* Identify this shared object. */
> > > > *(ElfW(Addr) *) (got + 1) = (ElfW(Addr)) l;
> > > >
> > > > - const struct cpu_features* cpu_features = __get_cpu_features ();
> > > > -
> > > > #ifdef SHARED
> > > > /* The got[2] entry contains the address of a function which gets
> > > > called to get the address of a so far unresolved function and
> > > > @@ -107,6 +102,7 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > > > end in this function. */
> > > > if (__glibc_unlikely (profile))
> > > > {
> > > > + const struct cpu_features* cpu_features = __get_cpu_features ();
> > > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512F))
> > > > *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx512;
> > > > else if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX))
> > > > @@ -126,15 +122,8 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > > > /* This function will get called to fix up the GOT entry
> > > > indicated by the offset on the stack, and then jump to
> > > > the resolved address. */
> > > > - if (MINIMUM_X86_ISA_LEVEL >= AVX_X86_ISA_LEVEL
> > > > - || GLRO(dl_x86_cpu_features).xsave_state_size != 0)
> > > > - *(ElfW(Addr) *) (got + 2)
> > > > - = (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC)
> > > > - ? (ElfW(Addr)) &_dl_runtime_resolve_xsavec
> > > > - : (ElfW(Addr)) &_dl_runtime_resolve_xsave);
> > > > - else
> > > > - *(ElfW(Addr) *) (got + 2)
> > > > - = (ElfW(Addr)) &_dl_runtime_resolve_fxsave;
> > > > + *(ElfW(Addr) *) (got + 2)
> > > > + = (ElfW(Addr)) GLRO(dl_x86_64_runtime_resolve);
> > > > }
> > > > }
> > > >
> > > > @@ -383,7 +372,7 @@ and creates an unsatisfiable circular dependency.\n",
> > > > {
> > > > td->arg = _dl_make_tlsdesc_dynamic
> > > > (sym_map, sym->st_value + reloc->r_addend);
> > > > - td->entry = _dl_tlsdesc_dynamic;
> > > > + td->entry = GLRO(dl_x86_tlsdesc_dynamic);
> > > > }
> > > > else
> > > > # endif
> > > > diff --git a/sysdeps/x86_64/dl-procinfo.c b/sysdeps/x86_64/dl-procinfo.c
> > > > index 4d1d790fbb..06637a8154 100644
> > > > --- a/sysdeps/x86_64/dl-procinfo.c
> > > > +++ b/sysdeps/x86_64/dl-procinfo.c
> > > > @@ -41,5 +41,21 @@
> > > >
> > > > #include <sysdeps/x86/dl-procinfo.c>
> > > >
> > > > +#if !IS_IN (ldconfig)
> > > > +# if !defined PROCINFO_DECL && defined SHARED
> > > > + ._dl_x86_64_runtime_resolve
> > > > +# else
> > > > +PROCINFO_CLASS void * _dl_x86_64_runtime_resolve
> > > > +# endif
> > > > +# ifndef PROCINFO_DECL
> > > > += NULL
> > > > +# endif
> > > > +# if !defined SHARED || defined PROCINFO_DECL
> > > > +;
> > > > +# else
> > > > +,
> > > > +# endif
> > > > +#endif
> > > > +
> > > > #undef PROCINFO_DECL
> > > > #undef PROCINFO_CLASS
> > > > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > > new file mode 100644
> > > > index 0000000000..0c2e8d5320
> > > > --- /dev/null
> > > > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > > @@ -0,0 +1,166 @@
> > > > +/* Thread-local storage handling in the ELF dynamic linker. x86_64 version.
> > > > + Copyright (C) 2004-2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <https://www.gnu.org/licenses/>. */
> > > > +
> > > > +#ifndef SECTION
> > > > +# define SECTION(p) p
> > > > +#endif
> > > > +
> > > > +#undef REGISTER_SAVE_AREA
> > > > +#undef LOCAL_STORAGE_AREA
> > > > +#undef BASE
> > > > +
> > > > +#include "dl-trampoline-state.h"
> > > > +
> > > > + .section SECTION(.text),"ax",@progbits
> > > > +
> > > > + .hidden _dl_tlsdesc_dynamic
> > > > + .global _dl_tlsdesc_dynamic
> > > > + .type _dl_tlsdesc_dynamic,@function
> > > > +
> > > > + /* %rax points to the TLS descriptor, such that 0(%rax) points to
> > > > + _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
> > > > + tlsdesc_dynamic_arg object. It must return in %rax the offset
> > > > + between the thread pointer and the object denoted by the
> > > > + argument, without clobbering any registers.
> > > > +
> > > > + The assembly code that follows is a rendition of the following
> > > > + C code, hand-optimized a little bit.
> > > > +
> > > > +ptrdiff_t
> > > > +_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
> > > > +{
> > > > + struct tlsdesc_dynamic_arg *td = tdp->arg;
> > > > + dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > > > + if (__builtin_expect (td->gen_count <= dtv[0].counter
> > > > + && (dtv[td->tlsinfo.ti_module].pointer.val
> > > > + != TLS_DTV_UNALLOCATED),
> > > > + 1))
> > > > + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > > > + - __thread_pointer;
> > > > +
> > > > + return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
> > > > +}
> > > > +*/
> > > > + cfi_startproc
> > > > + .align 16
> > > > +_dl_tlsdesc_dynamic:
> > > > + _CET_ENDBR
> > > > + /* Preserve call-clobbered registers that we modify.
> > > > + We need two scratch regs anyway. */
> > > > + movq %rsi, -16(%rsp)
> > > > + mov %fs:DTV_OFFSET, %RSI_LP
> > > > + movq %rdi, -8(%rsp)
> > > > + movq TLSDESC_ARG(%rax), %rdi
> > > > + movq (%rsi), %rax
> > > > + cmpq %rax, TLSDESC_GEN_COUNT(%rdi)
> > > > + ja 2f
> > > > + movq TLSDESC_MODID(%rdi), %rax
> > > > + salq $4, %rax
> > > > + movq (%rax,%rsi), %rax
> > > > + cmpq $-1, %rax
> > > > + je 2f
> > > > + addq TLSDESC_MODOFF(%rdi), %rax
> > > > +1:
> > > > + movq -16(%rsp), %rsi
> > > > + sub %fs:0, %RAX_LP
> > > > + movq -8(%rsp), %rdi
> > > > + ret
> > > > +2:
> > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > + movq %rbx, -24(%rsp)
> > > > + mov %RSP_LP, %RBX_LP
> > > > + cfi_def_cfa_register(%rbx)
> > > > + and $-STATE_SAVE_ALIGNMENT, %RSP_LP
> > > > +#endif
> > > > +#ifdef REGISTER_SAVE_AREA
> > > > +# if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > + /* STATE_SAVE_OFFSET has space for 8 integer registers. But we
> > > > + need space for RCX, RDX, RSI, RDI, R8, R9, R10 and R11, plus
> > > > + RBX above. */
> > > > + sub $(REGISTER_SAVE_AREA + STATE_SAVE_ALIGNMENT), %RSP_LP
> > > > +# else
> > > > + sub $REGISTER_SAVE_AREA, %RSP_LP
> > > > + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
> > > > +# endif
> > > > +#else
> > > > + /* Allocate stack space of the required size to save the state. */
> > > > + sub _rtld_local_ro+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET(%rip), %RSP_LP
> > > > +#endif
> > > > + /* Besides rdi and rsi, saved above, save rcx, rdx, r8, r9,
> > > > + r10 and r11. */
> > > > + movq %rcx, REGISTER_SAVE_RCX(%rsp)
> > > > + movq %rdx, REGISTER_SAVE_RDX(%rsp)
> > > > + movq %r8, REGISTER_SAVE_R8(%rsp)
> > > > + movq %r9, REGISTER_SAVE_R9(%rsp)
> > > > + movq %r10, REGISTER_SAVE_R10(%rsp)
> > > > + movq %r11, REGISTER_SAVE_R11(%rsp)
> > > > +#ifdef USE_FXSAVE
> > > > + fxsave STATE_SAVE_OFFSET(%rsp)
> > > > +#else
> > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > > + xorl %edx, %edx
> > > > + /* Clear the XSAVE Header. */
> > > > +# ifdef USE_XSAVE
> > > > + movq %rdx, (STATE_SAVE_OFFSET + 512)(%rsp)
> > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8)(%rsp)
> > > > +# endif
> > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 2)(%rsp)
> > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 3)(%rsp)
> > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 4)(%rsp)
> > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 5)(%rsp)
> > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 6)(%rsp)
> > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 7)(%rsp)
> > > > +# ifdef USE_XSAVE
> > > > + xsave STATE_SAVE_OFFSET(%rsp)
> > > > +# else
> > > > + xsavec STATE_SAVE_OFFSET(%rsp)
> > > > +# endif
> > > > +#endif
> > > > + /* %rdi already points to the tlsinfo data structure. */
> > > > + call HIDDEN_JUMPTARGET (__tls_get_addr)
> > > > + # Get register content back.
> > > > +#ifdef USE_FXSAVE
> > > > + fxrstor STATE_SAVE_OFFSET(%rsp)
> > > > +#else
> > > > + /* Save and retore __tls_get_addr return value stored in RAX. */
> > > > + mov %RAX_LP, %RCX_LP
> > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > > + xorl %edx, %edx
> > > > + xrstor STATE_SAVE_OFFSET(%rsp)
> > > > + mov %RCX_LP, %RAX_LP
> > > > +#endif
> > > > + movq REGISTER_SAVE_R11(%rsp), %r11
> > > > + movq REGISTER_SAVE_R10(%rsp), %r10
> > > > + movq REGISTER_SAVE_R9(%rsp), %r9
> > > > + movq REGISTER_SAVE_R8(%rsp), %r8
> > > > + movq REGISTER_SAVE_RDX(%rsp), %rdx
> > > > + movq REGISTER_SAVE_RCX(%rsp), %rcx
> > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > + mov %RBX_LP, %RSP_LP
> > > > + cfi_def_cfa_register(%rsp)
> > > > + movq -24(%rsp), %rbx
> > > > + cfi_restore(%rbx)
> > > > +#else
> > > > + add $REGISTER_SAVE_AREA, %RSP_LP
> > > > + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA)
> > > > +#endif
> > > > + jmp 1b
> > > > + cfi_endproc
> > > > + .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > > > +
> > > > +#undef STATE_SAVE_ALIGNMENT
> > > > diff --git a/sysdeps/x86_64/dl-tlsdesc.S b/sysdeps/x86_64/dl-tlsdesc.S
> > > > index f748af2ece..ea69f5223a 100644
> > > > --- a/sysdeps/x86_64/dl-tlsdesc.S
> > > > +++ b/sysdeps/x86_64/dl-tlsdesc.S
> > > > @@ -18,7 +18,19 @@
> > > >
> > > > #include <sysdep.h>
> > > > #include <tls.h>
> > > > +#include <cpu-features-offsets.h>
> > > > +#include <features-offsets.h>
> > > > #include "tlsdesc.h"
> > > > +#include "dl-trampoline-save.h"
> > > > +
> > > > +/* Area on stack to save and restore registers used for parameter
> > > > + passing when calling _dl_tlsdesc_dynamic. */
> > > > +#define REGISTER_SAVE_RCX 0
> > > > +#define REGISTER_SAVE_RDX (REGISTER_SAVE_RCX + 8)
> > > > +#define REGISTER_SAVE_R8 (REGISTER_SAVE_RDX + 8)
> > > > +#define REGISTER_SAVE_R9 (REGISTER_SAVE_R8 + 8)
> > > > +#define REGISTER_SAVE_R10 (REGISTER_SAVE_R9 + 8)
> > > > +#define REGISTER_SAVE_R11 (REGISTER_SAVE_R10 + 8)
> > > >
> > > > .text
> > > >
> > > > @@ -67,80 +79,24 @@ _dl_tlsdesc_undefweak:
> > > > .size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
> > > >
> > > > #ifdef SHARED
> > > > - .hidden _dl_tlsdesc_dynamic
> > > > - .global _dl_tlsdesc_dynamic
> > > > - .type _dl_tlsdesc_dynamic,@function
> > > > -
> > > > - /* %rax points to the TLS descriptor, such that 0(%rax) points to
> > > > - _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
> > > > - tlsdesc_dynamic_arg object. It must return in %rax the offset
> > > > - between the thread pointer and the object denoted by the
> > > > - argument, without clobbering any registers.
> > > > -
> > > > - The assembly code that follows is a rendition of the following
> > > > - C code, hand-optimized a little bit.
> > > > -
> > > > -ptrdiff_t
> > > > -_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
> > > > -{
> > > > - struct tlsdesc_dynamic_arg *td = tdp->arg;
> > > > - dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > > > - if (__builtin_expect (td->gen_count <= dtv[0].counter
> > > > - && (dtv[td->tlsinfo.ti_module].pointer.val
> > > > - != TLS_DTV_UNALLOCATED),
> > > > - 1))
> > > > - return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > > > - - __thread_pointer;
> > > > -
> > > > - return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
> > > > -}
> > > > -*/
> > > > - cfi_startproc
> > > > - .align 16
> > > > -_dl_tlsdesc_dynamic:
> > > > - _CET_ENDBR
> > > > - /* Preserve call-clobbered registers that we modify.
> > > > - We need two scratch regs anyway. */
> > > > - movq %rsi, -16(%rsp)
> > > > - mov %fs:DTV_OFFSET, %RSI_LP
> > > > - movq %rdi, -8(%rsp)
> > > > - movq TLSDESC_ARG(%rax), %rdi
> > > > - movq (%rsi), %rax
> > > > - cmpq %rax, TLSDESC_GEN_COUNT(%rdi)
> > > > - ja .Lslow
> > > > - movq TLSDESC_MODID(%rdi), %rax
> > > > - salq $4, %rax
> > > > - movq (%rax,%rsi), %rax
> > > > - cmpq $-1, %rax
> > > > - je .Lslow
> > > > - addq TLSDESC_MODOFF(%rdi), %rax
> > > > -.Lret:
> > > > - movq -16(%rsp), %rsi
> > > > - sub %fs:0, %RAX_LP
> > > > - movq -8(%rsp), %rdi
> > > > - ret
> > > > -.Lslow:
> > > > - /* Besides rdi and rsi, saved above, save rdx, rcx, r8, r9,
> > > > - r10 and r11. Also, align the stack, that's off by 8 bytes. */
> > > > - subq $72, %rsp
> > > > - cfi_adjust_cfa_offset (72)
> > > > - movq %rdx, 8(%rsp)
> > > > - movq %rcx, 16(%rsp)
> > > > - movq %r8, 24(%rsp)
> > > > - movq %r9, 32(%rsp)
> > > > - movq %r10, 40(%rsp)
> > > > - movq %r11, 48(%rsp)
> > > > - /* %rdi already points to the tlsinfo data structure. */
> > > > - call HIDDEN_JUMPTARGET (__tls_get_addr)
> > > > - movq 8(%rsp), %rdx
> > > > - movq 16(%rsp), %rcx
> > > > - movq 24(%rsp), %r8
> > > > - movq 32(%rsp), %r9
> > > > - movq 40(%rsp), %r10
> > > > - movq 48(%rsp), %r11
> > > > - addq $72, %rsp
> > > > - cfi_adjust_cfa_offset (-72)
> > > > - jmp .Lret
> > > > - cfi_endproc
> > > > - .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > > > +# define USE_FXSAVE
> > > > +# define STATE_SAVE_ALIGNMENT 16
> > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave
> > > > +# include "dl-tlsdesc-dynamic.h"
> > > > +# undef _dl_tlsdesc_dynamic
> > > > +# undef USE_FXSAVE
> > > > +
> > > > +# define USE_XSAVE
> > > > +# define STATE_SAVE_ALIGNMENT 64
> > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave
> > > > +# include "dl-tlsdesc-dynamic.h"
> > > > +# undef _dl_tlsdesc_dynamic
> > > > +# undef USE_XSAVE
> > > > +
> > > > +# define USE_XSAVEC
> > > > +# define STATE_SAVE_ALIGNMENT 64
> > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec
> > > > +# include "dl-tlsdesc-dynamic.h"
> > > > +# undef _dl_tlsdesc_dynamic
> > > > +# undef USE_XSAVEC
> > > > #endif /* SHARED */
> > > > diff --git a/sysdeps/x86_64/dl-trampoline-save.h b/sysdeps/x86_64/dl-trampoline-save.h
> > > > new file mode 100644
> > > > index 0000000000..84eac4a8ac
> > > > --- /dev/null
> > > > +++ b/sysdeps/x86_64/dl-trampoline-save.h
> > > > @@ -0,0 +1,34 @@
> > > > +/* x86-64 PLT trampoline register save macros.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <https://www.gnu.org/licenses/>. */
> > > > +
> > > > +#ifndef DL_STACK_ALIGNMENT
> > > > +/* Due to GCC bug:
> > > > +
> > > > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> > > > +
> > > > + __tls_get_addr may be called with 8-byte stack alignment. Although
> > > > + this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> > > > + that stack will be always aligned at 16 bytes. */
> > > > +# define DL_STACK_ALIGNMENT 8
> > > > +#endif
> > > > +
> > > > +/* True if _dl_runtime_resolve should align stack for STATE_SAVE or align
> > > > + stack to 16 bytes before calling _dl_fixup. */
> > > > +#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> > > > + (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> > > > + || 16 > DL_STACK_ALIGNMENT)
> > > > diff --git a/sysdeps/x86_64/dl-trampoline-state.h b/sysdeps/x86_64/dl-trampoline-state.h
> > > > new file mode 100644
> > > > index 0000000000..575f120797
> > > > --- /dev/null
> > > > +++ b/sysdeps/x86_64/dl-trampoline-state.h
> > > > @@ -0,0 +1,51 @@
> > > > +/* x86-64 PLT dl-trampoline state macros.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <https://www.gnu.org/licenses/>. */
> > > > +
> > > > +#if (STATE_SAVE_ALIGNMENT % 16) != 0
> > > > +# error STATE_SAVE_ALIGNMENT must be multiple of 16
> > > > +#endif
> > > > +
> > > > +#if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) != 0
> > > > +# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT
> > > > +#endif
> > > > +
> > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > +/* Local stack area before jumping to function address: RBX. */
> > > > +# define LOCAL_STORAGE_AREA 8
> > > > +# define BASE rbx
> > > > +# ifdef USE_FXSAVE
> > > > +/* Use fxsave to save XMM registers. */
> > > > +# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET)
> > > > +# if (REGISTER_SAVE_AREA % 16) != 0
> > > > +# error REGISTER_SAVE_AREA must be multiple of 16
> > > > +# endif
> > > > +# endif
> > > > +#else
> > > > +# ifndef USE_FXSAVE
> > > > +# error USE_FXSAVE must be defined
> > > > +# endif
> > > > +/* Use fxsave to save XMM registers. */
> > > > +# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8)
> > > > +/* Local stack area before jumping to function address: All saved
> > > > + registers. */
> > > > +# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA
> > > > +# define BASE rsp
> > > > +# if (REGISTER_SAVE_AREA % 16) != 8
> > > > +# error REGISTER_SAVE_AREA must be odd multiple of 8
> > > > +# endif
> > > > +#endif
> > > > diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S
> > > > index b2e7e0f69b..87c5137837 100644
> > > > --- a/sysdeps/x86_64/dl-trampoline.S
> > > > +++ b/sysdeps/x86_64/dl-trampoline.S
> > > > @@ -22,25 +22,7 @@
> > > > #include <features-offsets.h>
> > > > #include <link-defines.h>
> > > > #include <isa-level.h>
> > > > -
> > > > -#ifndef DL_STACK_ALIGNMENT
> > > > -/* Due to GCC bug:
> > > > -
> > > > - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> > > > -
> > > > - __tls_get_addr may be called with 8-byte stack alignment. Although
> > > > - this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> > > > - that stack will be always aligned at 16 bytes. We use unaligned
> > > > - 16-byte move to load and store SSE registers, which has no penalty
> > > > - on modern processors if stack is 16-byte aligned. */
> > > > -# define DL_STACK_ALIGNMENT 8
> > > > -#endif
> > > > -
> > > > -/* True if _dl_runtime_resolve should align stack for STATE_SAVE or align
> > > > - stack to 16 bytes before calling _dl_fixup. */
> > > > -#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> > > > - (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> > > > - || 16 > DL_STACK_ALIGNMENT)
> > > > +#include "dl-trampoline-save.h"
> > > >
> > > > /* Area on stack to save and restore registers used for parameter
> > > > passing when calling _dl_fixup. */
> > > > diff --git a/sysdeps/x86_64/dl-trampoline.h b/sysdeps/x86_64/dl-trampoline.h
> > > > index f55c6ea040..d9ccfb40d4 100644
> > > > --- a/sysdeps/x86_64/dl-trampoline.h
> > > > +++ b/sysdeps/x86_64/dl-trampoline.h
> > > > @@ -27,39 +27,7 @@
> > > > # undef LOCAL_STORAGE_AREA
> > > > # undef BASE
> > > >
> > > > -# if (STATE_SAVE_ALIGNMENT % 16) != 0
> > > > -# error STATE_SAVE_ALIGNMENT must be multiple of 16
> > > > -# endif
> > > > -
> > > > -# if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) != 0
> > > > -# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT
> > > > -# endif
> > > > -
> > > > -# if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > -/* Local stack area before jumping to function address: RBX. */
> > > > -# define LOCAL_STORAGE_AREA 8
> > > > -# define BASE rbx
> > > > -# ifdef USE_FXSAVE
> > > > -/* Use fxsave to save XMM registers. */
> > > > -# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET)
> > > > -# if (REGISTER_SAVE_AREA % 16) != 0
> > > > -# error REGISTER_SAVE_AREA must be multiple of 16
> > > > -# endif
> > > > -# endif
> > > > -# else
> > > > -# ifndef USE_FXSAVE
> > > > -# error USE_FXSAVE must be defined
> > > > -# endif
> > > > -/* Use fxsave to save XMM registers. */
> > > > -# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8)
> > > > -/* Local stack area before jumping to function address: All saved
> > > > - registers. */
> > > > -# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA
> > > > -# define BASE rsp
> > > > -# if (REGISTER_SAVE_AREA % 16) != 8
> > > > -# error REGISTER_SAVE_AREA must be odd multiple of 8
> > > > -# endif
> > > > -# endif
> > > > +# include "dl-trampoline-state.h"
> > > >
> > > > .globl _dl_runtime_resolve
> > > > .hidden _dl_runtime_resolve
> > > > --
> > > > 2.43.2
> > > >
> > >
> > > This still LGTM.
> > > Feel free to add my reviewed-by tag to future posts of the patch.
> > > Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
> >
> > Please wait for community consensus before pushing
>
> It was agreed that I should send out the v10 patch with xfail on
> the new test. Someone just needs to double check that my
> v10 patch does it.
Okay, give it 24 hour. If you no further concerns are brought up
think its okay to push.
>
> --
> H.J.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v10 1/1] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers
2024-02-27 17:00 ` Noah Goldstein
@ 2024-03-08 20:10 ` H.J. Lu
0 siblings, 0 replies; 7+ messages in thread
From: H.J. Lu @ 2024-03-08 20:10 UTC (permalink / raw)
To: Noah Goldstein, Florian Weimer, Sunil K Pandey, Libc-stable Mailing List
Cc: libc-alpha, adhemerval.zanella
On Tue, Feb 27, 2024 at 9:00 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Mon, Feb 26, 2024 at 1:06 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Mon, Feb 26, 2024 at 11:02 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > >
> > > On Mon, Feb 26, 2024 at 1:02 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > > >
> > > > On Mon, Feb 26, 2024 at 8:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > > >
> > > > > Compiler generates the following instruction sequence for GNU2 dynamic
> > > > > TLS access:
> > > > >
> > > > > leaq tls_var@TLSDESC(%rip), %rax
> > > > > call *tls_var@TLSCALL(%rax)
> > > > >
> > > > > or
> > > > >
> > > > > leal tls_var@TLSDESC(%ebx), %eax
> > > > > call *tls_var@TLSCALL(%eax)
> > > > >
> > > > > CALL instruction is transparent to compiler which assumes all registers,
> > > > > except for EFLAGS and RAX/EAX, are unchanged after CALL. When
> > > > > _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
> > > > > path. __tls_get_addr is a normal function which doesn't preserve any
> > > > > caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer
> > > > > caller-saved registers, but didn't preserve any other caller-saved
> > > > > registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
> > > > > XSAVE and XSAVEC to save and restore all caller-saved registers. This
> > > > > fixes BZ #31372.
> > > > >
> > > > > Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
> > > > > to optimize elf_machine_runtime_setup.
> > > > > ---
> > > > > elf/Makefile | 18 ++
> > > > > elf/tst-gnu2-tls2.c | 122 ++++++++++++
> > > > > elf/tst-gnu2-tls2.h | 36 ++++
> > > > > elf/tst-gnu2-tls2mod0.c | 31 +++
> > > > > elf/tst-gnu2-tls2mod1.c | 31 +++
> > > > > elf/tst-gnu2-tls2mod2.c | 31 +++
> > > > > sysdeps/i386/dl-machine.h | 2 +-
> > > > > sysdeps/i386/dl-tlsdesc-dynamic.h | 190 +++++++++++++++++++
> > > > > sysdeps/i386/dl-tlsdesc.S | 115 +++++------
> > > > > sysdeps/x86/Makefile | 7 +-
> > > > > sysdeps/x86/cpu-features.c | 56 +++++-
> > > > > sysdeps/x86/dl-procinfo.c | 16 ++
> > > > > sysdeps/{x86_64 => x86}/features-offsets.sym | 2 +
> > > > > sysdeps/x86/sysdep.h | 6 +
> > > > > sysdeps/x86/tst-gnu2-tls2.c | 20 ++
> > > > > sysdeps/x86_64/Makefile | 2 +-
> > > > > sysdeps/x86_64/dl-machine.h | 19 +-
> > > > > sysdeps/x86_64/dl-procinfo.c | 16 ++
> > > > > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 166 ++++++++++++++++
> > > > > sysdeps/x86_64/dl-tlsdesc.S | 108 ++++-------
> > > > > sysdeps/x86_64/dl-trampoline-save.h | 34 ++++
> > > > > sysdeps/x86_64/dl-trampoline-state.h | 51 +++++
> > > > > sysdeps/x86_64/dl-trampoline.S | 20 +-
> > > > > sysdeps/x86_64/dl-trampoline.h | 34 +---
> > > > > 24 files changed, 920 insertions(+), 213 deletions(-)
> > > > > create mode 100644 elf/tst-gnu2-tls2.c
> > > > > create mode 100644 elf/tst-gnu2-tls2.h
> > > > > create mode 100644 elf/tst-gnu2-tls2mod0.c
> > > > > create mode 100644 elf/tst-gnu2-tls2mod1.c
> > > > > create mode 100644 elf/tst-gnu2-tls2mod2.c
> > > > > create mode 100644 sysdeps/i386/dl-tlsdesc-dynamic.h
> > > > > rename sysdeps/{x86_64 => x86}/features-offsets.sym (89%)
> > > > > create mode 100644 sysdeps/x86/tst-gnu2-tls2.c
> > > > > create mode 100644 sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > > > create mode 100644 sysdeps/x86_64/dl-trampoline-save.h
> > > > > create mode 100644 sysdeps/x86_64/dl-trampoline-state.h
> > > > >
> > > > > diff --git a/elf/Makefile b/elf/Makefile
> > > > > index 36c04baf02..02dc476e27 100644
> > > > > --- a/elf/Makefile
> > > > > +++ b/elf/Makefile
> > > > > @@ -424,6 +424,7 @@ tests += \
> > > > > tst-glibc-hwcaps-prepend \
> > > > > tst-global1 \
> > > > > tst-global2 \
> > > > > + tst-gnu2-tls2 \
> > > > > tst-initfinilazyfail \
> > > > > tst-initorder \
> > > > > tst-initorder2 \
> > > > > @@ -846,6 +847,9 @@ modules-names += \
> > > > > tst-filterobj-flt \
> > > > > tst-finilazyfailmod \
> > > > > tst-globalmod2 \
> > > > > + tst-gnu2-tls2mod0 \
> > > > > + tst-gnu2-tls2mod1 \
> > > > > + tst-gnu2-tls2mod2 \
> > > > > tst-initlazyfailmod \
> > > > > tst-initorder2a \
> > > > > tst-initorder2b \
> > > > > @@ -3044,8 +3048,22 @@ $(objpfx)tst-tlsgap.out: \
> > > > > $(objpfx)tst-tlsgap-mod0.so \
> > > > > $(objpfx)tst-tlsgap-mod1.so \
> > > > > $(objpfx)tst-tlsgap-mod2.so
> > > > > +
> > > > > +$(objpfx)tst-gnu2-tls2: $(shared-thread-library)
> > > > > +$(objpfx)tst-gnu2-tls2.out: \
> > > > > + $(objpfx)tst-gnu2-tls2mod0.so \
> > > > > + $(objpfx)tst-gnu2-tls2mod1.so \
> > > > > + $(objpfx)tst-gnu2-tls2mod2.so
> > > > > +
> > > > > ifeq (yes,$(have-mtls-dialect-gnu2))
> > > > > +# This test fails if dl_tlsdesc_dynamic doesn't preserve all caller-saved
> > > > > +# registers. See https://sourceware.org/bugzilla/show_bug.cgi?id=31372
> > > > > +test-xfail-tst-gnu2-tls2 = yes
> > > > > +
> > > > > CFLAGS-tst-tlsgap-mod0.c += -mtls-dialect=gnu2
> > > > > CFLAGS-tst-tlsgap-mod1.c += -mtls-dialect=gnu2
> > > > > CFLAGS-tst-tlsgap-mod2.c += -mtls-dialect=gnu2
> > > > > +CFLAGS-tst-gnu2-tls2mod0.c += -mtls-dialect=gnu2
> > > > > +CFLAGS-tst-gnu2-tls2mod1.c += -mtls-dialect=gnu2
> > > > > +CFLAGS-tst-gnu2-tls2mod2.c += -mtls-dialect=gnu2
> > > > > endif
> > > > > diff --git a/elf/tst-gnu2-tls2.c b/elf/tst-gnu2-tls2.c
> > > > > new file mode 100644
> > > > > index 0000000000..7ac04d7f33
> > > > > --- /dev/null
> > > > > +++ b/elf/tst-gnu2-tls2.c
> > > > > @@ -0,0 +1,122 @@
> > > > > +/* Test TLSDESC relocation.
> > > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > > + This file is part of the GNU C Library.
> > > > > +
> > > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > > + modify it under the terms of the GNU Lesser General Public
> > > > > + License as published by the Free Software Foundation; either
> > > > > + version 2.1 of the License, or (at your option) any later version.
> > > > > +
> > > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > > + Lesser General Public License for more details.
> > > > > +
> > > > > + You should have received a copy of the GNU Lesser General Public
> > > > > + License along with the GNU C Library; if not, see
> > > > > + <http://www.gnu.org/licenses/>. */
> > > > > +
> > > > > +#include <stdio.h>
> > > > > +#include <stdlib.h>
> > > > > +#include <string.h>
> > > > > +#include <dlfcn.h>
> > > > > +#include <pthread.h>
> > > > > +#include <support/xdlfcn.h>
> > > > > +#include <support/xthread.h>
> > > > > +#include <support/check.h>
> > > > > +#include <support/test-driver.h>
> > > > > +#include "tst-gnu2-tls2.h"
> > > > > +
> > > > > +#ifndef IS_SUPPORTED
> > > > > +# define IS_SUPPORTED() true
> > > > > +#endif
> > > > > +
> > > > > +/* An architecture can define it to clobber caller-saved registers in
> > > > > + malloc below to verify that the implicit TLSDESC call won't change
> > > > > + caller-saved registers. */
> > > > > +#ifndef PREPARE_MALLOC
> > > > > +# define PREPARE_MALLOC()
> > > > > +#endif
> > > > > +
> > > > > +extern void * __libc_malloc (size_t);
> > > > > +
> > > > > +size_t malloc_counter = 0;
> > > > > +
> > > > > +void *
> > > > > +malloc (size_t n)
> > > > > +{
> > > > > + PREPARE_MALLOC ();
> > > > > + malloc_counter++;
> > > > > + return __libc_malloc (n);
> > > > > +}
> > > > > +
> > > > > +static void *mod[3];
> > > > > +#ifndef MOD
> > > > > +# define MOD(i) "tst-gnu2-tls2mod" #i ".so"
> > > > > +#endif
> > > > > +static const char *modname[3] = { MOD(0), MOD(1), MOD(2) };
> > > > > +#undef MOD
> > > > > +
> > > > > +static void
> > > > > +open_mod (int i)
> > > > > +{
> > > > > + mod[i] = xdlopen (modname[i], RTLD_LAZY);
> > > > > + printf ("open %s\n", modname[i]);
> > > > > +}
> > > > > +
> > > > > +static void
> > > > > +close_mod (int i)
> > > > > +{
> > > > > + xdlclose (mod[i]);
> > > > > + mod[i] = NULL;
> > > > > + printf ("close %s\n", modname[i]);
> > > > > +}
> > > > > +
> > > > > +static void
> > > > > +access_mod (int i, const char *sym)
> > > > > +{
> > > > > + struct tls var = { -1, -1, -1, -1 };
> > > > > + struct tls *(*f) (struct tls *) = xdlsym (mod[i], sym);
> > > > > + /* Check that our malloc is called. */
> > > > > + malloc_counter = 0;
> > > > > + struct tls *p = f (&var);
> > > > > + TEST_VERIFY (malloc_counter != 0);
> > > > > + printf ("access %s: %s() = %p\n", modname[i], sym, p);
> > > > > + TEST_VERIFY_EXIT (memcmp (p, &var, sizeof (var)) == 0);
> > > > > + ++(p->a);
> > > > > +}
> > > > > +
> > > > > +static void *
> > > > > +start (void *arg)
> > > > > +{
> > > > > + /* The DTV generation is at the last dlopen of mod0 and the
> > > > > + entry for mod1 is NULL. */
> > > > > +
> > > > > + open_mod (1); /* Reuse modid of mod1. Uses dynamic TLS. */
> > > > > +
> > > > > + /* Force the slow path in GNU2 TLS descriptor call. */
> > > > > + access_mod (1, "apply_tls");
> > > > > +
> > > > > + return arg;
> > > > > +}
> > > > > +
> > > > > +static int
> > > > > +do_test (void)
> > > > > +{
> > > > > + if (!IS_SUPPORTED ())
> > > > > + return EXIT_UNSUPPORTED;
> > > > > +
> > > > > + open_mod (0);
> > > > > + open_mod (1);
> > > > > + open_mod (2);
> > > > > + close_mod (0);
> > > > > + close_mod (1); /* Create modid gap at mod1. */
> > > > > + open_mod (0); /* Reuse modid of mod0, bump generation count. */
> > > > > +
> > > > > + /* Create a thread where DTV of mod1 is NULL. */
> > > > > + pthread_t t = xpthread_create (NULL, start, NULL);
> > > > > + xpthread_join (t);
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +#include <support/test-driver.c>
> > > > > diff --git a/elf/tst-gnu2-tls2.h b/elf/tst-gnu2-tls2.h
> > > > > new file mode 100644
> > > > > index 0000000000..77964a57a3
> > > > > --- /dev/null
> > > > > +++ b/elf/tst-gnu2-tls2.h
> > > > > @@ -0,0 +1,36 @@
> > > > > +/* Test TLSDESC relocation.
> > > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > > + This file is part of the GNU C Library.
> > > > > +
> > > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > > + modify it under the terms of the GNU Lesser General Public
> > > > > + License as published by the Free Software Foundation; either
> > > > > + version 2.1 of the License, or (at your option) any later version.
> > > > > +
> > > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > > + Lesser General Public License for more details.
> > > > > +
> > > > > + You should have received a copy of the GNU Lesser General Public
> > > > > + License along with the GNU C Library; if not, see
> > > > > + <https://www.gnu.org/licenses/>. */
> > > > > +
> > > > > +#include <stdint.h>
> > > > > +
> > > > > +struct tls
> > > > > +{
> > > > > + int64_t a, b, c, d;
> > > > > +};
> > > > > +
> > > > > +extern struct tls *apply_tls (struct tls *);
> > > > > +
> > > > > +/* An architecture can define them to verify that clobber caller-saved
> > > > > + registers aren't changed by the implicit TLSDESC call. */
> > > > > +#ifndef BEFORE_TLSDESC_CALL
> > > > > +# define BEFORE_TLSDESC_CALL()
> > > > > +#endif
> > > > > +
> > > > > +#ifndef AFTER_TLSDESC_CALL
> > > > > +# define AFTER_TLSDESC_CALL()
> > > > > +#endif
> > > > > diff --git a/elf/tst-gnu2-tls2mod0.c b/elf/tst-gnu2-tls2mod0.c
> > > > > new file mode 100644
> > > > > index 0000000000..45556a0e17
> > > > > --- /dev/null
> > > > > +++ b/elf/tst-gnu2-tls2mod0.c
> > > > > @@ -0,0 +1,31 @@
> > > > > +/* DSO used by tst-gnu2-tls2.
> > > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > > + This file is part of the GNU C Library.
> > > > > +
> > > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > > + modify it under the terms of the GNU Lesser General Public
> > > > > + License as published by the Free Software Foundation; either
> > > > > + version 2.1 of the License, or (at your option) any later version.
> > > > > +
> > > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > > + Lesser General Public License for more details.
> > > > > +
> > > > > + You should have received a copy of the GNU Lesser General Public
> > > > > + License along with the GNU C Library; if not, see
> > > > > + <https://www.gnu.org/licenses/>. */
> > > > > +
> > > > > +#include "tst-gnu2-tls2.h"
> > > > > +
> > > > > +__thread struct tls tls_var0 __attribute__ ((visibility ("hidden")));
> > > > > +
> > > > > +struct tls *
> > > > > +apply_tls (struct tls *p)
> > > > > +{
> > > > > + BEFORE_TLSDESC_CALL ();
> > > > > + tls_var0 = *p;
> > > > > + struct tls *ret = &tls_var0;
> > > > > + AFTER_TLSDESC_CALL ();
> > > > > + return ret;
> > > > > +}
> > > > > diff --git a/elf/tst-gnu2-tls2mod1.c b/elf/tst-gnu2-tls2mod1.c
> > > > > new file mode 100644
> > > > > index 0000000000..e10b9dbc0a
> > > > > --- /dev/null
> > > > > +++ b/elf/tst-gnu2-tls2mod1.c
> > > > > @@ -0,0 +1,31 @@
> > > > > +/* DSO used by tst-gnu2-tls2.
> > > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > > + This file is part of the GNU C Library.
> > > > > +
> > > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > > + modify it under the terms of the GNU Lesser General Public
> > > > > + License as published by the Free Software Foundation; either
> > > > > + version 2.1 of the License, or (at your option) any later version.
> > > > > +
> > > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > > + Lesser General Public License for more details.
> > > > > +
> > > > > + You should have received a copy of the GNU Lesser General Public
> > > > > + License along with the GNU C Library; if not, see
> > > > > + <https://www.gnu.org/licenses/>. */
> > > > > +
> > > > > +#include "tst-gnu2-tls2.h"
> > > > > +
> > > > > +__thread struct tls tls_var1[100] __attribute__ ((visibility ("hidden")));
> > > > > +
> > > > > +struct tls *
> > > > > +apply_tls (struct tls *p)
> > > > > +{
> > > > > + BEFORE_TLSDESC_CALL ();
> > > > > + tls_var1[1] = *p;
> > > > > + struct tls *ret = &tls_var1[1];
> > > > > + AFTER_TLSDESC_CALL ();
> > > > > + return ret;
> > > > > +}
> > > > > diff --git a/elf/tst-gnu2-tls2mod2.c b/elf/tst-gnu2-tls2mod2.c
> > > > > new file mode 100644
> > > > > index 0000000000..141af51e55
> > > > > --- /dev/null
> > > > > +++ b/elf/tst-gnu2-tls2mod2.c
> > > > > @@ -0,0 +1,31 @@
> > > > > +/* DSO used by tst-gnu2-tls2.
> > > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > > + This file is part of the GNU C Library.
> > > > > +
> > > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > > + modify it under the terms of the GNU Lesser General Public
> > > > > + License as published by the Free Software Foundation; either
> > > > > + version 2.1 of the License, or (at your option) any later version.
> > > > > +
> > > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > > + Lesser General Public License for more details.
> > > > > +
> > > > > + You should have received a copy of the GNU Lesser General Public
> > > > > + License along with the GNU C Library; if not, see
> > > > > + <https://www.gnu.org/licenses/>. */
> > > > > +
> > > > > +#include "tst-gnu2-tls2.h"
> > > > > +
> > > > > +__thread struct tls tls_var2 __attribute__ ((visibility ("hidden")));
> > > > > +
> > > > > +struct tls *
> > > > > +apply_tls (struct tls *p)
> > > > > +{
> > > > > + BEFORE_TLSDESC_CALL ();
> > > > > + tls_var2 = *p;
> > > > > + struct tls *ret = &tls_var2;
> > > > > + AFTER_TLSDESC_CALL ();
> > > > > + return ret;
> > > > > +}
> > > > > diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
> > > > > index fc1ef96587..50d74fe6e9 100644
> > > > > --- a/sysdeps/i386/dl-machine.h
> > > > > +++ b/sysdeps/i386/dl-machine.h
> > > > > @@ -347,7 +347,7 @@ and creates an unsatisfiable circular dependency.\n",
> > > > > {
> > > > > td->arg = _dl_make_tlsdesc_dynamic
> > > > > (sym_map, sym->st_value + (ElfW(Word))td->arg);
> > > > > - td->entry = _dl_tlsdesc_dynamic;
> > > > > + td->entry = GLRO(dl_x86_tlsdesc_dynamic);
> > > > > }
> > > > > else
> > > > > # endif
> > > > > diff --git a/sysdeps/i386/dl-tlsdesc-dynamic.h b/sysdeps/i386/dl-tlsdesc-dynamic.h
> > > > > new file mode 100644
> > > > > index 0000000000..3627028577
> > > > > --- /dev/null
> > > > > +++ b/sysdeps/i386/dl-tlsdesc-dynamic.h
> > > > > @@ -0,0 +1,190 @@
> > > > > +/* Thread-local storage handling in the ELF dynamic linker. i386 version.
> > > > > + Copyright (C) 2004-2024 Free Software Foundation, Inc.
> > > > > + This file is part of the GNU C Library.
> > > > > +
> > > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > > + modify it under the terms of the GNU Lesser General Public
> > > > > + License as published by the Free Software Foundation; either
> > > > > + version 2.1 of the License, or (at your option) any later version.
> > > > > +
> > > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > > + Lesser General Public License for more details.
> > > > > +
> > > > > + You should have received a copy of the GNU Lesser General Public
> > > > > + License along with the GNU C Library; if not, see
> > > > > + <https://www.gnu.org/licenses/>. */
> > > > > +
> > > > > +#undef REGISTER_SAVE_AREA
> > > > > +
> > > > > +#if !defined USE_FNSAVE && (STATE_SAVE_ALIGNMENT % 16) != 0
> > > > > +# error STATE_SAVE_ALIGNMENT must be multiple of 16
> > > > > +#endif
> > > > > +
> > > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > > +# ifdef USE_FNSAVE
> > > > > +# error USE_FNSAVE shouldn't be defined
> > > > > +# endif
> > > > > +# ifdef USE_FXSAVE
> > > > > +/* Use fxsave to save all registers. */
> > > > > +# define REGISTER_SAVE_AREA 512
> > > > > +# endif
> > > > > +#else
> > > > > +# ifdef USE_FNSAVE
> > > > > +/* Use fnsave to save x87 FPU stack registers. */
> > > > > +# define REGISTER_SAVE_AREA 108
> > > > > +# else
> > > > > +# ifndef USE_FXSAVE
> > > > > +# error USE_FXSAVE must be defined
> > > > > +# endif
> > > > > +/* Use fxsave to save all registers. Add 12 bytes to align the stack
> > > > > + to 16 bytes. */
> > > > > +# define REGISTER_SAVE_AREA (512 + 12)
> > > > > +# endif
> > > > > +#endif
> > > > > +
> > > > > + .hidden _dl_tlsdesc_dynamic
> > > > > + .global _dl_tlsdesc_dynamic
> > > > > + .type _dl_tlsdesc_dynamic,@function
> > > > > +
> > > > > + /* This function is used for symbols that need dynamic TLS.
> > > > > +
> > > > > + %eax points to the TLS descriptor, such that 0(%eax) points to
> > > > > + _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
> > > > > + tlsdesc_dynamic_arg object. It must return in %eax the offset
> > > > > + between the thread pointer and the object denoted by the
> > > > > + argument, without clobbering any registers.
> > > > > +
> > > > > + The assembly code that follows is a rendition of the following
> > > > > + C code, hand-optimized a little bit.
> > > > > +
> > > > > +ptrdiff_t
> > > > > +__attribute__ ((__regparm__ (1)))
> > > > > +_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
> > > > > +{
> > > > > + struct tlsdesc_dynamic_arg *td = tdp->arg;
> > > > > + dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > > > > + if (__builtin_expect (td->gen_count <= dtv[0].counter
> > > > > + && (dtv[td->tlsinfo.ti_module].pointer.val
> > > > > + != TLS_DTV_UNALLOCATED),
> > > > > + 1))
> > > > > + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > > > > + - __thread_pointer;
> > > > > +
> > > > > + return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
> > > > > +}
> > > > > +*/
> > > > > + cfi_startproc
> > > > > + .align 16
> > > > > +_dl_tlsdesc_dynamic:
> > > > > + /* Like all TLS resolvers, preserve call-clobbered registers.
> > > > > + We need two scratch regs anyway. */
> > > > > + subl $32, %esp
> > > > > + cfi_adjust_cfa_offset (32)
> > > > > + movl %ecx, 20(%esp)
> > > > > + movl %edx, 24(%esp)
> > > > > + movl TLSDESC_ARG(%eax), %eax
> > > > > + movl %gs:DTV_OFFSET, %edx
> > > > > + movl TLSDESC_GEN_COUNT(%eax), %ecx
> > > > > + cmpl (%edx), %ecx
> > > > > + ja 2f
> > > > > + movl TLSDESC_MODID(%eax), %ecx
> > > > > + movl (%edx,%ecx,8), %edx
> > > > > + cmpl $-1, %edx
> > > > > + je 2f
> > > > > + movl TLSDESC_MODOFF(%eax), %eax
> > > > > + addl %edx, %eax
> > > > > +1:
> > > > > + movl 20(%esp), %ecx
> > > > > + subl %gs:0, %eax
> > > > > + movl 24(%esp), %edx
> > > > > + addl $32, %esp
> > > > > + cfi_adjust_cfa_offset (-32)
> > > > > + ret
> > > > > + .p2align 4,,7
> > > > > +2:
> > > > > + cfi_adjust_cfa_offset (32)
> > > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > > + movl %ebx, -28(%esp)
> > > > > + movl %esp, %ebx
> > > > > + cfi_def_cfa_register(%ebx)
> > > > > + and $-STATE_SAVE_ALIGNMENT, %esp
> > > > > +#endif
> > > > > +#ifdef REGISTER_SAVE_AREA
> > > > > + subl $REGISTER_SAVE_AREA, %esp
> > > > > +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > > + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
> > > > > +# endif
> > > > > +#else
> > > > > +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > > +# error DL_RUNTIME_RESOLVE_REALIGN_STACK must be true
> > > > > +# endif
> > > > > + /* Allocate stack space of the required size to save the state. */
> > > > > + LOAD_PIC_REG (cx)
> > > > > + subl RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET+_rtld_local_ro@GOTOFF(%ecx), %esp
> > > > > +#endif
> > > > > +#ifdef USE_FNSAVE
> > > > > + fnsave (%esp)
> > > > > +#elif defined USE_FXSAVE
> > > > > + fxsave (%esp)
> > > > > +#else
> > > > > + /* Save the argument for ___tls_get_addr in EAX. */
> > > > > + movl %eax, %ecx
> > > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > > > + xorl %edx, %edx
> > > > > + /* Clear the XSAVE Header. */
> > > > > +# ifdef USE_XSAVE
> > > > > + movl %edx, (512)(%esp)
> > > > > + movl %edx, (512 + 4 * 1)(%esp)
> > > > > + movl %edx, (512 + 4 * 2)(%esp)
> > > > > + movl %edx, (512 + 4 * 3)(%esp)
> > > > > +# endif
> > > > > + movl %edx, (512 + 4 * 4)(%esp)
> > > > > + movl %edx, (512 + 4 * 5)(%esp)
> > > > > + movl %edx, (512 + 4 * 6)(%esp)
> > > > > + movl %edx, (512 + 4 * 7)(%esp)
> > > > > + movl %edx, (512 + 4 * 8)(%esp)
> > > > > + movl %edx, (512 + 4 * 9)(%esp)
> > > > > + movl %edx, (512 + 4 * 10)(%esp)
> > > > > + movl %edx, (512 + 4 * 11)(%esp)
> > > > > + movl %edx, (512 + 4 * 12)(%esp)
> > > > > + movl %edx, (512 + 4 * 13)(%esp)
> > > > > + movl %edx, (512 + 4 * 14)(%esp)
> > > > > + movl %edx, (512 + 4 * 15)(%esp)
> > > > > +# ifdef USE_XSAVE
> > > > > + xsave (%esp)
> > > > > +# else
> > > > > + xsavec (%esp)
> > > > > +# endif
> > > > > + /* Restore the argument for ___tls_get_addr in EAX. */
> > > > > + movl %ecx, %eax
> > > > > +#endif
> > > > > + call HIDDEN_JUMPTARGET (___tls_get_addr)
> > > > > + /* Get register content back. */
> > > > > +#ifdef USE_FNSAVE
> > > > > + frstor (%esp)
> > > > > +#elif defined USE_FXSAVE
> > > > > + fxrstor (%esp)
> > > > > +#else
> > > > > + /* Save and retore ___tls_get_addr return value stored in EAX. */
> > > > > + movl %eax, %ecx
> > > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > > > + xorl %edx, %edx
> > > > > + xrstor (%esp)
> > > > > + movl %ecx, %eax
> > > > > +#endif
> > > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > > + mov %ebx, %esp
> > > > > + cfi_def_cfa_register(%esp)
> > > > > + movl -28(%esp), %ebx
> > > > > + cfi_restore(%ebx)
> > > > > +#else
> > > > > + addl $REGISTER_SAVE_AREA, %esp
> > > > > + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA)
> > > > > +#endif
> > > > > + jmp 1b
> > > > > + cfi_endproc
> > > > > + .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > > > > +
> > > > > +#undef STATE_SAVE_ALIGNMENT
> > > > > diff --git a/sysdeps/i386/dl-tlsdesc.S b/sysdeps/i386/dl-tlsdesc.S
> > > > > index 90d93caa0c..f002feee56 100644
> > > > > --- a/sysdeps/i386/dl-tlsdesc.S
> > > > > +++ b/sysdeps/i386/dl-tlsdesc.S
> > > > > @@ -18,8 +18,27 @@
> > > > >
> > > > > #include <sysdep.h>
> > > > > #include <tls.h>
> > > > > +#include <cpu-features-offsets.h>
> > > > > +#include <features-offsets.h>
> > > > > #include "tlsdesc.h"
> > > > >
> > > > > +#ifndef DL_STACK_ALIGNMENT
> > > > > +/* Due to GCC bug:
> > > > > +
> > > > > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> > > > > +
> > > > > + __tls_get_addr may be called with 4-byte stack alignment. Although
> > > > > + this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> > > > > + that stack will be always aligned at 16 bytes. */
> > > > > +# define DL_STACK_ALIGNMENT 4
> > > > > +#endif
> > > > > +
> > > > > +/* True if _dl_tlsdesc_dynamic should align stack for STATE_SAVE or align
> > > > > + stack to MINIMUM_ALIGNMENT bytes before calling ___tls_get_addr. */
> > > > > +#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> > > > > + (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> > > > > + || MINIMUM_ALIGNMENT > DL_STACK_ALIGNMENT)
> > > > > +
> > > > > .text
> > > > >
> > > > > /* This function is used to compute the TP offset for symbols in
> > > > > @@ -65,69 +84,35 @@ _dl_tlsdesc_undefweak:
> > > > > .size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
> > > > >
> > > > > #ifdef SHARED
> > > > > - .hidden _dl_tlsdesc_dynamic
> > > > > - .global _dl_tlsdesc_dynamic
> > > > > - .type _dl_tlsdesc_dynamic,@function
> > > > > -
> > > > > - /* This function is used for symbols that need dynamic TLS.
> > > > > -
> > > > > - %eax points to the TLS descriptor, such that 0(%eax) points to
> > > > > - _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
> > > > > - tlsdesc_dynamic_arg object. It must return in %eax the offset
> > > > > - between the thread pointer and the object denoted by the
> > > > > - argument, without clobbering any registers.
> > > > > -
> > > > > - The assembly code that follows is a rendition of the following
> > > > > - C code, hand-optimized a little bit.
> > > > > -
> > > > > -ptrdiff_t
> > > > > -__attribute__ ((__regparm__ (1)))
> > > > > -_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
> > > > > -{
> > > > > - struct tlsdesc_dynamic_arg *td = tdp->arg;
> > > > > - dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > > > > - if (__builtin_expect (td->gen_count <= dtv[0].counter
> > > > > - && (dtv[td->tlsinfo.ti_module].pointer.val
> > > > > - != TLS_DTV_UNALLOCATED),
> > > > > - 1))
> > > > > - return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > > > > - - __thread_pointer;
> > > > > -
> > > > > - return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
> > > > > -}
> > > > > -*/
> > > > > - cfi_startproc
> > > > > - .align 16
> > > > > -_dl_tlsdesc_dynamic:
> > > > > - /* Like all TLS resolvers, preserve call-clobbered registers.
> > > > > - We need two scratch regs anyway. */
> > > > > - subl $28, %esp
> > > > > - cfi_adjust_cfa_offset (28)
> > > > > - movl %ecx, 20(%esp)
> > > > > - movl %edx, 24(%esp)
> > > > > - movl TLSDESC_ARG(%eax), %eax
> > > > > - movl %gs:DTV_OFFSET, %edx
> > > > > - movl TLSDESC_GEN_COUNT(%eax), %ecx
> > > > > - cmpl (%edx), %ecx
> > > > > - ja .Lslow
> > > > > - movl TLSDESC_MODID(%eax), %ecx
> > > > > - movl (%edx,%ecx,8), %edx
> > > > > - cmpl $-1, %edx
> > > > > - je .Lslow
> > > > > - movl TLSDESC_MODOFF(%eax), %eax
> > > > > - addl %edx, %eax
> > > > > -.Lret:
> > > > > - movl 20(%esp), %ecx
> > > > > - subl %gs:0, %eax
> > > > > - movl 24(%esp), %edx
> > > > > - addl $28, %esp
> > > > > - cfi_adjust_cfa_offset (-28)
> > > > > - ret
> > > > > - .p2align 4,,7
> > > > > -.Lslow:
> > > > > - cfi_adjust_cfa_offset (28)
> > > > > - call HIDDEN_JUMPTARGET (___tls_get_addr)
> > > > > - jmp .Lret
> > > > > - cfi_endproc
> > > > > - .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > > > > +# define USE_FNSAVE
> > > > > +# define MINIMUM_ALIGNMENT 4
> > > > > +# define STATE_SAVE_ALIGNMENT 4
> > > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fnsave
> > > > > +# include "dl-tlsdesc-dynamic.h"
> > > > > +# undef _dl_tlsdesc_dynamic
> > > > > +# undef MINIMUM_ALIGNMENT
> > > > > +# undef USE_FNSAVE
> > > > > +
> > > > > +# define MINIMUM_ALIGNMENT 16
> > > > > +
> > > > > +# define USE_FXSAVE
> > > > > +# define STATE_SAVE_ALIGNMENT 16
> > > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave
> > > > > +# include "dl-tlsdesc-dynamic.h"
> > > > > +# undef _dl_tlsdesc_dynamic
> > > > > +# undef USE_FXSAVE
> > > > > +
> > > > > +# define USE_XSAVE
> > > > > +# define STATE_SAVE_ALIGNMENT 64
> > > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave
> > > > > +# include "dl-tlsdesc-dynamic.h"
> > > > > +# undef _dl_tlsdesc_dynamic
> > > > > +# undef USE_XSAVE
> > > > > +
> > > > > +# define USE_XSAVEC
> > > > > +# define STATE_SAVE_ALIGNMENT 64
> > > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec
> > > > > +# include "dl-tlsdesc-dynamic.h"
> > > > > +# undef _dl_tlsdesc_dynamic
> > > > > +# undef USE_XSAVEC
> > > > > #endif /* SHARED */
> > > > > diff --git a/sysdeps/x86/Makefile b/sysdeps/x86/Makefile
> > > > > index 73b29cc78c..5311b594af 100644
> > > > > --- a/sysdeps/x86/Makefile
> > > > > +++ b/sysdeps/x86/Makefile
> > > > > @@ -1,5 +1,5 @@
> > > > > ifeq ($(subdir),csu)
> > > > > -gen-as-const-headers += cpu-features-offsets.sym
> > > > > +gen-as-const-headers += cpu-features-offsets.sym features-offsets.sym
> > > > > endif
> > > > >
> > > > > ifeq ($(subdir),elf)
> > > > > @@ -86,6 +86,11 @@ endif
> > > > > tst-ifunc-isa-2-ENV = GLIBC_TUNABLES=glibc.cpu.hwcaps=-SSE4_2,-AVX,-AVX2,-AVX512F
> > > > > tst-ifunc-isa-2-static-ENV = $(tst-ifunc-isa-2-ENV)
> > > > > tst-hwcap-tunables-ARGS = -- $(host-test-program-cmd)
> > > > > +
> > > > > +CFLAGS-tst-gnu2-tls2.c += -msse
> > > > > +CFLAGS-tst-gnu2-tls2mod0.c += -msse2 -mtune=haswell
> > > > > +CFLAGS-tst-gnu2-tls2mod1.c += -msse2 -mtune=haswell
> > > > > +CFLAGS-tst-gnu2-tls2mod2.c += -msse2 -mtune=haswell
> > > > > endif
> > > > >
> > > > > ifeq ($(subdir),math)
> > > > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
> > > > > index 25e6622a79..835113b42f 100644
> > > > > --- a/sysdeps/x86/cpu-features.c
> > > > > +++ b/sysdeps/x86/cpu-features.c
> > > > > @@ -27,8 +27,13 @@
> > > > > extern void TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *)
> > > > > attribute_hidden;
> > > > >
> > > > > -#if defined SHARED && defined __x86_64__
> > > > > -# include <dl-plt-rewrite.h>
> > > > > +#if defined SHARED
> > > > > +extern void _dl_tlsdesc_dynamic_fxsave (void) attribute_hidden;
> > > > > +extern void _dl_tlsdesc_dynamic_xsave (void) attribute_hidden;
> > > > > +extern void _dl_tlsdesc_dynamic_xsavec (void) attribute_hidden;
> > > > > +
> > > > > +# ifdef __x86_64__
> > > > > +# include <dl-plt-rewrite.h>
> > > > >
> > > > > static void
> > > > > TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
> > > > > @@ -47,6 +52,15 @@ TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp)
> > > > > : plt_rewrite_jmp);
> > > > > }
> > > > > }
> > > > > +# else
> > > > > +extern void _dl_tlsdesc_dynamic_fnsave (void) attribute_hidden;
> > > > > +# endif
> > > > > +#endif
> > > > > +
> > > > > +#ifdef __x86_64__
> > > > > +extern void _dl_runtime_resolve_fxsave (void) attribute_hidden;
> > > > > +extern void _dl_runtime_resolve_xsave (void) attribute_hidden;
> > > > > +extern void _dl_runtime_resolve_xsavec (void) attribute_hidden;
> > > > > #endif
> > > > >
> > > > > #ifdef __LP64__
> > > > > @@ -1130,6 +1144,44 @@ no_cpuid:
> > > > > TUNABLE_CALLBACK (set_x86_shstk));
> > > > > #endif
> > > > >
> > > > > + if (GLRO(dl_x86_cpu_features).xsave_state_size != 0)
> > > > > + {
> > > > > + if (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC))
> > > > > + {
> > > > > +#ifdef __x86_64__
> > > > > + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_xsavec;
> > > > > +#endif
> > > > > +#ifdef SHARED
> > > > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_xsavec;
> > > > > +#endif
> > > > > + }
> > > > > + else
> > > > > + {
> > > > > +#ifdef __x86_64__
> > > > > + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_xsave;
> > > > > +#endif
> > > > > +#ifdef SHARED
> > > > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_xsave;
> > > > > +#endif
> > > > > + }
> > > > > + }
> > > > > + else
> > > > > + {
> > > > > +#ifdef __x86_64__
> > > > > + GLRO(dl_x86_64_runtime_resolve) = _dl_runtime_resolve_fxsave;
> > > > > +# ifdef SHARED
> > > > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fxsave;
> > > > > +# endif
> > > > > +#else
> > > > > +# ifdef SHARED
> > > > > + if (CPU_FEATURE_USABLE_P (cpu_features, FXSR))
> > > > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fxsave;
> > > > > + else
> > > > > + GLRO(dl_x86_tlsdesc_dynamic) = _dl_tlsdesc_dynamic_fnsave;
> > > > > +# endif
> > > > > +#endif
> > > > > + }
> > > > > +
> > > > > #ifdef SHARED
> > > > > # ifdef __x86_64__
> > > > > TUNABLE_GET (plt_rewrite, tunable_val_t *,
> > > > > diff --git a/sysdeps/x86/dl-procinfo.c b/sysdeps/x86/dl-procinfo.c
> > > > > index ee957b4d70..5920d4b320 100644
> > > > > --- a/sysdeps/x86/dl-procinfo.c
> > > > > +++ b/sysdeps/x86/dl-procinfo.c
> > > > > @@ -86,3 +86,19 @@ PROCINFO_CLASS const char _dl_x86_platforms[4][9]
> > > > > #else
> > > > > ,
> > > > > #endif
> > > > > +
> > > > > +#if defined SHARED && !IS_IN (ldconfig)
> > > > > +# if !defined PROCINFO_DECL
> > > > > + ._dl_x86_tlsdesc_dynamic
> > > > > +# else
> > > > > +PROCINFO_CLASS void * _dl_x86_tlsdesc_dynamic
> > > > > +# endif
> > > > > +# ifndef PROCINFO_DECL
> > > > > += NULL
> > > > > +# endif
> > > > > +# ifdef PROCINFO_DECL
> > > > > +;
> > > > > +# else
> > > > > +,
> > > > > +# endif
> > > > > +#endif
> > > > > diff --git a/sysdeps/x86_64/features-offsets.sym b/sysdeps/x86/features-offsets.sym
> > > > > similarity index 89%
> > > > > rename from sysdeps/x86_64/features-offsets.sym
> > > > > rename to sysdeps/x86/features-offsets.sym
> > > > > index 9e4be3393a..77e990c705 100644
> > > > > --- a/sysdeps/x86_64/features-offsets.sym
> > > > > +++ b/sysdeps/x86/features-offsets.sym
> > > > > @@ -3,4 +3,6 @@
> > > > > #include <ldsodefs.h>
> > > > >
> > > > > RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET offsetof (struct rtld_global_ro, _dl_x86_cpu_features)
> > > > > +#ifdef __x86_64__
> > > > > RTLD_GLOBAL_DL_X86_FEATURE_1_OFFSET offsetof (struct rtld_global, _dl_x86_feature_1)
> > > > > +#endif
> > > > > diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h
> > > > > index 837fd28734..485cad9c02 100644
> > > > > --- a/sysdeps/x86/sysdep.h
> > > > > +++ b/sysdeps/x86/sysdep.h
> > > > > @@ -70,6 +70,12 @@
> > > > > | (1 << X86_XSTATE_ZMM_H_ID))
> > > > > #endif
> > > > >
> > > > > +/* States which should be saved for TLSDESC_CALL and TLS_DESC_CALL.
> > > > > + Compiler assumes that all registers, including x87 FPU stack registers,
> > > > > + are unchanged after CALL, except for EFLAGS and RAX/EAX. */
> > > > > +#define TLSDESC_CALL_STATE_SAVE_MASK \
> > > > > + (STATE_SAVE_MASK | (1 << X86_XSTATE_X87_ID))
> > > > > +
> > > > > /* Constants for bits in __x86_string_control: */
> > > > >
> > > > > /* Avoid short distance REP MOVSB. */
> > > > > diff --git a/sysdeps/x86/tst-gnu2-tls2.c b/sysdeps/x86/tst-gnu2-tls2.c
> > > > > new file mode 100644
> > > > > index 0000000000..de900a423b
> > > > > --- /dev/null
> > > > > +++ b/sysdeps/x86/tst-gnu2-tls2.c
> > > > > @@ -0,0 +1,20 @@
> > > > > +#ifndef __x86_64__
> > > > > +#include <sys/platform/x86.h>
> > > > > +
> > > > > +#define IS_SUPPORTED() CPU_FEATURE_ACTIVE (SSE2)
> > > > > +#endif
> > > > > +
> > > > > +/* Clear XMM0...XMM7 */
> > > > > +#define PREPARE_MALLOC() \
> > > > > +{ \
> > > > > + asm volatile ("xorps %%xmm0, %%xmm0" : : : "xmm0" ); \
> > > > > + asm volatile ("xorps %%xmm1, %%xmm1" : : : "xmm1" ); \
> > > > > + asm volatile ("xorps %%xmm2, %%xmm2" : : : "xmm2" ); \
> > > > > + asm volatile ("xorps %%xmm3, %%xmm3" : : : "xmm3" ); \
> > > > > + asm volatile ("xorps %%xmm4, %%xmm4" : : : "xmm4" ); \
> > > > > + asm volatile ("xorps %%xmm5, %%xmm5" : : : "xmm5" ); \
> > > > > + asm volatile ("xorps %%xmm6, %%xmm6" : : : "xmm6" ); \
> > > > > + asm volatile ("xorps %%xmm7, %%xmm7" : : : "xmm7" ); \
> > > > > +}
> > > > > +
> > > > > +#include <elf/tst-gnu2-tls2.c>
> > > > > diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
> > > > > index 145f25e7f6..9337e95093 100644
> > > > > --- a/sysdeps/x86_64/Makefile
> > > > > +++ b/sysdeps/x86_64/Makefile
> > > > > @@ -10,7 +10,7 @@ LDFLAGS-rtld += -Wl,-z,nomark-plt
> > > > > endif
> > > > >
> > > > > ifeq ($(subdir),csu)
> > > > > -gen-as-const-headers += features-offsets.sym link-defines.sym
> > > > > +gen-as-const-headers += link-defines.sym
> > > > > endif
> > > > >
> > > > > ifeq ($(subdir),gmon)
> > > > > diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
> > > > > index 6d605d0d32..ff5d45f7cb 100644
> > > > > --- a/sysdeps/x86_64/dl-machine.h
> > > > > +++ b/sysdeps/x86_64/dl-machine.h
> > > > > @@ -71,9 +71,6 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > > > > int lazy, int profile)
> > > > > {
> > > > > Elf64_Addr *got;
> > > > > - extern void _dl_runtime_resolve_fxsave (ElfW(Word)) attribute_hidden;
> > > > > - extern void _dl_runtime_resolve_xsave (ElfW(Word)) attribute_hidden;
> > > > > - extern void _dl_runtime_resolve_xsavec (ElfW(Word)) attribute_hidden;
> > > > > extern void _dl_runtime_profile_sse (ElfW(Word)) attribute_hidden;
> > > > > extern void _dl_runtime_profile_avx (ElfW(Word)) attribute_hidden;
> > > > > extern void _dl_runtime_profile_avx512 (ElfW(Word)) attribute_hidden;
> > > > > @@ -96,8 +93,6 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > > > > /* Identify this shared object. */
> > > > > *(ElfW(Addr) *) (got + 1) = (ElfW(Addr)) l;
> > > > >
> > > > > - const struct cpu_features* cpu_features = __get_cpu_features ();
> > > > > -
> > > > > #ifdef SHARED
> > > > > /* The got[2] entry contains the address of a function which gets
> > > > > called to get the address of a so far unresolved function and
> > > > > @@ -107,6 +102,7 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > > > > end in this function. */
> > > > > if (__glibc_unlikely (profile))
> > > > > {
> > > > > + const struct cpu_features* cpu_features = __get_cpu_features ();
> > > > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512F))
> > > > > *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx512;
> > > > > else if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX))
> > > > > @@ -126,15 +122,8 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[],
> > > > > /* This function will get called to fix up the GOT entry
> > > > > indicated by the offset on the stack, and then jump to
> > > > > the resolved address. */
> > > > > - if (MINIMUM_X86_ISA_LEVEL >= AVX_X86_ISA_LEVEL
> > > > > - || GLRO(dl_x86_cpu_features).xsave_state_size != 0)
> > > > > - *(ElfW(Addr) *) (got + 2)
> > > > > - = (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC)
> > > > > - ? (ElfW(Addr)) &_dl_runtime_resolve_xsavec
> > > > > - : (ElfW(Addr)) &_dl_runtime_resolve_xsave);
> > > > > - else
> > > > > - *(ElfW(Addr) *) (got + 2)
> > > > > - = (ElfW(Addr)) &_dl_runtime_resolve_fxsave;
> > > > > + *(ElfW(Addr) *) (got + 2)
> > > > > + = (ElfW(Addr)) GLRO(dl_x86_64_runtime_resolve);
> > > > > }
> > > > > }
> > > > >
> > > > > @@ -383,7 +372,7 @@ and creates an unsatisfiable circular dependency.\n",
> > > > > {
> > > > > td->arg = _dl_make_tlsdesc_dynamic
> > > > > (sym_map, sym->st_value + reloc->r_addend);
> > > > > - td->entry = _dl_tlsdesc_dynamic;
> > > > > + td->entry = GLRO(dl_x86_tlsdesc_dynamic);
> > > > > }
> > > > > else
> > > > > # endif
> > > > > diff --git a/sysdeps/x86_64/dl-procinfo.c b/sysdeps/x86_64/dl-procinfo.c
> > > > > index 4d1d790fbb..06637a8154 100644
> > > > > --- a/sysdeps/x86_64/dl-procinfo.c
> > > > > +++ b/sysdeps/x86_64/dl-procinfo.c
> > > > > @@ -41,5 +41,21 @@
> > > > >
> > > > > #include <sysdeps/x86/dl-procinfo.c>
> > > > >
> > > > > +#if !IS_IN (ldconfig)
> > > > > +# if !defined PROCINFO_DECL && defined SHARED
> > > > > + ._dl_x86_64_runtime_resolve
> > > > > +# else
> > > > > +PROCINFO_CLASS void * _dl_x86_64_runtime_resolve
> > > > > +# endif
> > > > > +# ifndef PROCINFO_DECL
> > > > > += NULL
> > > > > +# endif
> > > > > +# if !defined SHARED || defined PROCINFO_DECL
> > > > > +;
> > > > > +# else
> > > > > +,
> > > > > +# endif
> > > > > +#endif
> > > > > +
> > > > > #undef PROCINFO_DECL
> > > > > #undef PROCINFO_CLASS
> > > > > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > > > new file mode 100644
> > > > > index 0000000000..0c2e8d5320
> > > > > --- /dev/null
> > > > > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > > > @@ -0,0 +1,166 @@
> > > > > +/* Thread-local storage handling in the ELF dynamic linker. x86_64 version.
> > > > > + Copyright (C) 2004-2024 Free Software Foundation, Inc.
> > > > > + This file is part of the GNU C Library.
> > > > > +
> > > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > > + modify it under the terms of the GNU Lesser General Public
> > > > > + License as published by the Free Software Foundation; either
> > > > > + version 2.1 of the License, or (at your option) any later version.
> > > > > +
> > > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > > + Lesser General Public License for more details.
> > > > > +
> > > > > + You should have received a copy of the GNU Lesser General Public
> > > > > + License along with the GNU C Library; if not, see
> > > > > + <https://www.gnu.org/licenses/>. */
> > > > > +
> > > > > +#ifndef SECTION
> > > > > +# define SECTION(p) p
> > > > > +#endif
> > > > > +
> > > > > +#undef REGISTER_SAVE_AREA
> > > > > +#undef LOCAL_STORAGE_AREA
> > > > > +#undef BASE
> > > > > +
> > > > > +#include "dl-trampoline-state.h"
> > > > > +
> > > > > + .section SECTION(.text),"ax",@progbits
> > > > > +
> > > > > + .hidden _dl_tlsdesc_dynamic
> > > > > + .global _dl_tlsdesc_dynamic
> > > > > + .type _dl_tlsdesc_dynamic,@function
> > > > > +
> > > > > + /* %rax points to the TLS descriptor, such that 0(%rax) points to
> > > > > + _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
> > > > > + tlsdesc_dynamic_arg object. It must return in %rax the offset
> > > > > + between the thread pointer and the object denoted by the
> > > > > + argument, without clobbering any registers.
> > > > > +
> > > > > + The assembly code that follows is a rendition of the following
> > > > > + C code, hand-optimized a little bit.
> > > > > +
> > > > > +ptrdiff_t
> > > > > +_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
> > > > > +{
> > > > > + struct tlsdesc_dynamic_arg *td = tdp->arg;
> > > > > + dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > > > > + if (__builtin_expect (td->gen_count <= dtv[0].counter
> > > > > + && (dtv[td->tlsinfo.ti_module].pointer.val
> > > > > + != TLS_DTV_UNALLOCATED),
> > > > > + 1))
> > > > > + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > > > > + - __thread_pointer;
> > > > > +
> > > > > + return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
> > > > > +}
> > > > > +*/
> > > > > + cfi_startproc
> > > > > + .align 16
> > > > > +_dl_tlsdesc_dynamic:
> > > > > + _CET_ENDBR
> > > > > + /* Preserve call-clobbered registers that we modify.
> > > > > + We need two scratch regs anyway. */
> > > > > + movq %rsi, -16(%rsp)
> > > > > + mov %fs:DTV_OFFSET, %RSI_LP
> > > > > + movq %rdi, -8(%rsp)
> > > > > + movq TLSDESC_ARG(%rax), %rdi
> > > > > + movq (%rsi), %rax
> > > > > + cmpq %rax, TLSDESC_GEN_COUNT(%rdi)
> > > > > + ja 2f
> > > > > + movq TLSDESC_MODID(%rdi), %rax
> > > > > + salq $4, %rax
> > > > > + movq (%rax,%rsi), %rax
> > > > > + cmpq $-1, %rax
> > > > > + je 2f
> > > > > + addq TLSDESC_MODOFF(%rdi), %rax
> > > > > +1:
> > > > > + movq -16(%rsp), %rsi
> > > > > + sub %fs:0, %RAX_LP
> > > > > + movq -8(%rsp), %rdi
> > > > > + ret
> > > > > +2:
> > > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > > + movq %rbx, -24(%rsp)
> > > > > + mov %RSP_LP, %RBX_LP
> > > > > + cfi_def_cfa_register(%rbx)
> > > > > + and $-STATE_SAVE_ALIGNMENT, %RSP_LP
> > > > > +#endif
> > > > > +#ifdef REGISTER_SAVE_AREA
> > > > > +# if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > > + /* STATE_SAVE_OFFSET has space for 8 integer registers. But we
> > > > > + need space for RCX, RDX, RSI, RDI, R8, R9, R10 and R11, plus
> > > > > + RBX above. */
> > > > > + sub $(REGISTER_SAVE_AREA + STATE_SAVE_ALIGNMENT), %RSP_LP
> > > > > +# else
> > > > > + sub $REGISTER_SAVE_AREA, %RSP_LP
> > > > > + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
> > > > > +# endif
> > > > > +#else
> > > > > + /* Allocate stack space of the required size to save the state. */
> > > > > + sub _rtld_local_ro+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET(%rip), %RSP_LP
> > > > > +#endif
> > > > > + /* Besides rdi and rsi, saved above, save rcx, rdx, r8, r9,
> > > > > + r10 and r11. */
> > > > > + movq %rcx, REGISTER_SAVE_RCX(%rsp)
> > > > > + movq %rdx, REGISTER_SAVE_RDX(%rsp)
> > > > > + movq %r8, REGISTER_SAVE_R8(%rsp)
> > > > > + movq %r9, REGISTER_SAVE_R9(%rsp)
> > > > > + movq %r10, REGISTER_SAVE_R10(%rsp)
> > > > > + movq %r11, REGISTER_SAVE_R11(%rsp)
> > > > > +#ifdef USE_FXSAVE
> > > > > + fxsave STATE_SAVE_OFFSET(%rsp)
> > > > > +#else
> > > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > > > + xorl %edx, %edx
> > > > > + /* Clear the XSAVE Header. */
> > > > > +# ifdef USE_XSAVE
> > > > > + movq %rdx, (STATE_SAVE_OFFSET + 512)(%rsp)
> > > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8)(%rsp)
> > > > > +# endif
> > > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 2)(%rsp)
> > > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 3)(%rsp)
> > > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 4)(%rsp)
> > > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 5)(%rsp)
> > > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 6)(%rsp)
> > > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 7)(%rsp)
> > > > > +# ifdef USE_XSAVE
> > > > > + xsave STATE_SAVE_OFFSET(%rsp)
> > > > > +# else
> > > > > + xsavec STATE_SAVE_OFFSET(%rsp)
> > > > > +# endif
> > > > > +#endif
> > > > > + /* %rdi already points to the tlsinfo data structure. */
> > > > > + call HIDDEN_JUMPTARGET (__tls_get_addr)
> > > > > + # Get register content back.
> > > > > +#ifdef USE_FXSAVE
> > > > > + fxrstor STATE_SAVE_OFFSET(%rsp)
> > > > > +#else
> > > > > + /* Save and retore __tls_get_addr return value stored in RAX. */
> > > > > + mov %RAX_LP, %RCX_LP
> > > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > > > + xorl %edx, %edx
> > > > > + xrstor STATE_SAVE_OFFSET(%rsp)
> > > > > + mov %RCX_LP, %RAX_LP
> > > > > +#endif
> > > > > + movq REGISTER_SAVE_R11(%rsp), %r11
> > > > > + movq REGISTER_SAVE_R10(%rsp), %r10
> > > > > + movq REGISTER_SAVE_R9(%rsp), %r9
> > > > > + movq REGISTER_SAVE_R8(%rsp), %r8
> > > > > + movq REGISTER_SAVE_RDX(%rsp), %rdx
> > > > > + movq REGISTER_SAVE_RCX(%rsp), %rcx
> > > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > > + mov %RBX_LP, %RSP_LP
> > > > > + cfi_def_cfa_register(%rsp)
> > > > > + movq -24(%rsp), %rbx
> > > > > + cfi_restore(%rbx)
> > > > > +#else
> > > > > + add $REGISTER_SAVE_AREA, %RSP_LP
> > > > > + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA)
> > > > > +#endif
> > > > > + jmp 1b
> > > > > + cfi_endproc
> > > > > + .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > > > > +
> > > > > +#undef STATE_SAVE_ALIGNMENT
> > > > > diff --git a/sysdeps/x86_64/dl-tlsdesc.S b/sysdeps/x86_64/dl-tlsdesc.S
> > > > > index f748af2ece..ea69f5223a 100644
> > > > > --- a/sysdeps/x86_64/dl-tlsdesc.S
> > > > > +++ b/sysdeps/x86_64/dl-tlsdesc.S
> > > > > @@ -18,7 +18,19 @@
> > > > >
> > > > > #include <sysdep.h>
> > > > > #include <tls.h>
> > > > > +#include <cpu-features-offsets.h>
> > > > > +#include <features-offsets.h>
> > > > > #include "tlsdesc.h"
> > > > > +#include "dl-trampoline-save.h"
> > > > > +
> > > > > +/* Area on stack to save and restore registers used for parameter
> > > > > + passing when calling _dl_tlsdesc_dynamic. */
> > > > > +#define REGISTER_SAVE_RCX 0
> > > > > +#define REGISTER_SAVE_RDX (REGISTER_SAVE_RCX + 8)
> > > > > +#define REGISTER_SAVE_R8 (REGISTER_SAVE_RDX + 8)
> > > > > +#define REGISTER_SAVE_R9 (REGISTER_SAVE_R8 + 8)
> > > > > +#define REGISTER_SAVE_R10 (REGISTER_SAVE_R9 + 8)
> > > > > +#define REGISTER_SAVE_R11 (REGISTER_SAVE_R10 + 8)
> > > > >
> > > > > .text
> > > > >
> > > > > @@ -67,80 +79,24 @@ _dl_tlsdesc_undefweak:
> > > > > .size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak
> > > > >
> > > > > #ifdef SHARED
> > > > > - .hidden _dl_tlsdesc_dynamic
> > > > > - .global _dl_tlsdesc_dynamic
> > > > > - .type _dl_tlsdesc_dynamic,@function
> > > > > -
> > > > > - /* %rax points to the TLS descriptor, such that 0(%rax) points to
> > > > > - _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct
> > > > > - tlsdesc_dynamic_arg object. It must return in %rax the offset
> > > > > - between the thread pointer and the object denoted by the
> > > > > - argument, without clobbering any registers.
> > > > > -
> > > > > - The assembly code that follows is a rendition of the following
> > > > > - C code, hand-optimized a little bit.
> > > > > -
> > > > > -ptrdiff_t
> > > > > -_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax"))
> > > > > -{
> > > > > - struct tlsdesc_dynamic_arg *td = tdp->arg;
> > > > > - dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > > > > - if (__builtin_expect (td->gen_count <= dtv[0].counter
> > > > > - && (dtv[td->tlsinfo.ti_module].pointer.val
> > > > > - != TLS_DTV_UNALLOCATED),
> > > > > - 1))
> > > > > - return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > > > > - - __thread_pointer;
> > > > > -
> > > > > - return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer;
> > > > > -}
> > > > > -*/
> > > > > - cfi_startproc
> > > > > - .align 16
> > > > > -_dl_tlsdesc_dynamic:
> > > > > - _CET_ENDBR
> > > > > - /* Preserve call-clobbered registers that we modify.
> > > > > - We need two scratch regs anyway. */
> > > > > - movq %rsi, -16(%rsp)
> > > > > - mov %fs:DTV_OFFSET, %RSI_LP
> > > > > - movq %rdi, -8(%rsp)
> > > > > - movq TLSDESC_ARG(%rax), %rdi
> > > > > - movq (%rsi), %rax
> > > > > - cmpq %rax, TLSDESC_GEN_COUNT(%rdi)
> > > > > - ja .Lslow
> > > > > - movq TLSDESC_MODID(%rdi), %rax
> > > > > - salq $4, %rax
> > > > > - movq (%rax,%rsi), %rax
> > > > > - cmpq $-1, %rax
> > > > > - je .Lslow
> > > > > - addq TLSDESC_MODOFF(%rdi), %rax
> > > > > -.Lret:
> > > > > - movq -16(%rsp), %rsi
> > > > > - sub %fs:0, %RAX_LP
> > > > > - movq -8(%rsp), %rdi
> > > > > - ret
> > > > > -.Lslow:
> > > > > - /* Besides rdi and rsi, saved above, save rdx, rcx, r8, r9,
> > > > > - r10 and r11. Also, align the stack, that's off by 8 bytes. */
> > > > > - subq $72, %rsp
> > > > > - cfi_adjust_cfa_offset (72)
> > > > > - movq %rdx, 8(%rsp)
> > > > > - movq %rcx, 16(%rsp)
> > > > > - movq %r8, 24(%rsp)
> > > > > - movq %r9, 32(%rsp)
> > > > > - movq %r10, 40(%rsp)
> > > > > - movq %r11, 48(%rsp)
> > > > > - /* %rdi already points to the tlsinfo data structure. */
> > > > > - call HIDDEN_JUMPTARGET (__tls_get_addr)
> > > > > - movq 8(%rsp), %rdx
> > > > > - movq 16(%rsp), %rcx
> > > > > - movq 24(%rsp), %r8
> > > > > - movq 32(%rsp), %r9
> > > > > - movq 40(%rsp), %r10
> > > > > - movq 48(%rsp), %r11
> > > > > - addq $72, %rsp
> > > > > - cfi_adjust_cfa_offset (-72)
> > > > > - jmp .Lret
> > > > > - cfi_endproc
> > > > > - .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic
> > > > > +# define USE_FXSAVE
> > > > > +# define STATE_SAVE_ALIGNMENT 16
> > > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave
> > > > > +# include "dl-tlsdesc-dynamic.h"
> > > > > +# undef _dl_tlsdesc_dynamic
> > > > > +# undef USE_FXSAVE
> > > > > +
> > > > > +# define USE_XSAVE
> > > > > +# define STATE_SAVE_ALIGNMENT 64
> > > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave
> > > > > +# include "dl-tlsdesc-dynamic.h"
> > > > > +# undef _dl_tlsdesc_dynamic
> > > > > +# undef USE_XSAVE
> > > > > +
> > > > > +# define USE_XSAVEC
> > > > > +# define STATE_SAVE_ALIGNMENT 64
> > > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec
> > > > > +# include "dl-tlsdesc-dynamic.h"
> > > > > +# undef _dl_tlsdesc_dynamic
> > > > > +# undef USE_XSAVEC
> > > > > #endif /* SHARED */
> > > > > diff --git a/sysdeps/x86_64/dl-trampoline-save.h b/sysdeps/x86_64/dl-trampoline-save.h
> > > > > new file mode 100644
> > > > > index 0000000000..84eac4a8ac
> > > > > --- /dev/null
> > > > > +++ b/sysdeps/x86_64/dl-trampoline-save.h
> > > > > @@ -0,0 +1,34 @@
> > > > > +/* x86-64 PLT trampoline register save macros.
> > > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > > + This file is part of the GNU C Library.
> > > > > +
> > > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > > + modify it under the terms of the GNU Lesser General Public
> > > > > + License as published by the Free Software Foundation; either
> > > > > + version 2.1 of the License, or (at your option) any later version.
> > > > > +
> > > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > > + Lesser General Public License for more details.
> > > > > +
> > > > > + You should have received a copy of the GNU Lesser General Public
> > > > > + License along with the GNU C Library; if not, see
> > > > > + <https://www.gnu.org/licenses/>. */
> > > > > +
> > > > > +#ifndef DL_STACK_ALIGNMENT
> > > > > +/* Due to GCC bug:
> > > > > +
> > > > > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> > > > > +
> > > > > + __tls_get_addr may be called with 8-byte stack alignment. Although
> > > > > + this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> > > > > + that stack will be always aligned at 16 bytes. */
> > > > > +# define DL_STACK_ALIGNMENT 8
> > > > > +#endif
> > > > > +
> > > > > +/* True if _dl_runtime_resolve should align stack for STATE_SAVE or align
> > > > > + stack to 16 bytes before calling _dl_fixup. */
> > > > > +#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> > > > > + (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> > > > > + || 16 > DL_STACK_ALIGNMENT)
> > > > > diff --git a/sysdeps/x86_64/dl-trampoline-state.h b/sysdeps/x86_64/dl-trampoline-state.h
> > > > > new file mode 100644
> > > > > index 0000000000..575f120797
> > > > > --- /dev/null
> > > > > +++ b/sysdeps/x86_64/dl-trampoline-state.h
> > > > > @@ -0,0 +1,51 @@
> > > > > +/* x86-64 PLT dl-trampoline state macros.
> > > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > > + This file is part of the GNU C Library.
> > > > > +
> > > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > > + modify it under the terms of the GNU Lesser General Public
> > > > > + License as published by the Free Software Foundation; either
> > > > > + version 2.1 of the License, or (at your option) any later version.
> > > > > +
> > > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > > + Lesser General Public License for more details.
> > > > > +
> > > > > + You should have received a copy of the GNU Lesser General Public
> > > > > + License along with the GNU C Library; if not, see
> > > > > + <https://www.gnu.org/licenses/>. */
> > > > > +
> > > > > +#if (STATE_SAVE_ALIGNMENT % 16) != 0
> > > > > +# error STATE_SAVE_ALIGNMENT must be multiple of 16
> > > > > +#endif
> > > > > +
> > > > > +#if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) != 0
> > > > > +# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT
> > > > > +#endif
> > > > > +
> > > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > > +/* Local stack area before jumping to function address: RBX. */
> > > > > +# define LOCAL_STORAGE_AREA 8
> > > > > +# define BASE rbx
> > > > > +# ifdef USE_FXSAVE
> > > > > +/* Use fxsave to save XMM registers. */
> > > > > +# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET)
> > > > > +# if (REGISTER_SAVE_AREA % 16) != 0
> > > > > +# error REGISTER_SAVE_AREA must be multiple of 16
> > > > > +# endif
> > > > > +# endif
> > > > > +#else
> > > > > +# ifndef USE_FXSAVE
> > > > > +# error USE_FXSAVE must be defined
> > > > > +# endif
> > > > > +/* Use fxsave to save XMM registers. */
> > > > > +# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8)
> > > > > +/* Local stack area before jumping to function address: All saved
> > > > > + registers. */
> > > > > +# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA
> > > > > +# define BASE rsp
> > > > > +# if (REGISTER_SAVE_AREA % 16) != 8
> > > > > +# error REGISTER_SAVE_AREA must be odd multiple of 8
> > > > > +# endif
> > > > > +#endif
> > > > > diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S
> > > > > index b2e7e0f69b..87c5137837 100644
> > > > > --- a/sysdeps/x86_64/dl-trampoline.S
> > > > > +++ b/sysdeps/x86_64/dl-trampoline.S
> > > > > @@ -22,25 +22,7 @@
> > > > > #include <features-offsets.h>
> > > > > #include <link-defines.h>
> > > > > #include <isa-level.h>
> > > > > -
> > > > > -#ifndef DL_STACK_ALIGNMENT
> > > > > -/* Due to GCC bug:
> > > > > -
> > > > > - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
> > > > > -
> > > > > - __tls_get_addr may be called with 8-byte stack alignment. Although
> > > > > - this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
> > > > > - that stack will be always aligned at 16 bytes. We use unaligned
> > > > > - 16-byte move to load and store SSE registers, which has no penalty
> > > > > - on modern processors if stack is 16-byte aligned. */
> > > > > -# define DL_STACK_ALIGNMENT 8
> > > > > -#endif
> > > > > -
> > > > > -/* True if _dl_runtime_resolve should align stack for STATE_SAVE or align
> > > > > - stack to 16 bytes before calling _dl_fixup. */
> > > > > -#define DL_RUNTIME_RESOLVE_REALIGN_STACK \
> > > > > - (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \
> > > > > - || 16 > DL_STACK_ALIGNMENT)
> > > > > +#include "dl-trampoline-save.h"
> > > > >
> > > > > /* Area on stack to save and restore registers used for parameter
> > > > > passing when calling _dl_fixup. */
> > > > > diff --git a/sysdeps/x86_64/dl-trampoline.h b/sysdeps/x86_64/dl-trampoline.h
> > > > > index f55c6ea040..d9ccfb40d4 100644
> > > > > --- a/sysdeps/x86_64/dl-trampoline.h
> > > > > +++ b/sysdeps/x86_64/dl-trampoline.h
> > > > > @@ -27,39 +27,7 @@
> > > > > # undef LOCAL_STORAGE_AREA
> > > > > # undef BASE
> > > > >
> > > > > -# if (STATE_SAVE_ALIGNMENT % 16) != 0
> > > > > -# error STATE_SAVE_ALIGNMENT must be multiple of 16
> > > > > -# endif
> > > > > -
> > > > > -# if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) != 0
> > > > > -# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT
> > > > > -# endif
> > > > > -
> > > > > -# if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > > -/* Local stack area before jumping to function address: RBX. */
> > > > > -# define LOCAL_STORAGE_AREA 8
> > > > > -# define BASE rbx
> > > > > -# ifdef USE_FXSAVE
> > > > > -/* Use fxsave to save XMM registers. */
> > > > > -# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET)
> > > > > -# if (REGISTER_SAVE_AREA % 16) != 0
> > > > > -# error REGISTER_SAVE_AREA must be multiple of 16
> > > > > -# endif
> > > > > -# endif
> > > > > -# else
> > > > > -# ifndef USE_FXSAVE
> > > > > -# error USE_FXSAVE must be defined
> > > > > -# endif
> > > > > -/* Use fxsave to save XMM registers. */
> > > > > -# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8)
> > > > > -/* Local stack area before jumping to function address: All saved
> > > > > - registers. */
> > > > > -# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA
> > > > > -# define BASE rsp
> > > > > -# if (REGISTER_SAVE_AREA % 16) != 8
> > > > > -# error REGISTER_SAVE_AREA must be odd multiple of 8
> > > > > -# endif
> > > > > -# endif
> > > > > +# include "dl-trampoline-state.h"
> > > > >
> > > > > .globl _dl_runtime_resolve
> > > > > .hidden _dl_runtime_resolve
> > > > > --
> > > > > 2.43.2
> > > > >
> > > >
> > > > This still LGTM.
> > > > Feel free to add my reviewed-by tag to future posts of the patch.
> > > > Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
> > >
> > > Please wait for community consensus before pushing
> >
> > It was agreed that I should send out the v10 patch with xfail on
> > the new test. Someone just needs to double check that my
> > v10 patch does it.
> Okay, give it 24 hour. If you no further concerns are brought up
> think its okay to push.
OK to backport it to release branches?
--
H.J.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-03-08 20:11 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-26 14:37 [PATCH v10 0/1] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers H.J. Lu
2024-02-26 14:37 ` [PATCH v10 1/1] " H.J. Lu
2024-02-26 19:02 ` Noah Goldstein
2024-02-26 19:02 ` Noah Goldstein
2024-02-26 19:06 ` H.J. Lu
2024-02-27 17:00 ` Noah Goldstein
2024-03-08 20:10 ` H.J. Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).