From: "H.J. Lu" <hjl.tools@gmail.com>
To: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: libc-alpha@sourceware.org
Subject: Re: [PATCH v4 2/2] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers
Date: Wed, 14 Feb 2024 16:23:34 -0800 [thread overview]
Message-ID: <Zc1Zhqd16gEV60gO@gmail.com> (raw)
In-Reply-To: <CAFUsyf+=OoSpazOabLndvX2QaXqm+0B90-0d4R=uv3h_AqEymg@mail.gmail.com>
On Wed, Feb 14, 2024 at 11:57:20PM +0000, Noah Goldstein wrote:
> On Wed, Feb 14, 2024 at 11:21 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Wed, Feb 14, 2024 at 10:44:20PM +0000, Noah Goldstein wrote:
> > > On Tue, Feb 13, 2024 at 4:15 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > Compiler generates the following instruction sequence for GNU2 dynamic
> > > > TLS access:
> > > >
> > > > leaq tls_var@TLSDESC(%rip), %rax
> > > > call *tls_var@TLSCALL(%rax)
> > > >
> > > > or
> > > >
> > > > leal tls_var@TLSDESC(%ebx), %eax
> > > > call *tls_var@TLSCALL(%eax)
> > > >
> > > > CALL instruction is transparent to compiler which assumes all registers,
> > > > except for EFLAGS and RAX/EAX, are unchanged after CALL. When
> > > > _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
> > > > path. __tls_get_addr is a normal function which doesn't preserve any
> > > > caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer
> > > > caller-saved registers, but didn't preserve any other caller-saved
> > > > registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
> > > > XSAVE and XSAVEC to save and restore all caller-saved registers. This
> > > > fixes BZ #31372.
> > > >
> > > > Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
> > > > to optimize elf_machine_runtime_setup.
> > > > ---
> > > > elf/Makefile | 19 ++
> > > > elf/malloc-for-test.c | 32 ++++
> > > > elf/malloc-for-test.map | 6 +
> > > > elf/tst-gnu2-tls2.c | 97 ++++++++++
> > > > elf/tst-gnu2-tls2.h | 26 +++
> > > > elf/tst-gnu2-tls2mod0.c | 28 +++
> > > > elf/tst-gnu2-tls2mod1.c | 28 +++
> > > > elf/tst-gnu2-tls2mod2.c | 28 +++
> > > > sysdeps/i386/dl-machine.h | 2 +-
> > > > sysdeps/i386/dl-tlsdesc-dynamic.h | 187 +++++++++++++++++++
> > > > sysdeps/i386/dl-tlsdesc.S | 115 +++++-------
> > > > sysdeps/i386/tst-gnu2-tls2.c | 5 +
> > > > sysdeps/x86/Makefile | 7 +-
> > > > sysdeps/x86/cpu-features.c | 56 +++++-
> > > > sysdeps/x86/dl-procinfo.c | 16 ++
> > > > sysdeps/{x86_64 => x86}/features-offsets.sym | 2 +
> > > > sysdeps/x86/malloc-for-test.c | 33 ++++
> > > > sysdeps/x86/sysdep.h | 6 +
> > > > sysdeps/x86_64/Makefile | 2 +-
> > > > sysdeps/x86_64/dl-machine.h | 19 +-
> > > > sysdeps/x86_64/dl-procinfo.c | 16 ++
> > > > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 166 ++++++++++++++++
> > > > sysdeps/x86_64/dl-tlsdesc.S | 108 ++++-------
> > > > sysdeps/x86_64/dl-trampoline-save.h | 34 ++++
> > > > sysdeps/x86_64/dl-trampoline-state.h | 51 +++++
> > > > sysdeps/x86_64/dl-trampoline.S | 20 +-
> > > > sysdeps/x86_64/dl-trampoline.h | 34 +---
> > > > 27 files changed, 930 insertions(+), 213 deletions(-)
> > > > create mode 100644 elf/malloc-for-test.c
> > > > create mode 100644 elf/malloc-for-test.map
> > > > create mode 100644 elf/tst-gnu2-tls2.c
> > > > create mode 100644 elf/tst-gnu2-tls2.h
> > > > create mode 100644 elf/tst-gnu2-tls2mod0.c
> > > > create mode 100644 elf/tst-gnu2-tls2mod1.c
> > > > create mode 100644 elf/tst-gnu2-tls2mod2.c
> > > > create mode 100644 sysdeps/i386/dl-tlsdesc-dynamic.h
> > > > create mode 100644 sysdeps/i386/tst-gnu2-tls2.c
> > > > rename sysdeps/{x86_64 => x86}/features-offsets.sym (89%)
> > > > create mode 100644 sysdeps/x86/malloc-for-test.c
> > > > create mode 100644 sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > > create mode 100644 sysdeps/x86_64/dl-trampoline-save.h
> > > > create mode 100644 sysdeps/x86_64/dl-trampoline-state.h
> > > >
> > > > diff --git a/elf/Makefile b/elf/Makefile
> > > > index 5d78b659ce..e0665d2007 100644
> > > > --- a/elf/Makefile
> > > > +++ b/elf/Makefile
> > > > @@ -424,6 +424,7 @@ tests += \
> > > > tst-glibc-hwcaps-prepend \
> > > > tst-global1 \
> > > > tst-global2 \
> > > > + tst-gnu2-tls2 \
> > > > tst-initfinilazyfail \
> > > > tst-initorder \
> > > > tst-initorder2 \
> > > > @@ -699,6 +700,7 @@ modules-names += \
> > > > libtracemod5-1 \
> > > > ltglobmod1 \
> > > > ltglobmod2 \
> > > > + malloc-for-test \
> > > > neededobj1 \
> > > > neededobj2 \
> > > > neededobj3 \
> > > > @@ -846,6 +848,9 @@ modules-names += \
> > > > tst-filterobj-flt \
> > > > tst-finilazyfailmod \
> > > > tst-globalmod2 \
> > > > + tst-gnu2-tls2mod0 \
> > > > + tst-gnu2-tls2mod1 \
> > > > + tst-gnu2-tls2mod2 \
> > > > tst-initlazyfailmod \
> > > > tst-initorder2a \
> > > > tst-initorder2b \
> > > > @@ -3044,8 +3049,22 @@ $(objpfx)tst-tlsgap.out: \
> > > > $(objpfx)tst-tlsgap-mod0.so \
> > > > $(objpfx)tst-tlsgap-mod1.so \
> > > > $(objpfx)tst-tlsgap-mod2.so
> > > > +
> > > > +$(objpfx)tst-gnu2-tls2: \
> > > > + $(shared-thread-library) \
> > > > + $(objpfx)malloc-for-test.so
> > > > +$(objpfx)tst-gnu2-tls2.out: \
> > > > + $(objpfx)tst-gnu2-tls2mod0.so \
> > > > + $(objpfx)tst-gnu2-tls2mod1.so \
> > > > + $(objpfx)tst-gnu2-tls2mod2.so
> > > > +
> > > > +LDFLAGS-malloc-for-test.so += -Wl,--version-script=malloc-for-test.map
> > > > +
> > > > ifeq (yes,$(have-mtls-dialect-gnu2))
> > > > CFLAGS-tst-tlsgap-mod0.c += -mtls-dialect=gnu2
> > > > CFLAGS-tst-tlsgap-mod1.c += -mtls-dialect=gnu2
> > > > CFLAGS-tst-tlsgap-mod2.c += -mtls-dialect=gnu2
> > > > +CFLAGS-tst-gnu2-tls2mod0.c += -mtls-dialect=gnu2
> > > > +CFLAGS-tst-gnu2-tls2mod1.c += -mtls-dialect=gnu2
> > > > +CFLAGS-tst-gnu2-tls2mod2.c += -mtls-dialect=gnu2
> > > > endif
> > > > diff --git a/elf/malloc-for-test.c b/elf/malloc-for-test.c
> > > > new file mode 100644
> > > > index 0000000000..1bec69eda7
> > > > --- /dev/null
> > > > +++ b/elf/malloc-for-test.c
> > > > @@ -0,0 +1,32 @@
> > > > +/* A malloc for intercept test.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <http://www.gnu.org/licenses/>. */
> > > > +
> > > > +#include <stdlib.h>
> > > > +
> > > > +extern void * __libc_malloc (size_t);
> > > > +
> > > > +#ifndef PREPARE_MALLOC
> > > > +# define PREPARE_MALLOC()
> > > > +#endif
> > > > +
> > > > +void *
> > > > +malloc (size_t n)
> > > > +{
> > > > + PREPARE_MALLOC ();
> > > > + return __libc_malloc (n);
> > > > +}
> > > > diff --git a/elf/malloc-for-test.map b/elf/malloc-for-test.map
> > > > new file mode 100644
> > > > index 0000000000..8437cf4346
> > > > --- /dev/null
> > > > +++ b/elf/malloc-for-test.map
> > > > @@ -0,0 +1,6 @@
> > > > +GLIBC_2.0 {
> > > > + global:
> > > > + malloc;
> > > > + local:
> > > > + *;
> > > > +};
> > > > diff --git a/elf/tst-gnu2-tls2.c b/elf/tst-gnu2-tls2.c
> > > > new file mode 100644
> > > > index 0000000000..34427f9a0f
> > > > --- /dev/null
> > > > +++ b/elf/tst-gnu2-tls2.c
> > > > @@ -0,0 +1,97 @@
> > > > +/* Test TLSDESC relocation.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <http://www.gnu.org/licenses/>. */
> > > > +
> > > > +#include <stdio.h>
> > > > +#include <string.h>
> > > > +#include <dlfcn.h>
> > > > +#include <pthread.h>
> > > > +#include <support/xdlfcn.h>
> > > > +#include <support/xthread.h>
> > > > +#include <support/check.h>
> > > > +#include <support/test-driver.h>
> > > > +#include "tst-gnu2-tls2.h"
> > > > +
> > > > +#ifndef IS_SUPPORTED
> > > > +# define IS_SUPPORTED() true
> > > > +#endif
> > > > +
> > > > +static void *mod[3];
> > > > +#define MOD(i) "tst-gnu2-tls2mod" #i ".so"
> > > > +static const char *modname[3] = { MOD(0), MOD(1), MOD(2) };
> > > > +#undef MOD
> > > > +
> > > > +static void
> > > > +open_mod (int i)
> > > > +{
> > > > + mod[i] = xdlopen (modname[i], RTLD_LAZY);
> > > > + printf ("open %s\n", modname[i]);
> > > > +}
> > > > +
> > > > +static void
> > > > +close_mod (int i)
> > > > +{
> > > > + xdlclose (mod[i]);
> > > > + mod[i] = NULL;
> > > > + printf ("close %s\n", modname[i]);
> > > > +}
> > > > +
> > > > +static void
> > > > +access_mod (int i, const char *sym)
> > > > +{
> > > > + struct tls var = { -1, -1, -1, -1 };
> > > > + struct tls *(*f) (struct tls *) = xdlsym (mod[i], sym);
> > > > + struct tls *p = f (&var);
> > > > + printf ("access %s: %s() = %p\n", modname[i], sym, p);
> > > > + TEST_VERIFY_EXIT (memcmp (p, &var, sizeof (var)) == 0);
> > > > + ++(p->a);
> > > > +}
> > > > +
> > > > +static void *
> > > > +start (void *arg)
> > > > +{
> > > > + /* The DTV generation is at the last dlopen of mod0 and the
> > > > + entry for mod1 is NULL. */
> > > > +
> > > > + open_mod (1); /* Reuse modid of mod1. Uses dynamic TLS. */
> > > > +
> > > > + /* Force the slow path in GNU2 TLS descriptor call. */
> > > > + access_mod (1, "apply_tls");
> > > > +
> > > > + return arg;
> > > > +}
> > > > +
> > > > +static int
> > > > +do_test (void)
> > > > +{
> > > > + if (!IS_SUPPORTED ())
> > > > + return EXIT_UNSUPPORTED;
> > > > +
> > > > + open_mod (0);
> > > > + open_mod (1);
> > > > + open_mod (2);
> > > > + close_mod (0);
> > > > + close_mod (1); /* Create modid gap at mod1. */
> > > > + open_mod (0); /* Reuse modid of mod0, bump generation count. */
> > > > +
> > > > + /* Create a thread where DTV of mod1 is NULL. */
> > > > + pthread_t t = xpthread_create (NULL, start, NULL);
> > > > + xpthread_join (t);
> > > > + return 0;
> > > > +}
> > > > +
> > > > +#include <support/test-driver.c>
> > > > diff --git a/elf/tst-gnu2-tls2.h b/elf/tst-gnu2-tls2.h
> > > > new file mode 100644
> > > > index 0000000000..e33f4dbe27
> > > > --- /dev/null
> > > > +++ b/elf/tst-gnu2-tls2.h
> > > > @@ -0,0 +1,26 @@
> > > > +/* Test TLSDESC relocation.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <https://www.gnu.org/licenses/>. */
> > > > +
> > > > +#include <stdint.h>
> > > > +
> > > > +struct tls
> > > > +{
> > > > + int64_t a, b, c, d;
> > > > +};
> > > > +
> > > > +extern struct tls *apply_tls (struct tls *);
> > > > diff --git a/elf/tst-gnu2-tls2mod0.c b/elf/tst-gnu2-tls2mod0.c
> > > > new file mode 100644
> > > > index 0000000000..67dc0d464d
> > > > --- /dev/null
> > > > +++ b/elf/tst-gnu2-tls2mod0.c
> > > > @@ -0,0 +1,28 @@
> > > > +/* DSO used by tst-gnu2-tls2.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <https://www.gnu.org/licenses/>. */
> > > > +
> > > > +#include "tst-gnu2-tls2.h"
> > > > +
> > > > +__thread struct tls tls_var0 __attribute__ ((visibility ("hidden")));
> > > > +
> > > > +struct tls *
> > > > +apply_tls (struct tls *p)
> > > > +{
> > > > + tls_var0 = *p;
> > > > + return &tls_var0;
> > > > +}
> > > > diff --git a/elf/tst-gnu2-tls2mod1.c b/elf/tst-gnu2-tls2mod1.c
> > > > new file mode 100644
> > > > index 0000000000..a4ae6db24f
> > > > --- /dev/null
> > > > +++ b/elf/tst-gnu2-tls2mod1.c
> > > > @@ -0,0 +1,28 @@
> > > > +/* DSO used by tst-gnu2-tls2.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <https://www.gnu.org/licenses/>. */
> > > > +
> > > > +#include "tst-gnu2-tls2.h"
> > > > +
> > > > +__thread struct tls tls_var1[100] __attribute__ ((visibility ("hidden")));
> > > > +
> > > > +struct tls *
> > > > +apply_tls (struct tls *p)
> > > > +{
> > > > + tls_var1[1] = *p;
> > > > + return &tls_var1[1];
> > > > +}
> > > > diff --git a/elf/tst-gnu2-tls2mod2.c b/elf/tst-gnu2-tls2mod2.c
> > > > new file mode 100644
> > > > index 0000000000..2d13921717
> > > > --- /dev/null
> > > > +++ b/elf/tst-gnu2-tls2mod2.c
> > > > @@ -0,0 +1,28 @@
> > > > +/* DSO used by tst-gnu2-tls2.
> > > > + Copyright (C) 2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <https://www.gnu.org/licenses/>. */
> > > > +
> > > > +#include "tst-gnu2-tls2.h"
> > > > +
> > > > +__thread struct tls tls_var2 __attribute__ ((visibility ("hidden")));
> > > > +
> > > > +struct tls *
> > > > +apply_tls (struct tls *p)
> > > > +{
> > > > + tls_var2 = *p;
> > > > + return &tls_var2;
> > > > +}
> > > > diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
> > > > index fc1ef96587..50d74fe6e9 100644
> > > > --- a/sysdeps/i386/dl-machine.h
> > > > +++ b/sysdeps/i386/dl-machine.h
> > > > @@ -347,7 +347,7 @@ and creates an unsatisfiable circular dependency.\n",
> > > > {
> > > > td->arg = _dl_make_tlsdesc_dynamic
> > > > (sym_map, sym->st_value + (ElfW(Word))td->arg);
> > > > - td->entry = _dl_tlsdesc_dynamic;
> > > > + td->entry = GLRO(dl_x86_tlsdesc_dynamic);
> > > > }
> > > > else
> > > > # endif
> > > > diff --git a/sysdeps/i386/dl-tlsdesc-dynamic.h b/sysdeps/i386/dl-tlsdesc-dynamic.h
> > > > new file mode 100644
> > > > index 0000000000..675e56d32d
> > > > --- /dev/null
> > > > +++ b/sysdeps/i386/dl-tlsdesc-dynamic.h
> > > > @@ -0,0 +1,187 @@
> > > > +/* Thread-local storage handling in the ELF dynamic linker. i386 version.
> > > > + Copyright (C) 2004-2024 Free Software Foundation, Inc.
> > > > + This file is part of the GNU C Library.
> > > > +
> > > > + The GNU C Library is free software; you can redistribute it and/or
> > > > + modify it under the terms of the GNU Lesser General Public
> > > > + License as published by the Free Software Foundation; either
> > > > + version 2.1 of the License, or (at your option) any later version.
> > > > +
> > > > + The GNU C Library is distributed in the hope that it will be useful,
> > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > > + Lesser General Public License for more details.
> > > > +
> > > > + You should have received a copy of the GNU Lesser General Public
> > > > + License along with the GNU C Library; if not, see
> > > > + <https://www.gnu.org/licenses/>. */
> > > > +
> > > > +#undef REGISTER_SAVE_AREA
> > > > +
> > > > +#if !defined USE_FNSAVE && (STATE_SAVE_ALIGNMENT % 16) != 0
> > > > +# error STATE_SAVE_ALIGNMENT must be multiple of 16
> > > > +#endif
> > > > +
> > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > +# ifdef USE_FNSAVE
> > > > +# error USE_FNSAVE shouldn't be defined
> > > > +# endif
> > > > +# ifdef USE_FXSAVE
> > > > +/* Use fxsave to save all registers. */
> > > > +# define REGISTER_SAVE_AREA 512
> > > > +# endif
> > > > +#else
> > > > +# ifdef USE_FNSAVE
> > > > +/* Use fnsave to save x87 FPU stack registers. */
> > > > +# define REGISTER_SAVE_AREA 108
> > > > +# else
> > > > +# ifndef USE_FXSAVE
> > > > +# error USE_FXSAVE must be defined
> > > > +# endif
> > > > +/* Use fxsave to save all registers. Add 12 bytes to align the stack
> > > > + to 16 bytes. */
> > > > +# define REGISTER_SAVE_AREA (512 + 12)
> > > > +# endif
> > > > +#endif
> > > > +
> > > > + .hidden _dl_tlsdesc_dynamic
> > > > + .global _dl_tlsdesc_dynamic
> > > > + .type _dl_tlsdesc_dynamic,@function
> > > > +
> > > > + /* This function is used for symbols that need dynamic TLS.
> > > > +
> > > > + %eax points to the TLS descriptor, such that 0(%eax) points to
> > > > + _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct
> > > > + tlsdesc_dynamic_arg object. It must return in %eax the offset
> > > > + between the thread pointer and the object denoted by the
> > > > + argument, without clobbering any registers.
> > > > +
> > > > + The assembly code that follows is a rendition of the following
> > > > + C code, hand-optimized a little bit.
> > > > +
> > > > +ptrdiff_t
> > > > +__attribute__ ((__regparm__ (1)))
> > > > +_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
> > > > +{
> > > > + struct tlsdesc_dynamic_arg *td = tdp->arg;
> > > > + dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
> > > > + if (__builtin_expect (td->gen_count <= dtv[0].counter
> > > > + && (dtv[td->tlsinfo.ti_module].pointer.val
> > > > + != TLS_DTV_UNALLOCATED),
> > > > + 1))
> > > > + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
> > > > + - __thread_pointer;
> > > > +
> > > > + return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
> > > > +}
> > > > +*/
> > > > + cfi_startproc
> > > > + .align 16
> > > > +_dl_tlsdesc_dynamic:
> > > > + /* Like all TLS resolvers, preserve call-clobbered registers.
> > > > + We need two scratch regs anyway. */
> > > > + subl $32, %esp
> > > > + cfi_adjust_cfa_offset (32)
> > > > + movl %ecx, 20(%esp)
> > > > + movl %edx, 24(%esp)
> > > > + movl TLSDESC_ARG(%eax), %eax
> > > > + movl %gs:DTV_OFFSET, %edx
> > > > + movl TLSDESC_GEN_COUNT(%eax), %ecx
> > > > + cmpl (%edx), %ecx
> > > > + ja 2f
> > > > + movl TLSDESC_MODID(%eax), %ecx
> > > > + movl (%edx,%ecx,8), %edx
> > > > + cmpl $-1, %edx
> > > > + je 2f
> > > > + movl TLSDESC_MODOFF(%eax), %eax
> > > > + addl %edx, %eax
> > > > +1:
> > > > + movl 20(%esp), %ecx
> > > > + subl %gs:0, %eax
> > > > + movl 24(%esp), %edx
> > > > + addl $32, %esp
> > > > + cfi_adjust_cfa_offset (-32)
> > > > + ret
> > > > + .p2align 4,,7
> > > > +2:
> > > > + cfi_adjust_cfa_offset (32)
> > > Extraneous AFAICT.
> >
> > This was in the existing code. The label 2 can only be reached by
> > a jump. When the label 2 is reached, this CFA adjustment is to tell
> > debugger that CFA isn't changed the CFA directive above.
> >
> > >
> > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > + movl %ebx, -28(%esp)
> > > > + movl %esp, %ebx
> > > > + cfi_def_cfa_register(%ebx)
> > > > + and $-STATE_SAVE_ALIGNMENT, %esp
> > > > +#endif
> > > > +#ifdef REGISTER_SAVE_AREA
> > > > + subl $REGISTER_SAVE_AREA, %esp
> > > > +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
> > > > +# endif
> > > > +#else
> > > > + # Allocate stack space of the required size to save the state.
> > > > + LOAD_PIC_REG (cx)
> > > > + subl RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET+_rtld_local_ro@GOTOFF(%ecx), %esp
> > > > +#endif
> > > > +#ifdef USE_FNSAVE
> > > > + fnsave (%esp)
> > > > +#elif defined USE_FXSAVE
> > > > + fxsave (%esp)
> > > > +#else
> > > > + # Save the argument for ___tls_get_addr in EAX.
> > > > + movl %eax, %ecx
> > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > > + xorl %edx, %edx
> > > > + # Clear the XSAVE Header.
> > > > +# ifdef USE_XSAVE
> > > > + movl %edx, (512)(%esp)
> > > > + movl %edx, (512 + 4 * 1)(%esp)
> > > > + movl %edx, (512 + 4 * 2)(%esp)
> > > > + movl %edx, (512 + 4 * 3)(%esp)
> > > > +# endif
> > > > + movl %edx, (512 + 4 * 4)(%esp)
> > > > + movl %edx, (512 + 4 * 5)(%esp)
> > > > + movl %edx, (512 + 4 * 6)(%esp)
> > > > + movl %edx, (512 + 4 * 7)(%esp)
> > > > + movl %edx, (512 + 4 * 8)(%esp)
> > > > + movl %edx, (512 + 4 * 9)(%esp)
> > > > + movl %edx, (512 + 4 * 10)(%esp)
> > > > + movl %edx, (512 + 4 * 11)(%esp)
> > > > + movl %edx, (512 + 4 * 12)(%esp)
> > > > + movl %edx, (512 + 4 * 13)(%esp)
> > > > + movl %edx, (512 + 4 * 14)(%esp)
> > > > + movl %edx, (512 + 4 * 15)(%esp)
> > > > +# ifdef USE_XSAVE
> > > > + xsave (%esp)
> > > > +# else
> > > > + xsavec (%esp)
> > > > +# endif
> > > > + # Restore the argument for ___tls_get_addr in EAX.
> > > > + movl %ecx, %eax
> > > > +#endif
> > > > + call HIDDEN_JUMPTARGET (___tls_get_addr)
> > > > + # Get register content back.
> > > > +#ifdef USE_FNSAVE
> > > > + frstor (%esp)
> > > > +#elif defined USE_FXSAVE
> > > > + fxrstor (%esp)
> > > > +#else
> > > > + /* Save and retore ___tls_get_addr return value stored in EAX. */
> > > > + movl %eax, %ecx
> > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax
> > > > + xorl %edx, %edx
> > > > + xrstor (%esp)
> > > > + movl %ecx, %eax
> > > > +#endif
> > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > > + mov %ebx, %esp
> > > > + cfi_def_cfa_register(%esp)
> > > > + movl -28(%esp), %ebx
> > > > + cfi_restore(%ebx)
> > > > +#else
> > > > + addl $REGISTER_SAVE_AREA, %esp
> > > > + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA)
> > > The use of `REGISTER_SAVE_AREA` above is guarded by an
> > > `#ifdef REGISTER_SAVE_AREA`
> > > and uses
> > > `_rtld_local_ro+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET(%rip)`
> > > otherwise.
> > > Would expect same here?
> >
> > REGISTER_SAVE_AREA is only used by fnsave and fxsave which
> > expect the fixed area.
> >
> > _rtld_local_ro+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET(%rip)
> > is used by xsave and xsavec whose saved area size depends on
> > the enabled features.
> >
> > 2 things are different.
>
> My point is that we setup the stack above with ifdef i.e
> ```
> #ifdef REGISTER_SAVE_AREA
> subl $REGISTER_SAVE_AREA, %esp
> #else
> subl RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET+_rtld_local_ro@GOTOFF(%ecx),
> %esp
> #endif
> ```
> Shouldnt you have the same ifdef for restoring?
The actual code is
#ifdef REGISTER_SAVE_AREA
subl $REGISTER_SAVE_AREA, %esp
# if !DL_RUNTIME_RESOLVE_REALIGN_STACK
cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
# endif
#else
# Allocate stack space of the required size to save the state.
LOAD_PIC_REG (cx)
subl RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_OFFSET+_rtld_local_ro@GOTOFF(%ecx), %esp
#endif
I am not sure how your suggestion should work.
H.J.
next prev parent reply other threads:[~2024-02-15 0:23 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-13 4:14 [PATCH v4 0/2] " H.J. Lu
2024-02-13 4:15 ` [PATCH v4 1/2] x86-64: Save APX registers in ld.so trampoline H.J. Lu
2024-02-13 4:15 ` [PATCH v4 2/2] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers H.J. Lu
2024-02-14 22:44 ` Noah Goldstein
2024-02-14 23:21 ` H.J. Lu
2024-02-14 23:57 ` Noah Goldstein
2024-02-15 0:23 ` H.J. Lu [this message]
2024-02-15 23:05 ` Adhemerval Zanella Netto
2024-02-15 23:15 ` H.J. Lu
2024-02-16 6:23 ` Florian Weimer
2024-02-16 11:59 ` H.J. Lu
2024-02-16 12:18 ` Florian Weimer
2024-02-16 12:20 ` H.J. Lu
2024-02-16 12:37 ` H.J. Lu
2024-02-16 12:47 ` Adhemerval Zanella Netto
2024-02-16 12:58 ` H.J. Lu
2024-02-16 13:24 ` Adhemerval Zanella Netto
2024-02-16 14:25 ` H.J. Lu
2024-02-16 13:06 ` Florian Weimer
2024-02-16 13:24 ` H.J. Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zc1Zhqd16gEV60gO@gmail.com \
--to=hjl.tools@gmail.com \
--cc=goldstein.w.n@gmail.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).