From: "H.J. Lu" <hjl.tools@gmail.com>
To: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: libc-alpha@sourceware.org, Florian Weimer <fweimer@redhat.com>,
Fangrui Song <maskray@google.com>,
"Carlos O'Donell" <carlos@redhat.com>
Subject: Re: [PATCH v4 2/2] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers
Date: Thu, 15 Feb 2024 15:15:10 -0800 [thread overview]
Message-ID: <CAMe9rOqichkHy=AhGZE4m=4QC6LpVQYgB3V9V5nSHn9oiXi3hg@mail.gmail.com> (raw)
In-Reply-To: <ce327884-642d-4ff4-86df-10ff08804a58@linaro.org>
On Thu, Feb 15, 2024 at 3:05 PM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 13/02/24 01:15, H.J. Lu wrote:
> > Compiler generates the following instruction sequence for GNU2 dynamic
> > TLS access:
> >
> > leaq tls_var@TLSDESC(%rip), %rax
> > call *tls_var@TLSCALL(%rax)
> >
> > or
> >
> > leal tls_var@TLSDESC(%ebx), %eax
> > call *tls_var@TLSCALL(%eax)
> >
> > CALL instruction is transparent to compiler which assumes all registers,
> > except for EFLAGS and RAX/EAX, are unchanged after CALL. When
> > _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
> > path. __tls_get_addr is a normal function which doesn't preserve any
> > caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer
> > caller-saved registers, but didn't preserve any other caller-saved
> > registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
> > XSAVE and XSAVEC to save and restore all caller-saved registers. This
> > fixes BZ #31372.
> >
> > Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
> > to optimize elf_machine_runtime_setup.
> > ---
> > elf/Makefile | 19 ++
> > elf/malloc-for-test.c | 32 ++++
> > elf/malloc-for-test.map | 6 +
> > elf/tst-gnu2-tls2.c | 97 ++++++++++
> > elf/tst-gnu2-tls2.h | 26 +++
> > elf/tst-gnu2-tls2mod0.c | 28 +++
> > elf/tst-gnu2-tls2mod1.c | 28 +++
> > elf/tst-gnu2-tls2mod2.c | 28 +++
> > sysdeps/i386/dl-machine.h | 2 +-
> > sysdeps/i386/dl-tlsdesc-dynamic.h | 187 +++++++++++++++++++
> > sysdeps/i386/dl-tlsdesc.S | 115 +++++-------
> > sysdeps/i386/tst-gnu2-tls2.c | 5 +
> > sysdeps/x86/Makefile | 7 +-
> > sysdeps/x86/cpu-features.c | 56 +++++-
> > sysdeps/x86/dl-procinfo.c | 16 ++
> > sysdeps/{x86_64 => x86}/features-offsets.sym | 2 +
> > sysdeps/x86/malloc-for-test.c | 33 ++++
> > sysdeps/x86/sysdep.h | 6 +
> > sysdeps/x86_64/Makefile | 2 +-
> > sysdeps/x86_64/dl-machine.h | 19 +-
> > sysdeps/x86_64/dl-procinfo.c | 16 ++
> > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 166 ++++++++++++++++
> > sysdeps/x86_64/dl-tlsdesc.S | 108 ++++-------
> > sysdeps/x86_64/dl-trampoline-save.h | 34 ++++
> > sysdeps/x86_64/dl-trampoline-state.h | 51 +++++
> > sysdeps/x86_64/dl-trampoline.S | 20 +-
> > sysdeps/x86_64/dl-trampoline.h | 34 +---
> > 27 files changed, 930 insertions(+), 213 deletions(-)
> > create mode 100644 elf/malloc-for-test.c
> > create mode 100644 elf/malloc-for-test.map
> > create mode 100644 elf/tst-gnu2-tls2.c
> > create mode 100644 elf/tst-gnu2-tls2.h
> > create mode 100644 elf/tst-gnu2-tls2mod0.c
> > create mode 100644 elf/tst-gnu2-tls2mod1.c
> > create mode 100644 elf/tst-gnu2-tls2mod2.c
> > create mode 100644 sysdeps/i386/dl-tlsdesc-dynamic.h
> > create mode 100644 sysdeps/i386/tst-gnu2-tls2.c
> > rename sysdeps/{x86_64 => x86}/features-offsets.sym (89%)
> > create mode 100644 sysdeps/x86/malloc-for-test.c
> > create mode 100644 sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > create mode 100644 sysdeps/x86_64/dl-trampoline-save.h
> > create mode 100644 sysdeps/x86_64/dl-trampoline-state.h
> >
> > diff --git a/elf/Makefile b/elf/Makefile
> > index 5d78b659ce..e0665d2007 100644
> > --- a/elf/Makefile
> > +++ b/elf/Makefile
> > @@ -424,6 +424,7 @@ tests += \
> > tst-glibc-hwcaps-prepend \
> > tst-global1 \
> > tst-global2 \
> > + tst-gnu2-tls2 \
> > tst-initfinilazyfail \
> > tst-initorder \
> > tst-initorder2 \
> > @@ -699,6 +700,7 @@ modules-names += \
> > libtracemod5-1 \
> > ltglobmod1 \
> > ltglobmod2 \
> > + malloc-for-test \
> > neededobj1 \
> > neededobj2 \
> > neededobj3 \
> > @@ -846,6 +848,9 @@ modules-names += \
> > tst-filterobj-flt \
> > tst-finilazyfailmod \
> > tst-globalmod2 \
> > + tst-gnu2-tls2mod0 \
> > + tst-gnu2-tls2mod1 \
> > + tst-gnu2-tls2mod2 \
> > tst-initlazyfailmod \
> > tst-initorder2a \
> > tst-initorder2b \
> > @@ -3044,8 +3049,22 @@ $(objpfx)tst-tlsgap.out: \
> > $(objpfx)tst-tlsgap-mod0.so \
> > $(objpfx)tst-tlsgap-mod1.so \
> > $(objpfx)tst-tlsgap-mod2.so
> > +
> > +$(objpfx)tst-gnu2-tls2: \
> > + $(shared-thread-library) \
> > + $(objpfx)malloc-for-test.so
> > +$(objpfx)tst-gnu2-tls2.out: \
> > + $(objpfx)tst-gnu2-tls2mod0.so \
> > + $(objpfx)tst-gnu2-tls2mod1.so \
> > + $(objpfx)tst-gnu2-tls2mod2.so
> > +
> > +LDFLAGS-malloc-for-test.so += -Wl,--version-script=malloc-for-test.map
> > +
> > ifeq (yes,$(have-mtls-dialect-gnu2))
> > CFLAGS-tst-tlsgap-mod0.c += -mtls-dialect=gnu2
> > CFLAGS-tst-tlsgap-mod1.c += -mtls-dialect=gnu2
> > CFLAGS-tst-tlsgap-mod2.c += -mtls-dialect=gnu2
> > +CFLAGS-tst-gnu2-tls2mod0.c += -mtls-dialect=gnu2
> > +CFLAGS-tst-gnu2-tls2mod1.c += -mtls-dialect=gnu2
> > +CFLAGS-tst-gnu2-tls2mod2.c += -mtls-dialect=gnu2
> > endif
> > diff --git a/elf/malloc-for-test.c b/elf/malloc-for-test.c
> > new file mode 100644
> > index 0000000000..1bec69eda7
> > --- /dev/null
> > +++ b/elf/malloc-for-test.c
> > @@ -0,0 +1,32 @@
> > +/* A malloc for intercept test.
> > + Copyright (C) 2024 Free Software Foundation, Inc.
> > + This file is part of the GNU C Library.
> > +
> > + The GNU C Library is free software; you can redistribute it and/or
> > + modify it under the terms of the GNU Lesser General Public
> > + License as published by the Free Software Foundation; either
> > + version 2.1 of the License, or (at your option) any later version.
> > +
> > + The GNU C Library is distributed in the hope that it will be useful,
> > + but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + Lesser General Public License for more details.
> > +
> > + You should have received a copy of the GNU Lesser General Public
> > + License along with the GNU C Library; if not, see
> > + <http://www.gnu.org/licenses/>. */
> > +
> > +#include <stdlib.h>
> > +
> > +extern void * __libc_malloc (size_t);
> > +
> > +#ifndef PREPARE_MALLOC
> > +# define PREPARE_MALLOC()
> > +#endif
> > +
> > +void *
> > +malloc (size_t n)
> > +{
> > + PREPARE_MALLOC ();
>
> It is not clear to me how exactly this adds proper coverage without
> actually set the affected registers *before* the TLS variable access
> and later check its value has not changed. In fact, on x86_64 it is
> passing on my system even without the test actually work as expected
> (see below).
It depends on the compiler version. Newer GCC will generate vector
load/store on x86-64 to copy a structure.
> And I think we also need to add arch-specific rules to build the test
> with only the base ABI, and add extra macros to clobber and check
> the expected registers that _dl_tlsdesc_dynamic should save/restore.
>
> It would be slightly more trick on ABIs that already have a large set
> or register (like x86_64-v1 and armv8-a).
>
> > + return __libc_malloc (n);
> > +}
> > diff --git a/elf/malloc-for-test.map b/elf/malloc-for-test.map
> > new file mode 100644
> > index 0000000000..8437cf4346
> > --- /dev/null
> > +++ b/elf/malloc-for-test.map
> > @@ -0,0 +1,6 @@
> > +GLIBC_2.0 {
>
> You need to use the correct version to override the malloc:
>
> $ gdb --args tst-gnu2-tls2 --direct
> [...]
> (gdb) b apply_tls
> (gdb) r
> Thread 2 "tst-gnu2-tls2" hit Breakpoint 1, apply_tls (p=0x7ffff7bfee80) at tst-gnu2-tls2mod1.c:25
> 25 {
> (gdb) b malloc
> Breakpoint 2 at 0x7ffff7ca8ad0: malloc. (3 locations)
> (gdb) c
> Continuing.
>
> Thread 2 "test-gnu2-tls2" hit Breakpoint 2, __GI___libc_malloc (bytes=3200) at malloc.c:3294
> 3294 {
> (gdb) bt
> #0 __GI___libc_malloc (bytes=3200) at malloc.c:3294
> #1 0x00007ffff7fda3de in malloc (size=<optimized out>) at ../include/rtld-malloc.h:56
> #2 allocate_dtv_entry (size=<optimized out>, alignment=16) at ../elf/dl-tls.c:679
> #3 allocate_and_init (map=0x7ffff0000bd0) at ../elf/dl-tls.c:704
> #4 tls_get_addr_tail (ti=0x7ffff0001240, dtv=0x55555555e340, the_map=0x7ffff0000bd0) at ../elf/dl-tls.c:904
> #5 0x00007ffff7fdda2e in _dl_tlsdesc_dynamic_xsavec () at ../sysdeps/x86_64/dl-tlsdesc-dynamic.h:135
> #6 0x00007ffff7fb0155 in apply_tls (p=0xc80) at tst-gnu2-tls2mod1.c:27
> #7 0x0000555555556965 in access_mod (i=1, sym=0x555555559022 "apply_tls") at tst-gnu2-tls2.c:58
> #8 start (arg=0x0) at tst-gnu2-tls2.c:73
> #9 0x00007ffff7c96a82 in start_thread (arg=<optimized out>) at pthread_create.c:447
> #10 0x00007ffff7d1b13c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
>
> By using GLIBC_2.2.5 instead of GLIBC_2.0, now I see:
>
> (gdb) bt
> #0 malloc (n=3200) at ../elf/malloc-for-test.c:29
> #1 0x00007ffff7fda3de in malloc (size=<optimized out>) at ../include/rtld-malloc.h:56
> #2 allocate_dtv_entry (size=<optimized out>, alignment=16) at ../elf/dl-tls.c:679
> #3 allocate_and_init (map=0x7ffff0000bd0) at ../elf/dl-tls.c:704
> #4 tls_get_addr_tail (ti=0x7ffff0001240, dtv=0x55555555e340, the_map=0x7ffff0000bd0) at ../elf/dl-tls.c:904
> #5 0x00007ffff7fdda2e in _dl_tlsdesc_dynamic_xsavec () at ../sysdeps/x86_64/dl-tlsdesc-dynamic.h:135
> #6 0x00007ffff7fb0155 in apply_tls (p=0xc80) at tst-gnu2-tls2mod1.c:27
> #7 0x0000555555556965 in access_mod (i=1, sym=0x555555559022 "apply_tls") at tst-gnu2-tls2.c:58
> #8 start (arg=0x0) at tst-gnu2-tls2.c:73
> #9 0x00007ffff7c96a82 in start_thread (arg=<optimized out>) at pthread_create.c:447
> #10 0x00007ffff7d1b13c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
>
> So you will need either to get the correct version or parameterize
> the map file for each ABI (which is just a handful that actually
> support TLS descriptors).
An arch may need a different version map.
> Which made me realize that preloading malloc won't interpose ld
> implementation since 3a0ecccb599a6b1ad4b149dc569c0080e92d057b unless
> the malloc replacement also exports the malloc with the proper ABI
> version.
True.
> I think it is an unexpected change from BZ#25486, but at the same
> time I don't think this is a bad one. We will need to add all the
> machinery required to save/restore the caller-saved register for
> each ABI that supports TLS descriptors because even glibc malloc
> might call internal functions that might use such ABI extension
> (for instance mem* and str* functions).
>
> All this made me realize that the TLS descriptor slow path
> is *far* from maintainable, as we discussed on the weekly call.
> And I think we *should* move away from it. Some issues:
>
> * To properly support _dl_tlsdesc_dynamic on ABI with vector
> extensions, it would either need to pessimize code generation
> for TLS access (so the compiler would add all the required
> instructions to save/restore the caller-saved registers) or
> move the complexity to libc.
>
> * The latter would make the libc to require either a quite complex
> _dl_tlsdesc_dynamic, which would either need to probe hardware
> support to provide the multiple code paths or add the support
> through iFUNC.
True.
> * ARM also has the issue and I think it has not seen this issue
> because gnu2 is not the default TLS ABI and gcc likely won't change
> in nearby future. And to properly fix it, it would require to add
> something like what you are doing for x86 to support the multiple
> vector extensions (VFP, VFP3, NEON).
True.
> * Loongsong is finishing its TLSDESC ABI support on gcc/binutils, and
> most likely would require quite similar support to proper support
> LSX, LASX.
>
> * I think RISC-V would also have a similar issue for its vector ABI.
>
> So I think we really should reevaluate the BZ#16133 fix that we reverted
> on 2.20 [1] [2]. if I recall correctly (I need to go through again my
> notes about this issue), two main issues triggered the revert:
>
> 1. It broke LSAN;
> 2. Lazy allocation is an explicit feature [3].
>
> For 1. I think it should be doable to fix on sanitizer, either by adding
> more hacks to get the correct TLS size or by providing a proper ABI.
>
> However for 2. I think it is past time that we accept that lazy allocation
> was a nice idea, but it adds a *lot* of maintainability burden that
> it is not paying off.
>
> [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=1f33d36a8a9e78c81bed59b47f260723f56bb7e6
> [2] https://sourceware.org/legacy-ml/libc-alpha/2013-09/msg00721.html
> [3] https://sourceware.org/legacy-ml/libc-alpha/2014-01/msg00287.html
>
We need a short-team fix before the slow path is removed.
--
H.J.
next prev parent reply other threads:[~2024-02-15 23:15 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-13 4:14 [PATCH v4 0/2] " H.J. Lu
2024-02-13 4:15 ` [PATCH v4 1/2] x86-64: Save APX registers in ld.so trampoline H.J. Lu
2024-02-13 4:15 ` [PATCH v4 2/2] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers H.J. Lu
2024-02-14 22:44 ` Noah Goldstein
2024-02-14 23:21 ` H.J. Lu
2024-02-14 23:57 ` Noah Goldstein
2024-02-15 0:23 ` H.J. Lu
2024-02-15 23:05 ` Adhemerval Zanella Netto
2024-02-15 23:15 ` H.J. Lu [this message]
2024-02-16 6:23 ` Florian Weimer
2024-02-16 11:59 ` H.J. Lu
2024-02-16 12:18 ` Florian Weimer
2024-02-16 12:20 ` H.J. Lu
2024-02-16 12:37 ` H.J. Lu
2024-02-16 12:47 ` Adhemerval Zanella Netto
2024-02-16 12:58 ` H.J. Lu
2024-02-16 13:24 ` Adhemerval Zanella Netto
2024-02-16 14:25 ` H.J. Lu
2024-02-16 13:06 ` Florian Weimer
2024-02-16 13:24 ` H.J. Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMe9rOqichkHy=AhGZE4m=4QC6LpVQYgB3V9V5nSHn9oiXi3hg@mail.gmail.com' \
--to=hjl.tools@gmail.com \
--cc=adhemerval.zanella@linaro.org \
--cc=carlos@redhat.com \
--cc=fweimer@redhat.com \
--cc=libc-alpha@sourceware.org \
--cc=maskray@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).