From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1130.google.com (mail-yw1-x1130.google.com [IPv6:2607:f8b0:4864:20::1130]) by sourceware.org (Postfix) with ESMTPS id A14D1386D635 for ; Thu, 15 Feb 2024 23:15:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A14D1386D635 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A14D1386D635 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708038950; cv=none; b=gGyq7Ro3hucugknRFmeomQunfdRaVfpugflriojCXVoHED2gclEEHMnUeovDJeqZho+fj0SeDp8gY3tuJb7zFsh9wwjR8sxO95rnhewke/Ic1GEFpnpc6bGo2D+1ukus6mpsP1p+cRR4URQB3Tw2Itk8wSKa3BVgym+OHCRCknA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708038950; c=relaxed/simple; bh=z6aMT51d0Dqy26opQcsIdlsYQHrhjocL5xEWdKK4okI=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=gd0q+b+nDOs3bV+WMRu+meo4mStTfyadE/VYj456fehE5bpWQ/AF57295kjXTcw+/Z1buI/0iA0lzyb9cvBkuxmKA3FfShXM7SYcGsRCeiiNF/w0vr2L6Ho4M4/ezh85A0LaK6MsynxvNHaXgia0MHH9oVi3EBfqeU1b8rcOCL8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x1130.google.com with SMTP id 00721157ae682-6077444cb51so18997387b3.1 for ; Thu, 15 Feb 2024 15:15:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708038947; x=1708643747; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=IeyRJB7UnH9h53Enrh8Nnv/Q6deUgmQCy2uEILko/Dc=; b=HT1lwUeVqcRGLHesXV3S9EzLuIkgDAB9b8SlKrclHiW7bO+CiWZdYSGbgd/wpMEVca Co6eiPLl+L1S01ivOhe+kOB7KcHxlYWLvJ2cw0AVvkwZt7UI/3pGU7ChC47im50pTHSo vGbPFPPQAUwOYTozaZO/Ig7XL5SMEaKk2SEZJbg9pjPgD3/K3JmqSrnbMDdEzC45dqE4 8ZoxEDRJxya4XjwEBg9w+xJy78Xva3LuFyOCZDRE7d1Gug291d0GaAHGsN4yvX8JVIDx WRWvFhFZX74mu89XKruF6JOpgj99VkRYcxNtDYd4NjdZJZeAdmeNrQVrjeBIWy7jNfqy aSJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708038947; x=1708643747; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IeyRJB7UnH9h53Enrh8Nnv/Q6deUgmQCy2uEILko/Dc=; b=T0fe3OhiKJL2mBPyAvfhZQ/ZRZ+I+MhR88E4axjBW1YZ0wedADp7AGpV0m1O6g98Cb 4sdwiqy/YMbaVeF5JkMlLHOdAs8EdzpSB/mmwBGkZRr6gF0eP3Gr1vC/+lAxSpRzrQ+t fIrOTkqe3aFHU3MOilgolgzqGestZaMacLohLaGaoB0ZIlEfDrRiDoFtN4VwyB27AFUa 6uvFsryJEKm3GoiEdDdkLLQDeNmXBuI/cRFnJiD5vB77gvVGNo/eXUSyCylyCdHUHUw2 8ZVL/42vtuiki0EUl/X49BNNjq0W86nkiePoJ9EOZ4eqcLQo170utL114ClomUogQVcP QcGA== X-Gm-Message-State: AOJu0YyHZoqzVLrIi20zmXPmFfpp1zTui0ajlRMH9FmWHJbGoOmAVFiB jDqqoMMSq0ENcgRgUuTjXDGBkbro2Zy9g+HEQxwMCRjMyx47qgs/OelSx184arjP+xcDiTH+otO 2zfEyQPN0wJWAML9RCHIWqT0QuRs= X-Google-Smtp-Source: AGHT+IGOBEDXlo3GfUCXAk5pKzVKFlWFY4OKOp96ziLk2PXkV2/zcMSxhNo5aPxM+do6L1/wtMBYedaWaOOdy8kGh20= X-Received: by 2002:a0d:d514:0:b0:5ff:9175:b102 with SMTP id x20-20020a0dd514000000b005ff9175b102mr2248930ywd.25.1708038946599; Thu, 15 Feb 2024 15:15:46 -0800 (PST) MIME-Version: 1.0 References: <20240213041501.2494232-1-hjl.tools@gmail.com> <20240213041501.2494232-3-hjl.tools@gmail.com> In-Reply-To: From: "H.J. Lu" Date: Thu, 15 Feb 2024 15:15:10 -0800 Message-ID: Subject: Re: [PATCH v4 2/2] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers To: Adhemerval Zanella Netto Cc: libc-alpha@sourceware.org, Florian Weimer , Fangrui Song , "Carlos O'Donell" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3020.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Feb 15, 2024 at 3:05=E2=80=AFPM Adhemerval Zanella Netto wrote: > > > > On 13/02/24 01:15, H.J. Lu wrote: > > Compiler generates the following instruction sequence for GNU2 dynamic > > TLS access: > > > > leaq tls_var@TLSDESC(%rip), %rax > > call *tls_var@TLSCALL(%rax) > > > > or > > > > leal tls_var@TLSDESC(%ebx), %eax > > call *tls_var@TLSCALL(%eax) > > > > CALL instruction is transparent to compiler which assumes all registers= , > > except for EFLAGS and RAX/EAX, are unchanged after CALL. When > > _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow > > path. __tls_get_addr is a normal function which doesn't preserve any > > caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer > > caller-saved registers, but didn't preserve any other caller-saved > > registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE, > > XSAVE and XSAVEC to save and restore all caller-saved registers. This > > fixes BZ #31372. > > > > Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic) > > to optimize elf_machine_runtime_setup. > > --- > > elf/Makefile | 19 ++ > > elf/malloc-for-test.c | 32 ++++ > > elf/malloc-for-test.map | 6 + > > elf/tst-gnu2-tls2.c | 97 ++++++++++ > > elf/tst-gnu2-tls2.h | 26 +++ > > elf/tst-gnu2-tls2mod0.c | 28 +++ > > elf/tst-gnu2-tls2mod1.c | 28 +++ > > elf/tst-gnu2-tls2mod2.c | 28 +++ > > sysdeps/i386/dl-machine.h | 2 +- > > sysdeps/i386/dl-tlsdesc-dynamic.h | 187 +++++++++++++++++++ > > sysdeps/i386/dl-tlsdesc.S | 115 +++++------- > > sysdeps/i386/tst-gnu2-tls2.c | 5 + > > sysdeps/x86/Makefile | 7 +- > > sysdeps/x86/cpu-features.c | 56 +++++- > > sysdeps/x86/dl-procinfo.c | 16 ++ > > sysdeps/{x86_64 =3D> x86}/features-offsets.sym | 2 + > > sysdeps/x86/malloc-for-test.c | 33 ++++ > > sysdeps/x86/sysdep.h | 6 + > > sysdeps/x86_64/Makefile | 2 +- > > sysdeps/x86_64/dl-machine.h | 19 +- > > sysdeps/x86_64/dl-procinfo.c | 16 ++ > > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 166 ++++++++++++++++ > > sysdeps/x86_64/dl-tlsdesc.S | 108 ++++------- > > sysdeps/x86_64/dl-trampoline-save.h | 34 ++++ > > sysdeps/x86_64/dl-trampoline-state.h | 51 +++++ > > sysdeps/x86_64/dl-trampoline.S | 20 +- > > sysdeps/x86_64/dl-trampoline.h | 34 +--- > > 27 files changed, 930 insertions(+), 213 deletions(-) > > create mode 100644 elf/malloc-for-test.c > > create mode 100644 elf/malloc-for-test.map > > create mode 100644 elf/tst-gnu2-tls2.c > > create mode 100644 elf/tst-gnu2-tls2.h > > create mode 100644 elf/tst-gnu2-tls2mod0.c > > create mode 100644 elf/tst-gnu2-tls2mod1.c > > create mode 100644 elf/tst-gnu2-tls2mod2.c > > create mode 100644 sysdeps/i386/dl-tlsdesc-dynamic.h > > create mode 100644 sysdeps/i386/tst-gnu2-tls2.c > > rename sysdeps/{x86_64 =3D> x86}/features-offsets.sym (89%) > > create mode 100644 sysdeps/x86/malloc-for-test.c > > create mode 100644 sysdeps/x86_64/dl-tlsdesc-dynamic.h > > create mode 100644 sysdeps/x86_64/dl-trampoline-save.h > > create mode 100644 sysdeps/x86_64/dl-trampoline-state.h > > > > diff --git a/elf/Makefile b/elf/Makefile > > index 5d78b659ce..e0665d2007 100644 > > --- a/elf/Makefile > > +++ b/elf/Makefile > > @@ -424,6 +424,7 @@ tests +=3D \ > > tst-glibc-hwcaps-prepend \ > > tst-global1 \ > > tst-global2 \ > > + tst-gnu2-tls2 \ > > tst-initfinilazyfail \ > > tst-initorder \ > > tst-initorder2 \ > > @@ -699,6 +700,7 @@ modules-names +=3D \ > > libtracemod5-1 \ > > ltglobmod1 \ > > ltglobmod2 \ > > + malloc-for-test \ > > neededobj1 \ > > neededobj2 \ > > neededobj3 \ > > @@ -846,6 +848,9 @@ modules-names +=3D \ > > tst-filterobj-flt \ > > tst-finilazyfailmod \ > > tst-globalmod2 \ > > + tst-gnu2-tls2mod0 \ > > + tst-gnu2-tls2mod1 \ > > + tst-gnu2-tls2mod2 \ > > tst-initlazyfailmod \ > > tst-initorder2a \ > > tst-initorder2b \ > > @@ -3044,8 +3049,22 @@ $(objpfx)tst-tlsgap.out: \ > > $(objpfx)tst-tlsgap-mod0.so \ > > $(objpfx)tst-tlsgap-mod1.so \ > > $(objpfx)tst-tlsgap-mod2.so > > + > > +$(objpfx)tst-gnu2-tls2: \ > > + $(shared-thread-library) \ > > + $(objpfx)malloc-for-test.so > > +$(objpfx)tst-gnu2-tls2.out: \ > > + $(objpfx)tst-gnu2-tls2mod0.so \ > > + $(objpfx)tst-gnu2-tls2mod1.so \ > > + $(objpfx)tst-gnu2-tls2mod2.so > > + > > +LDFLAGS-malloc-for-test.so +=3D -Wl,--version-script=3Dmalloc-for-test= .map > > + > > ifeq (yes,$(have-mtls-dialect-gnu2)) > > CFLAGS-tst-tlsgap-mod0.c +=3D -mtls-dialect=3Dgnu2 > > CFLAGS-tst-tlsgap-mod1.c +=3D -mtls-dialect=3Dgnu2 > > CFLAGS-tst-tlsgap-mod2.c +=3D -mtls-dialect=3Dgnu2 > > +CFLAGS-tst-gnu2-tls2mod0.c +=3D -mtls-dialect=3Dgnu2 > > +CFLAGS-tst-gnu2-tls2mod1.c +=3D -mtls-dialect=3Dgnu2 > > +CFLAGS-tst-gnu2-tls2mod2.c +=3D -mtls-dialect=3Dgnu2 > > endif > > diff --git a/elf/malloc-for-test.c b/elf/malloc-for-test.c > > new file mode 100644 > > index 0000000000..1bec69eda7 > > --- /dev/null > > +++ b/elf/malloc-for-test.c > > @@ -0,0 +1,32 @@ > > +/* A malloc for intercept test. > > + Copyright (C) 2024 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful= , > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + . */ > > + > > +#include > > + > > +extern void * __libc_malloc (size_t); > > + > > +#ifndef PREPARE_MALLOC > > +# define PREPARE_MALLOC() > > +#endif > > + > > +void * > > +malloc (size_t n) > > +{ > > + PREPARE_MALLOC (); > > It is not clear to me how exactly this adds proper coverage without > actually set the affected registers *before* the TLS variable access > and later check its value has not changed. In fact, on x86_64 it is > passing on my system even without the test actually work as expected > (see below). It depends on the compiler version. Newer GCC will generate vector load/store on x86-64 to copy a structure. > And I think we also need to add arch-specific rules to build the test > with only the base ABI, and add extra macros to clobber and check > the expected registers that _dl_tlsdesc_dynamic should save/restore. > > It would be slightly more trick on ABIs that already have a large set > or register (like x86_64-v1 and armv8-a). > > > + return __libc_malloc (n); > > +} > > diff --git a/elf/malloc-for-test.map b/elf/malloc-for-test.map > > new file mode 100644 > > index 0000000000..8437cf4346 > > --- /dev/null > > +++ b/elf/malloc-for-test.map > > @@ -0,0 +1,6 @@ > > +GLIBC_2.0 { > > You need to use the correct version to override the malloc: > > $ gdb --args tst-gnu2-tls2 --direct > [...] > (gdb) b apply_tls > (gdb) r > Thread 2 "tst-gnu2-tls2" hit Breakpoint 1, apply_tls (p=3D0x7ffff7bfee80)= at tst-gnu2-tls2mod1.c:25 > 25 { > (gdb) b malloc > Breakpoint 2 at 0x7ffff7ca8ad0: malloc. (3 locations) > (gdb) c > Continuing. > > Thread 2 "test-gnu2-tls2" hit Breakpoint 2, __GI___libc_malloc (bytes=3D3= 200) at malloc.c:3294 > 3294 { > (gdb) bt > #0 __GI___libc_malloc (bytes=3D3200) at malloc.c:3294 > #1 0x00007ffff7fda3de in malloc (size=3D) at ../include/rt= ld-malloc.h:56 > #2 allocate_dtv_entry (size=3D, alignment=3D16) at ../elf/= dl-tls.c:679 > #3 allocate_and_init (map=3D0x7ffff0000bd0) at ../elf/dl-tls.c:704 > #4 tls_get_addr_tail (ti=3D0x7ffff0001240, dtv=3D0x55555555e340, the_map= =3D0x7ffff0000bd0) at ../elf/dl-tls.c:904 > #5 0x00007ffff7fdda2e in _dl_tlsdesc_dynamic_xsavec () at ../sysdeps/x86_= 64/dl-tlsdesc-dynamic.h:135 > #6 0x00007ffff7fb0155 in apply_tls (p=3D0xc80) at tst-gnu2-tls2mod1.c:27 > #7 0x0000555555556965 in access_mod (i=3D1, sym=3D0x555555559022 "apply_t= ls") at tst-gnu2-tls2.c:58 > #8 start (arg=3D0x0) at tst-gnu2-tls2.c:73 > #9 0x00007ffff7c96a82 in start_thread (arg=3D) at pthread_= create.c:447 > #10 0x00007ffff7d1b13c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/= clone3.S:78 > > By using GLIBC_2.2.5 instead of GLIBC_2.0, now I see: > > (gdb) bt > #0 malloc (n=3D3200) at ../elf/malloc-for-test.c:29 > #1 0x00007ffff7fda3de in malloc (size=3D) at ../include/rt= ld-malloc.h:56 > #2 allocate_dtv_entry (size=3D, alignment=3D16) at ../elf/= dl-tls.c:679 > #3 allocate_and_init (map=3D0x7ffff0000bd0) at ../elf/dl-tls.c:704 > #4 tls_get_addr_tail (ti=3D0x7ffff0001240, dtv=3D0x55555555e340, the_map= =3D0x7ffff0000bd0) at ../elf/dl-tls.c:904 > #5 0x00007ffff7fdda2e in _dl_tlsdesc_dynamic_xsavec () at ../sysdeps/x86_= 64/dl-tlsdesc-dynamic.h:135 > #6 0x00007ffff7fb0155 in apply_tls (p=3D0xc80) at tst-gnu2-tls2mod1.c:27 > #7 0x0000555555556965 in access_mod (i=3D1, sym=3D0x555555559022 "apply_t= ls") at tst-gnu2-tls2.c:58 > #8 start (arg=3D0x0) at tst-gnu2-tls2.c:73 > #9 0x00007ffff7c96a82 in start_thread (arg=3D) at pthread_= create.c:447 > #10 0x00007ffff7d1b13c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/= clone3.S:78 > > So you will need either to get the correct version or parameterize > the map file for each ABI (which is just a handful that actually > support TLS descriptors). An arch may need a different version map. > Which made me realize that preloading malloc won't interpose ld > implementation since 3a0ecccb599a6b1ad4b149dc569c0080e92d057b unless > the malloc replacement also exports the malloc with the proper ABI > version. True. > I think it is an unexpected change from BZ#25486, but at the same > time I don't think this is a bad one. We will need to add all the > machinery required to save/restore the caller-saved register for > each ABI that supports TLS descriptors because even glibc malloc > might call internal functions that might use such ABI extension > (for instance mem* and str* functions). > > All this made me realize that the TLS descriptor slow path > is *far* from maintainable, as we discussed on the weekly call. > And I think we *should* move away from it. Some issues: > > * To properly support _dl_tlsdesc_dynamic on ABI with vector > extensions, it would either need to pessimize code generation > for TLS access (so the compiler would add all the required > instructions to save/restore the caller-saved registers) or > move the complexity to libc. > > * The latter would make the libc to require either a quite complex > _dl_tlsdesc_dynamic, which would either need to probe hardware > support to provide the multiple code paths or add the support > through iFUNC. True. > * ARM also has the issue and I think it has not seen this issue > because gnu2 is not the default TLS ABI and gcc likely won't change > in nearby future. And to properly fix it, it would require to add > something like what you are doing for x86 to support the multiple > vector extensions (VFP, VFP3, NEON). True. > * Loongsong is finishing its TLSDESC ABI support on gcc/binutils, and > most likely would require quite similar support to proper support > LSX, LASX. > > * I think RISC-V would also have a similar issue for its vector ABI. > > So I think we really should reevaluate the BZ#16133 fix that we reverted > on 2.20 [1] [2]. if I recall correctly (I need to go through again my > notes about this issue), two main issues triggered the revert: > > 1. It broke LSAN; > 2. Lazy allocation is an explicit feature [3]. > > For 1. I think it should be doable to fix on sanitizer, either by adding > more hacks to get the correct TLS size or by providing a proper ABI. > > However for 2. I think it is past time that we accept that lazy allocatio= n > was a nice idea, but it adds a *lot* of maintainability burden that > it is not paying off. > > [1] https://sourceware.org/git/?p=3Dglibc.git;a=3Dcommitdiff;h=3D1f33d36a= 8a9e78c81bed59b47f260723f56bb7e6 > [2] https://sourceware.org/legacy-ml/libc-alpha/2013-09/msg00721.html > [3] https://sourceware.org/legacy-ml/libc-alpha/2014-01/msg00287.html > We need a short-team fix before the slow path is removed. --=20 H.J.