From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1134.google.com (mail-yw1-x1134.google.com [IPv6:2607:f8b0:4864:20::1134]) by sourceware.org (Postfix) with ESMTPS id B8FD8385829E for ; Fri, 16 Feb 2024 12:16:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B8FD8385829E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B8FD8385829E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1134 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708085778; cv=none; b=qqa0SNUdtSRQdJ2xnRZ/OmqLYRSWXa0RbD0rNp48jxJyddy2T3b3M1GPhjZ88jqqMXURuUTSpo9JHSiUb2umx97oeHMkhqyx8mrSYb7On5sBw+ChNnwGKkXMk2vj7YFu7NYVxcc4aDhKiFoIEdcgEC5GVUoEQjTyThIezECE7CU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708085778; c=relaxed/simple; bh=INNP2smInGs79pcYl2of020kMCVyYCvlUuLaupztwaI=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=vSs9wTluzumSbQdPew2hyQvBTjawaJHpF1urf5/UuQIO5x7ZPoer5USz8zVPy4Mi477Xazq+Lj4Do4X6hzt7rtrl6IZTl3v3J2huVVwdfxBuY51jqd+4lhJDF9QNf4mvrY7/3eEZGrc9aWhEPHycBqqRUXV6o0VV4X+oJ5FBs0M= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x1134.google.com with SMTP id 00721157ae682-607f8482b88so5892167b3.0 for ; Fri, 16 Feb 2024 04:16:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708085773; x=1708690573; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=CzwwHKwylNQhYRE3+3JZLBebJ7GlsYazJVlJlPitMvo=; b=JUAxezIilijTwGR9gO8+oHF3VfCaFhR6MalBPOu4ybnaK3mtJl8X2SqbL1TrMpFI/H QwP+2GB79N5YFSJsNYGjOLTAzV65O9fUjnx1oMtaj/VwM31c1Po2aPAZlpMnu7DWPvrC k2mSbQSbkx12Fio4bGkMtUUdh5TThwsAtO4TYMly4OJLWIUU5DSVXsFk/kNwRKk5XDSL MiEmQUanulYfBVvhK2cL2Ms/uO/m4/u5NIpFc/Z8AIAOMbgzHjQNZvPfXFEsxLY2KzY4 nVK5My9CpDCMz7qGtj4GTjUaxbB0/I+kPCzz6YOZMfSmqbtyn9ulDA3StbP1+hzViqz6 aE7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708085773; x=1708690573; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CzwwHKwylNQhYRE3+3JZLBebJ7GlsYazJVlJlPitMvo=; b=XJhoAI4LqNs3EXdIQmzzTIem8B7EEYZW7Qm0E7On8Hg19QkLWTUrKlpGDzKRIxwe2/ OUTNyTKu2CjbUvCTeAityZRGkc02fv0TzdoiRmB1vBydkZdrxoIRG+NjIuLIK1RglZ85 XoP6FRh0xaYtK2TBCFLorvXqi/2Hlezyz3I36RwQIh4o8qyJ5nfzEWnzwj/ch+DwVrJu ZPi7G37pa5ctJJYIhcZLjl7BFOVQgyGkXo7cc0bw0fprFAp+ju5oMLgD3opBxGuvtU3w 0yyKVe5GbFxbLnlbCcDxFN+vAaEnMYNJK5F0fGbQc0vGOcH9xKMoRM4/aNW6J8hwXqdi ywIQ== X-Gm-Message-State: AOJu0YyhwhMa50MbFmRZ+umcJjOLso1/dD55hB1HhWDQFk6G9PI+duw6 wfTCKKiL6vYd8mVUlcvD+Iq2o/iSgrGn+pdAJLJpVtDKPIeJitNqOcsqMKy3EB/nP1tKXJh+qg7 xMHeC4zKKvH9tC0qzKrlfHelhD5E= X-Google-Smtp-Source: AGHT+IEjbu4U68SjNRrnDavT707cnUQVMgDem+1/qhhkAnIC/OGT/dUVppjUuKul6IqICgEGbQykuQfWI8O1e+KCw+U= X-Received: by 2002:a81:72c2:0:b0:607:fe40:f413 with SMTP id n185-20020a8172c2000000b00607fe40f413mr1104604ywc.0.1708085772781; Fri, 16 Feb 2024 04:16:12 -0800 (PST) MIME-Version: 1.0 References: <20240216002114.2255406-1-hjl.tools@gmail.com> <20240216002114.2255406-3-hjl.tools@gmail.com> In-Reply-To: From: "H.J. Lu" Date: Fri, 16 Feb 2024 04:15:36 -0800 Message-ID: Subject: Re: [PATCH v7 2/2] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers To: Noah Goldstein Cc: libc-alpha@sourceware.org, fweimer@redhat.com, adhemerval.zanella@linaro.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3020.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Feb 16, 2024 at 12:05=E2=80=AFAM Noah Goldstein wrote: > > On Fri, Feb 16, 2024 at 12:21=E2=80=AFAM H.J. Lu wr= ote: > > > > Compiler generates the following instruction sequence for GNU2 dynamic > > TLS access: > > > > leaq tls_var@TLSDESC(%rip), %rax > > call *tls_var@TLSCALL(%rax) > > > > or > > > > leal tls_var@TLSDESC(%ebx), %eax > > call *tls_var@TLSCALL(%eax) > > > > CALL instruction is transparent to compiler which assumes all registers= , > > except for EFLAGS and RAX/EAX, are unchanged after CALL. When > > _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow > > path. __tls_get_addr is a normal function which doesn't preserve any > > caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer > > caller-saved registers, but didn't preserve any other caller-saved > > registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE, > > XSAVE and XSAVEC to save and restore all caller-saved registers. This > > fixes BZ #31372. > > > > Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic) > > to optimize elf_machine_runtime_setup. > > --- > > elf/Makefile | 36 +++- > > elf/malloc-for-test.c | 32 ++++ > > elf/malloc-for-test.map.in | 8 + > > elf/tst-gnu2-tls2.c | 97 ++++++++++ > > elf/tst-gnu2-tls2.h | 26 +++ > > elf/tst-gnu2-tls2mod0.c | 28 +++ > > elf/tst-gnu2-tls2mod1.c | 28 +++ > > elf/tst-gnu2-tls2mod2.c | 28 +++ > > sysdeps/i386/dl-machine.h | 2 +- > > sysdeps/i386/dl-tlsdesc-dynamic.h | 190 +++++++++++++++++++ > > sysdeps/i386/dl-tlsdesc.S | 115 +++++------ > > sysdeps/i386/tst-gnu2-tls2.c | 5 + > > sysdeps/x86/Makefile | 7 +- > > sysdeps/x86/cpu-features.c | 56 +++++- > > sysdeps/x86/dl-procinfo.c | 16 ++ > > sysdeps/{x86_64 =3D> x86}/features-offsets.sym | 2 + > > sysdeps/x86/malloc-for-test.c | 33 ++++ > > sysdeps/x86/sysdep.h | 6 + > > sysdeps/x86_64/Makefile | 2 +- > > sysdeps/x86_64/dl-machine.h | 19 +- > > sysdeps/x86_64/dl-procinfo.c | 16 ++ > > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 166 ++++++++++++++++ > > sysdeps/x86_64/dl-tlsdesc.S | 108 ++++------- > > sysdeps/x86_64/dl-trampoline-save.h | 34 ++++ > > sysdeps/x86_64/dl-trampoline-state.h | 51 +++++ > > sysdeps/x86_64/dl-trampoline.S | 20 +- > > sysdeps/x86_64/dl-trampoline.h | 34 +--- > > 27 files changed, 950 insertions(+), 215 deletions(-) > > create mode 100644 elf/malloc-for-test.c > > create mode 100644 elf/malloc-for-test.map.in > > create mode 100644 elf/tst-gnu2-tls2.c > > create mode 100644 elf/tst-gnu2-tls2.h > > create mode 100644 elf/tst-gnu2-tls2mod0.c > > create mode 100644 elf/tst-gnu2-tls2mod1.c > > create mode 100644 elf/tst-gnu2-tls2mod2.c > > create mode 100644 sysdeps/i386/dl-tlsdesc-dynamic.h > > create mode 100644 sysdeps/i386/tst-gnu2-tls2.c > > rename sysdeps/{x86_64 =3D> x86}/features-offsets.sym (89%) > > create mode 100644 sysdeps/x86/malloc-for-test.c > > create mode 100644 sysdeps/x86_64/dl-tlsdesc-dynamic.h > > create mode 100644 sysdeps/x86_64/dl-trampoline-save.h > > create mode 100644 sysdeps/x86_64/dl-trampoline-state.h > > ... > > diff --git a/sysdeps/i386/dl-tlsdesc-dynamic.h b/sysdeps/i386/dl-tlsdes= c-dynamic.h > > new file mode 100644 > > index 0000000000..c857c68c55 > > --- /dev/null > > +++ b/sysdeps/i386/dl-tlsdesc-dynamic.h > > @@ -0,0 +1,190 @@ > > +/* Thread-local storage handling in the ELF dynamic linker. i386 vers= ion. > > + Copyright (C) 2004-2024 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful= , > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + . */ > > + > > +#undef REGISTER_SAVE_AREA > > + > > +#if !defined USE_FNSAVE && (STATE_SAVE_ALIGNMENT % 16) !=3D 0 > > +# error STATE_SAVE_ALIGNMENT must be multiple of 16 > > +#endif > > + > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK > > +# ifdef USE_FNSAVE > > +# error USE_FNSAVE shouldn't be defined > > +# endif > > +# ifdef USE_FXSAVE > > +/* Use fxsave to save all registers. */ > > +# define REGISTER_SAVE_AREA 512 > > +# endif > > +#else > > +# ifdef USE_FNSAVE > > +/* Use fnsave to save x87 FPU stack registers. */ > > +# define REGISTER_SAVE_AREA 108 > > +# else > > +# ifndef USE_FXSAVE > > +# error USE_FXSAVE must be defined > > +# endif > > +/* Use fxsave to save all registers. Add 12 bytes to align the stack > > + to 16 bytes. */ > > +# define REGISTER_SAVE_AREA (512 + 12) > > +# endif > > +#endif > > + > > + .hidden _dl_tlsdesc_dynamic > > + .global _dl_tlsdesc_dynamic > > + .type _dl_tlsdesc_dynamic,@function > > + > > + /* This function is used for symbols that need dynamic TLS. > > + > nit: comment start at line start. This is copied from the original dl-tlsdesc.S. I prefer to leave it alone. > > + %eax points to the TLS descriptor, such that 0(%eax) points to > > + _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct > > + tlsdesc_dynamic_arg object. It must return in %eax the offset > > + between the thread pointer and the object denoted by the > > + argument, without clobbering any registers. > > + > > + The assembly code that follows is a rendition of the following > > + C code, hand-optimized a little bit. > > + > > +ptrdiff_t > > +__attribute__ ((__regparm__ (1))) > > +_dl_tlsdesc_dynamic (struct tlsdesc *tdp) > > +{ > > + struct tlsdesc_dynamic_arg *td =3D tdp->arg; > > + dtv_t *dtv =3D *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET); > > + if (__builtin_expect (td->gen_count <=3D dtv[0].counter > > + && (dtv[td->tlsinfo.ti_module].pointer.val > > + !=3D TLS_DTV_UNALLOCATED), > > + 1)) > > + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_off= set > > + - __thread_pointer; > > + > > + return ___tls_get_addr (&td->tlsinfo) - __thread_pointer; > > +} > > +*/ > > + cfi_startproc > > + .align 16 > > +_dl_tlsdesc_dynamic: > > + /* Like all TLS resolvers, preserve call-clobbered registers. > > + We need two scratch regs anyway. */ > > + subl $32, %esp > > + cfi_adjust_cfa_offset (32) > > + movl %ecx, 20(%esp) > > + movl %edx, 24(%esp) > > + movl TLSDESC_ARG(%eax), %eax > > + movl %gs:DTV_OFFSET, %edx > > + movl TLSDESC_GEN_COUNT(%eax), %ecx > > + cmpl (%edx), %ecx > > + ja 2f > > + movl TLSDESC_MODID(%eax), %ecx > > + movl (%edx,%ecx,8), %edx > maybe 8 -> TLSDESC_DTV_SIZE? This is copied from the original dl-tlsdesc.S. I prefer to leave it alone. > > + cmpl $-1, %edx > -1 -> TLS_DTV_UNALLOCATED This is copied from the original dl-tlsdesc.S. I prefer to leave it alone. > > + je 2f > > + movl TLSDESC_MODOFF(%eax), %eax > > + addl %edx, %eax > > +1: > > + movl 20(%esp), %ecx > > + subl %gs:0, %eax > > + movl 24(%esp), %edx > > + addl $32, %esp > > + cfi_adjust_cfa_offset (-32) > > + ret > > + .p2align 4,,7 > > +2: > > + cfi_adjust_cfa_offset (32) > I still don't understand what this cfi is for? > You already have `cfi_adjust_cfa_offset (32)` above right after > the `subl $32, %esp` There are subl $32, %esp cfi_adjust_cfa_offset (32) ... addl $32, %esp cfi_adjust_cfa_offset (-32) ret .p2align 4,,7 2: What is the CFA at the label 2 for GDB? GDB only consumes CFI directives. > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK > > + movl %ebx, -28(%esp) > > + movl %esp, %ebx > > + cfi_def_cfa_register(%ebx) > > + and $-STATE_SAVE_ALIGNMENT, %esp > > +#endif > > +#ifdef REGISTER_SAVE_AREA > > + subl $REGISTER_SAVE_AREA, %esp > > +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK > > + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA) > > +# endif > > +#else > > +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK > > +# error DL_RUNTIME_RESOLVE_REALIGN_STACK must be true > > +# endif > > + # Allocate stack space of the required size to save the state. > nit: comment with /* or // > likewise below. Will fix them. > > + LOAD_PIC_REG (cx) > > + subl RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_S= IZE_OFFSET+_rtld_local_ro@GOTOFF(%ecx), %esp > > +#endif > > +#ifdef USE_FNSAVE > > + fnsave (%esp) > > +#elif defined USE_FXSAVE > > + fxsave (%esp) > > +#else > > + # Save the argument for ___tls_get_addr in EAX. > > + movl %eax, %ecx > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax > > + xorl %edx, %edx > > + # Clear the XSAVE Header. > > +# ifdef USE_XSAVE > > + movl %edx, (512)(%esp) > > + movl %edx, (512 + 4 * 1)(%esp) > > + movl %edx, (512 + 4 * 2)(%esp) > > + movl %edx, (512 + 4 * 3)(%esp) > > +# endif > > + movl %edx, (512 + 4 * 4)(%esp) > > + movl %edx, (512 + 4 * 5)(%esp) > > + movl %edx, (512 + 4 * 6)(%esp) > > + movl %edx, (512 + 4 * 7)(%esp) > > + movl %edx, (512 + 4 * 8)(%esp) > > + movl %edx, (512 + 4 * 9)(%esp) > > + movl %edx, (512 + 4 * 10)(%esp) > > + movl %edx, (512 + 4 * 11)(%esp) > > + movl %edx, (512 + 4 * 12)(%esp) > > + movl %edx, (512 + 4 * 13)(%esp) > > + movl %edx, (512 + 4 * 14)(%esp) > > + movl %edx, (512 + 4 * 15)(%esp) > > +# ifdef USE_XSAVE > > + xsave (%esp) > > +# else > > + xsavec (%esp) > > +# endif > > + # Restore the argument for ___tls_get_addr in EAX. > > + movl %ecx, %eax > > +#endif > > + call HIDDEN_JUMPTARGET (___tls_get_addr) > > + # Get register content back. > > +#ifdef USE_FNSAVE > > + frstor (%esp) > > +#elif defined USE_FXSAVE > > + fxrstor (%esp) > > +#else > > + /* Save and retore ___tls_get_addr return value stored in EAX. = */ > > + movl %eax, %ecx > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax > > + xorl %edx, %edx > > + xrstor (%esp) > > + movl %ecx, %eax > > +#endif > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK > > + mov %ebx, %esp > > + cfi_def_cfa_register(%esp) > > + movl -28(%esp), %ebx > > + cfi_restore(%ebx) > > +#else > > + addl $REGISTER_SAVE_AREA, %esp > > + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA) > > +#endif > > + jmp 1b > > + cfi_endproc > > + .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic > > + > > +#undef STATE_SAVE_ALIGNMENT > > diff --git a/sysdeps/i386/dl-tlsdesc.S b/sysdeps/i386/dl-tlsdesc.S > > index 90d93caa0c..f002feee56 100644 > > --- a/sysdeps/i386/dl-tlsdesc.S > > +++ b/sysdeps/i386/dl-tlsdesc.S > > @@ -18,8 +18,27 @@ > > > > #include > > #include > > +#include > > +#include > > #include "tlsdesc.h" > > > > +#ifndef DL_STACK_ALIGNMENT > > +/* Due to GCC bug: > > + > > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D58066 > > + > > + __tls_get_addr may be called with 4-byte stack alignment. Although > > + this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume > > + that stack will be always aligned at 16 bytes. */ > > +# define DL_STACK_ALIGNMENT 4 > > +#endif > > + > > +/* True if _dl_tlsdesc_dynamic should align stack for STATE_SAVE or al= ign > > + stack to MINIMUM_ALIGNMENT bytes before calling ___tls_get_addr. *= / > > +#define DL_RUNTIME_RESOLVE_REALIGN_STACK \ > > + (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \ > > + || MINIMUM_ALIGNMENT > DL_STACK_ALIGNMENT) > > + > > .text > > > > /* This function is used to compute the TP offset for symbols in > > @@ -65,69 +84,35 @@ _dl_tlsdesc_undefweak: > > .size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak > > > > #ifdef SHARED > > - .hidden _dl_tlsdesc_dynamic > > - .global _dl_tlsdesc_dynamic > > - .type _dl_tlsdesc_dynamic,@function > > - > > - /* This function is used for symbols that need dynamic TLS. > > - > > - %eax points to the TLS descriptor, such that 0(%eax) points to > > - _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct > > - tlsdesc_dynamic_arg object. It must return in %eax the offset > > - between the thread pointer and the object denoted by the > > - argument, without clobbering any registers. > > - > > - The assembly code that follows is a rendition of the following > > - C code, hand-optimized a little bit. > > - > > -ptrdiff_t > > -__attribute__ ((__regparm__ (1))) > > -_dl_tlsdesc_dynamic (struct tlsdesc *tdp) > > -{ > > - struct tlsdesc_dynamic_arg *td =3D tdp->arg; > > - dtv_t *dtv =3D *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET); > > - if (__builtin_expect (td->gen_count <=3D dtv[0].counter > > - && (dtv[td->tlsinfo.ti_module].pointer.val > > - !=3D TLS_DTV_UNALLOCATED), > > - 1)) > > - return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_off= set > > - - __thread_pointer; > > - > > - return ___tls_get_addr (&td->tlsinfo) - __thread_pointer; > > -} > > -*/ > > - cfi_startproc > > - .align 16 > > -_dl_tlsdesc_dynamic: > > - /* Like all TLS resolvers, preserve call-clobbered registers. > > - We need two scratch regs anyway. */ > > - subl $28, %esp > > - cfi_adjust_cfa_offset (28) > > - movl %ecx, 20(%esp) > > - movl %edx, 24(%esp) > > - movl TLSDESC_ARG(%eax), %eax > > - movl %gs:DTV_OFFSET, %edx > > - movl TLSDESC_GEN_COUNT(%eax), %ecx > > - cmpl (%edx), %ecx > > - ja .Lslow > > - movl TLSDESC_MODID(%eax), %ecx > > - movl (%edx,%ecx,8), %edx > > - cmpl $-1, %edx > > - je .Lslow > > - movl TLSDESC_MODOFF(%eax), %eax > > - addl %edx, %eax > > -.Lret: > > - movl 20(%esp), %ecx > > - subl %gs:0, %eax > > - movl 24(%esp), %edx > > - addl $28, %esp > > - cfi_adjust_cfa_offset (-28) > > - ret > > - .p2align 4,,7 > > -.Lslow: > > - cfi_adjust_cfa_offset (28) > > - call HIDDEN_JUMPTARGET (___tls_get_addr) > > - jmp .Lret > > - cfi_endproc > > - .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic > > +# define USE_FNSAVE > > +# define MINIMUM_ALIGNMENT 4 > > +# define STATE_SAVE_ALIGNMENT 4 > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fnsave > > +# include "dl-tlsdesc-dynamic.h" > > +# undef _dl_tlsdesc_dynamic > > +# undef MINIMUM_ALIGNMENT > > +# undef USE_FNSAVE > > + > > +# define MINIMUM_ALIGNMENT 16 > > + > > +# define USE_FXSAVE > > +# define STATE_SAVE_ALIGNMENT 16 > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave > > +# include "dl-tlsdesc-dynamic.h" > > +# undef _dl_tlsdesc_dynamic > > +# undef USE_FXSAVE > > + > > +# define USE_XSAVE > > +# define STATE_SAVE_ALIGNMENT 64 > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave > > +# include "dl-tlsdesc-dynamic.h" > > +# undef _dl_tlsdesc_dynamic > > +# undef USE_XSAVE > > + > > +# define USE_XSAVEC > > +# define STATE_SAVE_ALIGNMENT 64 > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec > > +# include "dl-tlsdesc-dynamic.h" > > +# undef _dl_tlsdesc_dynamic > > +# undef USE_XSAVEC > > #endif /* SHARED */ > > diff --git a/sysdeps/i386/tst-gnu2-tls2.c b/sysdeps/i386/tst-gnu2-tls2.= c > > new file mode 100644 > > index 0000000000..92e7fbff89 > > --- /dev/null > > +++ b/sysdeps/i386/tst-gnu2-tls2.c > > @@ -0,0 +1,5 @@ > > +#include > > + > > +#define IS_SUPPORTED() CPU_FEATURE_ACTIVE (SSE2) > > + > > +#include > > diff --git a/sysdeps/x86/Makefile b/sysdeps/x86/Makefile > > index 73b29cc78c..581086305d 100644 > > --- a/sysdeps/x86/Makefile > > +++ b/sysdeps/x86/Makefile > > @@ -1,5 +1,5 @@ > > ifeq ($(subdir),csu) > > -gen-as-const-headers +=3D cpu-features-offsets.sym > > +gen-as-const-headers +=3D cpu-features-offsets.sym features-offsets.sy= m > > endif > > > > ifeq ($(subdir),elf) > > @@ -86,6 +86,11 @@ endif > > tst-ifunc-isa-2-ENV =3D GLIBC_TUNABLES=3Dglibc.cpu.hwcaps=3D-SSE4_2,-A= VX,-AVX2,-AVX512F > > tst-ifunc-isa-2-static-ENV =3D $(tst-ifunc-isa-2-ENV) > > tst-hwcap-tunables-ARGS =3D -- $(host-test-program-cmd) > > + > > +CFLAGS-malloc-for-test.c +=3D -msse2 > > +CFLAGS-tst-gnu2-tls2mod0.c +=3D -msse2 -mtune=3Dhaswell > > +CFLAGS-tst-gnu2-tls2mod1.c +=3D -msse2 -mtune=3Dhaswell > > +CFLAGS-tst-gnu2-tls2mod2.c +=3D -msse2 -mtune=3Dhaswell > > endif > > > > ifeq ($(subdir),math) > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > > index 25e6622a79..835113b42f 100644 > > --- a/sysdeps/x86/cpu-features.c > > +++ b/sysdeps/x86/cpu-features.c > > @@ -27,8 +27,13 @@ > > extern void TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *) > > attribute_hidden; > > > > -#if defined SHARED && defined __x86_64__ > > -# include > > +#if defined SHARED > > +extern void _dl_tlsdesc_dynamic_fxsave (void) attribute_hidden; > > +extern void _dl_tlsdesc_dynamic_xsave (void) attribute_hidden; > > +extern void _dl_tlsdesc_dynamic_xsavec (void) attribute_hidden; > > + > > +# ifdef __x86_64__ > > +# include > > > > static void > > TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp) > > @@ -47,6 +52,15 @@ TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *v= alp) > > : plt_rewrite_jmp); > > } > > } > > +# else > > +extern void _dl_tlsdesc_dynamic_fnsave (void) attribute_hidden; > > +# endif > > +#endif > > + > > +#ifdef __x86_64__ > > +extern void _dl_runtime_resolve_fxsave (void) attribute_hidden; > > +extern void _dl_runtime_resolve_xsave (void) attribute_hidden; > > +extern void _dl_runtime_resolve_xsavec (void) attribute_hidden; > > #endif > > > > #ifdef __LP64__ > > @@ -1130,6 +1144,44 @@ no_cpuid: > > TUNABLE_CALLBACK (set_x86_shstk)); > > #endif > > > > + if (GLRO(dl_x86_cpu_features).xsave_state_size !=3D 0) > > + { > > + if (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC)) > > + { > > +#ifdef __x86_64__ > > + GLRO(dl_x86_64_runtime_resolve) =3D _dl_runtime_resolve_xsave= c; > > +#endif > > +#ifdef SHARED > > + GLRO(dl_x86_tlsdesc_dynamic) =3D _dl_tlsdesc_dynamic_xsavec; > > +#endif > > + } > > + else > > + { > > +#ifdef __x86_64__ > > + GLRO(dl_x86_64_runtime_resolve) =3D _dl_runtime_resolve_xsave= ; > > +#endif > > +#ifdef SHARED > > + GLRO(dl_x86_tlsdesc_dynamic) =3D _dl_tlsdesc_dynamic_xsave; > > +#endif > > + } > > + } > > + else > > + { > > +#ifdef __x86_64__ > > + GLRO(dl_x86_64_runtime_resolve) =3D _dl_runtime_resolve_fxsave; > > +# ifdef SHARED > > + GLRO(dl_x86_tlsdesc_dynamic) =3D _dl_tlsdesc_dynamic_fxsave; > > +# endif > > +#else > > +# ifdef SHARED > > + if (CPU_FEATURE_USABLE_P (cpu_features, FXSR)) > > + GLRO(dl_x86_tlsdesc_dynamic) =3D _dl_tlsdesc_dynamic_fxsave; > > + else > > + GLRO(dl_x86_tlsdesc_dynamic) =3D _dl_tlsdesc_dynamic_fnsave; > > +# endif > > +#endif > > + } > > + > > #ifdef SHARED > > # ifdef __x86_64__ > > TUNABLE_GET (plt_rewrite, tunable_val_t *, > > diff --git a/sysdeps/x86/dl-procinfo.c b/sysdeps/x86/dl-procinfo.c > > index ee957b4d70..5920d4b320 100644 > > --- a/sysdeps/x86/dl-procinfo.c > > +++ b/sysdeps/x86/dl-procinfo.c > > @@ -86,3 +86,19 @@ PROCINFO_CLASS const char _dl_x86_platforms[4][9] > > #else > > , > > #endif > > + > > +#if defined SHARED && !IS_IN (ldconfig) > > +# if !defined PROCINFO_DECL > > + ._dl_x86_tlsdesc_dynamic > > +# else > > +PROCINFO_CLASS void * _dl_x86_tlsdesc_dynamic > > +# endif > > +# ifndef PROCINFO_DECL > > +=3D NULL > > +# endif > > +# ifdef PROCINFO_DECL > > +; > > +# else > > +, > > +# endif > > +#endif > > diff --git a/sysdeps/x86_64/features-offsets.sym b/sysdeps/x86/features= -offsets.sym > > similarity index 89% > > rename from sysdeps/x86_64/features-offsets.sym > > rename to sysdeps/x86/features-offsets.sym > > index 9e4be3393a..77e990c705 100644 > > --- a/sysdeps/x86_64/features-offsets.sym > > +++ b/sysdeps/x86/features-offsets.sym > > @@ -3,4 +3,6 @@ > > #include > > > > RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET offsetof (struct rtld_global= _ro, _dl_x86_cpu_features) > > +#ifdef __x86_64__ > > RTLD_GLOBAL_DL_X86_FEATURE_1_OFFSET offsetof (struct rtld_global, _dl_= x86_feature_1) > > +#endif > > diff --git a/sysdeps/x86/malloc-for-test.c b/sysdeps/x86/malloc-for-tes= t.c > > new file mode 100644 > > index 0000000000..02f4dead5d > > --- /dev/null > > +++ b/sysdeps/x86/malloc-for-test.c > > @@ -0,0 +1,33 @@ > > +/* A malloc for intercept test. x86 version. > > + Copyright (C) 2024 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful= , > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + . */ > > + > > + > > +/* Clear XMM0...XMM7 */ > > +#define PREPARE_MALLOC() \ > > +{ \ > > + asm volatile ("xorps %%xmm0, %%xmm0" : : : "xmm0" ); \ > > + asm volatile ("xorps %%xmm1, %%xmm1" : : : "xmm1" ); \ > > + asm volatile ("xorps %%xmm2, %%xmm2" : : : "xmm2" ); \ > > + asm volatile ("xorps %%xmm3, %%xmm3" : : : "xmm3" ); \ > > + asm volatile ("xorps %%xmm4, %%xmm4" : : : "xmm4" ); \ > > + asm volatile ("xorps %%xmm5, %%xmm5" : : : "xmm5" ); \ > > + asm volatile ("xorps %%xmm6, %%xmm6" : : : "xmm6" ); \ > > + asm volatile ("xorps %%xmm7, %%xmm7" : : : "xmm7" ); \ > > +} > > + > > +#include > > diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h > > index 837fd28734..485cad9c02 100644 > > --- a/sysdeps/x86/sysdep.h > > +++ b/sysdeps/x86/sysdep.h > > @@ -70,6 +70,12 @@ > > | (1 << X86_XSTATE_ZMM_H_ID)) > > #endif > > > > +/* States which should be saved for TLSDESC_CALL and TLS_DESC_CALL. > > + Compiler assumes that all registers, including x87 FPU stack regist= ers, > > + are unchanged after CALL, except for EFLAGS and RAX/EAX. */ > > +#define TLSDESC_CALL_STATE_SAVE_MASK \ > > + (STATE_SAVE_MASK | (1 << X86_XSTATE_X87_ID)) > > + > > /* Constants for bits in __x86_string_control: */ > > > > /* Avoid short distance REP MOVSB. */ > > diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile > > index 145f25e7f6..9337e95093 100644 > > --- a/sysdeps/x86_64/Makefile > > +++ b/sysdeps/x86_64/Makefile > > @@ -10,7 +10,7 @@ LDFLAGS-rtld +=3D -Wl,-z,nomark-plt > > endif > > > > ifeq ($(subdir),csu) > > -gen-as-const-headers +=3D features-offsets.sym link-defines.sym > > +gen-as-const-headers +=3D link-defines.sym > > endif > > > > ifeq ($(subdir),gmon) > > diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h > > index 6d605d0d32..ff5d45f7cb 100644 > > --- a/sysdeps/x86_64/dl-machine.h > > +++ b/sysdeps/x86_64/dl-machine.h > > @@ -71,9 +71,6 @@ elf_machine_runtime_setup (struct link_map *l, struct= r_scope_elem *scope[], > > int lazy, int profile) > > { > > Elf64_Addr *got; > > - extern void _dl_runtime_resolve_fxsave (ElfW(Word)) attribute_hidden= ; > > - extern void _dl_runtime_resolve_xsave (ElfW(Word)) attribute_hidden; > > - extern void _dl_runtime_resolve_xsavec (ElfW(Word)) attribute_hidden= ; > > extern void _dl_runtime_profile_sse (ElfW(Word)) attribute_hidden; > > extern void _dl_runtime_profile_avx (ElfW(Word)) attribute_hidden; > > extern void _dl_runtime_profile_avx512 (ElfW(Word)) attribute_hidden= ; > > @@ -96,8 +93,6 @@ elf_machine_runtime_setup (struct link_map *l, struct= r_scope_elem *scope[], > > /* Identify this shared object. */ > > *(ElfW(Addr) *) (got + 1) =3D (ElfW(Addr)) l; > > > > - const struct cpu_features* cpu_features =3D __get_cpu_features (= ); > > - > > #ifdef SHARED > > /* The got[2] entry contains the address of a function which get= s > > called to get the address of a so far unresolved function and > > @@ -107,6 +102,7 @@ elf_machine_runtime_setup (struct link_map *l, stru= ct r_scope_elem *scope[], > > end in this function. */ > > if (__glibc_unlikely (profile)) > > { > > + const struct cpu_features* cpu_features =3D __get_cpu_feature= s (); > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512F)) > > *(ElfW(Addr) *) (got + 2) =3D (ElfW(Addr)) &_dl_runtime_pro= file_avx512; > > else if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX)) > > @@ -126,15 +122,8 @@ elf_machine_runtime_setup (struct link_map *l, str= uct r_scope_elem *scope[], > > /* This function will get called to fix up the GOT entry > > indicated by the offset on the stack, and then jump to > > the resolved address. */ > > - if (MINIMUM_X86_ISA_LEVEL >=3D AVX_X86_ISA_LEVEL > > - || GLRO(dl_x86_cpu_features).xsave_state_size !=3D 0) > > - *(ElfW(Addr) *) (got + 2) > > - =3D (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC) > > - ? (ElfW(Addr)) &_dl_runtime_resolve_xsavec > > - : (ElfW(Addr)) &_dl_runtime_resolve_xsave); > > - else > > - *(ElfW(Addr) *) (got + 2) > > - =3D (ElfW(Addr)) &_dl_runtime_resolve_fxsave; > > + *(ElfW(Addr) *) (got + 2) > > + =3D (ElfW(Addr)) GLRO(dl_x86_64_runtime_resolve); > > } > > } > > > > @@ -383,7 +372,7 @@ and creates an unsatisfiable circular dependency.\n= ", > > { > > td->arg =3D _dl_make_tlsdesc_dynamic > > (sym_map, sym->st_value + reloc->r_addend); > > - td->entry =3D _dl_tlsdesc_dynamic; > > + td->entry =3D GLRO(dl_x86_tlsdesc_dynamic); > > } > > else > > # endif > > diff --git a/sysdeps/x86_64/dl-procinfo.c b/sysdeps/x86_64/dl-procinfo.= c > > index 4d1d790fbb..06637a8154 100644 > > --- a/sysdeps/x86_64/dl-procinfo.c > > +++ b/sysdeps/x86_64/dl-procinfo.c > > @@ -41,5 +41,21 @@ > > > > #include > > > > +#if !IS_IN (ldconfig) > > +# if !defined PROCINFO_DECL && defined SHARED > > + ._dl_x86_64_runtime_resolve > > +# else > > +PROCINFO_CLASS void * _dl_x86_64_runtime_resolve > > +# endif > > +# ifndef PROCINFO_DECL > > +=3D NULL > > +# endif > > +# if !defined SHARED || defined PROCINFO_DECL > > +; > > +# else > > +, > > +# endif > > +#endif > > + > > #undef PROCINFO_DECL > > #undef PROCINFO_CLASS > > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tl= sdesc-dynamic.h > > new file mode 100644 > > index 0000000000..ce0bc094ec > > --- /dev/null > > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h > > @@ -0,0 +1,166 @@ > > +/* Thread-local storage handling in the ELF dynamic linker. x86_64 ve= rsion. > > + Copyright (C) 2004-2024 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful= , > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + . */ > > + > > +#ifndef SECTION > > +# define SECTION(p) p > > +#endif > > + > > +#undef REGISTER_SAVE_AREA > > +#undef LOCAL_STORAGE_AREA > > +#undef BASE > > + > > +#include "dl-trampoline-state.h" > > + > > + .section SECTION(.text),"ax",@progbits > > + > > + .hidden _dl_tlsdesc_dynamic > > + .global _dl_tlsdesc_dynamic > > + .type _dl_tlsdesc_dynamic,@function > > + > > + /* %rax points to the TLS descriptor, such that 0(%rax) points to > > + _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct > > + tlsdesc_dynamic_arg object. It must return in %rax the offset > > + between the thread pointer and the object denoted by the > > + argument, without clobbering any registers. > > + > > + The assembly code that follows is a rendition of the following > > + C code, hand-optimized a little bit. > > + > > +ptrdiff_t > > +_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax")) > > +{ > > + struct tlsdesc_dynamic_arg *td =3D tdp->arg; > > + dtv_t *dtv =3D *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET); > > + if (__builtin_expect (td->gen_count <=3D dtv[0].counter > > + && (dtv[td->tlsinfo.ti_module].pointer.val > > + !=3D TLS_DTV_UNALLOCATED), > > + 1)) > > + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_off= set > > + - __thread_pointer; > > + > > + return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer; > > +} > > +*/ > basically same comments for x86 version as for i386. I will leave the original code alone and only change comments to /* ... */. --=20 H.J.