From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2e.google.com (mail-oa1-x2e.google.com [IPv6:2001:4860:4864:20::2e]) by sourceware.org (Postfix) with ESMTPS id 7CF023857716 for ; Wed, 14 Feb 2024 23:57:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7CF023857716 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7CF023857716 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4860:4864:20::2e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707955061; cv=none; b=YAwzw6Xfdv4zabJPeDvxJTyveIfrWpuFfoJei/hO+swcF6JdlgMPfx5Wuyxfs5ZKq+VkOVF0JanjCYnQrIy7aqFO/giGAchaXlh+D8xYSXy0s4EsK/18B8w8SlEPwV/U+OmXk7fI14ysnQ3WUJWS/ou0JogQ1K8Wh2GjbQuKH68= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707955061; c=relaxed/simple; bh=zg3NtGitQ6ls91PrXNVZx1hDSnz7zIklSQK1lNBjiwY=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=HIoUzsCDYeB5PHR9IiKzPtHAJGFzh0PZ2P2Qjo07GiaxKvw5qryDF+rLSmZazzYLPgIOlQAkXnjvaE8O05jIzenqunl/oJg70YMvmqcUTimh/GKnnY0U5NWLFrp616xc3T96LrKR6Ugs7yjghhgf7mwfelO8Qwji+Rst3gYD7UU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oa1-x2e.google.com with SMTP id 586e51a60fabf-214c940145bso201082fac.1 for ; Wed, 14 Feb 2024 15:57:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1707955053; x=1708559853; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/vtjB9cIPxciGmKitpUVnrc1SdMTVN8pa0H4hdVIG0Y=; b=hM/PINAIQxctE9/s1mbItMA3wiSyhnIq034I42RNn1uAca4UMnIwMPvT8DUW/55j0n rnOPMC7QQHOAuf3mf0TD9xEPN1b+Xo7XOfi5IOzAtkk3STcMg6OeOwBVDSmUHNScyywY SyM9HneUhq1cYnFbdZAlUIzoWF++Td2C2CwHtyESmJkXfreJRgIbh8A8ta6JUDqHAsgM z9z6+fP7ixP/aWJJChZSCsuACdWlRno6ucFCxVR5oRQ7rvTyrNgvdGuaEurfZ581QyE4 4T+QRVcAIeEbgOC+2TZoeqBvEqXlCEDuFQvL+9WBOT9efzwCcpWeU+P51klKY9McKtkZ Cahg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707955053; x=1708559853; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/vtjB9cIPxciGmKitpUVnrc1SdMTVN8pa0H4hdVIG0Y=; b=lhuMdP/RlBN+ioeruQMOb9/SLa0NatDktUDLf0x2AMIxU59A+ukfY5bG3F+dkG/F9v 64S8p4AwbPDgnphtjZEVw1b0FtdspVppM+yqbL+lH4ybUZ0PQ0UN3ENIH9+xIg3437ev BZTFlFmR2IZ8cot3u3XPZj+lXvfgoZe72kAL2TZQEC6mECI8OamlhzySSm1TnpEEEvo2 s5rgo8nV9mR7f4I1c9A7dIY+2KzKJpA9/9ms3fA6zpBV7TOD57qcWSlJqloQ1Te7O7x5 E1LqkV/wNosINDSo0+Ffjp5IbjbojGpzmYIZX8XvJT7QYMpmIn0Jwm1czqo1t4ZrnZPc js0w== X-Gm-Message-State: AOJu0YwaVIF4Hk3MlrRmMejzgpx9ic4IrN/GkYy2hMDSbFiUmDvwQ4b7 M+gRjkIEeLQoAdYQ3CPEpyV4ZvjHBfh4G5eCQd4DZHEIGd7er9fDH/997vpQRj7dP7S6RUQ42yh 7NkvxsHTLn/W4RCzU9sB+sI9P6m/kjLAMMqs= X-Google-Smtp-Source: AGHT+IHweGT1253jIVezxHOwjM8iPPeHnn0+fwD3U9oYlCkPTUrKsVtt7NyHdKd1nHQxDQR0vR4+OGFnu2bVdQ7KaH4= X-Received: by 2002:a05:6870:e986:b0:219:9f7f:be98 with SMTP id r6-20020a056870e98600b002199f7fbe98mr107958oao.2.1707955052184; Wed, 14 Feb 2024 15:57:32 -0800 (PST) MIME-Version: 1.0 References: <20240213041501.2494232-1-hjl.tools@gmail.com> <20240213041501.2494232-3-hjl.tools@gmail.com> In-Reply-To: From: Noah Goldstein Date: Wed, 14 Feb 2024 23:57:20 +0000 Message-ID: Subject: Re: [PATCH v4 2/2] x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers To: "H.J. Lu" Cc: libc-alpha@sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Feb 14, 2024 at 11:21=E2=80=AFPM H.J. Lu wrot= e: > > On Wed, Feb 14, 2024 at 10:44:20PM +0000, Noah Goldstein wrote: > > On Tue, Feb 13, 2024 at 4:15=E2=80=AFAM H.J. Lu w= rote: > > > > > > Compiler generates the following instruction sequence for GNU2 dynami= c > > > TLS access: > > > > > > leaq tls_var@TLSDESC(%rip), %rax > > > call *tls_var@TLSCALL(%rax) > > > > > > or > > > > > > leal tls_var@TLSDESC(%ebx), %eax > > > call *tls_var@TLSCALL(%eax) > > > > > > CALL instruction is transparent to compiler which assumes all registe= rs, > > > except for EFLAGS and RAX/EAX, are unchanged after CALL. When > > > _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow > > > path. __tls_get_addr is a normal function which doesn't preserve any > > > caller-saved registers. _dl_tlsdesc_dynamic saved and restored integ= er > > > caller-saved registers, but didn't preserve any other caller-saved > > > registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAV= E, > > > XSAVE and XSAVEC to save and restore all caller-saved registers. Thi= s > > > fixes BZ #31372. > > > > > > Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic) > > > to optimize elf_machine_runtime_setup. > > > --- > > > elf/Makefile | 19 ++ > > > elf/malloc-for-test.c | 32 ++++ > > > elf/malloc-for-test.map | 6 + > > > elf/tst-gnu2-tls2.c | 97 ++++++++++ > > > elf/tst-gnu2-tls2.h | 26 +++ > > > elf/tst-gnu2-tls2mod0.c | 28 +++ > > > elf/tst-gnu2-tls2mod1.c | 28 +++ > > > elf/tst-gnu2-tls2mod2.c | 28 +++ > > > sysdeps/i386/dl-machine.h | 2 +- > > > sysdeps/i386/dl-tlsdesc-dynamic.h | 187 +++++++++++++++++= ++ > > > sysdeps/i386/dl-tlsdesc.S | 115 +++++------- > > > sysdeps/i386/tst-gnu2-tls2.c | 5 + > > > sysdeps/x86/Makefile | 7 +- > > > sysdeps/x86/cpu-features.c | 56 +++++- > > > sysdeps/x86/dl-procinfo.c | 16 ++ > > > sysdeps/{x86_64 =3D> x86}/features-offsets.sym | 2 + > > > sysdeps/x86/malloc-for-test.c | 33 ++++ > > > sysdeps/x86/sysdep.h | 6 + > > > sysdeps/x86_64/Makefile | 2 +- > > > sysdeps/x86_64/dl-machine.h | 19 +- > > > sysdeps/x86_64/dl-procinfo.c | 16 ++ > > > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 166 ++++++++++++++++ > > > sysdeps/x86_64/dl-tlsdesc.S | 108 ++++------- > > > sysdeps/x86_64/dl-trampoline-save.h | 34 ++++ > > > sysdeps/x86_64/dl-trampoline-state.h | 51 +++++ > > > sysdeps/x86_64/dl-trampoline.S | 20 +- > > > sysdeps/x86_64/dl-trampoline.h | 34 +--- > > > 27 files changed, 930 insertions(+), 213 deletions(-) > > > create mode 100644 elf/malloc-for-test.c > > > create mode 100644 elf/malloc-for-test.map > > > create mode 100644 elf/tst-gnu2-tls2.c > > > create mode 100644 elf/tst-gnu2-tls2.h > > > create mode 100644 elf/tst-gnu2-tls2mod0.c > > > create mode 100644 elf/tst-gnu2-tls2mod1.c > > > create mode 100644 elf/tst-gnu2-tls2mod2.c > > > create mode 100644 sysdeps/i386/dl-tlsdesc-dynamic.h > > > create mode 100644 sysdeps/i386/tst-gnu2-tls2.c > > > rename sysdeps/{x86_64 =3D> x86}/features-offsets.sym (89%) > > > create mode 100644 sysdeps/x86/malloc-for-test.c > > > create mode 100644 sysdeps/x86_64/dl-tlsdesc-dynamic.h > > > create mode 100644 sysdeps/x86_64/dl-trampoline-save.h > > > create mode 100644 sysdeps/x86_64/dl-trampoline-state.h > > > > > > diff --git a/elf/Makefile b/elf/Makefile > > > index 5d78b659ce..e0665d2007 100644 > > > --- a/elf/Makefile > > > +++ b/elf/Makefile > > > @@ -424,6 +424,7 @@ tests +=3D \ > > > tst-glibc-hwcaps-prepend \ > > > tst-global1 \ > > > tst-global2 \ > > > + tst-gnu2-tls2 \ > > > tst-initfinilazyfail \ > > > tst-initorder \ > > > tst-initorder2 \ > > > @@ -699,6 +700,7 @@ modules-names +=3D \ > > > libtracemod5-1 \ > > > ltglobmod1 \ > > > ltglobmod2 \ > > > + malloc-for-test \ > > > neededobj1 \ > > > neededobj2 \ > > > neededobj3 \ > > > @@ -846,6 +848,9 @@ modules-names +=3D \ > > > tst-filterobj-flt \ > > > tst-finilazyfailmod \ > > > tst-globalmod2 \ > > > + tst-gnu2-tls2mod0 \ > > > + tst-gnu2-tls2mod1 \ > > > + tst-gnu2-tls2mod2 \ > > > tst-initlazyfailmod \ > > > tst-initorder2a \ > > > tst-initorder2b \ > > > @@ -3044,8 +3049,22 @@ $(objpfx)tst-tlsgap.out: \ > > > $(objpfx)tst-tlsgap-mod0.so \ > > > $(objpfx)tst-tlsgap-mod1.so \ > > > $(objpfx)tst-tlsgap-mod2.so > > > + > > > +$(objpfx)tst-gnu2-tls2: \ > > > + $(shared-thread-library) \ > > > + $(objpfx)malloc-for-test.so > > > +$(objpfx)tst-gnu2-tls2.out: \ > > > + $(objpfx)tst-gnu2-tls2mod0.so \ > > > + $(objpfx)tst-gnu2-tls2mod1.so \ > > > + $(objpfx)tst-gnu2-tls2mod2.so > > > + > > > +LDFLAGS-malloc-for-test.so +=3D -Wl,--version-script=3Dmalloc-for-te= st.map > > > + > > > ifeq (yes,$(have-mtls-dialect-gnu2)) > > > CFLAGS-tst-tlsgap-mod0.c +=3D -mtls-dialect=3Dgnu2 > > > CFLAGS-tst-tlsgap-mod1.c +=3D -mtls-dialect=3Dgnu2 > > > CFLAGS-tst-tlsgap-mod2.c +=3D -mtls-dialect=3Dgnu2 > > > +CFLAGS-tst-gnu2-tls2mod0.c +=3D -mtls-dialect=3Dgnu2 > > > +CFLAGS-tst-gnu2-tls2mod1.c +=3D -mtls-dialect=3Dgnu2 > > > +CFLAGS-tst-gnu2-tls2mod2.c +=3D -mtls-dialect=3Dgnu2 > > > endif > > > diff --git a/elf/malloc-for-test.c b/elf/malloc-for-test.c > > > new file mode 100644 > > > index 0000000000..1bec69eda7 > > > --- /dev/null > > > +++ b/elf/malloc-for-test.c > > > @@ -0,0 +1,32 @@ > > > +/* A malloc for intercept test. > > > + Copyright (C) 2024 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/o= r > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version= . > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#include > > > + > > > +extern void * __libc_malloc (size_t); > > > + > > > +#ifndef PREPARE_MALLOC > > > +# define PREPARE_MALLOC() > > > +#endif > > > + > > > +void * > > > +malloc (size_t n) > > > +{ > > > + PREPARE_MALLOC (); > > > + return __libc_malloc (n); > > > +} > > > diff --git a/elf/malloc-for-test.map b/elf/malloc-for-test.map > > > new file mode 100644 > > > index 0000000000..8437cf4346 > > > --- /dev/null > > > +++ b/elf/malloc-for-test.map > > > @@ -0,0 +1,6 @@ > > > +GLIBC_2.0 { > > > + global: > > > + malloc; > > > + local: > > > + *; > > > +}; > > > diff --git a/elf/tst-gnu2-tls2.c b/elf/tst-gnu2-tls2.c > > > new file mode 100644 > > > index 0000000000..34427f9a0f > > > --- /dev/null > > > +++ b/elf/tst-gnu2-tls2.c > > > @@ -0,0 +1,97 @@ > > > +/* Test TLSDESC relocation. > > > + Copyright (C) 2024 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/o= r > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version= . > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include "tst-gnu2-tls2.h" > > > + > > > +#ifndef IS_SUPPORTED > > > +# define IS_SUPPORTED() true > > > +#endif > > > + > > > +static void *mod[3]; > > > +#define MOD(i) "tst-gnu2-tls2mod" #i ".so" > > > +static const char *modname[3] =3D { MOD(0), MOD(1), MOD(2) }; > > > +#undef MOD > > > + > > > +static void > > > +open_mod (int i) > > > +{ > > > + mod[i] =3D xdlopen (modname[i], RTLD_LAZY); > > > + printf ("open %s\n", modname[i]); > > > +} > > > + > > > +static void > > > +close_mod (int i) > > > +{ > > > + xdlclose (mod[i]); > > > + mod[i] =3D NULL; > > > + printf ("close %s\n", modname[i]); > > > +} > > > + > > > +static void > > > +access_mod (int i, const char *sym) > > > +{ > > > + struct tls var =3D { -1, -1, -1, -1 }; > > > + struct tls *(*f) (struct tls *) =3D xdlsym (mod[i], sym); > > > + struct tls *p =3D f (&var); > > > + printf ("access %s: %s() =3D %p\n", modname[i], sym, p); > > > + TEST_VERIFY_EXIT (memcmp (p, &var, sizeof (var)) =3D=3D 0); > > > + ++(p->a); > > > +} > > > + > > > +static void * > > > +start (void *arg) > > > +{ > > > + /* The DTV generation is at the last dlopen of mod0 and the > > > + entry for mod1 is NULL. */ > > > + > > > + open_mod (1); /* Reuse modid of mod1. Uses dynamic TLS. */ > > > + > > > + /* Force the slow path in GNU2 TLS descriptor call. */ > > > + access_mod (1, "apply_tls"); > > > + > > > + return arg; > > > +} > > > + > > > +static int > > > +do_test (void) > > > +{ > > > + if (!IS_SUPPORTED ()) > > > + return EXIT_UNSUPPORTED; > > > + > > > + open_mod (0); > > > + open_mod (1); > > > + open_mod (2); > > > + close_mod (0); > > > + close_mod (1); /* Create modid gap at mod1. */ > > > + open_mod (0); /* Reuse modid of mod0, bump generation count. */ > > > + > > > + /* Create a thread where DTV of mod1 is NULL. */ > > > + pthread_t t =3D xpthread_create (NULL, start, NULL); > > > + xpthread_join (t); > > > + return 0; > > > +} > > > + > > > +#include > > > diff --git a/elf/tst-gnu2-tls2.h b/elf/tst-gnu2-tls2.h > > > new file mode 100644 > > > index 0000000000..e33f4dbe27 > > > --- /dev/null > > > +++ b/elf/tst-gnu2-tls2.h > > > @@ -0,0 +1,26 @@ > > > +/* Test TLSDESC relocation. > > > + Copyright (C) 2024 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/o= r > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version= . > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#include > > > + > > > +struct tls > > > +{ > > > + int64_t a, b, c, d; > > > +}; > > > + > > > +extern struct tls *apply_tls (struct tls *); > > > diff --git a/elf/tst-gnu2-tls2mod0.c b/elf/tst-gnu2-tls2mod0.c > > > new file mode 100644 > > > index 0000000000..67dc0d464d > > > --- /dev/null > > > +++ b/elf/tst-gnu2-tls2mod0.c > > > @@ -0,0 +1,28 @@ > > > +/* DSO used by tst-gnu2-tls2. > > > + Copyright (C) 2024 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/o= r > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version= . > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#include "tst-gnu2-tls2.h" > > > + > > > +__thread struct tls tls_var0 __attribute__ ((visibility ("hidden")))= ; > > > + > > > +struct tls * > > > +apply_tls (struct tls *p) > > > +{ > > > + tls_var0 =3D *p; > > > + return &tls_var0; > > > +} > > > diff --git a/elf/tst-gnu2-tls2mod1.c b/elf/tst-gnu2-tls2mod1.c > > > new file mode 100644 > > > index 0000000000..a4ae6db24f > > > --- /dev/null > > > +++ b/elf/tst-gnu2-tls2mod1.c > > > @@ -0,0 +1,28 @@ > > > +/* DSO used by tst-gnu2-tls2. > > > + Copyright (C) 2024 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/o= r > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version= . > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#include "tst-gnu2-tls2.h" > > > + > > > +__thread struct tls tls_var1[100] __attribute__ ((visibility ("hidde= n"))); > > > + > > > +struct tls * > > > +apply_tls (struct tls *p) > > > +{ > > > + tls_var1[1] =3D *p; > > > + return &tls_var1[1]; > > > +} > > > diff --git a/elf/tst-gnu2-tls2mod2.c b/elf/tst-gnu2-tls2mod2.c > > > new file mode 100644 > > > index 0000000000..2d13921717 > > > --- /dev/null > > > +++ b/elf/tst-gnu2-tls2mod2.c > > > @@ -0,0 +1,28 @@ > > > +/* DSO used by tst-gnu2-tls2. > > > + Copyright (C) 2024 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/o= r > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version= . > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#include "tst-gnu2-tls2.h" > > > + > > > +__thread struct tls tls_var2 __attribute__ ((visibility ("hidden")))= ; > > > + > > > +struct tls * > > > +apply_tls (struct tls *p) > > > +{ > > > + tls_var2 =3D *p; > > > + return &tls_var2; > > > +} > > > diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h > > > index fc1ef96587..50d74fe6e9 100644 > > > --- a/sysdeps/i386/dl-machine.h > > > +++ b/sysdeps/i386/dl-machine.h > > > @@ -347,7 +347,7 @@ and creates an unsatisfiable circular dependency.= \n", > > > { > > > td->arg =3D _dl_make_tlsdesc_dynamic > > > (sym_map, sym->st_value + (ElfW(Word))td->arg); > > > - td->entry =3D _dl_tlsdesc_dynamic; > > > + td->entry =3D GLRO(dl_x86_tlsdesc_dynamic); > > > } > > > else > > > # endif > > > diff --git a/sysdeps/i386/dl-tlsdesc-dynamic.h b/sysdeps/i386/dl-tlsd= esc-dynamic.h > > > new file mode 100644 > > > index 0000000000..675e56d32d > > > --- /dev/null > > > +++ b/sysdeps/i386/dl-tlsdesc-dynamic.h > > > @@ -0,0 +1,187 @@ > > > +/* Thread-local storage handling in the ELF dynamic linker. i386 ve= rsion. > > > + Copyright (C) 2004-2024 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/o= r > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version= . > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#undef REGISTER_SAVE_AREA > > > + > > > +#if !defined USE_FNSAVE && (STATE_SAVE_ALIGNMENT % 16) !=3D 0 > > > +# error STATE_SAVE_ALIGNMENT must be multiple of 16 > > > +#endif > > > + > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK > > > +# ifdef USE_FNSAVE > > > +# error USE_FNSAVE shouldn't be defined > > > +# endif > > > +# ifdef USE_FXSAVE > > > +/* Use fxsave to save all registers. */ > > > +# define REGISTER_SAVE_AREA 512 > > > +# endif > > > +#else > > > +# ifdef USE_FNSAVE > > > +/* Use fnsave to save x87 FPU stack registers. */ > > > +# define REGISTER_SAVE_AREA 108 > > > +# else > > > +# ifndef USE_FXSAVE > > > +# error USE_FXSAVE must be defined > > > +# endif > > > +/* Use fxsave to save all registers. Add 12 bytes to align the stac= k > > > + to 16 bytes. */ > > > +# define REGISTER_SAVE_AREA (512 + 12) > > > +# endif > > > +#endif > > > + > > > + .hidden _dl_tlsdesc_dynamic > > > + .global _dl_tlsdesc_dynamic > > > + .type _dl_tlsdesc_dynamic,@function > > > + > > > + /* This function is used for symbols that need dynamic TLS. > > > + > > > + %eax points to the TLS descriptor, such that 0(%eax) points t= o > > > + _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct > > > + tlsdesc_dynamic_arg object. It must return in %eax the offse= t > > > + between the thread pointer and the object denoted by the > > > + argument, without clobbering any registers. > > > + > > > + The assembly code that follows is a rendition of the followin= g > > > + C code, hand-optimized a little bit. > > > + > > > +ptrdiff_t > > > +__attribute__ ((__regparm__ (1))) > > > +_dl_tlsdesc_dynamic (struct tlsdesc *tdp) > > > +{ > > > + struct tlsdesc_dynamic_arg *td =3D tdp->arg; > > > + dtv_t *dtv =3D *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET); > > > + if (__builtin_expect (td->gen_count <=3D dtv[0].counter > > > + && (dtv[td->tlsinfo.ti_module].pointer.val > > > + !=3D TLS_DTV_UNALLOCATED), > > > + 1)) > > > + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_o= ffset > > > + - __thread_pointer; > > > + > > > + return ___tls_get_addr (&td->tlsinfo) - __thread_pointer; > > > +} > > > +*/ > > > + cfi_startproc > > > + .align 16 > > > +_dl_tlsdesc_dynamic: > > > + /* Like all TLS resolvers, preserve call-clobbered registers. > > > + We need two scratch regs anyway. */ > > > + subl $32, %esp > > > + cfi_adjust_cfa_offset (32) > > > + movl %ecx, 20(%esp) > > > + movl %edx, 24(%esp) > > > + movl TLSDESC_ARG(%eax), %eax > > > + movl %gs:DTV_OFFSET, %edx > > > + movl TLSDESC_GEN_COUNT(%eax), %ecx > > > + cmpl (%edx), %ecx > > > + ja 2f > > > + movl TLSDESC_MODID(%eax), %ecx > > > + movl (%edx,%ecx,8), %edx > > > + cmpl $-1, %edx > > > + je 2f > > > + movl TLSDESC_MODOFF(%eax), %eax > > > + addl %edx, %eax > > > +1: > > > + movl 20(%esp), %ecx > > > + subl %gs:0, %eax > > > + movl 24(%esp), %edx > > > + addl $32, %esp > > > + cfi_adjust_cfa_offset (-32) > > > + ret > > > + .p2align 4,,7 > > > +2: > > > + cfi_adjust_cfa_offset (32) > > Extraneous AFAICT. > > This was in the existing code. The label 2 can only be reached by > a jump. When the label 2 is reached, this CFA adjustment is to tell > debugger that CFA isn't changed the CFA directive above. > > > > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK > > > + movl %ebx, -28(%esp) > > > + movl %esp, %ebx > > > + cfi_def_cfa_register(%ebx) > > > + and $-STATE_SAVE_ALIGNMENT, %esp > > > +#endif > > > +#ifdef REGISTER_SAVE_AREA > > > + subl $REGISTER_SAVE_AREA, %esp > > > +# if !DL_RUNTIME_RESOLVE_REALIGN_STACK > > > + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA) > > > +# endif > > > +#else > > > + # Allocate stack space of the required size to save the state= . > > > + LOAD_PIC_REG (cx) > > > + subl RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE= _SIZE_OFFSET+_rtld_local_ro@GOTOFF(%ecx), %esp > > > +#endif > > > +#ifdef USE_FNSAVE > > > + fnsave (%esp) > > > +#elif defined USE_FXSAVE > > > + fxsave (%esp) > > > +#else > > > + # Save the argument for ___tls_get_addr in EAX. > > > + movl %eax, %ecx > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax > > > + xorl %edx, %edx > > > + # Clear the XSAVE Header. > > > +# ifdef USE_XSAVE > > > + movl %edx, (512)(%esp) > > > + movl %edx, (512 + 4 * 1)(%esp) > > > + movl %edx, (512 + 4 * 2)(%esp) > > > + movl %edx, (512 + 4 * 3)(%esp) > > > +# endif > > > + movl %edx, (512 + 4 * 4)(%esp) > > > + movl %edx, (512 + 4 * 5)(%esp) > > > + movl %edx, (512 + 4 * 6)(%esp) > > > + movl %edx, (512 + 4 * 7)(%esp) > > > + movl %edx, (512 + 4 * 8)(%esp) > > > + movl %edx, (512 + 4 * 9)(%esp) > > > + movl %edx, (512 + 4 * 10)(%esp) > > > + movl %edx, (512 + 4 * 11)(%esp) > > > + movl %edx, (512 + 4 * 12)(%esp) > > > + movl %edx, (512 + 4 * 13)(%esp) > > > + movl %edx, (512 + 4 * 14)(%esp) > > > + movl %edx, (512 + 4 * 15)(%esp) > > > +# ifdef USE_XSAVE > > > + xsave (%esp) > > > +# else > > > + xsavec (%esp) > > > +# endif > > > + # Restore the argument for ___tls_get_addr in EAX. > > > + movl %ecx, %eax > > > +#endif > > > + call HIDDEN_JUMPTARGET (___tls_get_addr) > > > + # Get register content back. > > > +#ifdef USE_FNSAVE > > > + frstor (%esp) > > > +#elif defined USE_FXSAVE > > > + fxrstor (%esp) > > > +#else > > > + /* Save and retore ___tls_get_addr return value stored in EAX= . */ > > > + movl %eax, %ecx > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax > > > + xorl %edx, %edx > > > + xrstor (%esp) > > > + movl %ecx, %eax > > > +#endif > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK > > > + mov %ebx, %esp > > > + cfi_def_cfa_register(%esp) > > > + movl -28(%esp), %ebx > > > + cfi_restore(%ebx) > > > +#else > > > + addl $REGISTER_SAVE_AREA, %esp > > > + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA) > > The use of `REGISTER_SAVE_AREA` above is guarded by an > > `#ifdef REGISTER_SAVE_AREA` > > and uses > > `_rtld_local_ro+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_S= IZE_OFFSET(%rip)` > > otherwise. > > Would expect same here? > > REGISTER_SAVE_AREA is only used by fnsave and fxsave which > expect the fixed area. > > _rtld_local_ro+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE= _OFFSET(%rip) > is used by xsave and xsavec whose saved area size depends on > the enabled features. > > 2 things are different. My point is that we setup the stack above with ifdef i.e ``` #ifdef REGISTER_SAVE_AREA subl $REGISTER_SAVE_AREA, %esp #else subl RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_SIZE_O= FFSET+_rtld_local_ro@GOTOFF(%ecx), %esp #endif ``` Shouldnt you have the same ifdef for restoring? > > > > +#endif > > > + jmp 1b > > > + cfi_endproc > > > + .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic > > > + > > > +#undef STATE_SAVE_ALIGNMENT > > > diff --git a/sysdeps/i386/dl-tlsdesc.S b/sysdeps/i386/dl-tlsdesc.S > > > index 90d93caa0c..f002feee56 100644 > > > --- a/sysdeps/i386/dl-tlsdesc.S > > > +++ b/sysdeps/i386/dl-tlsdesc.S > > > @@ -18,8 +18,27 @@ > > > > > > #include > > > #include > > > +#include > > > +#include > > > #include "tlsdesc.h" > > > > > > +#ifndef DL_STACK_ALIGNMENT > > > +/* Due to GCC bug: > > > + > > > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D58066 > > > + > > > + __tls_get_addr may be called with 4-byte stack alignment. Althou= gh > > > + this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume > > > + that stack will be always aligned at 16 bytes. */ > > > +# define DL_STACK_ALIGNMENT 4 > > > +#endif > > > + > > > +/* True if _dl_tlsdesc_dynamic should align stack for STATE_SAVE or = align > > > + stack to MINIMUM_ALIGNMENT bytes before calling ___tls_get_addr. = */ > > > +#define DL_RUNTIME_RESOLVE_REALIGN_STACK \ > > > + (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \ > > > + || MINIMUM_ALIGNMENT > DL_STACK_ALIGNMENT) > > > + > > > .text > > > > > > /* This function is used to compute the TP offset for symbols i= n > > > @@ -65,69 +84,35 @@ _dl_tlsdesc_undefweak: > > > .size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak > > > > > > #ifdef SHARED > > > - .hidden _dl_tlsdesc_dynamic > > > - .global _dl_tlsdesc_dynamic > > > - .type _dl_tlsdesc_dynamic,@function > > > - > > > - /* This function is used for symbols that need dynamic TLS. > > > - > > > - %eax points to the TLS descriptor, such that 0(%eax) points t= o > > > - _dl_tlsdesc_dynamic itself, and 4(%eax) points to a struct > > > - tlsdesc_dynamic_arg object. It must return in %eax the offse= t > > > - between the thread pointer and the object denoted by the > > > - argument, without clobbering any registers. > > > - > > > - The assembly code that follows is a rendition of the followin= g > > > - C code, hand-optimized a little bit. > > > - > > > -ptrdiff_t > > > -__attribute__ ((__regparm__ (1))) > > > -_dl_tlsdesc_dynamic (struct tlsdesc *tdp) > > > -{ > > > - struct tlsdesc_dynamic_arg *td =3D tdp->arg; > > > - dtv_t *dtv =3D *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET); > > > - if (__builtin_expect (td->gen_count <=3D dtv[0].counter > > > - && (dtv[td->tlsinfo.ti_module].pointer.val > > > - !=3D TLS_DTV_UNALLOCATED), > > > - 1)) > > > - return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_o= ffset > > > - - __thread_pointer; > > > - > > > - return ___tls_get_addr (&td->tlsinfo) - __thread_pointer; > > > -} > > > -*/ > > > - cfi_startproc > > > - .align 16 > > > -_dl_tlsdesc_dynamic: > > > - /* Like all TLS resolvers, preserve call-clobbered registers. > > > - We need two scratch regs anyway. */ > > > - subl $28, %esp > > > - cfi_adjust_cfa_offset (28) > > > - movl %ecx, 20(%esp) > > > - movl %edx, 24(%esp) > > > - movl TLSDESC_ARG(%eax), %eax > > > - movl %gs:DTV_OFFSET, %edx > > > - movl TLSDESC_GEN_COUNT(%eax), %ecx > > > - cmpl (%edx), %ecx > > > - ja .Lslow > > > - movl TLSDESC_MODID(%eax), %ecx > > > - movl (%edx,%ecx,8), %edx > > > - cmpl $-1, %edx > > > - je .Lslow > > > - movl TLSDESC_MODOFF(%eax), %eax > > > - addl %edx, %eax > > > -.Lret: > > > - movl 20(%esp), %ecx > > > - subl %gs:0, %eax > > > - movl 24(%esp), %edx > > > - addl $28, %esp > > > - cfi_adjust_cfa_offset (-28) > > > - ret > > > - .p2align 4,,7 > > > -.Lslow: > > > - cfi_adjust_cfa_offset (28) > > > - call HIDDEN_JUMPTARGET (___tls_get_addr) > > > - jmp .Lret > > > - cfi_endproc > > > - .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic > > > +# define USE_FNSAVE > > > +# define MINIMUM_ALIGNMENT 4 > > > +# define STATE_SAVE_ALIGNMENT 4 > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fnsave > > > +# include "dl-tlsdesc-dynamic.h" > > > +# undef _dl_tlsdesc_dynamic > > > +# undef MINIMUM_ALIGNMENT > > > +# undef USE_FNSAVE > > > + > > > +# define MINIMUM_ALIGNMENT 16 > > > + > > > +# define USE_FXSAVE > > > +# define STATE_SAVE_ALIGNMENT 16 > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave > > > +# include "dl-tlsdesc-dynamic.h" > > > +# undef _dl_tlsdesc_dynamic > > > +# undef USE_FXSAVE > > > + > > > +# define USE_XSAVE > > > +# define STATE_SAVE_ALIGNMENT 64 > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave > > > +# include "dl-tlsdesc-dynamic.h" > > > +# undef _dl_tlsdesc_dynamic > > > +# undef USE_XSAVE > > > + > > > +# define USE_XSAVEC > > > +# define STATE_SAVE_ALIGNMENT 64 > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec > > > +# include "dl-tlsdesc-dynamic.h" > > > +# undef _dl_tlsdesc_dynamic > > > +# undef USE_XSAVEC > > > #endif /* SHARED */ > > > diff --git a/sysdeps/i386/tst-gnu2-tls2.c b/sysdeps/i386/tst-gnu2-tls= 2.c > > > new file mode 100644 > > > index 0000000000..92e7fbff89 > > > --- /dev/null > > > +++ b/sysdeps/i386/tst-gnu2-tls2.c > > > @@ -0,0 +1,5 @@ > > > +#include > > > + > > > +#define IS_SUPPORTED() CPU_FEATURE_ACTIVE (SSE2) > > > + > > > +#include > > > diff --git a/sysdeps/x86/Makefile b/sysdeps/x86/Makefile > > > index 4d50b327b5..bc357f0099 100644 > > > --- a/sysdeps/x86/Makefile > > > +++ b/sysdeps/x86/Makefile > > > @@ -1,5 +1,5 @@ > > > ifeq ($(subdir),csu) > > > -gen-as-const-headers +=3D cpu-features-offsets.sym > > > +gen-as-const-headers +=3D cpu-features-offsets.sym features-offsets.= sym > > > endif > > > > > > ifeq ($(subdir),elf) > > > @@ -86,6 +86,11 @@ endif > > > tst-ifunc-isa-2-ENV =3D GLIBC_TUNABLES=3Dglibc.cpu.hwcaps=3D-SSE4_2,= -AVX,-AVX2,-AVX512F > > > tst-ifunc-isa-2-static-ENV =3D $(tst-ifunc-isa-2-ENV) > > > tst-hwcap-tunables-ARGS =3D -- $(host-test-program-cmd) > > > + > > > +CFLAGS-malloc-for-test.c +=3D -msse2 > > > +CFLAGS-tst-gnu2-tls2mod0.c +=3D -msse2 -mtune=3Dhaswell > > > +CFLAGS-tst-gnu2-tls2mod1.c +=3D -msse2 -mtune=3Dhaswell > > > +CFLAGS-tst-gnu2-tls2mod2.c +=3D -msse2 -mtune=3Dhaswell > > > endif > > > > > > ifeq ($(subdir),math) > > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > > > index 25e6622a79..835113b42f 100644 > > > --- a/sysdeps/x86/cpu-features.c > > > +++ b/sysdeps/x86/cpu-features.c > > > @@ -27,8 +27,13 @@ > > > extern void TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *) > > > attribute_hidden; > > > > > > -#if defined SHARED && defined __x86_64__ > > > -# include > > > +#if defined SHARED > > > +extern void _dl_tlsdesc_dynamic_fxsave (void) attribute_hidden; > > > +extern void _dl_tlsdesc_dynamic_xsave (void) attribute_hidden; > > > +extern void _dl_tlsdesc_dynamic_xsavec (void) attribute_hidden; > > > + > > > +# ifdef __x86_64__ > > > +# include > > > > > > static void > > > TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t *valp) > > > @@ -47,6 +52,15 @@ TUNABLE_CALLBACK (set_plt_rewrite) (tunable_val_t = *valp) > > > : plt_rewrite_jmp); > > > } > > > } > > > +# else > > > +extern void _dl_tlsdesc_dynamic_fnsave (void) attribute_hidden; > > > +# endif > > > +#endif > > > + > > > +#ifdef __x86_64__ > > > +extern void _dl_runtime_resolve_fxsave (void) attribute_hidden; > > > +extern void _dl_runtime_resolve_xsave (void) attribute_hidden; > > > +extern void _dl_runtime_resolve_xsavec (void) attribute_hidden; > > > #endif > > > > > > #ifdef __LP64__ > > > @@ -1130,6 +1144,44 @@ no_cpuid: > > > TUNABLE_CALLBACK (set_x86_shstk)); > > > #endif > > > > > > + if (GLRO(dl_x86_cpu_features).xsave_state_size !=3D 0) > > > + { > > > + if (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC)) > > > + { > > > +#ifdef __x86_64__ > > > + GLRO(dl_x86_64_runtime_resolve) =3D _dl_runtime_resolve_xsa= vec; > > > +#endif > > > +#ifdef SHARED > > > + GLRO(dl_x86_tlsdesc_dynamic) =3D _dl_tlsdesc_dynamic_xsavec= ; > > > +#endif > > > + } > > > + else > > > + { > > > +#ifdef __x86_64__ > > > + GLRO(dl_x86_64_runtime_resolve) =3D _dl_runtime_resolve_xsa= ve; > > > +#endif > > > +#ifdef SHARED > > > + GLRO(dl_x86_tlsdesc_dynamic) =3D _dl_tlsdesc_dynamic_xsave; > > > +#endif > > > + } > > > + } > > > + else > > > + { > > > +#ifdef __x86_64__ > > > + GLRO(dl_x86_64_runtime_resolve) =3D _dl_runtime_resolve_fxsave= ; > > > +# ifdef SHARED > > > + GLRO(dl_x86_tlsdesc_dynamic) =3D _dl_tlsdesc_dynamic_fxsave; > > > +# endif > > > +#else > > > +# ifdef SHARED > > > + if (CPU_FEATURE_USABLE_P (cpu_features, FXSR)) > > > + GLRO(dl_x86_tlsdesc_dynamic) =3D _dl_tlsdesc_dynamic_fxsave; > > > + else > > > + GLRO(dl_x86_tlsdesc_dynamic) =3D _dl_tlsdesc_dynamic_fnsave; > > > +# endif > > > +#endif > > > + } > > > + > > > #ifdef SHARED > > > # ifdef __x86_64__ > > > TUNABLE_GET (plt_rewrite, tunable_val_t *, > > > diff --git a/sysdeps/x86/dl-procinfo.c b/sysdeps/x86/dl-procinfo.c > > > index ee957b4d70..5920d4b320 100644 > > > --- a/sysdeps/x86/dl-procinfo.c > > > +++ b/sysdeps/x86/dl-procinfo.c > > > @@ -86,3 +86,19 @@ PROCINFO_CLASS const char _dl_x86_platforms[4][9] > > > #else > > > , > > > #endif > > > + > > > +#if defined SHARED && !IS_IN (ldconfig) > > > +# if !defined PROCINFO_DECL > > > + ._dl_x86_tlsdesc_dynamic > > > +# else > > > +PROCINFO_CLASS void * _dl_x86_tlsdesc_dynamic > > > +# endif > > > +# ifndef PROCINFO_DECL > > > +=3D NULL > > > +# endif > > > +# ifdef PROCINFO_DECL > > > +; > > > +# else > > > +, > > > +# endif > > > +#endif > > > diff --git a/sysdeps/x86_64/features-offsets.sym b/sysdeps/x86/featur= es-offsets.sym > > > similarity index 89% > > > rename from sysdeps/x86_64/features-offsets.sym > > > rename to sysdeps/x86/features-offsets.sym > > > index 9e4be3393a..77e990c705 100644 > > > --- a/sysdeps/x86_64/features-offsets.sym > > > +++ b/sysdeps/x86/features-offsets.sym > > > @@ -3,4 +3,6 @@ > > > #include > > > > > > RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET offsetof (struct rtld_glob= al_ro, _dl_x86_cpu_features) > > > +#ifdef __x86_64__ > > > RTLD_GLOBAL_DL_X86_FEATURE_1_OFFSET offsetof (struct rtld_global, _d= l_x86_feature_1) > > > +#endif > > > diff --git a/sysdeps/x86/malloc-for-test.c b/sysdeps/x86/malloc-for-t= est.c > > > new file mode 100644 > > > index 0000000000..02f4dead5d > > > --- /dev/null > > > +++ b/sysdeps/x86/malloc-for-test.c > > > @@ -0,0 +1,33 @@ > > > +/* A malloc for intercept test. x86 version. > > > + Copyright (C) 2024 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/o= r > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version= . > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > + > > > +/* Clear XMM0...XMM7 */ > > > +#define PREPARE_MALLOC() \ > > > +{ \ > > > + asm volatile ("xorps %%xmm0, %%xmm0" : : : "xmm0" ); \ > > > + asm volatile ("xorps %%xmm1, %%xmm1" : : : "xmm1" ); \ > > > + asm volatile ("xorps %%xmm2, %%xmm2" : : : "xmm2" ); \ > > > + asm volatile ("xorps %%xmm3, %%xmm3" : : : "xmm3" ); \ > > > + asm volatile ("xorps %%xmm4, %%xmm4" : : : "xmm4" ); \ > > > + asm volatile ("xorps %%xmm5, %%xmm5" : : : "xmm5" ); \ > > > + asm volatile ("xorps %%xmm6, %%xmm6" : : : "xmm6" ); \ > > > + asm volatile ("xorps %%xmm7, %%xmm7" : : : "xmm7" ); \ > > > +} > > > + > > > +#include > > > diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h > > > index 5c1f0bcf53..792e2ea5ed 100644 > > > --- a/sysdeps/x86/sysdep.h > > > +++ b/sysdeps/x86/sysdep.h > > > @@ -68,6 +68,12 @@ > > > | (1 << X86_XSTATE_ZMM_H_ID)) > > > #endif > > > > > > +/* States which should be saved for TLSDESC_CALL and TLS_DESC_CALL. > > > + Compiler assumes that all registers, including x87 FPU stack regi= sters, > > > + are unchanged after CALL, except for EFLAGS and RAX/EAX. */ > > > +#define TLSDESC_CALL_STATE_SAVE_MASK \ > > > + (STATE_SAVE_MASK | (1 << X86_XSTATE_X87_ID)) > > > + > > > /* Constants for bits in __x86_string_control: */ > > > > > > /* Avoid short distance REP MOVSB. */ > > > diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile > > > index 90f4ecfd26..e8babc9a4e 100644 > > > --- a/sysdeps/x86_64/Makefile > > > +++ b/sysdeps/x86_64/Makefile > > > @@ -10,7 +10,7 @@ LDFLAGS-rtld +=3D -Wl,-z,nomark-plt > > > endif > > > > > > ifeq ($(subdir),csu) > > > -gen-as-const-headers +=3D features-offsets.sym link-defines.sym > > > +gen-as-const-headers +=3D link-defines.sym > > > endif > > > > > > ifeq ($(subdir),gmon) > > > diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.= h > > > index 6d605d0d32..ff5d45f7cb 100644 > > > --- a/sysdeps/x86_64/dl-machine.h > > > +++ b/sysdeps/x86_64/dl-machine.h > > > @@ -71,9 +71,6 @@ elf_machine_runtime_setup (struct link_map *l, stru= ct r_scope_elem *scope[], > > > int lazy, int profile) > > > { > > > Elf64_Addr *got; > > > - extern void _dl_runtime_resolve_fxsave (ElfW(Word)) attribute_hidd= en; > > > - extern void _dl_runtime_resolve_xsave (ElfW(Word)) attribute_hidde= n; > > > - extern void _dl_runtime_resolve_xsavec (ElfW(Word)) attribute_hidd= en; > > > extern void _dl_runtime_profile_sse (ElfW(Word)) attribute_hidden; > > > extern void _dl_runtime_profile_avx (ElfW(Word)) attribute_hidden; > > > extern void _dl_runtime_profile_avx512 (ElfW(Word)) attribute_hidd= en; > > > @@ -96,8 +93,6 @@ elf_machine_runtime_setup (struct link_map *l, stru= ct r_scope_elem *scope[], > > > /* Identify this shared object. */ > > > *(ElfW(Addr) *) (got + 1) =3D (ElfW(Addr)) l; > > > > > > - const struct cpu_features* cpu_features =3D __get_cpu_features= (); > > > - > > > #ifdef SHARED > > > /* The got[2] entry contains the address of a function which g= ets > > > called to get the address of a so far unresolved function an= d > > > @@ -107,6 +102,7 @@ elf_machine_runtime_setup (struct link_map *l, st= ruct r_scope_elem *scope[], > > > end in this function. */ > > > if (__glibc_unlikely (profile)) > > > { > > > + const struct cpu_features* cpu_features =3D __get_cpu_featu= res (); > > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512F)) > > > *(ElfW(Addr) *) (got + 2) =3D (ElfW(Addr)) &_dl_runtime_p= rofile_avx512; > > > else if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX)) > > > @@ -126,15 +122,8 @@ elf_machine_runtime_setup (struct link_map *l, s= truct r_scope_elem *scope[], > > > /* This function will get called to fix up the GOT entry > > > indicated by the offset on the stack, and then jump to > > > the resolved address. */ > > > - if (MINIMUM_X86_ISA_LEVEL >=3D AVX_X86_ISA_LEVEL > > > - || GLRO(dl_x86_cpu_features).xsave_state_size !=3D 0) > > > - *(ElfW(Addr) *) (got + 2) > > > - =3D (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC) > > > - ? (ElfW(Addr)) &_dl_runtime_resolve_xsavec > > > - : (ElfW(Addr)) &_dl_runtime_resolve_xsave); > > > - else > > > - *(ElfW(Addr) *) (got + 2) > > > - =3D (ElfW(Addr)) &_dl_runtime_resolve_fxsave; > > > + *(ElfW(Addr) *) (got + 2) > > > + =3D (ElfW(Addr)) GLRO(dl_x86_64_runtime_resolve); > > > } > > > } > > > > > > @@ -383,7 +372,7 @@ and creates an unsatisfiable circular dependency.= \n", > > > { > > > td->arg =3D _dl_make_tlsdesc_dynamic > > > (sym_map, sym->st_value + reloc->r_addend); > > > - td->entry =3D _dl_tlsdesc_dynamic; > > > + td->entry =3D GLRO(dl_x86_tlsdesc_dynamic); > > > } > > > else > > > # endif > > > diff --git a/sysdeps/x86_64/dl-procinfo.c b/sysdeps/x86_64/dl-procinf= o.c > > > index 4d1d790fbb..06637a8154 100644 > > > --- a/sysdeps/x86_64/dl-procinfo.c > > > +++ b/sysdeps/x86_64/dl-procinfo.c > > > @@ -41,5 +41,21 @@ > > > > > > #include > > > > > > +#if !IS_IN (ldconfig) > > > +# if !defined PROCINFO_DECL && defined SHARED > > > + ._dl_x86_64_runtime_resolve > > > +# else > > > +PROCINFO_CLASS void * _dl_x86_64_runtime_resolve > > > +# endif > > > +# ifndef PROCINFO_DECL > > > +=3D NULL > > > +# endif > > > +# if !defined SHARED || defined PROCINFO_DECL > > > +; > > > +# else > > > +, > > > +# endif > > > +#endif > > > + > > > #undef PROCINFO_DECL > > > #undef PROCINFO_CLASS > > > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-= tlsdesc-dynamic.h > > > new file mode 100644 > > > index 0000000000..ce0bc094ec > > > --- /dev/null > > > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h > > > @@ -0,0 +1,166 @@ > > > +/* Thread-local storage handling in the ELF dynamic linker. x86_64 = version. > > > + Copyright (C) 2004-2024 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/o= r > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version= . > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#ifndef SECTION > > > +# define SECTION(p) p > > > +#endif > > > + > > > +#undef REGISTER_SAVE_AREA > > > +#undef LOCAL_STORAGE_AREA > > > +#undef BASE > > > + > > > +#include "dl-trampoline-state.h" > > > + > > > + .section SECTION(.text),"ax",@progbits > > > + > > > + .hidden _dl_tlsdesc_dynamic > > > + .global _dl_tlsdesc_dynamic > > > + .type _dl_tlsdesc_dynamic,@function > > > + > > > + /* %rax points to the TLS descriptor, such that 0(%rax) points = to > > > + _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct > > > + tlsdesc_dynamic_arg object. It must return in %rax the offse= t > > > + between the thread pointer and the object denoted by the > > > + argument, without clobbering any registers. > > > + > > > + The assembly code that follows is a rendition of the followin= g > > > + C code, hand-optimized a little bit. > > > + > > > +ptrdiff_t > > > +_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax")) > > > +{ > > > + struct tlsdesc_dynamic_arg *td =3D tdp->arg; > > > + dtv_t *dtv =3D *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET); > > > + if (__builtin_expect (td->gen_count <=3D dtv[0].counter > > > + && (dtv[td->tlsinfo.ti_module].pointer.val > > > + !=3D TLS_DTV_UNALLOCATED), > > > + 1)) > > > + return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_o= ffset > > > + - __thread_pointer; > > > + > > > + return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer; > > > +} > > > +*/ > > > + cfi_startproc > > > + .align 16 > > > +_dl_tlsdesc_dynamic: > > > + _CET_ENDBR > > > + /* Preserve call-clobbered registers that we modify. > > > + We need two scratch regs anyway. */ > > > + movq %rsi, -16(%rsp) > > > + mov %fs:DTV_OFFSET, %RSI_LP > > > + movq %rdi, -8(%rsp) > > > + movq TLSDESC_ARG(%rax), %rdi > > > + movq (%rsi), %rax > > > + cmpq %rax, TLSDESC_GEN_COUNT(%rdi) > > > + ja 2f > > > + movq TLSDESC_MODID(%rdi), %rax > > > + salq $4, %rax > > > + movq (%rax,%rsi), %rax > > > + cmpq $-1, %rax > > > + je 2f > > > + addq TLSDESC_MODOFF(%rdi), %rax > > > +1: > > > + movq -16(%rsp), %rsi > > > + sub %fs:0, %RAX_LP > > > + movq -8(%rsp), %rdi > > > + ret > > > +2: > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK > > > + movq %rbx, -24(%rsp) > > > + mov %RSP_LP, %RBX_LP > > > + cfi_def_cfa_register(%rbx) > > > + and $-STATE_SAVE_ALIGNMENT, %RSP_LP > > > +#endif > > > +#ifdef REGISTER_SAVE_AREA > > > +# if DL_RUNTIME_RESOLVE_REALIGN_STACK > > > + # STATE_SAVE_OFFSET has space for 8 integer registers. But w= e > > > + # need space for RCX, RDX, RSI, RDI, R8, R9, R10 and R11, plu= s > > > + # RBX above. > > > + sub $(REGISTER_SAVE_AREA + STATE_SAVE_ALIGNMENT), %RSP_LP > > > +# else > > > + sub $REGISTER_SAVE_AREA, %RSP_LP > > > + cfi_adjust_cfa_offset(REGISTER_SAVE_AREA) > > > +# endif > > > +#else > > > + # Allocate stack space of the required size to save the state= . > > > + sub _rtld_local_ro+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFF= SET+XSAVE_STATE_SIZE_OFFSET(%rip), %RSP_LP > > > +#endif > > > + /* Besides rdi and rsi, saved above, save rcx, rdx, r8, r9, > > > + r10 and r11. */ > > > + movq %rcx, REGISTER_SAVE_RCX(%rsp) > > > + movq %rdx, REGISTER_SAVE_RDX(%rsp) > > > + movq %r8, REGISTER_SAVE_R8(%rsp) > > > + movq %r9, REGISTER_SAVE_R9(%rsp) > > > + movq %r10, REGISTER_SAVE_R10(%rsp) > > > + movq %r11, REGISTER_SAVE_R11(%rsp) > > > +#ifdef USE_FXSAVE > > > + fxsave STATE_SAVE_OFFSET(%rsp) > > > +#else > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax > > > + xorl %edx, %edx > > > + # Clear the XSAVE Header. > > > +# ifdef USE_XSAVE > > > + movq %rdx, (STATE_SAVE_OFFSET + 512)(%rsp) > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8)(%rsp) > > > +# endif > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 2)(%rsp) > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 3)(%rsp) > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 4)(%rsp) > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 5)(%rsp) > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 6)(%rsp) > > > + movq %rdx, (STATE_SAVE_OFFSET + 512 + 8 * 7)(%rsp) > > > +# ifdef USE_XSAVE > > > + xsave STATE_SAVE_OFFSET(%rsp) > > > +# else > > > + xsavec STATE_SAVE_OFFSET(%rsp) > > > +# endif > > > +#endif > > > + /* %rdi already points to the tlsinfo data structure. */ > > > + call HIDDEN_JUMPTARGET (__tls_get_addr) > > > + # Get register content back. > > > +#ifdef USE_FXSAVE > > > + fxrstor STATE_SAVE_OFFSET(%rsp) > > > +#else > > > + /* Save and retore __tls_get_addr return value stored in RAX.= */ > > > + mov %RAX_LP, %RCX_LP > > > + movl $TLSDESC_CALL_STATE_SAVE_MASK, %eax > > > + xorl %edx, %edx > > > + xrstor STATE_SAVE_OFFSET(%rsp) > > > + mov %RCX_LP, %RAX_LP > > > +#endif > > > + movq REGISTER_SAVE_R11(%rsp), %r11 > > > + movq REGISTER_SAVE_R10(%rsp), %r10 > > > + movq REGISTER_SAVE_R9(%rsp), %r9 > > > + movq REGISTER_SAVE_R8(%rsp), %r8 > > > + movq REGISTER_SAVE_RDX(%rsp), %rdx > > > + movq REGISTER_SAVE_RCX(%rsp), %rcx > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK > > > + mov %RBX_LP, %RSP_LP > > > + cfi_def_cfa_register(%rsp) > > > + movq -24(%rsp), %rbx > > > + cfi_restore(%rbx) > > > +#else > > > + add $REGISTER_SAVE_AREA, %RSP_LP > > > + cfi_adjust_cfa_offset(-REGISTER_SAVE_AREA) > > The use of `REGISTER_SAVE_AREA` above is guarded by an > > `#ifdef REGISTER_SAVE_AREA` > > and uses > > `_rtld_local_ro+RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET+XSAVE_STATE_S= IZE_OFFSET(%rip)` > > otherwise. > > Would expect same here? > > > +#endif > > > + jmp 1b > > > + cfi_endproc > > > + .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic > > > + > > Seems to me the x32 and x64 implementations could be merged with > > Did you mean i386? x32 uses the same implementation as x64. > > > a few defines for the registers/register width > > + the extra GPR saving in x64. > > i386 ISA is quite different from x86-64. Merging them will > only make codes hard to follow. > > > > +#undef STATE_SAVE_ALIGNMENT > > > diff --git a/sysdeps/x86_64/dl-tlsdesc.S b/sysdeps/x86_64/dl-tlsdesc.= S > > > index f748af2ece..ea69f5223a 100644 > > > --- a/sysdeps/x86_64/dl-tlsdesc.S > > > +++ b/sysdeps/x86_64/dl-tlsdesc.S > > > @@ -18,7 +18,19 @@ > > > > > > #include > > > #include > > > +#include > > > +#include > > > #include "tlsdesc.h" > > > +#include "dl-trampoline-save.h" > > > + > > > +/* Area on stack to save and restore registers used for parameter > > > + passing when calling _dl_tlsdesc_dynamic. */ > > > +#define REGISTER_SAVE_RCX 0 > > > +#define REGISTER_SAVE_RDX (REGISTER_SAVE_RCX + 8) > > > +#define REGISTER_SAVE_R8 (REGISTER_SAVE_RDX + 8) > > > +#define REGISTER_SAVE_R9 (REGISTER_SAVE_R8 + 8) > > > +#define REGISTER_SAVE_R10 (REGISTER_SAVE_R9 + 8) > > > +#define REGISTER_SAVE_R11 (REGISTER_SAVE_R10 + 8) > > > > > > .text > > > > > > @@ -67,80 +79,24 @@ _dl_tlsdesc_undefweak: > > > .size _dl_tlsdesc_undefweak, .-_dl_tlsdesc_undefweak > > > > > > #ifdef SHARED > > > - .hidden _dl_tlsdesc_dynamic > > > - .global _dl_tlsdesc_dynamic > > > - .type _dl_tlsdesc_dynamic,@function > > > - > > > - /* %rax points to the TLS descriptor, such that 0(%rax) points = to > > > - _dl_tlsdesc_dynamic itself, and 8(%rax) points to a struct > > > - tlsdesc_dynamic_arg object. It must return in %rax the offse= t > > > - between the thread pointer and the object denoted by the > > > - argument, without clobbering any registers. > > > - > > > - The assembly code that follows is a rendition of the followin= g > > > - C code, hand-optimized a little bit. > > > - > > > -ptrdiff_t > > > -_dl_tlsdesc_dynamic (register struct tlsdesc *tdp asm ("%rax")) > > > -{ > > > - struct tlsdesc_dynamic_arg *td =3D tdp->arg; > > > - dtv_t *dtv =3D *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET); > > > - if (__builtin_expect (td->gen_count <=3D dtv[0].counter > > > - && (dtv[td->tlsinfo.ti_module].pointer.val > > > - !=3D TLS_DTV_UNALLOCATED), > > > - 1)) > > > - return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_o= ffset > > > - - __thread_pointer; > > > - > > > - return __tls_get_addr_internal (&td->tlsinfo) - __thread_pointer; > > > -} > > > -*/ > > > - cfi_startproc > > > - .align 16 > > > -_dl_tlsdesc_dynamic: > > > - _CET_ENDBR > > > - /* Preserve call-clobbered registers that we modify. > > > - We need two scratch regs anyway. */ > > > - movq %rsi, -16(%rsp) > > > - mov %fs:DTV_OFFSET, %RSI_LP > > > - movq %rdi, -8(%rsp) > > > - movq TLSDESC_ARG(%rax), %rdi > > > - movq (%rsi), %rax > > > - cmpq %rax, TLSDESC_GEN_COUNT(%rdi) > > > - ja .Lslow > > > - movq TLSDESC_MODID(%rdi), %rax > > > - salq $4, %rax > > > - movq (%rax,%rsi), %rax > > > - cmpq $-1, %rax > > > - je .Lslow > > > - addq TLSDESC_MODOFF(%rdi), %rax > > > -.Lret: > > > - movq -16(%rsp), %rsi > > > - sub %fs:0, %RAX_LP > > > - movq -8(%rsp), %rdi > > > - ret > > > -.Lslow: > > > - /* Besides rdi and rsi, saved above, save rdx, rcx, r8, r9, > > > - r10 and r11. Also, align the stack, that's off by 8 bytes= . */ > > > - subq $72, %rsp > > > - cfi_adjust_cfa_offset (72) > > > - movq %rdx, 8(%rsp) > > > - movq %rcx, 16(%rsp) > > > - movq %r8, 24(%rsp) > > > - movq %r9, 32(%rsp) > > > - movq %r10, 40(%rsp) > > > - movq %r11, 48(%rsp) > > > - /* %rdi already points to the tlsinfo data structure. */ > > > - call HIDDEN_JUMPTARGET (__tls_get_addr) > > > - movq 8(%rsp), %rdx > > > - movq 16(%rsp), %rcx > > > - movq 24(%rsp), %r8 > > > - movq 32(%rsp), %r9 > > > - movq 40(%rsp), %r10 > > > - movq 48(%rsp), %r11 > > > - addq $72, %rsp > > > - cfi_adjust_cfa_offset (-72) > > > - jmp .Lret > > > - cfi_endproc > > > - .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic > > > +# define USE_FXSAVE > > > +# define STATE_SAVE_ALIGNMENT 16 > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_fxsave > > > +# include "dl-tlsdesc-dynamic.h" > > > +# undef _dl_tlsdesc_dynamic > > > +# undef USE_FXSAVE > > > + > > > +# define USE_XSAVE > > > +# define STATE_SAVE_ALIGNMENT 64 > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsave > > > +# include "dl-tlsdesc-dynamic.h" > > > +# undef _dl_tlsdesc_dynamic > > > +# undef USE_XSAVE > > > + > > > +# define USE_XSAVEC > > > +# define STATE_SAVE_ALIGNMENT 64 > > > +# define _dl_tlsdesc_dynamic _dl_tlsdesc_dynamic_xsavec > > > +# include "dl-tlsdesc-dynamic.h" > > > +# undef _dl_tlsdesc_dynamic > > > +# undef USE_XSAVEC > > > #endif /* SHARED */ > > > diff --git a/sysdeps/x86_64/dl-trampoline-save.h b/sysdeps/x86_64/dl-= trampoline-save.h > > > new file mode 100644 > > > index 0000000000..84eac4a8ac > > > --- /dev/null > > > +++ b/sysdeps/x86_64/dl-trampoline-save.h > > > @@ -0,0 +1,34 @@ > > > +/* x86-64 PLT trampoline register save macros. > > > + Copyright (C) 2024 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/o= r > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version= . > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#ifndef DL_STACK_ALIGNMENT > > > +/* Due to GCC bug: > > > + > > > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D58066 > > > + > > > + __tls_get_addr may be called with 8-byte stack alignment. Althou= gh > > > + this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume > > > + that stack will be always aligned at 16 bytes. */ > > > +# define DL_STACK_ALIGNMENT 8 > > > +#endif > > > + > > > +/* True if _dl_runtime_resolve should align stack for STATE_SAVE or = align > > > + stack to 16 bytes before calling _dl_fixup. */ > > > +#define DL_RUNTIME_RESOLVE_REALIGN_STACK \ > > > + (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \ > > > + || 16 > DL_STACK_ALIGNMENT) > > > diff --git a/sysdeps/x86_64/dl-trampoline-state.h b/sysdeps/x86_64/dl= -trampoline-state.h > > > new file mode 100644 > > > index 0000000000..575f120797 > > > --- /dev/null > > > +++ b/sysdeps/x86_64/dl-trampoline-state.h > > > @@ -0,0 +1,51 @@ > > > +/* x86-64 PLT dl-trampoline state macros. > > > + Copyright (C) 2024 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/o= r > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version= . > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#if (STATE_SAVE_ALIGNMENT % 16) !=3D 0 > > > +# error STATE_SAVE_ALIGNMENT must be multiple of 16 > > > +#endif > > > + > > > +#if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) !=3D 0 > > > +# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT > > > +#endif > > > + > > > +#if DL_RUNTIME_RESOLVE_REALIGN_STACK > > > +/* Local stack area before jumping to function address: RBX. */ > > > +# define LOCAL_STORAGE_AREA 8 > > > +# define BASE rbx > > > +# ifdef USE_FXSAVE > > > +/* Use fxsave to save XMM registers. */ > > > +# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET) > > > +# if (REGISTER_SAVE_AREA % 16) !=3D 0 > > > +# error REGISTER_SAVE_AREA must be multiple of 16 > > > +# endif > > > +# endif > > > +#else > > > +# ifndef USE_FXSAVE > > > +# error USE_FXSAVE must be defined > > > +# endif > > > +/* Use fxsave to save XMM registers. */ > > > +# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8) > > > +/* Local stack area before jumping to function address: All saved > > > + registers. */ > > > +# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA > > > +# define BASE rsp > > > +# if (REGISTER_SAVE_AREA % 16) !=3D 8 > > > +# error REGISTER_SAVE_AREA must be odd multiple of 8 > > > +# endif > > > +#endif > > > diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-tramp= oline.S > > > index b2e7e0f69b..87c5137837 100644 > > > --- a/sysdeps/x86_64/dl-trampoline.S > > > +++ b/sysdeps/x86_64/dl-trampoline.S > > > @@ -22,25 +22,7 @@ > > > #include > > > #include > > > #include > > > - > > > -#ifndef DL_STACK_ALIGNMENT > > > -/* Due to GCC bug: > > > - > > > - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D58066 > > > - > > > - __tls_get_addr may be called with 8-byte stack alignment. Althou= gh > > > - this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume > > > - that stack will be always aligned at 16 bytes. We use unaligned > > > - 16-byte move to load and store SSE registers, which has no penalt= y > > > - on modern processors if stack is 16-byte aligned. */ > > > -# define DL_STACK_ALIGNMENT 8 > > > -#endif > > > - > > > -/* True if _dl_runtime_resolve should align stack for STATE_SAVE or = align > > > - stack to 16 bytes before calling _dl_fixup. */ > > > -#define DL_RUNTIME_RESOLVE_REALIGN_STACK \ > > > - (STATE_SAVE_ALIGNMENT > DL_STACK_ALIGNMENT \ > > > - || 16 > DL_STACK_ALIGNMENT) > > > +#include "dl-trampoline-save.h" > > > > > > /* Area on stack to save and restore registers used for parameter > > > passing when calling _dl_fixup. */ > > > diff --git a/sysdeps/x86_64/dl-trampoline.h b/sysdeps/x86_64/dl-tramp= oline.h > > > index f55c6ea040..d9ccfb40d4 100644 > > > --- a/sysdeps/x86_64/dl-trampoline.h > > > +++ b/sysdeps/x86_64/dl-trampoline.h > > > @@ -27,39 +27,7 @@ > > > # undef LOCAL_STORAGE_AREA > > > # undef BASE > > > > > > -# if (STATE_SAVE_ALIGNMENT % 16) !=3D 0 > > > -# error STATE_SAVE_ALIGNMENT must be multiple of 16 > > > -# endif > > > - > > > -# if (STATE_SAVE_OFFSET % STATE_SAVE_ALIGNMENT) !=3D 0 > > > -# error STATE_SAVE_OFFSET must be multiple of STATE_SAVE_ALIGNMENT > > > -# endif > > > - > > > -# if DL_RUNTIME_RESOLVE_REALIGN_STACK > > > -/* Local stack area before jumping to function address: RBX. */ > > > -# define LOCAL_STORAGE_AREA 8 > > > -# define BASE rbx > > > -# ifdef USE_FXSAVE > > > -/* Use fxsave to save XMM registers. */ > > > -# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET) > > > -# if (REGISTER_SAVE_AREA % 16) !=3D 0 > > > -# error REGISTER_SAVE_AREA must be multiple of 16 > > > -# endif > > > -# endif > > > -# else > > > -# ifndef USE_FXSAVE > > > -# error USE_FXSAVE must be defined > > > -# endif > > > -/* Use fxsave to save XMM registers. */ > > > -# define REGISTER_SAVE_AREA (512 + STATE_SAVE_OFFSET + 8) > > > -/* Local stack area before jumping to function address: All saved > > > - registers. */ > > > -# define LOCAL_STORAGE_AREA REGISTER_SAVE_AREA > > > -# define BASE rsp > > > -# if (REGISTER_SAVE_AREA % 16) !=3D 8 > > > -# error REGISTER_SAVE_AREA must be odd multiple of 8 > > > -# endif > > > -# endif > > > +# include "dl-trampoline-state.h" > > > > > > .globl _dl_runtime_resolve > > > .hidden _dl_runtime_resolve > > > -- > > > 2.43.0 > > > > > H.J.