From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb2c.google.com (mail-yb1-xb2c.google.com [IPv6:2607:f8b0:4864:20::b2c]) by sourceware.org (Postfix) with ESMTPS id 7038B3858D37 for ; Sun, 17 Mar 2024 13:02:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7038B3858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7038B3858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::b2c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710680578; cv=none; b=YT92M73oGy+7LhRvwZH9rU2jt/d0mxqgLI/3NWQ3AMWmrmTBS4T3E9dgfpZF+18joR3WpO49gW4xMseoLSOCMP/z70v266S7bOyvmfNu3tRblhorbvAZMz5jbJDmTTxJeOcDtfgK14UXyBB4iqThUiv3Qt9ZgLg9jElazXMzriY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710680578; c=relaxed/simple; bh=WPNyDbUz2N7I7Q6UOeTcPh3D+Fe8+sI8cTM/LkHwlKQ=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=Kdy3pIhdXXIiJpHjneypaS/Vs5KTSHOVgFrut+o6toNvA2uMSGNnsHEJ0gLesvFe5XD056UeVgFVw6YdF2RL+6/00UvdrOd7+L48sCVngobMrlfI5Cg+ZDaU5pCD60KeesloLHCCFYLBQq94PHlj0libDLXkXEG4ZKzRGoyiLUw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yb1-xb2c.google.com with SMTP id 3f1490d57ef6-dd10ebcd702so2865510276.2 for ; Sun, 17 Mar 2024 06:02:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710680575; x=1711285375; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zd44oS6nXgfwBHXyv4XIMb+NbFZZxGYJL6+CSiPw1Js=; b=eAas8L773L3b/MGzWWUeFgTLWcLrdam+pEh+nbX7zlXwIN+8Zvx2NsyZCRi7jyG1SU 7oWzxpe/Jpw2zBslquYFZ9/ATEe1s/XHsPtn0Bq0IkembmQ1KYARfn1HSPa6x1o5JbFZ fZFFAWIG1cH7DGyRLn8j1ATZXCPkKOgYscszO+MuP9JgQExFEgmBEQzXtyH48mdIHTmN GOMUkadad3h6FsE53rfP+k4zaztTNZn2cfHo40npp/TLOcrTBxkcf7QHSOqYXPf6kBto gmQlDyfgeDz+41U83qQHuZwWX+VtPJl2cDoWMAumVE/qtEsDw69QRAn1sb21sjLzUkdE CybA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710680575; x=1711285375; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zd44oS6nXgfwBHXyv4XIMb+NbFZZxGYJL6+CSiPw1Js=; b=r907MMKSRbDXpzBOuE8sIMI4M97/+uaW7wPJIjrUhXokPAQ9G7a9wRzcBzF/BqgfGd SZJJ0I15sBYYBAEoiyCK2E8CZqYBfAzlfZq1c+Vln8mUJ1cVHHG1HX8vOBXNY/b5RuVr wVC/p0DmSoyepaS30rzhI7k/sVhBToM5Lpq7eYowT64r8ROi6L+fQ3U3pSx6WNm5DJh7 gy3g8Tphu7UHd0UZuKStNZzKtGlnXYzVol5v9+ywdpcQY9CsJdESkIj6co+83HNJ+Zc7 xOCe0DWN3ZpMOTQh7LX+bxzkTCayiv+h47VXFcjYF6YopUvwUwY84A31L3QWA+D54dZS D90w== X-Gm-Message-State: AOJu0YzgmYJDzFGCWrfhdDEXCMBfA8vrNOZt9RWID0Wq7w4Z51Kn4BiW XSfvG8d0WDX4w9vERyFcZJtloTWwtf7O4SLSAND1Q8G4zz1Pcw1yNQARYK83JhwYniLIKbLiiz5 FsHNAyXCQWAcXogIo/UUurK341fjchair X-Google-Smtp-Source: AGHT+IHyxQylCq+lvAhR5m5zwTDQ0xs/DOVZ0lcDeJY6FNK6AU8qNV8LSS4ntLVex2vkMFTJ2P1TYvhagF7t5LV5z24= X-Received: by 2002:a25:8144:0:b0:dce:1871:3d30 with SMTP id j4-20020a258144000000b00dce18713d30mr5446461ybm.21.1710680575433; Sun, 17 Mar 2024 06:02:55 -0700 (PDT) MIME-Version: 1.0 References: <20240317125541.799962-1-hjl.tools@gmail.com> In-Reply-To: <20240317125541.799962-1-hjl.tools@gmail.com> From: "H.J. Lu" Date: Sun, 17 Mar 2024 06:02:19 -0700 Message-ID: Subject: Re: [PATCH v4] x86-64: Allocate state buffer space for RDI, RSI and RBX To: libc-alpha@sourceware.org Cc: fweimer@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3019.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sun, Mar 17, 2024 at 5:55=E2=80=AFAM H.J. Lu wrote= : > > _dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack. > After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11. Define > TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX > to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to > STATE_SAVE_OFFSET(%rsp). > > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+<- stack frame= start aligned at 8 or 16 bytes > | |<- RDI > | |<- RSI > | |<- RBX > | |<- paddings from stack realignment of 64 bytes > |------------------|<- xsave buffer end aligned at 64 bytes > | |<- > | |<- > | |<- > |------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp) > | |<- 8-byte padding > | |<- 8-byte padding > | |<- R11 > | |<- R10 > | |<- R9 > | |<- R8 > | |<- RDX > | |<- RCX > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+<- State buffe= r start aligned at 64 bytes > > This fixes BZ #31501. > --- > sysdeps/x86/cpu-features.c | 11 ++-- > sysdeps/x86/sysdep.h | 29 ++++++++++ > sysdeps/x86_64/tst-gnu2-tls2mod1.S | 87 ++++++++++++++++++++++++++++++ > 3 files changed, 123 insertions(+), 4 deletions(-) > create mode 100644 sysdeps/x86_64/tst-gnu2-tls2mod1.S > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > index 4ea373dffa..3d7c2819d7 100644 > --- a/sysdeps/x86/cpu-features.c > +++ b/sysdeps/x86/cpu-features.c > @@ -311,7 +311,7 @@ update_active (struct cpu_features *cpu_features) > /* NB: On AMX capable processors, ebx always includes AMX > states. */ > unsigned int xsave_state_full_size > - =3D ALIGN_UP (ebx + STATE_SAVE_OFFSET, 64); > + =3D ALIGN_UP (ebx + TLSDESC_CALL_REGISTER_SAVE_AREA, 64); > > cpu_features->xsave_state_size > =3D xsave_state_full_size; > @@ -401,8 +401,10 @@ update_active (struct cpu_features *cpu_features) > unsigned int amx_size > =3D (xstate_amx_comp_offsets[31] > + xstate_amx_comp_sizes[31]); > - amx_size =3D ALIGN_UP (amx_size + STATE_SAVE_OFFSET= , > - 64); > + amx_size > + =3D ALIGN_UP ((amx_size > + + TLSDESC_CALL_REGISTER_SAVE_AREA), > + 64); > /* Set xsave_state_full_size to the compact AMX > state size for XSAVEC. NB: xsave_state_full_siz= e > is only used in _dl_tlsdesc_dynamic_xsave and > @@ -410,7 +412,8 @@ update_active (struct cpu_features *cpu_features) > cpu_features->xsave_state_full_size =3D amx_size; > #endif > cpu_features->xsave_state_size > - =3D ALIGN_UP (size + STATE_SAVE_OFFSET, 64); > + =3D ALIGN_UP (size + TLSDESC_CALL_REGISTER_SAVE_A= REA, > + 64); > CPU_FEATURE_SET (cpu_features, XSAVEC); > } > } > diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h > index db8e576e91..46fcd27345 100644 > --- a/sysdeps/x86/sysdep.h > +++ b/sysdeps/x86/sysdep.h > @@ -46,6 +46,34 @@ > red-zone into account. */ > # define STATE_SAVE_OFFSET (8 * 7 + 8) > > +/* _dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning > + stack. After realigning stack, it saves RCX, RDX, R8, R9, R10 and > + R11. Allocate space for RDI, RSI and RBX to avoid clobbering saved > + RDI, RSI and RBX values on stack by xsave. > + > + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+<- stack fram= e start aligned at 8 or 16 bytes > + | |<- RDI > + | |<- RSI > + | |<- RBX > + | |<- paddings from stack realignment of 64 bytes > + |------------------|<- xsave buffer end aligned at 64 bytes > + | |<- > + | |<- > + | |<- > + |------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp) > + | |<- 8-byte padding > + | |<- 8-byte padding > + | |<- R11 > + | |<- R10 > + | |<- R9 > + | |<- R8 > + | |<- RDX > + | |<- RCX > + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+<- State buff= er start aligned at 64 bytes > + > +*/ > +# define TLSDESC_CALL_REGISTER_SAVE_AREA (STATE_SAVE_OFFSET + 24) > + > /* Save SSE, AVX, AVX512, mask, bound and APX registers. Bound and APX > registers are mutually exclusive. */ > # define STATE_SAVE_MASK \ > @@ -68,6 +96,7 @@ > /* Offset for fxsave/xsave area used by _dl_tlsdesc_dynamic. Since i386 > doesn't have red-zone, use 0 here. */ > # define STATE_SAVE_OFFSET 0 > +# define TLSDESC_CALL_REGISTER_SAVE_AREA 0 > > /* Save SSE, AVX, AXV512, mask and bound registers. */ > # define STATE_SAVE_MASK \ > diff --git a/sysdeps/x86_64/tst-gnu2-tls2mod1.S b/sysdeps/x86_64/tst-gnu2= -tls2mod1.S > new file mode 100644 > index 0000000000..449ddd5c9d > --- /dev/null > +++ b/sysdeps/x86_64/tst-gnu2-tls2mod1.S > @@ -0,0 +1,87 @@ > +/* Check if TLSDESC relocation preserves %rdi, %rsi and %rbx. > + Copyright (C) 2024 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > + > +/* On AVX512 machines, OFFSET =3D=3D 104 caused _dl_tlsdesc_dynamic_xsav= ec > + to clobber %rdi, %rsi and %rbx. On Intel AVX CPUs, the state size > + is 960 bytes and this test didn't fail. It may be due to the unused > + last 128 bytes. On AMD AVX CPUs, the state size is 832 bytes and > + this test might fail without the fix. */ > +#ifndef OFFSET > +# define OFFSET 104 > +#endif > + > + .text > + .p2align 4 > + .globl apply_tls > + .type apply_tls, @function > +apply_tls: > + cfi_startproc > + _CET_ENDBR > + pushq %rbp > + cfi_def_cfa_offset (16) > + cfi_offset (6, -16) > + movdqu (%RDI_LP), %xmm0 > + lea tls_var1@TLSDESC(%rip), %RAX_LP > + mov %RSP_LP, %RBP_LP > + cfi_def_cfa_register (6) > + /* Align stack to 64 bytes. */ > + and $-64, %RSP_LP > + sub $OFFSET, %RSP_LP > + pushq %rbx > + /* Set %ebx to 0xbadbeef. */ > + movl $0xbadbeef, %ebx > + movl $0xbadbeef, %esi > + movq %rdi, saved_rdi(%rip) > + movq %rsi, saved_rsi(%rip) > + call *tls_var1@TLSCALL(%RAX_LP) > + /* Check if _dl_tlsdesc_dynamic preserves %rdi, %rsi and %rbx. *= / > + cmpq saved_rdi(%rip), %rdi > + jne L(hlt) > + cmpq saved_rsi(%rip), %rsi > + jne L(hlt) > + cmpl $0xbadbeef, %ebx > + jne L(hlt) > + add %fs:0, %RAX_LP > + movups %xmm0, 32(%RAX_LP) > + movdqu 16(%RDI_LP), %xmm1 > + mov %RAX_LP, %RBX_LP > + movups %xmm1, 48(%RAX_LP) > + lea 32(%RBX_LP), %RAX_LP > + pop %rbx > + leave > + cfi_def_cfa (7, 8) > + ret > +L(hlt): > + hlt > + cfi_endproc > + .size apply_tls, .-apply_tls > + .hidden tls_var1 > + .globl tls_var1 > + .section .tbss,"awT",@nobits > + .align 16 > + .type tls_var1, @object > + .size tls_var1, 3200 > +tls_var1: > + .zero 3200 > + .local saved_rdi > + .comm saved_rdi,8,8 > + .local saved_rsi > + .comm saved_rsi,8,8 > + .section .note.GNU-stack,"",@progbits > -- > 2.44.0 > I need to adjust assembly codes. --=20 H.J.