* [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
@ 2024-03-16 14:32 Florian Weimer
2024-03-16 14:37 ` H.J. Lu
0 siblings, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2024-03-16 14:32 UTC (permalink / raw)
To: libc-alpha
In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
This realignment does not take into account that the function has
already used part of the red zone at this point, thus clobbering
the initally saved register values located there if the stack
alignment inherited from the caller is unfortunate.
(Note: I do not know to write a good test case for this in the existing
framework. We saw this as a random LTO plugin crash when building GCC
with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu
with this change here.)
---
sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
index 9f02cfc3eb..8e49e7eece 100644
--- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h
+++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
@@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic:
2:
#if DL_RUNTIME_RESOLVE_REALIGN_STACK
movq %rbx, -24(%rsp)
+ subq $24, %rsp
+ cfi_adjust_cfa_offset(24)
mov %RSP_LP, %RBX_LP
cfi_def_cfa_register(%rbx)
and $-STATE_SAVE_ALIGNMENT, %RSP_LP
@@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic:
#if DL_RUNTIME_RESOLVE_REALIGN_STACK
mov %RBX_LP, %RSP_LP
cfi_def_cfa_register(%rsp)
+ addq $24, %rsp
+ cfi_adjust_cfa_offset(-24)
movq -24(%rsp), %rbx
cfi_restore(%rbx)
#else
base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-16 14:32 [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501) Florian Weimer
@ 2024-03-16 14:37 ` H.J. Lu
2024-03-16 14:43 ` H.J. Lu
0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 14:37 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-alpha
On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> This realignment does not take into account that the function has
> already used part of the red zone at this point, thus clobbering
> the initally saved register values located there if the stack
> alignment inherited from the caller is unfortunate.
>
> (Note: I do not know to write a good test case for this in the existing
> framework. We saw this as a random LTO plugin crash when building GCC
> with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu
> with this change here.)
Will a different STATE_SAVE_OFFSET for TLS descriptor work?
> ---
> sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> index 9f02cfc3eb..8e49e7eece 100644
> --- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> @@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic:
> 2:
> #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> movq %rbx, -24(%rsp)
> + subq $24, %rsp
> + cfi_adjust_cfa_offset(24)
> mov %RSP_LP, %RBX_LP
> cfi_def_cfa_register(%rbx)
> and $-STATE_SAVE_ALIGNMENT, %RSP_LP
> @@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic:
> #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> mov %RBX_LP, %RSP_LP
> cfi_def_cfa_register(%rsp)
> + addq $24, %rsp
> + cfi_adjust_cfa_offset(-24)
> movq -24(%rsp), %rbx
> cfi_restore(%rbx)
> #else
>
> base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68
>
--
H.J.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-16 14:37 ` H.J. Lu
@ 2024-03-16 14:43 ` H.J. Lu
2024-03-16 14:47 ` H.J. Lu
2024-03-16 14:57 ` Florian Weimer
0 siblings, 2 replies; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 14:43 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-alpha
On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> > This realignment does not take into account that the function has
> > already used part of the red zone at this point, thus clobbering
> > the initally saved register values located there if the stack
> > alignment inherited from the caller is unfortunate.
> >
> > (Note: I do not know to write a good test case for this in the existing
> > framework. We saw this as a random LTO plugin crash when building GCC
> > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu
> > with this change here.)
>
> Will a different STATE_SAVE_OFFSET for TLS descriptor work?
Correction. REGISTER_SAVE_AREA is for this purpose. Will a different
value for TLS descriptor work?
> > ---
> > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > index 9f02cfc3eb..8e49e7eece 100644
> > --- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > @@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic:
> > 2:
> > #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > movq %rbx, -24(%rsp)
> > + subq $24, %rsp
> > + cfi_adjust_cfa_offset(24)
> > mov %RSP_LP, %RBX_LP
> > cfi_def_cfa_register(%rbx)
> > and $-STATE_SAVE_ALIGNMENT, %RSP_LP
> > @@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic:
> > #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > mov %RBX_LP, %RSP_LP
> > cfi_def_cfa_register(%rsp)
> > + addq $24, %rsp
> > + cfi_adjust_cfa_offset(-24)
> > movq -24(%rsp), %rbx
> > cfi_restore(%rbx)
> > #else
> >
> > base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68
> >
>
>
> --
> H.J.
--
H.J.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-16 14:43 ` H.J. Lu
@ 2024-03-16 14:47 ` H.J. Lu
2024-03-16 15:04 ` Florian Weimer
2024-03-16 14:57 ` Florian Weimer
1 sibling, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 14:47 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-alpha
On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> > > This realignment does not take into account that the function has
> > > already used part of the red zone at this point, thus clobbering
> > > the initally saved register values located there if the stack
> > > alignment inherited from the caller is unfortunate.
> > >
> > > (Note: I do not know to write a good test case for this in the existing
> > > framework. We saw this as a random LTO plugin crash when building GCC
> > > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu
> > > with this change here.)
We should try to find a testcase. Can you provide a backtrace when it
happens? It should be possible to write a testcase with the backtrace.
> > Will a different STATE_SAVE_OFFSET for TLS descriptor work?
>
> Correction. REGISTER_SAVE_AREA is for this purpose. Will a different
> value for TLS descriptor work?
>
> > > ---
> > > sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > index 9f02cfc3eb..8e49e7eece 100644
> > > --- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > @@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic:
> > > 2:
> > > #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > movq %rbx, -24(%rsp)
> > > + subq $24, %rsp
> > > + cfi_adjust_cfa_offset(24)
> > > mov %RSP_LP, %RBX_LP
> > > cfi_def_cfa_register(%rbx)
> > > and $-STATE_SAVE_ALIGNMENT, %RSP_LP
> > > @@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic:
> > > #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > > mov %RBX_LP, %RSP_LP
> > > cfi_def_cfa_register(%rsp)
> > > + addq $24, %rsp
> > > + cfi_adjust_cfa_offset(-24)
> > > movq -24(%rsp), %rbx
> > > cfi_restore(%rbx)
> > > #else
> > >
> > > base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68
> > >
> >
> >
> > --
> > H.J.
>
>
>
> --
> H.J.
--
H.J.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-16 14:43 ` H.J. Lu
2024-03-16 14:47 ` H.J. Lu
@ 2024-03-16 14:57 ` Florian Weimer
2024-03-16 14:59 ` H.J. Lu
1 sibling, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2024-03-16 14:57 UTC (permalink / raw)
To: H.J. Lu; +Cc: libc-alpha
* H. J. Lu:
> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
>> >
>> > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
>> > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
>> > This realignment does not take into account that the function has
>> > already used part of the red zone at this point, thus clobbering
>> > the initally saved register values located there if the stack
>> > alignment inherited from the caller is unfortunate.
>> >
>> > (Note: I do not know to write a good test case for this in the existing
>> > framework. We saw this as a random LTO plugin crash when building GCC
>> > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu
>> > with this change here.)
>>
>> Will a different STATE_SAVE_OFFSET for TLS descriptor work?
>
> Correction. REGISTER_SAVE_AREA is for this purpose. Will a different
> value for TLS descriptor work?
I think REGISTER_SAVE_AREA is for the later register saves?
This use of the red zone is specific to to the TLS trampoline. The lazy
binding trampoline doesn't do that. REGISTER_SAVE_AREA is used by both.
Thanks,
Florian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-16 14:57 ` Florian Weimer
@ 2024-03-16 14:59 ` H.J. Lu
0 siblings, 0 replies; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 14:59 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-alpha
On Sat, Mar 16, 2024 at 7:57 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> >> >
> >> > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> >> > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> >> > This realignment does not take into account that the function has
> >> > already used part of the red zone at this point, thus clobbering
> >> > the initally saved register values located there if the stack
> >> > alignment inherited from the caller is unfortunate.
> >> >
> >> > (Note: I do not know to write a good test case for this in the existing
> >> > framework. We saw this as a random LTO plugin crash when building GCC
> >> > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu
> >> > with this change here.)
> >>
> >> Will a different STATE_SAVE_OFFSET for TLS descriptor work?
> >
> > Correction. REGISTER_SAVE_AREA is for this purpose. Will a different
> > value for TLS descriptor work?
>
> I think REGISTER_SAVE_AREA is for the later register saves?
>
> This use of the red zone is specific to to the TLS trampoline. The lazy
> binding trampoline doesn't do that. REGISTER_SAVE_AREA is used by both.
>
# if DL_RUNTIME_RESOLVE_REALIGN_STACK
/* STATE_SAVE_OFFSET has space for 8 integer registers. But we
need space for RCX, RDX, RSI, RDI, R8, R9, R10 and R11, plus
RBX above. */
sub $(REGISTER_SAVE_AREA + STATE_SAVE_ALIGNMENT), %RSP_LP
# else
sub $REGISTER_SAVE_AREA, %RSP_LP
cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
# endif
Let's find a testcase first.
--
H.J.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-16 14:47 ` H.J. Lu
@ 2024-03-16 15:04 ` Florian Weimer
2024-03-16 15:18 ` H.J. Lu
0 siblings, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2024-03-16 15:04 UTC (permalink / raw)
To: H.J. Lu; +Cc: libc-alpha
* H. J. Lu:
> On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>> >
>> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
>> > >
>> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
>> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
>> > > This realignment does not take into account that the function has
>> > > already used part of the red zone at this point, thus clobbering
>> > > the initally saved register values located there if the stack
>> > > alignment inherited from the caller is unfortunate.
>> > >
>> > > (Note: I do not know to write a good test case for this in the existing
>> > > framework. We saw this as a random LTO plugin crash when building GCC
>> > > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu
>> > > with this change here.)
>
> We should try to find a testcase. Can you provide a backtrace when it
> happens? It should be possible to write a testcase with the backtrace.
In my reproducer, when %rbx is about to be clobbered, I see
(%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec.
The %rbx register does not get clobbered if (%rsp % 64) == 56.
Does this help?
Thanks,
Florian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-16 15:04 ` Florian Weimer
@ 2024-03-16 15:18 ` H.J. Lu
2024-03-16 16:32 ` H.J. Lu
0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 15:18 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-alpha
On Sat, Mar 16, 2024 at 8:04 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >> >
> >> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> >> > >
> >> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> >> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> >> > > This realignment does not take into account that the function has
> >> > > already used part of the red zone at this point, thus clobbering
> >> > > the initally saved register values located there if the stack
> >> > > alignment inherited from the caller is unfortunate.
> >> > >
> >> > > (Note: I do not know to write a good test case for this in the existing
> >> > > framework. We saw this as a random LTO plugin crash when building GCC
> >> > > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu
> >> > > with this change here.)
> >
> > We should try to find a testcase. Can you provide a backtrace when it
> > happens? It should be possible to write a testcase with the backtrace.
>
> In my reproducer, when %rbx is about to be clobbered, I see
> (%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec.
>
> The %rbx register does not get clobbered if (%rsp % 64) == 56.
>
> Does this help?
>
Yes. I am working on a testcase.
--
H.J.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-16 15:18 ` H.J. Lu
@ 2024-03-16 16:32 ` H.J. Lu
2024-03-16 16:37 ` H.J. Lu
2024-03-16 17:42 ` Florian Weimer
0 siblings, 2 replies; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 16:32 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-alpha
[-- Attachment #1: Type: text/plain, Size: 1731 bytes --]
On Sat, Mar 16, 2024 at 8:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 8:04 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * H. J. Lu:
> >
> > > On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >>
> > >> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >> >
> > >> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >> > >
> > >> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> > >> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> > >> > > This realignment does not take into account that the function has
> > >> > > already used part of the red zone at this point, thus clobbering
> > >> > > the initally saved register values located there if the stack
> > >> > > alignment inherited from the caller is unfortunate.
> > >> > >
> > >> > > (Note: I do not know to write a good test case for this in the existing
> > >> > > framework. We saw this as a random LTO plugin crash when building GCC
> > >> > > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu
> > >> > > with this change here.)
> > >
> > > We should try to find a testcase. Can you provide a backtrace when it
> > > happens? It should be possible to write a testcase with the backtrace.
> >
> > In my reproducer, when %rbx is about to be clobbered, I see
> > (%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec.
> >
> > The %rbx register does not get clobbered if (%rsp % 64) == 56.
> >
> > Does this help?
> >
>
> Yes. I am working on a testcase.
Hi Florian,
Please verify if this is the right testcase.
Thanks.
--
H.J.
[-- Attachment #2: 0001-Add-a-test-for-BZ-31501.patch --]
[-- Type: text/x-patch, Size: 8698 bytes --]
From 8c167bd4d9faddf9580f55182542e8a05a25fd61 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sat, 16 Mar 2024 08:42:27 -0700
Subject: [PATCH] Add a test for BZ #31501
---
sysdeps/x86_64/Makefile | 16 +++++
sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S | 57 +++++++++++++++++
sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S | 74 ++++++++++++++++++++++
sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.S | 57 +++++++++++++++++
sysdeps/x86_64/tst-gnu2-tls2-x86-64.c | 21 ++++++
5 files changed, 225 insertions(+)
create mode 100644 sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
create mode 100644 sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
create mode 100644 sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.S
create mode 100644 sysdeps/x86_64/tst-gnu2-tls2-x86-64.c
diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index 66b21954f3..f020773e56 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -217,6 +217,22 @@ valgrind-suppressions-tst-valgrind-smoke = \
--suppressions=$(..)sysdeps/x86_64/tst-valgrind-smoke.supp
endif
+tests += \
+ tst-gnu2-tls2-x86-64 \
+# tests
+
+modules-names += \
+ tst-gnu2-tls2-x86-64-mod0 \
+ tst-gnu2-tls2-x86-64-mod1 \
+ tst-gnu2-tls2-x86-64-mod2 \
+# modules-names
+
+$(objpfx)tst-gnu2-tls2-x86-64: $(shared-thread-library)
+$(objpfx)tst-gnu2-tls2-x86-64.out: \
+ $(objpfx)tst-gnu2-tls2-x86-64-mod0.so \
+ $(objpfx)tst-gnu2-tls2-x86-64-mod1.so \
+ $(objpfx)tst-gnu2-tls2-x86-64-mod2.so
+
endif # $(subdir) == elf
ifeq ($(subdir),csu)
diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
new file mode 100644
index 0000000000..8129b28061
--- /dev/null
+++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
@@ -0,0 +1,57 @@
+/* Test TLSDESC relocation with x86-64.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+ .text
+ .p2align 4
+ .globl apply_tls
+ .type apply_tls, @function
+apply_tls:
+ .cfi_startproc
+ subq $24, %rsp
+ .cfi_def_cfa_offset 32
+ movdqu (%rdi), %xmm0
+ movq %fs:40, %rax
+ movq %rax, 8(%rsp)
+ xorl %eax, %eax
+ leaq tls_var0@TLSDESC(%rip), %rax
+ call *tls_var0@TLSCALL(%rax)
+ addq %fs:0, %rax
+ movups %xmm0, (%rax)
+ movdqu 16(%rdi), %xmm1
+ movups %xmm1, 16(%rax)
+ movq 8(%rsp), %rdx
+ subq %fs:40, %rdx
+ jne .L5
+ addq $24, %rsp
+ .cfi_remember_state
+ .cfi_def_cfa_offset 8
+ ret
+.L5:
+ .cfi_restore_state
+ call __stack_chk_fail@PLT
+ .cfi_endproc
+ .size apply_tls, .-apply_tls
+ .hidden tls_var0
+ .globl tls_var0
+ .section .tbss,"awT",@nobits
+ .align 8
+ .type tls_var0, @object
+ .size tls_var0, 32
+tls_var0:
+ .zero 32
+ .section .note.GNU-stack,"",@progbits
diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
new file mode 100644
index 0000000000..af4b7ca761
--- /dev/null
+++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
@@ -0,0 +1,74 @@
+/* Test TLSDESC relocation with x86-64.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
+ clobber %rbx. */
+#define OFFSET (56 + 16 + 16 + 16)
+
+ .text
+ .p2align 4
+ .globl apply_tls
+ .type apply_tls, @function
+apply_tls:
+ .cfi_startproc
+ pushq %rbp
+ .cfi_def_cfa_offset 16
+ .cfi_offset 6, -16
+ movq %rsp, %rbp
+ .cfi_def_cfa_register 6
+ /* Align stack to 64 bytes. */
+ andq $-64, %rsp
+ pushq %rbx
+ subq $OFFSET, %rsp
+ movdqu (%rdi), %xmm0
+ movq %fs:40, %rax
+ movq %rax, 56(%rsp)
+ xorl %eax, %eax
+ leaq tls_var1@TLSDESC(%rip), %rax
+ call *tls_var1@TLSCALL(%rax)
+ addq %fs:0, %rax
+ movups %xmm0, 32(%rax)
+ movdqu 16(%rdi), %xmm1
+ addq $32, %rax
+ movups %xmm1, 16(%rax)
+ movq 56(%rsp), %rdx
+ subq %fs:40, %rdx
+ jne .L5
+ /* All versions of _dl_tlsdesc_dynamic should preserve %rbx. */
+ cmpq OFFSET(%rsp), %rbx
+ jne .Lhlt
+ leave
+ .cfi_remember_state
+ .cfi_def_cfa 7, 8
+ ret
+.Lhlt:
+ hlt
+.L5:
+ .cfi_restore_state
+ call __stack_chk_fail@PLT
+ .cfi_endproc
+ .size apply_tls, .-apply_tls
+ .hidden tls_var1
+ .globl tls_var1
+ .section .tbss,"awT",@nobits
+ .align 16
+ .type tls_var1, @object
+ .size tls_var1, 3200
+tls_var1:
+ .zero 3200
+ .section .note.GNU-stack,"",@progbits
diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.S
new file mode 100644
index 0000000000..d82f5f029b
--- /dev/null
+++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.S
@@ -0,0 +1,57 @@
+/* Test TLSDESC relocation with x86-64.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+ .text
+ .p2align 4
+ .globl apply_tls
+ .type apply_tls, @function
+apply_tls:
+ .cfi_startproc
+ subq $24, %rsp
+ .cfi_def_cfa_offset 32
+ movdqu (%rdi), %xmm0
+ movq %fs:40, %rax
+ movq %rax, 8(%rsp)
+ xorl %eax, %eax
+ leaq tls_var2@TLSDESC(%rip), %rax
+ call *tls_var2@TLSCALL(%rax)
+ addq %fs:0, %rax
+ movups %xmm0, (%rax)
+ movdqu 16(%rdi), %xmm1
+ movups %xmm1, 16(%rax)
+ movq 8(%rsp), %rdx
+ subq %fs:40, %rdx
+ jne .L5
+ addq $24, %rsp
+ .cfi_remember_state
+ .cfi_def_cfa_offset 8
+ ret
+.L5:
+ .cfi_restore_state
+ call __stack_chk_fail@PLT
+ .cfi_endproc
+ .size apply_tls, .-apply_tls
+ .hidden tls_var2
+ .globl tls_var2
+ .section .tbss,"awT",@nobits
+ .align 8
+ .type tls_var2, @object
+ .size tls_var2, 32
+tls_var2:
+ .zero 32
+ .section .note.GNU-stack,"",@progbits
diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64.c b/sysdeps/x86_64/tst-gnu2-tls2-x86-64.c
new file mode 100644
index 0000000000..787181545b
--- /dev/null
+++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64.c
@@ -0,0 +1,21 @@
+/* Test TLSDESC relocation with x86-64.
+ Copyright (C) 2024 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#define MOD(i) "tst-gnu2-tls2-x86-64-mod" #i ".so"
+
+#include <elf/tst-gnu2-tls2.c>
--
2.44.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-16 16:32 ` H.J. Lu
@ 2024-03-16 16:37 ` H.J. Lu
2024-03-16 17:42 ` Florian Weimer
1 sibling, 0 replies; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 16:37 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-alpha
On Sat, Mar 16, 2024 at 9:32 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 8:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 8:04 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > * H. J. Lu:
> > >
> > > > On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >>
> > > >> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >> >
> > > >> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >> > >
> > > >> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> > > >> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> > > >> > > This realignment does not take into account that the function has
> > > >> > > already used part of the red zone at this point, thus clobbering
> > > >> > > the initally saved register values located there if the stack
> > > >> > > alignment inherited from the caller is unfortunate.
> > > >> > >
> > > >> > > (Note: I do not know to write a good test case for this in the existing
> > > >> > > framework. We saw this as a random LTO plugin crash when building GCC
> > > >> > > with -mtls-dialect=gnu2. The existing tests on pass on x86_64-linux-gnu
> > > >> > > with this change here.)
> > > >
> > > > We should try to find a testcase. Can you provide a backtrace when it
> > > > happens? It should be possible to write a testcase with the backtrace.
> > >
> > > In my reproducer, when %rbx is about to be clobbered, I see
> > > (%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec.
> > >
> > > The %rbx register does not get clobbered if (%rsp % 64) == 56.
> > >
> > > Does this help?
> > >
> >
> > Yes. I am working on a testcase.
>
> Hi Florian,
>
> Please verify if this is the right testcase.
This test fails only on AVX512 machines.
--
H.J.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-16 16:32 ` H.J. Lu
2024-03-16 16:37 ` H.J. Lu
@ 2024-03-16 17:42 ` Florian Weimer
2024-03-16 17:51 ` H.J. Lu
1 sibling, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2024-03-16 17:42 UTC (permalink / raw)
To: H.J. Lu; +Cc: libc-alpha
* H. J. Lu:
> Please verify if this is the right testcase.
Test case works (fails without my fix, succeeds with my fix). Some
comments below.
> diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> new file mode 100644
> index 0000000000..8129b28061
> --- /dev/null
> +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> @@ -0,0 +1,57 @@
> + .text
> + .p2align 4
> + .globl apply_tls
> + .type apply_tls, @function
> +apply_tls:
> + .cfi_startproc
Missing CET marker.
> + subq $24, %rsp
> + .cfi_def_cfa_offset 32
> + movdqu (%rdi), %xmm0
> + movq %fs:40, %rax
> + movq %rax, 8(%rsp)
> + xorl %eax, %eax
> + leaq tls_var0@TLSDESC(%rip), %rax
> + call *tls_var0@TLSCALL(%rax)
> + addq %fs:0, %rax
> + movups %xmm0, (%rax)
> + movdqu 16(%rdi), %xmm1
> + movups %xmm1, 16(%rax)
> + movq 8(%rsp), %rdx
> + subq %fs:40, %rdx
> + jne .L5
> + addq $24, %rsp
> + .cfi_remember_state
> + .cfi_def_cfa_offset 8
> + ret
> +.L5:
> + .cfi_restore_state
> + call __stack_chk_fail@PLT
Not sure if we need this?
Maybe add some comment what exactly this subtest is exercising?
These are present in the other TLS modules as well.
> diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> new file mode 100644
> index 0000000000..af4b7ca761
> --- /dev/null
> +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> + clobber %rbx. */
> +#define OFFSET (56 + 16 + 16 + 16)
> +
> + .text
> + .p2align 4
> + .globl apply_tls
> + .type apply_tls, @function
> +apply_tls:
> + .cfi_startproc
> + pushq %rbp
> + .cfi_def_cfa_offset 16
> + .cfi_offset 6, -16
> + movq %rsp, %rbp
> + .cfi_def_cfa_register 6
> + /* Align stack to 64 bytes. */
> + andq $-64, %rsp
> + pushq %rbx
> + subq $OFFSET, %rsp
The offset could be loaded from a global variable or something like
that. We should exercise a wide range of stack alignments—the
individual tests are cheap. And maybe check extra registers.
Thanks,
Florian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-16 17:42 ` Florian Weimer
@ 2024-03-16 17:51 ` H.J. Lu
2024-03-16 22:05 ` H.J. Lu
0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 17:51 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-alpha
On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > Please verify if this is the right testcase.
>
> Test case works (fails without my fix, succeeds with my fix). Some
> comments below.
>
> > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > new file mode 100644
> > index 0000000000..8129b28061
> > --- /dev/null
> > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > @@ -0,0 +1,57 @@
>
> > + .text
> > + .p2align 4
> > + .globl apply_tls
> > + .type apply_tls, @function
> > +apply_tls:
> > + .cfi_startproc
>
> Missing CET marker.
>
> > + subq $24, %rsp
> > + .cfi_def_cfa_offset 32
> > + movdqu (%rdi), %xmm0
> > + movq %fs:40, %rax
> > + movq %rax, 8(%rsp)
> > + xorl %eax, %eax
> > + leaq tls_var0@TLSDESC(%rip), %rax
> > + call *tls_var0@TLSCALL(%rax)
> > + addq %fs:0, %rax
> > + movups %xmm0, (%rax)
> > + movdqu 16(%rdi), %xmm1
> > + movups %xmm1, 16(%rax)
> > + movq 8(%rsp), %rdx
> > + subq %fs:40, %rdx
> > + jne .L5
> > + addq $24, %rsp
> > + .cfi_remember_state
> > + .cfi_def_cfa_offset 8
> > + ret
> > +.L5:
> > + .cfi_restore_state
> > + call __stack_chk_fail@PLT
>
> Not sure if we need this?
>
> Maybe add some comment what exactly this subtest is exercising?
>
> These are present in the other TLS modules as well.
>
> > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > new file mode 100644
> > index 0000000000..af4b7ca761
> > --- /dev/null
> > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
>
> > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> > + clobber %rbx. */
> > +#define OFFSET (56 + 16 + 16 + 16)
> > +
> > + .text
> > + .p2align 4
> > + .globl apply_tls
> > + .type apply_tls, @function
> > +apply_tls:
> > + .cfi_startproc
> > + pushq %rbp
> > + .cfi_def_cfa_offset 16
> > + .cfi_offset 6, -16
> > + movq %rsp, %rbp
> > + .cfi_def_cfa_register 6
> > + /* Align stack to 64 bytes. */
> > + andq $-64, %rsp
> > + pushq %rbx
> > + subq $OFFSET, %rsp
>
> The offset could be loaded from a global variable or something like
> that. We should exercise a wide range of stack alignments—the
> individual tests are cheap. And maybe check extra registers.
I will clean it up with a different fix.
--
H.J.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-16 17:51 ` H.J. Lu
@ 2024-03-16 22:05 ` H.J. Lu
2024-03-17 1:19 ` H.J. Lu
0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 22:05 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-alpha
On Sat, Mar 16, 2024 at 10:51 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * H. J. Lu:
> >
> > > Please verify if this is the right testcase.
> >
> > Test case works (fails without my fix, succeeds with my fix). Some
> > comments below.
> >
> > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > new file mode 100644
> > > index 0000000000..8129b28061
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > @@ -0,0 +1,57 @@
> >
> > > + .text
> > > + .p2align 4
> > > + .globl apply_tls
> > > + .type apply_tls, @function
> > > +apply_tls:
> > > + .cfi_startproc
> >
> > Missing CET marker.
> >
> > > + subq $24, %rsp
> > > + .cfi_def_cfa_offset 32
> > > + movdqu (%rdi), %xmm0
> > > + movq %fs:40, %rax
> > > + movq %rax, 8(%rsp)
> > > + xorl %eax, %eax
> > > + leaq tls_var0@TLSDESC(%rip), %rax
> > > + call *tls_var0@TLSCALL(%rax)
> > > + addq %fs:0, %rax
> > > + movups %xmm0, (%rax)
> > > + movdqu 16(%rdi), %xmm1
> > > + movups %xmm1, 16(%rax)
> > > + movq 8(%rsp), %rdx
> > > + subq %fs:40, %rdx
> > > + jne .L5
> > > + addq $24, %rsp
> > > + .cfi_remember_state
> > > + .cfi_def_cfa_offset 8
> > > + ret
> > > +.L5:
> > > + .cfi_restore_state
> > > + call __stack_chk_fail@PLT
> >
> > Not sure if we need this?
> >
> > Maybe add some comment what exactly this subtest is exercising?
> >
> > These are present in the other TLS modules as well.
> >
> > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > new file mode 100644
> > > index 0000000000..af4b7ca761
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> >
> > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> > > + clobber %rbx. */
> > > +#define OFFSET (56 + 16 + 16 + 16)
> > > +
> > > + .text
> > > + .p2align 4
> > > + .globl apply_tls
> > > + .type apply_tls, @function
> > > +apply_tls:
> > > + .cfi_startproc
> > > + pushq %rbp
> > > + .cfi_def_cfa_offset 16
> > > + .cfi_offset 6, -16
> > > + movq %rsp, %rbp
> > > + .cfi_def_cfa_register 6
> > > + /* Align stack to 64 bytes. */
> > > + andq $-64, %rsp
> > > + pushq %rbx
> > > + subq $OFFSET, %rsp
> >
> > The offset could be loaded from a global variable or something like
> > that. We should exercise a wide range of stack alignments—the
> > individual tests are cheap. And maybe check extra registers.
>
> I will clean it up with a different fix.
>
I submitted a patch with a testase:
https://patchwork.sourceware.org/project/glibc/list/?series=31963
My patch allocates 64 more bytes to avoid clobbering saved RDI,
RSI and RBX values on stack by xsave. It avoids 2 stack
adjustments. Either my fix or Florian's fix should fix the issue.
I don't have a strong preference as long as my testcase is
included.
Thanks.
--
H.J.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-16 22:05 ` H.J. Lu
@ 2024-03-17 1:19 ` H.J. Lu
2024-03-17 3:14 ` H.J. Lu
0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2024-03-17 1:19 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-alpha
On Sat, Mar 16, 2024 at 3:05 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 10:51 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > * H. J. Lu:
> > >
> > > > Please verify if this is the right testcase.
> > >
> > > Test case works (fails without my fix, succeeds with my fix). Some
> > > comments below.
> > >
> > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > new file mode 100644
> > > > index 0000000000..8129b28061
> > > > --- /dev/null
> > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > @@ -0,0 +1,57 @@
> > >
> > > > + .text
> > > > + .p2align 4
> > > > + .globl apply_tls
> > > > + .type apply_tls, @function
> > > > +apply_tls:
> > > > + .cfi_startproc
> > >
> > > Missing CET marker.
> > >
> > > > + subq $24, %rsp
> > > > + .cfi_def_cfa_offset 32
> > > > + movdqu (%rdi), %xmm0
> > > > + movq %fs:40, %rax
> > > > + movq %rax, 8(%rsp)
> > > > + xorl %eax, %eax
> > > > + leaq tls_var0@TLSDESC(%rip), %rax
> > > > + call *tls_var0@TLSCALL(%rax)
> > > > + addq %fs:0, %rax
> > > > + movups %xmm0, (%rax)
> > > > + movdqu 16(%rdi), %xmm1
> > > > + movups %xmm1, 16(%rax)
> > > > + movq 8(%rsp), %rdx
> > > > + subq %fs:40, %rdx
> > > > + jne .L5
> > > > + addq $24, %rsp
> > > > + .cfi_remember_state
> > > > + .cfi_def_cfa_offset 8
> > > > + ret
> > > > +.L5:
> > > > + .cfi_restore_state
> > > > + call __stack_chk_fail@PLT
> > >
> > > Not sure if we need this?
> > >
> > > Maybe add some comment what exactly this subtest is exercising?
> > >
> > > These are present in the other TLS modules as well.
> > >
> > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > > new file mode 100644
> > > > index 0000000000..af4b7ca761
> > > > --- /dev/null
> > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > >
> > > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> > > > + clobber %rbx. */
> > > > +#define OFFSET (56 + 16 + 16 + 16)
> > > > +
> > > > + .text
> > > > + .p2align 4
> > > > + .globl apply_tls
> > > > + .type apply_tls, @function
> > > > +apply_tls:
> > > > + .cfi_startproc
> > > > + pushq %rbp
> > > > + .cfi_def_cfa_offset 16
> > > > + .cfi_offset 6, -16
> > > > + movq %rsp, %rbp
> > > > + .cfi_def_cfa_register 6
> > > > + /* Align stack to 64 bytes. */
> > > > + andq $-64, %rsp
> > > > + pushq %rbx
> > > > + subq $OFFSET, %rsp
> > >
> > > The offset could be loaded from a global variable or something like
> > > that. We should exercise a wide range of stack alignments—the
> > > individual tests are cheap. And maybe check extra registers.
> >
> > I will clean it up with a different fix.
> >
>
> I submitted a patch with a testase:
>
> https://patchwork.sourceware.org/project/glibc/list/?series=31963
>
> My patch allocates 64 more bytes to avoid clobbering saved RDI,
> RSI and RBX values on stack by xsave. It avoids 2 stack
> adjustments. Either my fix or Florian's fix should fix the issue.
> I don't have a strong preference as long as my testcase is
> included.
>
>
I think my testcase may fail on AMD AVX CPUs without the
fix. On Intel AVX CPUs, the state size is 960 bytes. But the
last 128 bytes may be unused.
--
H.J.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
2024-03-17 1:19 ` H.J. Lu
@ 2024-03-17 3:14 ` H.J. Lu
0 siblings, 0 replies; 15+ messages in thread
From: H.J. Lu @ 2024-03-17 3:14 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-alpha
On Sat, Mar 16, 2024 at 6:19 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 3:05 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 10:51 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >
> > > > * H. J. Lu:
> > > >
> > > > > Please verify if this is the right testcase.
> > > >
> > > > Test case works (fails without my fix, succeeds with my fix). Some
> > > > comments below.
> > > >
> > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > > new file mode 100644
> > > > > index 0000000000..8129b28061
> > > > > --- /dev/null
> > > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > > @@ -0,0 +1,57 @@
> > > >
> > > > > + .text
> > > > > + .p2align 4
> > > > > + .globl apply_tls
> > > > > + .type apply_tls, @function
> > > > > +apply_tls:
> > > > > + .cfi_startproc
> > > >
> > > > Missing CET marker.
> > > >
> > > > > + subq $24, %rsp
> > > > > + .cfi_def_cfa_offset 32
> > > > > + movdqu (%rdi), %xmm0
> > > > > + movq %fs:40, %rax
> > > > > + movq %rax, 8(%rsp)
> > > > > + xorl %eax, %eax
> > > > > + leaq tls_var0@TLSDESC(%rip), %rax
> > > > > + call *tls_var0@TLSCALL(%rax)
> > > > > + addq %fs:0, %rax
> > > > > + movups %xmm0, (%rax)
> > > > > + movdqu 16(%rdi), %xmm1
> > > > > + movups %xmm1, 16(%rax)
> > > > > + movq 8(%rsp), %rdx
> > > > > + subq %fs:40, %rdx
> > > > > + jne .L5
> > > > > + addq $24, %rsp
> > > > > + .cfi_remember_state
> > > > > + .cfi_def_cfa_offset 8
> > > > > + ret
> > > > > +.L5:
> > > > > + .cfi_restore_state
> > > > > + call __stack_chk_fail@PLT
> > > >
> > > > Not sure if we need this?
> > > >
> > > > Maybe add some comment what exactly this subtest is exercising?
> > > >
> > > > These are present in the other TLS modules as well.
> > > >
> > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > > > new file mode 100644
> > > > > index 0000000000..af4b7ca761
> > > > > --- /dev/null
> > > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > >
> > > > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> > > > > + clobber %rbx. */
> > > > > +#define OFFSET (56 + 16 + 16 + 16)
> > > > > +
> > > > > + .text
> > > > > + .p2align 4
> > > > > + .globl apply_tls
> > > > > + .type apply_tls, @function
> > > > > +apply_tls:
> > > > > + .cfi_startproc
> > > > > + pushq %rbp
> > > > > + .cfi_def_cfa_offset 16
> > > > > + .cfi_offset 6, -16
> > > > > + movq %rsp, %rbp
> > > > > + .cfi_def_cfa_register 6
> > > > > + /* Align stack to 64 bytes. */
> > > > > + andq $-64, %rsp
> > > > > + pushq %rbx
> > > > > + subq $OFFSET, %rsp
> > > >
> > > > The offset could be loaded from a global variable or something like
> > > > that. We should exercise a wide range of stack alignments—the
> > > > individual tests are cheap. And maybe check extra registers.
> > >
> > > I will clean it up with a different fix.
> > >
> >
> > I submitted a patch with a testase:
> >
> > https://patchwork.sourceware.org/project/glibc/list/?series=31963
> >
> > My patch allocates 64 more bytes to avoid clobbering saved RDI,
> > RSI and RBX values on stack by xsave. It avoids 2 stack
> > adjustments. Either my fix or Florian's fix should fix the issue.
> > I don't have a strong preference as long as my testcase is
> > included.
> >
> >
> I think my testcase may fail on AMD AVX CPUs without the
> fix. On Intel AVX CPUs, the state size is 960 bytes. But the
> last 128 bytes may be unused.
>
I sent out the v2 patch:
https://patchwork.sourceware.org/project/glibc/list/?series=31966
to simplify the testcase.
--
H.J.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-03-17 3:15 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-16 14:32 [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501) Florian Weimer
2024-03-16 14:37 ` H.J. Lu
2024-03-16 14:43 ` H.J. Lu
2024-03-16 14:47 ` H.J. Lu
2024-03-16 15:04 ` Florian Weimer
2024-03-16 15:18 ` H.J. Lu
2024-03-16 16:32 ` H.J. Lu
2024-03-16 16:37 ` H.J. Lu
2024-03-16 17:42 ` Florian Weimer
2024-03-16 17:51 ` H.J. Lu
2024-03-16 22:05 ` H.J. Lu
2024-03-17 1:19 ` H.J. Lu
2024-03-17 3:14 ` H.J. Lu
2024-03-16 14:57 ` Florian Weimer
2024-03-16 14:59 ` H.J. Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).