public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
@ 2024-03-16 14:32 Florian Weimer
  2024-03-16 14:37 ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2024-03-16 14:32 UTC (permalink / raw)
  To: libc-alpha

In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
This realignment does not take into account that the function has
already used part of the red zone at this point, thus clobbering
the initally saved register values located there if the stack
alignment inherited from the caller is unfortunate.

(Note: I do not know to write a good test case for this in the existing
framework.  We saw this as a random LTO plugin crash when building GCC
with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
with this change here.)

---
 sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
index 9f02cfc3eb..8e49e7eece 100644
--- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h
+++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
@@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic:
 2:
 #if DL_RUNTIME_RESOLVE_REALIGN_STACK
 	movq	%rbx, -24(%rsp)
+	subq    $24, %rsp
+	cfi_adjust_cfa_offset(24)
 	mov	%RSP_LP, %RBX_LP
 	cfi_def_cfa_register(%rbx)
 	and	$-STATE_SAVE_ALIGNMENT, %RSP_LP
@@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic:
 #if DL_RUNTIME_RESOLVE_REALIGN_STACK
 	mov	%RBX_LP, %RSP_LP
 	cfi_def_cfa_register(%rsp)
+	addq    $24, %rsp
+	cfi_adjust_cfa_offset(-24)
 	movq	-24(%rsp), %rbx
 	cfi_restore(%rbx)
 #else

base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-16 14:32 [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501) Florian Weimer
@ 2024-03-16 14:37 ` H.J. Lu
  2024-03-16 14:43   ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 14:37 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> This realignment does not take into account that the function has
> already used part of the red zone at this point, thus clobbering
> the initally saved register values located there if the stack
> alignment inherited from the caller is unfortunate.
>
> (Note: I do not know to write a good test case for this in the existing
> framework.  We saw this as a random LTO plugin crash when building GCC
> with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> with this change here.)

Will a different STATE_SAVE_OFFSET for TLS descriptor work?

> ---
>  sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> index 9f02cfc3eb..8e49e7eece 100644
> --- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> @@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic:
>  2:
>  #if DL_RUNTIME_RESOLVE_REALIGN_STACK
>         movq    %rbx, -24(%rsp)
> +       subq    $24, %rsp
> +       cfi_adjust_cfa_offset(24)
>         mov     %RSP_LP, %RBX_LP
>         cfi_def_cfa_register(%rbx)
>         and     $-STATE_SAVE_ALIGNMENT, %RSP_LP
> @@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic:
>  #if DL_RUNTIME_RESOLVE_REALIGN_STACK
>         mov     %RBX_LP, %RSP_LP
>         cfi_def_cfa_register(%rsp)
> +       addq    $24, %rsp
> +       cfi_adjust_cfa_offset(-24)
>         movq    -24(%rsp), %rbx
>         cfi_restore(%rbx)
>  #else
>
> base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-16 14:37 ` H.J. Lu
@ 2024-03-16 14:43   ` H.J. Lu
  2024-03-16 14:47     ` H.J. Lu
  2024-03-16 14:57     ` Florian Weimer
  0 siblings, 2 replies; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 14:43 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> > This realignment does not take into account that the function has
> > already used part of the red zone at this point, thus clobbering
> > the initally saved register values located there if the stack
> > alignment inherited from the caller is unfortunate.
> >
> > (Note: I do not know to write a good test case for this in the existing
> > framework.  We saw this as a random LTO plugin crash when building GCC
> > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> > with this change here.)
>
> Will a different STATE_SAVE_OFFSET for TLS descriptor work?

Correction.  REGISTER_SAVE_AREA is for this purpose.   Will a different
value for TLS descriptor work?

> > ---
> >  sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > index 9f02cfc3eb..8e49e7eece 100644
> > --- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > @@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic:
> >  2:
> >  #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> >         movq    %rbx, -24(%rsp)
> > +       subq    $24, %rsp
> > +       cfi_adjust_cfa_offset(24)
> >         mov     %RSP_LP, %RBX_LP
> >         cfi_def_cfa_register(%rbx)
> >         and     $-STATE_SAVE_ALIGNMENT, %RSP_LP
> > @@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic:
> >  #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> >         mov     %RBX_LP, %RSP_LP
> >         cfi_def_cfa_register(%rsp)
> > +       addq    $24, %rsp
> > +       cfi_adjust_cfa_offset(-24)
> >         movq    -24(%rsp), %rbx
> >         cfi_restore(%rbx)
> >  #else
> >
> > base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68
> >
>
>
> --
> H.J.



-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-16 14:43   ` H.J. Lu
@ 2024-03-16 14:47     ` H.J. Lu
  2024-03-16 15:04       ` Florian Weimer
  2024-03-16 14:57     ` Florian Weimer
  1 sibling, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 14:47 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> > > This realignment does not take into account that the function has
> > > already used part of the red zone at this point, thus clobbering
> > > the initally saved register values located there if the stack
> > > alignment inherited from the caller is unfortunate.
> > >
> > > (Note: I do not know to write a good test case for this in the existing
> > > framework.  We saw this as a random LTO plugin crash when building GCC
> > > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> > > with this change here.)

We should try to find a testcase.  Can you provide a backtrace when it
happens?  It should be possible to write a testcase with the backtrace.


> > Will a different STATE_SAVE_OFFSET for TLS descriptor work?
>
> Correction.  REGISTER_SAVE_AREA is for this purpose.   Will a different
> value for TLS descriptor work?
>
> > > ---
> > >  sysdeps/x86_64/dl-tlsdesc-dynamic.h | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/sysdeps/x86_64/dl-tlsdesc-dynamic.h b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > index 9f02cfc3eb..8e49e7eece 100644
> > > --- a/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > +++ b/sysdeps/x86_64/dl-tlsdesc-dynamic.h
> > > @@ -83,6 +83,8 @@ _dl_tlsdesc_dynamic:
> > >  2:
> > >  #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > >         movq    %rbx, -24(%rsp)
> > > +       subq    $24, %rsp
> > > +       cfi_adjust_cfa_offset(24)
> > >         mov     %RSP_LP, %RBX_LP
> > >         cfi_def_cfa_register(%rbx)
> > >         and     $-STATE_SAVE_ALIGNMENT, %RSP_LP
> > > @@ -153,6 +155,8 @@ _dl_tlsdesc_dynamic:
> > >  #if DL_RUNTIME_RESOLVE_REALIGN_STACK
> > >         mov     %RBX_LP, %RSP_LP
> > >         cfi_def_cfa_register(%rsp)
> > > +       addq    $24, %rsp
> > > +       cfi_adjust_cfa_offset(-24)
> > >         movq    -24(%rsp), %rbx
> > >         cfi_restore(%rbx)
> > >  #else
> > >
> > > base-commit: 5ebc24f785dc0dff494a93ca82a369497c3cdc68
> > >
> >
> >
> > --
> > H.J.
>
>
>
> --
> H.J.



--
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-16 14:43   ` H.J. Lu
  2024-03-16 14:47     ` H.J. Lu
@ 2024-03-16 14:57     ` Florian Weimer
  2024-03-16 14:59       ` H.J. Lu
  1 sibling, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2024-03-16 14:57 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha

* H. J. Lu:

> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
>> >
>> > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
>> > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
>> > This realignment does not take into account that the function has
>> > already used part of the red zone at this point, thus clobbering
>> > the initally saved register values located there if the stack
>> > alignment inherited from the caller is unfortunate.
>> >
>> > (Note: I do not know to write a good test case for this in the existing
>> > framework.  We saw this as a random LTO plugin crash when building GCC
>> > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
>> > with this change here.)
>>
>> Will a different STATE_SAVE_OFFSET for TLS descriptor work?
>
> Correction.  REGISTER_SAVE_AREA is for this purpose.   Will a different
> value for TLS descriptor work?

I think REGISTER_SAVE_AREA is for the later register saves?

This use of the red zone is specific to to the TLS trampoline.  The lazy
binding trampoline doesn't do that.  REGISTER_SAVE_AREA is used by both.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-16 14:57     ` Florian Weimer
@ 2024-03-16 14:59       ` H.J. Lu
  0 siblings, 0 replies; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 14:59 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Sat, Mar 16, 2024 at 7:57 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> >> >
> >> > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> >> > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> >> > This realignment does not take into account that the function has
> >> > already used part of the red zone at this point, thus clobbering
> >> > the initally saved register values located there if the stack
> >> > alignment inherited from the caller is unfortunate.
> >> >
> >> > (Note: I do not know to write a good test case for this in the existing
> >> > framework.  We saw this as a random LTO plugin crash when building GCC
> >> > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> >> > with this change here.)
> >>
> >> Will a different STATE_SAVE_OFFSET for TLS descriptor work?
> >
> > Correction.  REGISTER_SAVE_AREA is for this purpose.   Will a different
> > value for TLS descriptor work?
>
> I think REGISTER_SAVE_AREA is for the later register saves?
>
> This use of the red zone is specific to to the TLS trampoline.  The lazy
> binding trampoline doesn't do that.  REGISTER_SAVE_AREA is used by both.
>

# if DL_RUNTIME_RESOLVE_REALIGN_STACK
        /* STATE_SAVE_OFFSET has space for 8 integer registers.  But we
           need space for RCX, RDX, RSI, RDI, R8, R9, R10 and R11, plus
           RBX above.  */
        sub     $(REGISTER_SAVE_AREA + STATE_SAVE_ALIGNMENT), %RSP_LP
# else
        sub     $REGISTER_SAVE_AREA, %RSP_LP
        cfi_adjust_cfa_offset(REGISTER_SAVE_AREA)
# endif

Let's find a testcase first.

-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-16 14:47     ` H.J. Lu
@ 2024-03-16 15:04       ` Florian Weimer
  2024-03-16 15:18         ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2024-03-16 15:04 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha

* H. J. Lu:

> On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>> >
>> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
>> > >
>> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
>> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
>> > > This realignment does not take into account that the function has
>> > > already used part of the red zone at this point, thus clobbering
>> > > the initally saved register values located there if the stack
>> > > alignment inherited from the caller is unfortunate.
>> > >
>> > > (Note: I do not know to write a good test case for this in the existing
>> > > framework.  We saw this as a random LTO plugin crash when building GCC
>> > > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
>> > > with this change here.)
>
> We should try to find a testcase.  Can you provide a backtrace when it
> happens?  It should be possible to write a testcase with the backtrace.

In my reproducer, when %rbx is about to be clobbered, I see
(%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec.

The %rbx register does not get clobbered if (%rsp % 64) == 56.

Does this help?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-16 15:04       ` Florian Weimer
@ 2024-03-16 15:18         ` H.J. Lu
  2024-03-16 16:32           ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 15:18 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Sat, Mar 16, 2024 at 8:04 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >> >
> >> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> >> > >
> >> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> >> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> >> > > This realignment does not take into account that the function has
> >> > > already used part of the red zone at this point, thus clobbering
> >> > > the initally saved register values located there if the stack
> >> > > alignment inherited from the caller is unfortunate.
> >> > >
> >> > > (Note: I do not know to write a good test case for this in the existing
> >> > > framework.  We saw this as a random LTO plugin crash when building GCC
> >> > > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> >> > > with this change here.)
> >
> > We should try to find a testcase.  Can you provide a backtrace when it
> > happens?  It should be possible to write a testcase with the backtrace.
>
> In my reproducer, when %rbx is about to be clobbered, I see
> (%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec.
>
> The %rbx register does not get clobbered if (%rsp % 64) == 56.
>
> Does this help?
>

Yes.  I am working on a testcase.


-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-16 15:18         ` H.J. Lu
@ 2024-03-16 16:32           ` H.J. Lu
  2024-03-16 16:37             ` H.J. Lu
  2024-03-16 17:42             ` Florian Weimer
  0 siblings, 2 replies; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 16:32 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1731 bytes --]

On Sat, Mar 16, 2024 at 8:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 8:04 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * H. J. Lu:
> >
> > > On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >>
> > >> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >> >
> > >> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >> > >
> > >> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> > >> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> > >> > > This realignment does not take into account that the function has
> > >> > > already used part of the red zone at this point, thus clobbering
> > >> > > the initally saved register values located there if the stack
> > >> > > alignment inherited from the caller is unfortunate.
> > >> > >
> > >> > > (Note: I do not know to write a good test case for this in the existing
> > >> > > framework.  We saw this as a random LTO plugin crash when building GCC
> > >> > > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> > >> > > with this change here.)
> > >
> > > We should try to find a testcase.  Can you provide a backtrace when it
> > > happens?  It should be possible to write a testcase with the backtrace.
> >
> > In my reproducer, when %rbx is about to be clobbered, I see
> > (%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec.
> >
> > The %rbx register does not get clobbered if (%rsp % 64) == 56.
> >
> > Does this help?
> >
>
> Yes.  I am working on a testcase.

Hi Florian,

Please verify if this is the right testcase.

Thanks.

-- 
H.J.

[-- Attachment #2: 0001-Add-a-test-for-BZ-31501.patch --]
[-- Type: text/x-patch, Size: 8698 bytes --]

From 8c167bd4d9faddf9580f55182542e8a05a25fd61 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sat, 16 Mar 2024 08:42:27 -0700
Subject: [PATCH] Add a test for BZ #31501

---
 sysdeps/x86_64/Makefile                    | 16 +++++
 sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S | 57 +++++++++++++++++
 sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S | 74 ++++++++++++++++++++++
 sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.S | 57 +++++++++++++++++
 sysdeps/x86_64/tst-gnu2-tls2-x86-64.c      | 21 ++++++
 5 files changed, 225 insertions(+)
 create mode 100644 sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
 create mode 100644 sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
 create mode 100644 sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.S
 create mode 100644 sysdeps/x86_64/tst-gnu2-tls2-x86-64.c

diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile
index 66b21954f3..f020773e56 100644
--- a/sysdeps/x86_64/Makefile
+++ b/sysdeps/x86_64/Makefile
@@ -217,6 +217,22 @@ valgrind-suppressions-tst-valgrind-smoke = \
   --suppressions=$(..)sysdeps/x86_64/tst-valgrind-smoke.supp
 endif
 
+tests += \
+  tst-gnu2-tls2-x86-64 \
+# tests
+
+modules-names += \
+  tst-gnu2-tls2-x86-64-mod0 \
+  tst-gnu2-tls2-x86-64-mod1 \
+  tst-gnu2-tls2-x86-64-mod2 \
+# modules-names
+
+$(objpfx)tst-gnu2-tls2-x86-64: $(shared-thread-library)
+$(objpfx)tst-gnu2-tls2-x86-64.out: \
+  $(objpfx)tst-gnu2-tls2-x86-64-mod0.so \
+  $(objpfx)tst-gnu2-tls2-x86-64-mod1.so \
+  $(objpfx)tst-gnu2-tls2-x86-64-mod2.so
+
 endif # $(subdir) == elf
 
 ifeq ($(subdir),csu)
diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
new file mode 100644
index 0000000000..8129b28061
--- /dev/null
+++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
@@ -0,0 +1,57 @@
+/* Test TLSDESC relocation with x86-64.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+	.text
+	.p2align 4
+	.globl	apply_tls
+	.type	apply_tls, @function
+apply_tls:
+	.cfi_startproc
+	subq	$24, %rsp
+	.cfi_def_cfa_offset 32
+	movdqu	(%rdi), %xmm0
+	movq	%fs:40, %rax
+	movq	%rax, 8(%rsp)
+	xorl	%eax, %eax
+	leaq	tls_var0@TLSDESC(%rip), %rax
+	call	*tls_var0@TLSCALL(%rax)
+	addq	%fs:0, %rax
+	movups	%xmm0, (%rax)
+	movdqu	16(%rdi), %xmm1
+	movups	%xmm1, 16(%rax)
+	movq	8(%rsp), %rdx
+	subq	%fs:40, %rdx
+	jne	.L5
+	addq	$24, %rsp
+	.cfi_remember_state
+	.cfi_def_cfa_offset 8
+	ret
+.L5:
+	.cfi_restore_state
+	call	__stack_chk_fail@PLT
+	.cfi_endproc
+	.size	apply_tls, .-apply_tls
+	.hidden	tls_var0
+	.globl	tls_var0
+	.section	.tbss,"awT",@nobits
+	.align 8
+	.type	tls_var0, @object
+	.size	tls_var0, 32
+tls_var0:
+	.zero	32
+	.section	.note.GNU-stack,"",@progbits
diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
new file mode 100644
index 0000000000..af4b7ca761
--- /dev/null
+++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
@@ -0,0 +1,74 @@
+/* Test TLSDESC relocation with x86-64.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
+   clobber %rbx.  */
+#define OFFSET (56 + 16 + 16 + 16)
+
+	.text
+	.p2align 4
+	.globl	apply_tls
+	.type	apply_tls, @function
+apply_tls:
+	.cfi_startproc
+	pushq	%rbp
+	.cfi_def_cfa_offset 16
+	.cfi_offset 6, -16
+	movq	%rsp, %rbp
+	.cfi_def_cfa_register 6
+	/* Align stack to 64 bytes.  */
+	andq	$-64, %rsp
+	pushq	%rbx
+	subq	$OFFSET, %rsp
+	movdqu	(%rdi), %xmm0
+	movq	%fs:40, %rax
+	movq	%rax, 56(%rsp)
+	xorl	%eax, %eax
+	leaq	tls_var1@TLSDESC(%rip), %rax
+	call	*tls_var1@TLSCALL(%rax)
+	addq	%fs:0, %rax
+	movups	%xmm0, 32(%rax)
+	movdqu	16(%rdi), %xmm1
+	addq	$32, %rax
+	movups	%xmm1, 16(%rax)
+	movq	56(%rsp), %rdx
+	subq	%fs:40, %rdx
+	jne	.L5
+	/* All versions of _dl_tlsdesc_dynamic should preserve %rbx.  */
+	cmpq	OFFSET(%rsp), %rbx
+	jne	.Lhlt
+	leave
+	.cfi_remember_state
+	.cfi_def_cfa 7, 8
+	ret
+.Lhlt:
+	hlt
+.L5:
+	.cfi_restore_state
+	call	__stack_chk_fail@PLT
+	.cfi_endproc
+	.size	apply_tls, .-apply_tls
+	.hidden	tls_var1
+	.globl	tls_var1
+	.section	.tbss,"awT",@nobits
+	.align 16
+	.type	tls_var1, @object
+	.size	tls_var1, 3200
+tls_var1:
+	.zero	3200
+	.section	.note.GNU-stack,"",@progbits
diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.S
new file mode 100644
index 0000000000..d82f5f029b
--- /dev/null
+++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod2.S
@@ -0,0 +1,57 @@
+/* Test TLSDESC relocation with x86-64.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+	.text
+	.p2align 4
+	.globl	apply_tls
+	.type	apply_tls, @function
+apply_tls:
+	.cfi_startproc
+	subq	$24, %rsp
+	.cfi_def_cfa_offset 32
+	movdqu	(%rdi), %xmm0
+	movq	%fs:40, %rax
+	movq	%rax, 8(%rsp)
+	xorl	%eax, %eax
+	leaq	tls_var2@TLSDESC(%rip), %rax
+	call	*tls_var2@TLSCALL(%rax)
+	addq	%fs:0, %rax
+	movups	%xmm0, (%rax)
+	movdqu	16(%rdi), %xmm1
+	movups	%xmm1, 16(%rax)
+	movq	8(%rsp), %rdx
+	subq	%fs:40, %rdx
+	jne	.L5
+	addq	$24, %rsp
+	.cfi_remember_state
+	.cfi_def_cfa_offset 8
+	ret
+.L5:
+	.cfi_restore_state
+	call	__stack_chk_fail@PLT
+	.cfi_endproc
+	.size	apply_tls, .-apply_tls
+	.hidden	tls_var2
+	.globl	tls_var2
+	.section	.tbss,"awT",@nobits
+	.align 8
+	.type	tls_var2, @object
+	.size	tls_var2, 32
+tls_var2:
+	.zero	32
+	.section	.note.GNU-stack,"",@progbits
diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64.c b/sysdeps/x86_64/tst-gnu2-tls2-x86-64.c
new file mode 100644
index 0000000000..787181545b
--- /dev/null
+++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64.c
@@ -0,0 +1,21 @@
+/* Test TLSDESC relocation with x86-64.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define MOD(i) "tst-gnu2-tls2-x86-64-mod" #i ".so"
+
+#include <elf/tst-gnu2-tls2.c>
-- 
2.44.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-16 16:32           ` H.J. Lu
@ 2024-03-16 16:37             ` H.J. Lu
  2024-03-16 17:42             ` Florian Weimer
  1 sibling, 0 replies; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 16:37 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Sat, Mar 16, 2024 at 9:32 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 8:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 8:04 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > * H. J. Lu:
> > >
> > > > On Sat, Mar 16, 2024 at 7:43 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >>
> > > >> On Sat, Mar 16, 2024 at 7:37 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >> >
> > > >> > On Sat, Mar 16, 2024 at 7:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >> > >
> > > >> > > In sysdeps/x86_64/dl-tlsdesc-dynamic.h, the stack pointer is
> > > >> > > realigned for some variants (notably _dl_tlsdesc_dynamic_xsavec).
> > > >> > > This realignment does not take into account that the function has
> > > >> > > already used part of the red zone at this point, thus clobbering
> > > >> > > the initally saved register values located there if the stack
> > > >> > > alignment inherited from the caller is unfortunate.
> > > >> > >
> > > >> > > (Note: I do not know to write a good test case for this in the existing
> > > >> > > framework.  We saw this as a random LTO plugin crash when building GCC
> > > >> > > with -mtls-dialect=gnu2.  The existing tests on pass on x86_64-linux-gnu
> > > >> > > with this change here.)
> > > >
> > > > We should try to find a testcase.  Can you provide a backtrace when it
> > > > happens?  It should be possible to write a testcase with the backtrace.
> > >
> > > In my reproducer, when %rbx is about to be clobbered, I see
> > > (%rsp % 64) == 8 at the start of _dl_tlsdesc_dynamic_xsavec.
> > >
> > > The %rbx register does not get clobbered if (%rsp % 64) == 56.
> > >
> > > Does this help?
> > >
> >
> > Yes.  I am working on a testcase.
>
> Hi Florian,
>
> Please verify if this is the right testcase.

This test fails only on AVX512 machines.


-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-16 16:32           ` H.J. Lu
  2024-03-16 16:37             ` H.J. Lu
@ 2024-03-16 17:42             ` Florian Weimer
  2024-03-16 17:51               ` H.J. Lu
  1 sibling, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2024-03-16 17:42 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha

* H. J. Lu:

> Please verify if this is the right testcase.

Test case works (fails without my fix, succeeds with my fix).  Some
comments below.

> diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> new file mode 100644
> index 0000000000..8129b28061
> --- /dev/null
> +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> @@ -0,0 +1,57 @@

> +	.text
> +	.p2align 4
> +	.globl	apply_tls
> +	.type	apply_tls, @function
> +apply_tls:
> +	.cfi_startproc

Missing CET marker.

> +	subq	$24, %rsp
> +	.cfi_def_cfa_offset 32
> +	movdqu	(%rdi), %xmm0
> +	movq	%fs:40, %rax
> +	movq	%rax, 8(%rsp)
> +	xorl	%eax, %eax
> +	leaq	tls_var0@TLSDESC(%rip), %rax
> +	call	*tls_var0@TLSCALL(%rax)
> +	addq	%fs:0, %rax
> +	movups	%xmm0, (%rax)
> +	movdqu	16(%rdi), %xmm1
> +	movups	%xmm1, 16(%rax)
> +	movq	8(%rsp), %rdx
> +	subq	%fs:40, %rdx
> +	jne	.L5
> +	addq	$24, %rsp
> +	.cfi_remember_state
> +	.cfi_def_cfa_offset 8
> +	ret
> +.L5:
> +	.cfi_restore_state
> +	call	__stack_chk_fail@PLT

Not sure if we need this?

Maybe add some comment what exactly this subtest is exercising?

These are present in the other TLS modules as well.

> diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> new file mode 100644
> index 0000000000..af4b7ca761
> --- /dev/null
> +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S

> +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> +   clobber %rbx.  */
> +#define OFFSET (56 + 16 + 16 + 16)
> +
> +	.text
> +	.p2align 4
> +	.globl	apply_tls
> +	.type	apply_tls, @function
> +apply_tls:
> +	.cfi_startproc
> +	pushq	%rbp
> +	.cfi_def_cfa_offset 16
> +	.cfi_offset 6, -16
> +	movq	%rsp, %rbp
> +	.cfi_def_cfa_register 6
> +	/* Align stack to 64 bytes.  */
> +	andq	$-64, %rsp
> +	pushq	%rbx
> +	subq	$OFFSET, %rsp

The offset could be loaded from a global variable or something like
that.  We should exercise a wide range of stack alignments—the
individual tests are cheap.  And maybe check extra registers.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-16 17:42             ` Florian Weimer
@ 2024-03-16 17:51               ` H.J. Lu
  2024-03-16 22:05                 ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 17:51 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > Please verify if this is the right testcase.
>
> Test case works (fails without my fix, succeeds with my fix).  Some
> comments below.
>
> > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > new file mode 100644
> > index 0000000000..8129b28061
> > --- /dev/null
> > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > @@ -0,0 +1,57 @@
>
> > +     .text
> > +     .p2align 4
> > +     .globl  apply_tls
> > +     .type   apply_tls, @function
> > +apply_tls:
> > +     .cfi_startproc
>
> Missing CET marker.
>
> > +     subq    $24, %rsp
> > +     .cfi_def_cfa_offset 32
> > +     movdqu  (%rdi), %xmm0
> > +     movq    %fs:40, %rax
> > +     movq    %rax, 8(%rsp)
> > +     xorl    %eax, %eax
> > +     leaq    tls_var0@TLSDESC(%rip), %rax
> > +     call    *tls_var0@TLSCALL(%rax)
> > +     addq    %fs:0, %rax
> > +     movups  %xmm0, (%rax)
> > +     movdqu  16(%rdi), %xmm1
> > +     movups  %xmm1, 16(%rax)
> > +     movq    8(%rsp), %rdx
> > +     subq    %fs:40, %rdx
> > +     jne     .L5
> > +     addq    $24, %rsp
> > +     .cfi_remember_state
> > +     .cfi_def_cfa_offset 8
> > +     ret
> > +.L5:
> > +     .cfi_restore_state
> > +     call    __stack_chk_fail@PLT
>
> Not sure if we need this?
>
> Maybe add some comment what exactly this subtest is exercising?
>
> These are present in the other TLS modules as well.
>
> > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > new file mode 100644
> > index 0000000000..af4b7ca761
> > --- /dev/null
> > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
>
> > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> > +   clobber %rbx.  */
> > +#define OFFSET (56 + 16 + 16 + 16)
> > +
> > +     .text
> > +     .p2align 4
> > +     .globl  apply_tls
> > +     .type   apply_tls, @function
> > +apply_tls:
> > +     .cfi_startproc
> > +     pushq   %rbp
> > +     .cfi_def_cfa_offset 16
> > +     .cfi_offset 6, -16
> > +     movq    %rsp, %rbp
> > +     .cfi_def_cfa_register 6
> > +     /* Align stack to 64 bytes.  */
> > +     andq    $-64, %rsp
> > +     pushq   %rbx
> > +     subq    $OFFSET, %rsp
>
> The offset could be loaded from a global variable or something like
> that.  We should exercise a wide range of stack alignments—the
> individual tests are cheap.  And maybe check extra registers.

I will clean it up with a different fix.

-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-16 17:51               ` H.J. Lu
@ 2024-03-16 22:05                 ` H.J. Lu
  2024-03-17  1:19                   ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2024-03-16 22:05 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Sat, Mar 16, 2024 at 10:51 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * H. J. Lu:
> >
> > > Please verify if this is the right testcase.
> >
> > Test case works (fails without my fix, succeeds with my fix).  Some
> > comments below.
> >
> > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > new file mode 100644
> > > index 0000000000..8129b28061
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > @@ -0,0 +1,57 @@
> >
> > > +     .text
> > > +     .p2align 4
> > > +     .globl  apply_tls
> > > +     .type   apply_tls, @function
> > > +apply_tls:
> > > +     .cfi_startproc
> >
> > Missing CET marker.
> >
> > > +     subq    $24, %rsp
> > > +     .cfi_def_cfa_offset 32
> > > +     movdqu  (%rdi), %xmm0
> > > +     movq    %fs:40, %rax
> > > +     movq    %rax, 8(%rsp)
> > > +     xorl    %eax, %eax
> > > +     leaq    tls_var0@TLSDESC(%rip), %rax
> > > +     call    *tls_var0@TLSCALL(%rax)
> > > +     addq    %fs:0, %rax
> > > +     movups  %xmm0, (%rax)
> > > +     movdqu  16(%rdi), %xmm1
> > > +     movups  %xmm1, 16(%rax)
> > > +     movq    8(%rsp), %rdx
> > > +     subq    %fs:40, %rdx
> > > +     jne     .L5
> > > +     addq    $24, %rsp
> > > +     .cfi_remember_state
> > > +     .cfi_def_cfa_offset 8
> > > +     ret
> > > +.L5:
> > > +     .cfi_restore_state
> > > +     call    __stack_chk_fail@PLT
> >
> > Not sure if we need this?
> >
> > Maybe add some comment what exactly this subtest is exercising?
> >
> > These are present in the other TLS modules as well.
> >
> > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > new file mode 100644
> > > index 0000000000..af4b7ca761
> > > --- /dev/null
> > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> >
> > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> > > +   clobber %rbx.  */
> > > +#define OFFSET (56 + 16 + 16 + 16)
> > > +
> > > +     .text
> > > +     .p2align 4
> > > +     .globl  apply_tls
> > > +     .type   apply_tls, @function
> > > +apply_tls:
> > > +     .cfi_startproc
> > > +     pushq   %rbp
> > > +     .cfi_def_cfa_offset 16
> > > +     .cfi_offset 6, -16
> > > +     movq    %rsp, %rbp
> > > +     .cfi_def_cfa_register 6
> > > +     /* Align stack to 64 bytes.  */
> > > +     andq    $-64, %rsp
> > > +     pushq   %rbx
> > > +     subq    $OFFSET, %rsp
> >
> > The offset could be loaded from a global variable or something like
> > that.  We should exercise a wide range of stack alignments—the
> > individual tests are cheap.  And maybe check extra registers.
>
> I will clean it up with a different fix.
>

I submitted a patch with a testase:

https://patchwork.sourceware.org/project/glibc/list/?series=31963

My patch allocates 64 more bytes to avoid clobbering saved RDI,
RSI and RBX values on stack by xsave.  It avoids 2 stack
adjustments.  Either my fix or Florian's fix should fix the issue.
I don't have a strong preference as long as my testcase is
included.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-16 22:05                 ` H.J. Lu
@ 2024-03-17  1:19                   ` H.J. Lu
  2024-03-17  3:14                     ` H.J. Lu
  0 siblings, 1 reply; 15+ messages in thread
From: H.J. Lu @ 2024-03-17  1:19 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Sat, Mar 16, 2024 at 3:05 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 10:51 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > * H. J. Lu:
> > >
> > > > Please verify if this is the right testcase.
> > >
> > > Test case works (fails without my fix, succeeds with my fix).  Some
> > > comments below.
> > >
> > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > new file mode 100644
> > > > index 0000000000..8129b28061
> > > > --- /dev/null
> > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > @@ -0,0 +1,57 @@
> > >
> > > > +     .text
> > > > +     .p2align 4
> > > > +     .globl  apply_tls
> > > > +     .type   apply_tls, @function
> > > > +apply_tls:
> > > > +     .cfi_startproc
> > >
> > > Missing CET marker.
> > >
> > > > +     subq    $24, %rsp
> > > > +     .cfi_def_cfa_offset 32
> > > > +     movdqu  (%rdi), %xmm0
> > > > +     movq    %fs:40, %rax
> > > > +     movq    %rax, 8(%rsp)
> > > > +     xorl    %eax, %eax
> > > > +     leaq    tls_var0@TLSDESC(%rip), %rax
> > > > +     call    *tls_var0@TLSCALL(%rax)
> > > > +     addq    %fs:0, %rax
> > > > +     movups  %xmm0, (%rax)
> > > > +     movdqu  16(%rdi), %xmm1
> > > > +     movups  %xmm1, 16(%rax)
> > > > +     movq    8(%rsp), %rdx
> > > > +     subq    %fs:40, %rdx
> > > > +     jne     .L5
> > > > +     addq    $24, %rsp
> > > > +     .cfi_remember_state
> > > > +     .cfi_def_cfa_offset 8
> > > > +     ret
> > > > +.L5:
> > > > +     .cfi_restore_state
> > > > +     call    __stack_chk_fail@PLT
> > >
> > > Not sure if we need this?
> > >
> > > Maybe add some comment what exactly this subtest is exercising?
> > >
> > > These are present in the other TLS modules as well.
> > >
> > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > > new file mode 100644
> > > > index 0000000000..af4b7ca761
> > > > --- /dev/null
> > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > >
> > > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> > > > +   clobber %rbx.  */
> > > > +#define OFFSET (56 + 16 + 16 + 16)
> > > > +
> > > > +     .text
> > > > +     .p2align 4
> > > > +     .globl  apply_tls
> > > > +     .type   apply_tls, @function
> > > > +apply_tls:
> > > > +     .cfi_startproc
> > > > +     pushq   %rbp
> > > > +     .cfi_def_cfa_offset 16
> > > > +     .cfi_offset 6, -16
> > > > +     movq    %rsp, %rbp
> > > > +     .cfi_def_cfa_register 6
> > > > +     /* Align stack to 64 bytes.  */
> > > > +     andq    $-64, %rsp
> > > > +     pushq   %rbx
> > > > +     subq    $OFFSET, %rsp
> > >
> > > The offset could be loaded from a global variable or something like
> > > that.  We should exercise a wide range of stack alignments—the
> > > individual tests are cheap.  And maybe check extra registers.
> >
> > I will clean it up with a different fix.
> >
>
> I submitted a patch with a testase:
>
> https://patchwork.sourceware.org/project/glibc/list/?series=31963
>
> My patch allocates 64 more bytes to avoid clobbering saved RDI,
> RSI and RBX values on stack by xsave.  It avoids 2 stack
> adjustments.  Either my fix or Florian's fix should fix the issue.
> I don't have a strong preference as long as my testcase is
> included.
>
>
I think my testcase may fail on AMD AVX CPUs without the
fix.  On Intel AVX CPUs, the state size is 960 bytes.  But the
last 128 bytes may be unused.

-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501)
  2024-03-17  1:19                   ` H.J. Lu
@ 2024-03-17  3:14                     ` H.J. Lu
  0 siblings, 0 replies; 15+ messages in thread
From: H.J. Lu @ 2024-03-17  3:14 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Sat, Mar 16, 2024 at 6:19 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 3:05 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 10:51 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Sat, Mar 16, 2024 at 10:42 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >
> > > > * H. J. Lu:
> > > >
> > > > > Please verify if this is the right testcase.
> > > >
> > > > Test case works (fails without my fix, succeeds with my fix).  Some
> > > > comments below.
> > > >
> > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > > new file mode 100644
> > > > > index 0000000000..8129b28061
> > > > > --- /dev/null
> > > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod0.S
> > > > > @@ -0,0 +1,57 @@
> > > >
> > > > > +     .text
> > > > > +     .p2align 4
> > > > > +     .globl  apply_tls
> > > > > +     .type   apply_tls, @function
> > > > > +apply_tls:
> > > > > +     .cfi_startproc
> > > >
> > > > Missing CET marker.
> > > >
> > > > > +     subq    $24, %rsp
> > > > > +     .cfi_def_cfa_offset 32
> > > > > +     movdqu  (%rdi), %xmm0
> > > > > +     movq    %fs:40, %rax
> > > > > +     movq    %rax, 8(%rsp)
> > > > > +     xorl    %eax, %eax
> > > > > +     leaq    tls_var0@TLSDESC(%rip), %rax
> > > > > +     call    *tls_var0@TLSCALL(%rax)
> > > > > +     addq    %fs:0, %rax
> > > > > +     movups  %xmm0, (%rax)
> > > > > +     movdqu  16(%rdi), %xmm1
> > > > > +     movups  %xmm1, 16(%rax)
> > > > > +     movq    8(%rsp), %rdx
> > > > > +     subq    %fs:40, %rdx
> > > > > +     jne     .L5
> > > > > +     addq    $24, %rsp
> > > > > +     .cfi_remember_state
> > > > > +     .cfi_def_cfa_offset 8
> > > > > +     ret
> > > > > +.L5:
> > > > > +     .cfi_restore_state
> > > > > +     call    __stack_chk_fail@PLT
> > > >
> > > > Not sure if we need this?
> > > >
> > > > Maybe add some comment what exactly this subtest is exercising?
> > > >
> > > > These are present in the other TLS modules as well.
> > > >
> > > > > diff --git a/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > > > new file mode 100644
> > > > > index 0000000000..af4b7ca761
> > > > > --- /dev/null
> > > > > +++ b/sysdeps/x86_64/tst-gnu2-tls2-x86-64-mod1.S
> > > >
> > > > > +/* Select an offset which will cause _dl_tlsdesc_dynamic_xsavec to
> > > > > +   clobber %rbx.  */
> > > > > +#define OFFSET (56 + 16 + 16 + 16)
> > > > > +
> > > > > +     .text
> > > > > +     .p2align 4
> > > > > +     .globl  apply_tls
> > > > > +     .type   apply_tls, @function
> > > > > +apply_tls:
> > > > > +     .cfi_startproc
> > > > > +     pushq   %rbp
> > > > > +     .cfi_def_cfa_offset 16
> > > > > +     .cfi_offset 6, -16
> > > > > +     movq    %rsp, %rbp
> > > > > +     .cfi_def_cfa_register 6
> > > > > +     /* Align stack to 64 bytes.  */
> > > > > +     andq    $-64, %rsp
> > > > > +     pushq   %rbx
> > > > > +     subq    $OFFSET, %rsp
> > > >
> > > > The offset could be loaded from a global variable or something like
> > > > that.  We should exercise a wide range of stack alignments—the
> > > > individual tests are cheap.  And maybe check extra registers.
> > >
> > > I will clean it up with a different fix.
> > >
> >
> > I submitted a patch with a testase:
> >
> > https://patchwork.sourceware.org/project/glibc/list/?series=31963
> >
> > My patch allocates 64 more bytes to avoid clobbering saved RDI,
> > RSI and RBX values on stack by xsave.  It avoids 2 stack
> > adjustments.  Either my fix or Florian's fix should fix the issue.
> > I don't have a strong preference as long as my testcase is
> > included.
> >
> >
> I think my testcase may fail on AMD AVX CPUs without the
> fix.  On Intel AVX CPUs, the state size is 960 bytes.  But the
> last 128 bytes may be unused.
>

I sent out the v2 patch:

https://patchwork.sourceware.org/project/glibc/list/?series=31966

to simplify the testcase.

-- 
H.J.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-03-17  3:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-16 14:32 [PATCH] x86-64: Stack alignment in _dl_tlsdesc_dynamic and red zone usage (bug 31501) Florian Weimer
2024-03-16 14:37 ` H.J. Lu
2024-03-16 14:43   ` H.J. Lu
2024-03-16 14:47     ` H.J. Lu
2024-03-16 15:04       ` Florian Weimer
2024-03-16 15:18         ` H.J. Lu
2024-03-16 16:32           ` H.J. Lu
2024-03-16 16:37             ` H.J. Lu
2024-03-16 17:42             ` Florian Weimer
2024-03-16 17:51               ` H.J. Lu
2024-03-16 22:05                 ` H.J. Lu
2024-03-17  1:19                   ` H.J. Lu
2024-03-17  3:14                     ` H.J. Lu
2024-03-16 14:57     ` Florian Weimer
2024-03-16 14:59       ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).