public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v16 0/8] Add rseq extensible ABI support
@ 2025-01-09 16:32 Michael Jeanson
  2025-01-09 16:32 ` [PATCH v16 1/8] nptl: Add rseq auxvals Michael Jeanson
                   ` (10 more replies)
  0 siblings, 11 replies; 39+ messages in thread
From: Michael Jeanson @ 2025-01-09 16:32 UTC (permalink / raw)
  To: libc-alpha
  Cc: Michael Jeanson, Florian Weimer, Carlos O'Donell, DJ Delorie,
	Mathieu Desnoyers

Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding
rseq features past the initial 32 bytes of the original ABI.

While the rseq features in the latest kernel still fit within the
original ABI size, there are currently only 4 bytes left. It would thus
be a good time to add support for the extensible ABI so that when new
features are added, they are immediately available to GNU libc users.

We use the ELF auxiliary vector to query the kernel for the size and
alignment of the rseq area, if this fails we default to the original
fixed size and alignment of '32' which the kernel will accept as a
compatibility mode with the original ABI.

This makes the size of the rseq area variable and thus requires to
relocate it out of 'struct pthread'. We chose to move it after (in block
allocation order) the last TLS block inside the static TLS block
allocation. It required a fairly small modification to the TLS block
allocator and did not interfere with the main executable TLS block which
must always be the first block relative to the thread pointer.

[1] https://lore.kernel.org/all/20221122203932.231377-4-mathieu.desnoyers@efficios.com/

Cc: Florian Weimer <fweimer@redhat.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: DJ Delorie <dj@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
Changes since v15:
- Improve comments around size and alignment calculations of the extra
  TLS block
- Simplify tls_blocks_size calculations
- Make sure we don't roundup to zero alignment
- Improve comments around size and alignment calculations of the extra
  TLS block
- Simplify tls_blocks_size calculations
- Make sure we don't roundup to zero alignment
- Rename VOLATILE macros to ONCE
- Clarify comments in tests regarding the rseq offset and tls offset
- Check the rseq offset against a TLS variable in the tests
Changes since v14:
- Add TLS_TP_OFFSET to the extra TLS block offset calculation
- Style nits
Changes since v13:
- Ensure that the VOLATILE variants of the RSEQ_ accessors static
  asserts on 64bit types on 32bit architectures
- Move rtld_hidden_proto rseq symbols to a separate patch
Changes since v12:
- Set _rseq_size to 0 on registration failure
- Split RSEQ_SET/GETMEM from THREAD_SET/GETMEM
- Rename rseq_get_area() to RSEQ_SELF()
- Add rtld_hidden_proto to __rseq_size and __rseq_offset
- Add comment and variable array member to 'struct rseq_area'
- Style nits
Changes since v11:
- Removed _dl_rseq_feature_size, use __rseq_size instead
- Replace GLRO(dl_rseq_align) with a hidden global variable _rseq_align
- __rseq_size is now set directly in _dl_parse_auxv, set it to 0 when
  the main thread registration fails or is disabled by tunable
Changes since v10:
- Split the patchset in smaller patches
- Rebased on 'Make __rseq_size useful for feature detection'
- Remove 'rseq' from the generic TLS code, add 'extra TLS'
- Add thread_pointer.h for all Linux architectures
- Fix build on the Hurd
Changes since v8:
- Fix copyright year in sysdeps/generic/dl-rseq.h
- Clarify the the tcb math comments
- Add a comment to clarify what enforces the aligment requirements of a
  pointer calculated from the rseq_offset
- Remove nonsensical test in tst-rseq-disable
- Add comments to clarify why the rseq size is 0 when registration fails
  or is disabled
- Add comments to explain why whe allocate and rseq area block even when
  the registration is disabled by tunable
- Rename 'rseq_size' -> 'rseq_alloc_size' and 'dl_tls_rseq_size' ->
  'dl_tls_rseq_alloc_size' to clarify the distinction between the
  allocated rseq size and the size reported to application code in
  '__rseq_size'
Changes since v6:
- Fix tst-rseq for feature size over 32 bytes
- Rebased on 'nptl: fix potential merge of __rseq_* relro symbols'
Changes since v5:
- Fix TLS_DTV_AT_TP rseq offset with statically linked executables
Changes since RFC v4:
- Move dynamic linker defines to a header file
- Fix alignment when tls block align is smaller than rseq align with
  statically linked executables
- Add statically linked rseq tests
- Revert: Set __rseq_size even when the registration fails
- Use minimum size when rseq is disabled by tunable
Changes since RFC v3:
- Fix RSEQ_SETMEM for rseq disabled
- Replace sys/auxv.h usage with dl-parse_auxv.h
- Fix offset for TLS_TCB_AT_TP with statically linked executables
- Zero the rseq area before registration
Changes since RFC v2:
- Set __rseq_size even when the registration fails
- Adjust rseq tests to the new ABI
- Added support for statically linked executables
Changes since RFC v1:
- Insert the rseq area after the last TLS block
- Add proper support for TLS_TCB_AT_TP variant

---
Michael Jeanson (8):
  nptl: Add rseq auxvals
  Add generic 'extra TLS'
  Add Linux 'extra TLS'
  nptl: add rtld_hidden_proto to __rseq_size and __rseq_offset
  nptl: Introduce <rseq-access.h> for RSEQ_* accessors
  nptl: Move the rseq area to the 'extra TLS' block
  nptl: Remove the rseq area from 'struct pthread'
  Linux: Update internal copy of '<sys/rseq.h>'

 csu/libc-tls.c                                | 82 ++++++++++++++--
 elf/dl-tls.c                                  | 72 ++++++++++++++
 nptl/descr.h                                  | 22 +----
 nptl/pthread_create.c                         |  2 +-
 sysdeps/generic/dl-extra_tls.h                | 46 +++++++++
 sysdeps/i386/nptl/rseq-access.h               | 98 +++++++++++++++++++
 sysdeps/nptl/dl-tls_init_tp.c                 | 23 ++---
 sysdeps/nptl/rseq-access.h                    | 56 +++++++++++
 sysdeps/unix/sysv/linux/Makefile              | 10 ++
 sysdeps/unix/sysv/linux/dl-extra_tls.h        | 71 ++++++++++++++
 sysdeps/unix/sysv/linux/dl-parse_auxv.h       |  7 ++
 sysdeps/unix/sysv/linux/dl-rseq-symbols.S     | 27 +++--
 sysdeps/unix/sysv/linux/rseq-internal.h       | 95 ++++++++++++++----
 sysdeps/unix/sysv/linux/sched_getcpu.c        |  3 +-
 sysdeps/unix/sysv/linux/sys/rseq.h            | 11 +++
 .../unix/sysv/linux/tst-rseq-disable-static.c |  1 +
 sysdeps/unix/sysv/linux/tst-rseq-disable.c    | 77 +++++++++++++--
 .../unix/sysv/linux/tst-rseq-nptl-static.c    |  1 +
 sysdeps/unix/sysv/linux/tst-rseq-static.c     |  1 +
 sysdeps/unix/sysv/linux/tst-rseq.c            | 97 +++++++++++++++---
 sysdeps/unix/sysv/linux/tst-rseq.h            |  3 +-
 sysdeps/x86_64/nptl/rseq-access.h             | 77 +++++++++++++++
 22 files changed, 785 insertions(+), 97 deletions(-)
 create mode 100644 sysdeps/generic/dl-extra_tls.h
 create mode 100644 sysdeps/i386/nptl/rseq-access.h
 create mode 100644 sysdeps/nptl/rseq-access.h
 create mode 100644 sysdeps/unix/sysv/linux/dl-extra_tls.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-disable-static.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-static.c
 create mode 100644 sysdeps/x86_64/nptl/rseq-access.h

-- 
2.39.5


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v16 1/8] nptl: Add rseq auxvals
  2025-01-09 16:32 [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
@ 2025-01-09 16:32 ` Michael Jeanson
  2025-01-09 16:32 ` [PATCH v16 2/8] Add generic 'extra TLS' Michael Jeanson
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 39+ messages in thread
From: Michael Jeanson @ 2025-01-09 16:32 UTC (permalink / raw)
  To: libc-alpha
  Cc: Michael Jeanson, Florian Weimer, Carlos O'Donell, DJ Delorie,
	Mathieu Desnoyers

Get the rseq feature size and alignment requirement from the auxiliary
vector for use inside the dynamic loader. Use '__rseq_size' directly to
store the feature size. If the main thread registration fails or is
disabled by tunable, reset the value to 0.

This will be used in the TLS block allocator to compute the size and
alignment of the rseq area block for the extended ABI support.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
---
Changes since v13:
- Fix nits and typos in comments and commit message
Changes since v12:
- Set _rseq_size to 0 on registration failure
Changes since v11:
- Removed _dl_rseq_feature_size, use __rseq_size instead
- Replace GLRO(dl_rseq_align) with a hidden global variable _rseq_align
---
 sysdeps/nptl/dl-tls_init_tp.c           | 14 ++++++++++----
 sysdeps/unix/sysv/linux/dl-parse_auxv.h | 13 +++++++++++++
 sysdeps/unix/sysv/linux/rseq-internal.h | 24 +++++++++++++++++++++---
 sysdeps/unix/sysv/linux/tst-rseq.c      |  6 ++++--
 sysdeps/unix/sysv/linux/tst-rseq.h      |  1 +
 5 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
index c57738e9f3..46478324fc 100644
--- a/sysdeps/nptl/dl-tls_init_tp.c
+++ b/sysdeps/nptl/dl-tls_init_tp.c
@@ -46,6 +46,8 @@ rtld_mutex_dummy (pthread_mutex_t *lock)
 
 const unsigned int __rseq_flags;
 
+size_t _rseq_align attribute_hidden;
+
 void
 __tls_pre_init_tp (void)
 {
@@ -99,10 +101,14 @@ __tls_init_tp (void)
   }
 
   {
-    bool do_rseq = true;
-    do_rseq = TUNABLE_GET (rseq, int, NULL);
-    if (rseq_register_current_thread (pd, do_rseq))
-      _rseq_size = RSEQ_AREA_SIZE_INITIAL_USED;
+    /* If the registration fails or is disabled by tunable, the public
+       '__rseq_size' will be set to '0' regardless of the feature size of the
+       allocated rseq area.  An rseq area of at least 32 bytes is always
+       allocated since application code is allowed to check the status of the
+       rseq registration by reading the content of the 'cpu_id' field.  */
+    bool do_rseq = TUNABLE_GET (rseq, int, NULL);
+    if (!rseq_register_current_thread (pd, do_rseq))
+      _rseq_size = 0;
 
 #ifdef RSEQ_SIG
     /* This should be a compile-time constant, but the current
diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
index 7b4427d0a7..2d4243734c 100644
--- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h
+++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
@@ -21,6 +21,7 @@
 #include <fpu_control.h>
 #include <ldsodefs.h>
 #include <link.h>
+#include <rseq-internal.h>
 
 typedef ElfW(Addr) dl_parse_auxv_t[AT_MINSIGSTKSZ + 1];
 
@@ -59,5 +60,17 @@ void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values)
     GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO];
 #endif
 
+  /* Get the rseq feature size, with a minimum of RSEQ_AREA_SIZE_INITIAL_USED
+     (20) for kernels that don't have AT_RSEQ_FEATURE_SIZE.  Limit the feature
+     size to RSEQ_AREA_SIZE_MAX_USED (28) which fits the rseq area in 'struct
+     pthread' and represents the maximum feature size of currently released
+     kernels.  Since no kernels currently cross the 32 bytes of the original
+     ABI, the semantics of a feature size of 32 or more are still undetermined.
+     */
+  _rseq_size = MIN (MAX (auxv_values[AT_RSEQ_FEATURE_SIZE],
+                         RSEQ_AREA_SIZE_INITIAL_USED),
+		    RSEQ_AREA_SIZE_MAX_USED);
+  _rseq_align = MAX (auxv_values[AT_RSEQ_ALIGN], RSEQ_MIN_ALIGN);
+
   DL_PLATFORM_AUXV
 }
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
index a0c244b0cf..6f565581c1 100644
--- a/sysdeps/unix/sysv/linux/rseq-internal.h
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -25,13 +25,31 @@
 #include <stdio.h>
 #include <sys/rseq.h>
 
-/* 32 is the initially required value for the area size.  The
-   actually used rseq size may be less (20 bytes initially).  */
+/* Minimum size of the rseq area allocation required by the syscall.  The
+   actually used rseq feature size may be less (20 bytes initially).  */
 #define RSEQ_AREA_SIZE_INITIAL 32
+
+/* Minimum used feature size of the rseq area.  */
 #define RSEQ_AREA_SIZE_INITIAL_USED 20
 
-/* The variables are in .data.relro but are not yet write-protected.  */
+/* Maximum currently used feature size of the rseq area.  */
+#define RSEQ_AREA_SIZE_MAX_USED 28
+
+/* Minimum alignment of the rseq area.  */
+#define RSEQ_MIN_ALIGN 32
+
+/* Alignment requirement of the rseq area.
+   Populated from the auxiliary vector with a minimum of '32'.
+   In .data.relro but not yet write-protected.  */
+extern size_t _rseq_align attribute_hidden;
+
+/* Size of the active features in the rseq area.
+   Populated from the auxiliary vector with a minimum of '20'.
+   In .data.relro but not yet write-protected.  */
 extern unsigned int _rseq_size attribute_hidden;
+
+/* Offset from the thread pointer to the rseq area.
+   In .data.relro but not yet write-protected.  */
 extern ptrdiff_t _rseq_offset attribute_hidden;
 
 #ifdef RSEQ_SIG
diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c
index aced4bd66c..b152280eff 100644
--- a/sysdeps/unix/sysv/linux/tst-rseq.c
+++ b/sysdeps/unix/sysv/linux/tst-rseq.c
@@ -38,13 +38,15 @@ static void
 do_rseq_main_test (void)
 {
   struct pthread *pd = THREAD_SELF;
+  size_t rseq_feature_size = MIN (MAX (getauxval (AT_RSEQ_FEATURE_SIZE),
+                                       RSEQ_AREA_SIZE_INITIAL_USED),
+                                  RSEQ_AREA_SIZE_MAX_USED);
 
   TEST_VERIFY_EXIT (rseq_thread_registered ());
   TEST_COMPARE (__rseq_flags, 0);
   TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset
                == (char *) &pd->rseq_area);
-  /* The current implementation only supports the initial size.  */
-  TEST_COMPARE (__rseq_size, 20);
+  TEST_COMPARE (__rseq_size, rseq_feature_size);
 }
 
 static void
diff --git a/sysdeps/unix/sysv/linux/tst-rseq.h b/sysdeps/unix/sysv/linux/tst-rseq.h
index f216d3f7d7..15f512a983 100644
--- a/sysdeps/unix/sysv/linux/tst-rseq.h
+++ b/sysdeps/unix/sysv/linux/tst-rseq.h
@@ -23,6 +23,7 @@
 #include <syscall.h>
 #include <sys/rseq.h>
 #include <tls.h>
+#include <rseq-internal.h>
 
 static inline bool
 rseq_thread_registered (void)
-- 
2.39.5


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v16 2/8] Add generic 'extra TLS'
  2025-01-09 16:32 [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
  2025-01-09 16:32 ` [PATCH v16 1/8] nptl: Add rseq auxvals Michael Jeanson
@ 2025-01-09 16:32 ` Michael Jeanson
  2025-01-10 19:03   ` Florian Weimer
  2025-01-11 14:51   ` Florian Weimer
  2025-01-09 16:32 ` [PATCH v16 3/8] Add Linux " Michael Jeanson
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 39+ messages in thread
From: Michael Jeanson @ 2025-01-09 16:32 UTC (permalink / raw)
  To: libc-alpha
  Cc: Michael Jeanson, Florian Weimer, Carlos O'Donell, DJ Delorie,
	Mathieu Desnoyers

Add the logic to append an 'extra TLS' block in the TLS block allocator
with a generic stub implementation. The duplicated code in
'csu/libc-tls.c' and 'elf/dl-tls.c' is to handle both statically linked
applications and the ELF dynamic loader.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
Changes since v15:
- Improve comments around size and alignment calculations
- Simplify tls_blocks_size calculations
- Make sure we don't roundup to zero alignment
Changes since v14:
- Add TLS_TP_OFFSET to the extra TLS block offset calculation
- Style nits
- Update copyright year to 2025
Changes since v13:
- Improve commit message
---
 csu/libc-tls.c                 | 82 ++++++++++++++++++++++++++++++----
 elf/dl-tls.c                   | 72 +++++++++++++++++++++++++++++
 sysdeps/generic/dl-extra_tls.h | 46 +++++++++++++++++++
 3 files changed, 191 insertions(+), 9 deletions(-)
 create mode 100644 sysdeps/generic/dl-extra_tls.h

diff --git a/csu/libc-tls.c b/csu/libc-tls.c
index 15f470aa87..02855d0b54 100644
--- a/csu/libc-tls.c
+++ b/csu/libc-tls.c
@@ -20,12 +20,14 @@
 #include <errno.h>
 #include <ldsodefs.h>
 #include <tls.h>
+#include <dl-tls.h>
 #include <unistd.h>
 #include <stdio.h>
 #include <sys/param.h>
 #include <array_length.h>
 #include <pthreadP.h>
 #include <dl-call_tls_init_tp.h>
+#include <dl-extra_tls.h>
 
 #ifdef SHARED
  #error makefile bug, this file is for static only
@@ -110,6 +112,7 @@ __libc_setup_tls (void)
   size_t filesz = 0;
   void *initimage = NULL;
   size_t align = 0;
+  size_t tls_blocks_size = 0;
   size_t max_align = TCB_ALIGNMENT;
   size_t tcb_offset;
   const ElfW(Phdr) *phdr;
@@ -135,22 +138,89 @@ __libc_setup_tls (void)
   /* Calculate the size of the static TLS surplus, with 0 auditors.  */
   _dl_tls_static_surplus_init (0);
 
+  /* Extra TLS block for internal usage to append at the end of the TLS blocks
+     (in allocation order).  The address at which the block is allocated must
+     be aligned to 'extra_tls_align'.  The size of the block as returned by
+     '_dl_extra_tls_get_size ()' is always a multiple of the aligment.
+
+     On Linux systems this is where the rseq area will be allocated.  On other
+     systems it is currently unused and both values will be '0'.  */
+  size_t extra_tls_size = _dl_extra_tls_get_size ();
+  size_t extra_tls_align = _dl_extra_tls_get_align ();
+
+  /* Increase the maximum alignment with the extra TLS alignment requirements
+     if necessary.  */
+  max_align = MAX (max_align, extra_tls_align);
+
   /* We have to set up the TCB block which also (possibly) contains
      'errno'.  Therefore we avoid 'malloc' which might touch 'errno'.
      Instead we use 'sbrk' which would only uses 'errno' if it fails.
      In this case we are right away out of memory and the user gets
      what she/he deserves.  */
 #if TLS_TCB_AT_TP
+  /* In this layout the TLS blocks are located before the thread pointer. */
+
+  /* Record the size of the combined TLS blocks.
+
+     First reserve space for 'memsz' while respecting both its alignment
+     requirements and those of the extra TLS blocks.  Then add the size of
+     the extra TLS block.  Both values respect the extra TLS alignment
+     requirements and so does the resulting size and the offset that will
+     be derived from it.  */
+  tls_blocks_size = roundup (memsz, MAX (align, extra_tls_align) ?: 1)
+		  + extra_tls_size;
+
+ /* Record the extra TLS block offset from the thread pointer.
+
+    With TLS_TCB_AT_TP the TLS blocks are allocated before the thread pointer
+    in reverse order.  Our block is added last which results in it being the
+    first in the static TLS block, thus record the most negative offset.
+
+    The alignment requirements of the pointer resulting from this offset and
+    the thread pointer are enforced by 'max_align' which is used to align the
+    tcb_offset.  */
+  _dl_extra_tls_set_offset (-tls_blocks_size);
+
   /* Align the TCB offset to the maximum alignment, as
      _dl_allocate_tls_storage (in elf/dl-tls.c) does using __libc_memalign
      and dl_tls_static_align.  */
-  tcb_offset = roundup (memsz + GLRO(dl_tls_static_surplus), max_align);
+  tcb_offset = roundup (tls_blocks_size + GLRO(dl_tls_static_surplus), max_align);
   tlsblock = _dl_early_allocate (tcb_offset + TLS_INIT_TCB_SIZE + max_align);
   if (tlsblock == NULL)
     _startup_fatal_tls_error ();
 #elif TLS_DTV_AT_TP
+  /* In this layout the TLS blocks are located after the thread pointer. */
+
+  /* Record the tcb_offset including the aligment requirements of 'memsz'
+     that comes after it.  */
   tcb_offset = roundup (TLS_INIT_TCB_SIZE, align ?: 1);
-  tlsblock = _dl_early_allocate (tcb_offset + memsz + max_align
+
+  /* Record the size of the combined TLS blocks.
+
+     First reserve space for TLS_INIT_TCB_SIZE and 'memsz' while respecting
+     both its alignment requirements and those of the extra TLS blocks.  Then
+     add the size of the extra TLS block.  Both values respect the extra TLS
+     alignment requirements and so does the resulting size and the offset that
+     will be derived from it.  */
+  tls_blocks_size = roundup (TLS_INIT_TCB_SIZE + memsz,
+		  MAX (align, extra_tls_align) ?: 1) + extra_tls_size;
+
+ /* Record the extra TLS block offset from the thread pointer.
+
+    With TLS_DTV_AT_TP the TLS blocks are allocated after the thread pointer in
+    order. Our block is added last which results in it being the last in the
+    static TLS block, thus record the offset as the size of the static TLS
+    block minus the size of our block.
+
+    On some architectures the TLS blocks are offset from the thread pointer,
+    include this offset in the extra TLS block offset.
+
+    The alignment requirements of the pointer resulting from this offset and
+    the thread pointer are enforced by 'max_align' which is used to align the
+    tcb_offset.  */
+  _dl_extra_tls_set_offset (tls_blocks_size - extra_tls_size - TLS_TP_OFFSET);
+
+  tlsblock = _dl_early_allocate (tls_blocks_size + max_align
 				 + TLS_PRE_TCB_SIZE
 				 + GLRO(dl_tls_static_surplus));
   if (tlsblock == NULL)
@@ -209,11 +279,5 @@ __libc_setup_tls (void)
   /* static_slotinfo.slotinfo[1].gen = 0; -- Already zero.  */
   static_slotinfo.slotinfo[1].map = main_map;
 
-  memsz = roundup (memsz, align ?: 1);
-
-#if TLS_DTV_AT_TP
-  memsz += tcb_offset;
-#endif
-
-  init_static_tls (memsz, MAX (TCB_ALIGNMENT, max_align));
+  init_static_tls (tls_blocks_size, MAX (TCB_ALIGNMENT, max_align));
 }
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index c2d17265fb..45ea0588c3 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -36,6 +36,8 @@
 #define TUNABLE_NAMESPACE rtld
 #include <dl-tunables.h>
 
+#include <dl-extra_tls.h>
+
 /* Surplus static TLS, GLRO(dl_tls_static_surplus), is used for
 
    - IE TLS in libc.so for all dlmopen namespaces except in the initial
@@ -323,6 +325,39 @@ _dl_determine_tlsoffset (void)
       slotinfo[cnt].map->l_tls_offset = off;
     }
 
+  /* Insert the extra TLS block after the last TLS block.  */
+
+  /* Extra TLS block for internal usage to append at the end of the TLS blocks
+     (in allocation order).  The address at which the block is allocated must
+     be aligned to 'extra_tls_align'.  The size of the block as returned by
+     '_dl_extra_tls_get_size ()' is always a multiple of the aligment.
+
+     On Linux systems this is where the rseq area will be allocated.  On other
+     systems it is currently unused and both values will be '0'.  */
+  size_t extra_tls_size = _dl_extra_tls_get_size ();
+  size_t extra_tls_align = _dl_extra_tls_get_align ();
+
+  /* Increase the maximum alignment with the extra TLS alignment requirements
+     if necessary.  */
+  max_align = MAX (max_align, extra_tls_align);
+
+  /* Add the extra TLS block to the global offset.  To ensure proper alignment,
+     first align the current global offset to the extra TLS block requirements
+     and then add the extra TLS block size.  Both values respect the extra TLS
+     alignment requirements and so does the resulting offset.  */
+  offset = roundup (offset, extra_tls_align ?: 1) + extra_tls_size;
+
+ /* Record the extra TLS offset.
+
+    With TLS_TCB_AT_TP the TLS blocks are allocated before the thread pointer
+    in reverse order.  Our block is added last which results in it being the
+    first in the static TLS block, thus record the most negative offset.
+
+    The alignment requirements of the pointer resulting from this offset and
+    the thread pointer are enforced by 'max_align' which is used to align the
+    tcb_offset.  */
+  _dl_extra_tls_set_offset (-offset);
+
   GL(dl_tls_static_used) = offset;
   GLRO (dl_tls_static_size) = (roundup (offset + GLRO(dl_tls_static_surplus),
 					max_align)
@@ -368,6 +403,43 @@ _dl_determine_tlsoffset (void)
       offset = off + slotinfo[cnt].map->l_tls_blocksize - firstbyte;
     }
 
+  /* Insert the extra TLS block after the last TLS block.  */
+
+  /* Extra TLS block for internal usage to append at the end of the TLS blocks
+     (in allocation order).  The address at which the block is allocated must
+     be aligned to 'extra_tls_align'.  The size of the block as returned by
+     '_dl_extra_tls_get_size ()' is always a multiple of the aligment.
+
+     On Linux systems this is where the rseq area will be allocated.  On other
+     systems it is currently unused and both values will be '0'.  */
+  size_t extra_tls_size = _dl_extra_tls_get_size ();
+  size_t extra_tls_align = _dl_extra_tls_get_align ();
+
+  /* Increase the maximum alignment with the extra TLS alignment requirements
+     if necessary.  */
+  max_align = MAX (max_align, extra_tls_align);
+
+  /* Align the global offset to the beginning of the extra TLS block.  */
+  offset = roundup (offset, extra_tls_align ?: 1);
+
+ /* Record the extra TLS offset.
+
+    With TLS_DTV_AT_TP the TLS blocks are allocated after the thread pointer in
+    order.  Our block is added last which results in it being the last in the
+    static TLS block, thus record the offset as the size of the static TLS
+    block minus the size of our block.
+
+    On some architectures the TLS blocks are offset from the thread pointer,
+    include this offset in the extra TLS block offset.
+
+    The alignment requirements of the pointer resulting from this offset and
+    the thread pointer are enforced by 'max_align' which is used to align the
+    tcb_offset.  */
+  _dl_extra_tls_set_offset (offset - TLS_TP_OFFSET);
+
+  /* Add the extra TLS block to the global offset.  */
+  offset += extra_tls_size;
+
   GL(dl_tls_static_used) = offset;
   GLRO (dl_tls_static_size) = roundup (offset + GLRO(dl_tls_static_surplus),
 				       TCB_ALIGNMENT);
diff --git a/sysdeps/generic/dl-extra_tls.h b/sysdeps/generic/dl-extra_tls.h
new file mode 100644
index 0000000000..7f5b10d3e8
--- /dev/null
+++ b/sysdeps/generic/dl-extra_tls.h
@@ -0,0 +1,46 @@
+/* extra tls utils for the dynamic linker.  Generic stub version.
+   Copyright (C) 2025 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _DL_EXTRA_TLS_H
+#define _DL_EXTRA_TLS_H 1
+#include <stddef.h>
+
+/* In this generic version, the extra TLS block is unused.  */
+
+/* Returns the size of the extra TLS block, it must always be a multiple of the
+   alignment.  */
+static inline size_t
+_dl_extra_tls_get_size (void)
+{
+	return 0;
+}
+
+/* Returns the alignment requirements of the extra TLS block.  */
+static inline size_t
+_dl_extra_tls_get_align (void)
+{
+	return 0;
+}
+
+/* Record the offset of the extra TLS block from the thread pointer.  */
+static inline void
+_dl_extra_tls_set_offset (ptrdiff_t tls_offset __attribute__ ((unused)))
+{
+}
+
+#endif
-- 
2.39.5


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v16 3/8] Add Linux 'extra TLS'
  2025-01-09 16:32 [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
  2025-01-09 16:32 ` [PATCH v16 1/8] nptl: Add rseq auxvals Michael Jeanson
  2025-01-09 16:32 ` [PATCH v16 2/8] Add generic 'extra TLS' Michael Jeanson
@ 2025-01-09 16:32 ` Michael Jeanson
  2025-01-09 16:32 ` [PATCH v16 4/8] nptl: add rtld_hidden_proto to __rseq_size and __rseq_offset Michael Jeanson
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 39+ messages in thread
From: Michael Jeanson @ 2025-01-09 16:32 UTC (permalink / raw)
  To: libc-alpha
  Cc: Michael Jeanson, Florian Weimer, Carlos O'Donell, DJ Delorie,
	Mathieu Desnoyers

Add the Linux implementation of 'extra TLS' which will allocate space
for the rseq area at the end of the TLS blocks in allocation order.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
---
Changes since v14:
- Update copyright year to 2025
---
 sysdeps/unix/sysv/linux/dl-extra_tls.h | 71 ++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/dl-extra_tls.h

diff --git a/sysdeps/unix/sysv/linux/dl-extra_tls.h b/sysdeps/unix/sysv/linux/dl-extra_tls.h
new file mode 100644
index 0000000000..d48a5e935d
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/dl-extra_tls.h
@@ -0,0 +1,71 @@
+/* extra tls block utils for the dynamic linker.  Linux version.
+   Copyright (C) 2025 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _DL_EXTRA_TLS_H
+#define _DL_EXTRA_TLS_H 1
+
+#include <stddef.h>
+#include <sys/rseq.h>
+#include <rseq-internal.h>
+#include <elf/dl-tunables.h>
+
+/* Returns the size of the extra TLS block, it must always be a multiple of the
+   alignment.  */
+static inline size_t
+_dl_extra_tls_get_size (void)
+{
+  bool do_rseq = TUNABLE_GET_FULL (glibc, pthread, rseq, int, NULL);
+  if (do_rseq)
+    {
+      /* Make sure the rseq area size is at least the minimum ABI size and a
+         multiple of the requested aligment.  */
+      return roundup (MAX (_rseq_size, RSEQ_AREA_SIZE_INITIAL), _rseq_align);
+    }
+
+  /* Even when disabled by tunable, an rseq area will be allocated to allow
+     application code to test the registration status with 'rseq->cpu_id >= 0'.
+     Default to the rseq ABI minimum size, this will ensure we don't use more
+     TLS than necessary.  */
+  return RSEQ_AREA_SIZE_INITIAL;
+}
+
+/* Returns the alignment requirements of the extra TLS block.  */
+static inline size_t
+_dl_extra_tls_get_align (void)
+{
+  bool do_rseq = TUNABLE_GET_FULL (glibc, pthread, rseq, int, NULL);
+  if (do_rseq)
+    {
+      return _rseq_align;
+    }
+
+  /* Even when disabled by tunable, an rseq area will be allocated to allow
+     application code to test the registration status with 'rseq->cpu_id >= 0'.
+     Default to the rseq ABI minimum alignment, this will ensure we don't use
+     more TLS than necessary.  */
+  return RSEQ_MIN_ALIGN;
+}
+
+/* Record the offset of the extra TLS block from the thread pointer.  */
+static inline void
+_dl_extra_tls_set_offset (ptrdiff_t tls_offset)
+{
+    _rseq_offset = tls_offset;
+}
+
+#endif
-- 
2.39.5


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v16 4/8] nptl: add rtld_hidden_proto to __rseq_size and __rseq_offset
  2025-01-09 16:32 [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
                   ` (2 preceding siblings ...)
  2025-01-09 16:32 ` [PATCH v16 3/8] Add Linux " Michael Jeanson
@ 2025-01-09 16:32 ` Michael Jeanson
  2025-01-10 19:05   ` Florian Weimer
  2025-01-09 16:32 ` [PATCH v16 5/8] nptl: Introduce <rseq-access.h> for RSEQ_* accessors Michael Jeanson
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Michael Jeanson @ 2025-01-09 16:32 UTC (permalink / raw)
  To: libc-alpha
  Cc: Michael Jeanson, Florian Weimer, Carlos O'Donell, DJ Delorie,
	Mathieu Desnoyers

This allows accessing the internal aliases of __rseq_size and
__rseq_offset from ld.so without ifdefs and avoids dynamic symbol
binding at run time for both variables.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 sysdeps/unix/sysv/linux/dl-rseq-symbols.S | 27 ++++++++++++++++-------
 sysdeps/unix/sysv/linux/rseq-internal.h   | 16 ++++++++------
 2 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/dl-rseq-symbols.S b/sysdeps/unix/sysv/linux/dl-rseq-symbols.S
index 353bf657b9..2502a76e50 100644
--- a/sysdeps/unix/sysv/linux/dl-rseq-symbols.S
+++ b/sysdeps/unix/sysv/linux/dl-rseq-symbols.S
@@ -27,14 +27,18 @@
 /* Some targets define a macro to denote the zero register.  */
 #undef zero
 
-/* Define 2 symbols: '__rseq_size' is public const and '_rseq_size' (an
-   alias of '__rseq_size') is hidden and writable for internal use by the
-   dynamic linker which will initialize the value both symbols point to
-   before copy relocations take place. */
+/* Define 3 symbols: '__rseq_size' is public const and then '_rseq_size' and
+   '__GI___rseq_size' (both aliases of '__rseq_size') are hidden, '_rseq_size'
+   is writable for internal use by the dynamic linker which will initialize
+   the value the symbols point to before copy relocations take place.  */
 
 	.globl	__rseq_size
 	.type	__rseq_size, %object
 	.size	__rseq_size, 4
+	.hidden __GI___rseq_size
+	.globl	__GI___rseq_size
+	.type	__GI___rseq_size, %object
+	.size	__GI___rseq_size, 4
 	.hidden _rseq_size
 	.globl	_rseq_size
 	.type	_rseq_size, %object
@@ -42,17 +46,23 @@
 	.section .data.rel.ro
 	.balign 4
 __rseq_size:
+__GI___rseq_size:
 _rseq_size:
 	.zero	4
 
-/* Define 2 symbols: '__rseq_offset' is public const and '_rseq_offset' (an
-   alias of '__rseq_offset') is hidden and writable for internal use by the
-   dynamic linker which will initialize the value both symbols point to
-   before copy relocations take place. */
+/* Define 3 symbols: '__rseq_offset' is public const and then '_rseq_offset'
+   and '__GI___rseq_offset' (both aliases of '__rseq_offset') are hidden,
+   '_rseq_offset' is writable for internal use by the dynamic linker which will
+   initialize the value the symbols point to before copy relocations take
+   place.  */
 
 	.globl	__rseq_offset
 	.type	__rseq_offset, %object
 	.size	__rseq_offset, RSEQ_OFFSET_SIZE
+	.hidden __GI___rseq_offset
+	.globl	__GI___rseq_offset
+	.type	__GI___rseq_offset, %object
+	.size	__GI___rseq_offset, RSEQ_OFFSET_SIZE
 	.hidden _rseq_offset
 	.globl	_rseq_offset
 	.type	_rseq_offset, %object
@@ -60,5 +70,6 @@ _rseq_size:
 	.section .data.rel.ro
 	.balign RSEQ_OFFSET_SIZE
 __rseq_offset:
+__GI___rseq_offset:
 _rseq_offset:
 	.zero	RSEQ_OFFSET_SIZE
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
index 6f565581c1..3993431707 100644
--- a/sysdeps/unix/sysv/linux/rseq-internal.h
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -24,6 +24,7 @@
 #include <stdbool.h>
 #include <stdio.h>
 #include <sys/rseq.h>
+#include <ldsodefs.h>
 
 /* Minimum size of the rseq area allocation required by the syscall.  The
    actually used rseq feature size may be less (20 bytes initially).  */
@@ -52,19 +53,20 @@ extern unsigned int _rseq_size attribute_hidden;
    In .data.relro but not yet write-protected.  */
 extern ptrdiff_t _rseq_offset attribute_hidden;
 
+/* We want to use rtld_hidden_proto in order to call the internal aliases
+   of __rseq_size and __rseq_offset from ld.so.  This avoids dynamic symbol
+   binding at run time for both variables.  */
+rtld_hidden_proto (__rseq_size)
+rtld_hidden_proto (__rseq_offset)
+
 #ifdef RSEQ_SIG
 static inline bool
 rseq_register_current_thread (struct pthread *self, bool do_rseq)
 {
   if (do_rseq)
     {
-      unsigned int size;
-#if IS_IN (rtld)
-      /* Use the hidden symbol in ld.so.  */
-      size = _rseq_size;
-#else
-      size = __rseq_size;
-#endif
+      unsigned int size =  __rseq_size;
+
       if (size < RSEQ_AREA_SIZE_INITIAL)
         /* The initial implementation used only 20 bytes out of 32,
            but still expected size 32.  */
-- 
2.39.5


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v16 5/8] nptl: Introduce <rseq-access.h> for RSEQ_* accessors
  2025-01-09 16:32 [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
                   ` (3 preceding siblings ...)
  2025-01-09 16:32 ` [PATCH v16 4/8] nptl: add rtld_hidden_proto to __rseq_size and __rseq_offset Michael Jeanson
@ 2025-01-09 16:32 ` Michael Jeanson
  2025-01-10 19:08   ` Florian Weimer
  2025-01-13 10:49   ` Frank Scheiner
  2025-01-09 16:32 ` [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block Michael Jeanson
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 39+ messages in thread
From: Michael Jeanson @ 2025-01-09 16:32 UTC (permalink / raw)
  To: libc-alpha
  Cc: Michael Jeanson, Florian Weimer, Carlos O'Donell, DJ Delorie,
	Mathieu Desnoyers

In preparation to move the rseq area to the 'extra TLS' block, we need
accessors based on the thread pointer and the rseq offset. The ONCE
variant of the accessors ensures single-copy atomicity for loads and
stores which is required for all fields once the registration is active.

A separate header is required to allow including <atomic.h> which
results in an include loop when added to <tcb-access.h>.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
Changes since v15:
- Rename VOLATILE macros to ONCE
Changes since v14:
- Update copyright year to 2025
Changes since v13:
- Ensure that the VOLATILE variant static assert on 64bit types
  on 32bit architectures
- Split to separate header to allow including 'atomic.h' without
  an include loop
- Move rtld_hidden_proto rseq symbols to a separate patch
Changes since v12:
- Split RSEQ_SET/GETMEM from THREAD_SET/GETMEM
- Rename rseq_get_area() to RSEQ_SELF()
- Add rtld_hidden_proto to __rseq_size and __rseq_offset
---
 sysdeps/i386/nptl/rseq-access.h         | 98 +++++++++++++++++++++++++
 sysdeps/nptl/rseq-access.h              | 56 ++++++++++++++
 sysdeps/unix/sysv/linux/rseq-internal.h |  8 ++
 sysdeps/x86_64/nptl/rseq-access.h       | 77 +++++++++++++++++++
 4 files changed, 239 insertions(+)
 create mode 100644 sysdeps/i386/nptl/rseq-access.h
 create mode 100644 sysdeps/nptl/rseq-access.h
 create mode 100644 sysdeps/x86_64/nptl/rseq-access.h

diff --git a/sysdeps/i386/nptl/rseq-access.h b/sysdeps/i386/nptl/rseq-access.h
new file mode 100644
index 0000000000..5e7e09d494
--- /dev/null
+++ b/sysdeps/i386/nptl/rseq-access.h
@@ -0,0 +1,98 @@
+/* RSEQ_* accessors.  i386 version.
+   Copyright (C) 2002-2025 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#define __RSEQ_GETMEM(member) \
+  ({ __typeof (RSEQ_SELF()->member) __value;				      \
+     if (sizeof (__value) == 1)						      \
+       asm volatile ("movb %%gs:%P2(%3),%b0"				      \
+		     : "=q" (__value)					      \
+		     : "0" (0), "i" (offsetof (struct rseq_area, member)),   \
+		     "r" (__rseq_offset));				      \
+     else if (sizeof (__value) == 4)					      \
+       asm volatile ("movl %%gs:%P1(%2),%0"				      \
+		     : "=r" (__value)					      \
+		     : "i" (offsetof (struct rseq_area, member)),	      \
+		       "r" (__rseq_offset));				      \
+     else /* 8 */							      \
+       {								      \
+	 asm volatile  ("movl %%gs:%P1(%2),%%eax\n\t"			      \
+			"movl %%gs:4+%P1(%2),%%edx"			      \
+			: "=&A" (__value)				      \
+			: "i" (offsetof (struct rseq_area, member)),	      \
+			  "r" (__rseq_offset));				      \
+       }								      \
+     __value; })
+
+/* Read member of the RSEQ area directly.  */
+#define RSEQ_GETMEM(member) \
+  ({									      \
+     _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
+		     || sizeof (RSEQ_SELF()->member) == 4		      \
+		     || sizeof (RSEQ_SELF()->member) == 8,		      \
+		     "size of rseq data");				      \
+     __RSEQ_GETMEM(member); })
+
+/* Read member of the RSEQ area directly, with single-copy atomicity semantics.
+   Static assert for types >= 64 bits since they can't be loaded atomically on
+   x86-32.  */
+#define RSEQ_GETMEM_ONCE(member) \
+  ({									      \
+     _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
+		     || sizeof (RSEQ_SELF()->member) == 4,		      \
+		     "size of rseq data");				      \
+     __RSEQ_GETMEM(member); })
+
+#define __RSEQ_SETMEM(member, value) \
+  ({									      \
+     if (sizeof (RSEQ_SELF()->member) == 1)				      \
+       asm volatile ("movb %b0,%%gs:%P1(%2)" :				      \
+		     : "iq" (value),					      \
+		       "i" (offsetof (struct rseq_area, member)),	      \
+		       "r" (__rseq_offset));				      \
+     else if (sizeof (RSEQ_SELF()->member) == 4)			      \
+       asm volatile ("movl %0,%%gs:%P1(%2)" :				      \
+		     : "ir" (value),					      \
+		       "i" (offsetof (struct rseq_area, member)),	      \
+		       "r" (__rseq_offset));				      \
+     else /* 8 */							      \
+       {								      \
+	 asm volatile ("movl %%eax,%%gs:%P1(%2)\n\t"			      \
+		       "movl %%edx,%%gs:4+%P1(%2)" :			      \
+		       : "A" ((uint64_t) cast_to_integer (value)),	      \
+			 "i" (offsetof (struct rseq_area, member)),	      \
+			 "r" (__rseq_offset));				      \
+       }})
+
+/* Set member of the RSEQ area directly.  */
+#define RSEQ_SETMEM(member, value) \
+  ({									      \
+     _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
+		     || sizeof (RSEQ_SELF()->member) == 4		      \
+		     || sizeof (RSEQ_SELF()->member) == 8,		      \
+		     "size of rseq data");				      \
+     __RSEQ_SETMEM(member, value); })
+
+/* Set member of the RSEQ area directly, with single-copy atomicity semantics.
+   Static assert for types >= 64 bits since they can't be stored atomically on
+   x86-32.  */
+#define RSEQ_SETMEM_ONCE(member, value) \
+  ({									      \
+     _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
+		     || sizeof (RSEQ_SELF()->member) == 4,		      \
+		     "size of rseq data");				      \
+     __RSEQ_SETMEM(member, value); })
diff --git a/sysdeps/nptl/rseq-access.h b/sysdeps/nptl/rseq-access.h
new file mode 100644
index 0000000000..450f2dcca3
--- /dev/null
+++ b/sysdeps/nptl/rseq-access.h
@@ -0,0 +1,56 @@
+/* RSEQ_* accessors.  Generic version.
+   Copyright (C) 2002-2025 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <atomic.h>
+
+/* Read member of the RSEQ area directly.  */
+#define RSEQ_GETMEM(member) \
+  RSEQ_SELF()->member
+
+/* Set member of the RSEQ area directly.  */
+#define RSEQ_SETMEM(member, value) \
+  RSEQ_SELF()->member = (value)
+
+/* Static assert for types that can't be loaded/stored atomically on the
+   current architecture.  */
+#if __HAVE_64B_ATOMICS
+#define __RSEQ_ASSERT_ATOMIC(member) \
+   _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
+		   || sizeof (RSEQ_SELF()->member) == 4			      \
+		   || sizeof (RSEQ_SELF()->member) == 8,		      \
+		   "size of rseq data")
+#else
+#define __RSEQ_ASSERT_ATOMIC(member) \
+   _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
+		   || sizeof (RSEQ_SELF()->member) == 4,		      \
+		   "size of rseq data")
+#endif
+
+/* Read member of the RSEQ area directly, with single-copy atomicity semantics.  */
+#define RSEQ_GETMEM_ONCE(member) \
+  ({									      \
+     __RSEQ_ASSERT_ATOMIC(member);					      \
+     (*(volatile __typeof (RSEQ_SELF()->member) *)&RSEQ_SELF()->member);      \
+  })
+
+/* Set member of the RSEQ area directly, with single-copy atomicity semantics.  */
+#define RSEQ_SETMEM_ONCE(member, value) \
+  ({									      \
+     __RSEQ_ASSERT_ATOMIC(member);					      \
+     (*(volatile __typeof (RSEQ_SELF()->member) *)&RSEQ_SELF()->member = (value)); \
+  })
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
index 3993431707..00be15cfc8 100644
--- a/sysdeps/unix/sysv/linux/rseq-internal.h
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -25,6 +25,7 @@
 #include <stdio.h>
 #include <sys/rseq.h>
 #include <ldsodefs.h>
+#include <thread_pointer.h>
 
 /* Minimum size of the rseq area allocation required by the syscall.  The
    actually used rseq feature size may be less (20 bytes initially).  */
@@ -59,6 +60,13 @@ extern ptrdiff_t _rseq_offset attribute_hidden;
 rtld_hidden_proto (__rseq_size)
 rtld_hidden_proto (__rseq_offset)
 
+/* Returns a pointer to the current thread rseq area.  */
+static inline struct rseq_area *
+RSEQ_SELF (void)
+{
+  return (struct rseq_area *) ((char *) __thread_pointer () + __rseq_offset);
+}
+
 #ifdef RSEQ_SIG
 static inline bool
 rseq_register_current_thread (struct pthread *self, bool do_rseq)
diff --git a/sysdeps/x86_64/nptl/rseq-access.h b/sysdeps/x86_64/nptl/rseq-access.h
new file mode 100644
index 0000000000..535e36281f
--- /dev/null
+++ b/sysdeps/x86_64/nptl/rseq-access.h
@@ -0,0 +1,77 @@
+/* RSEQ_* accessors.  x86_64 version.
+   Copyright (C) 2002-2025 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Read member of the RSEQ area directly, with single-copy atomicity semantics.  */
+#define RSEQ_GETMEM_ONCE(member) \
+  ({ __typeof (RSEQ_SELF()->member) __value;				      \
+     _Static_assert (sizeof (__value) == 1				      \
+		     || sizeof (__value) == 4				      \
+		     || sizeof (__value) == 8,				      \
+		     "size of rseq data");			      \
+     if (sizeof (__value) == 1)						      \
+       asm volatile ("movb %%fs:%P2(%q3),%b0"				      \
+		     : "=q" (__value)					      \
+		     : "0" (0), "i" (offsetof (struct rseq_area, member)),    \
+		       "r" (__rseq_offset));				      \
+     else if (sizeof (__value) == 4)					      \
+       asm volatile ("movl %%fs:%P1(%q2),%0"				      \
+		     : "=r" (__value)					      \
+		     : "i" (offsetof (struct rseq_area, member)),	      \
+		       "r" (__rseq_offset));				      \
+     else /* 8 */							      \
+       {								      \
+	 asm volatile ("movq %%fs:%P1(%q2),%q0"				      \
+		       : "=r" (__value)					      \
+		       : "i" (offsetof (struct rseq_area, member)),	      \
+			 "r" (__rseq_offset));				      \
+       }								      \
+     __value; })
+
+/* Read member of the RSEQ area directly.  */
+#define RSEQ_GETMEM(member) RSEQ_GETMEM_ONCE(member)
+
+/* Set member of the RSEQ area directly, with single-copy atomicity semantics.  */
+#define RSEQ_SETMEM_ONCE(member, value) \
+  ({									      \
+     _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
+		     || sizeof (RSEQ_SELF()->member) == 4		      \
+		     || sizeof (RSEQ_SELF()->member) == 8,		      \
+		     "size of rseq data");			      \
+     if (sizeof (RSEQ_SELF()->member) == 1)				      \
+       asm volatile ("movb %b0,%%fs:%P1(%q2)" :				      \
+		     : "iq" (value),					      \
+		       "i" (offsetof (struct rseq_area, member)),	      \
+		       "r" (__rseq_offset));				      \
+     else if (sizeof (RSEQ_SELF()->member) == 4)			      \
+       asm volatile ("movl %0,%%fs:%P1(%q2)" :				      \
+		     : IMM_MODE (value),				      \
+		       "i" (offsetof (struct rseq_area, member)),	      \
+		       "r" (__rseq_offset));				      \
+     else /* 8 */							      \
+       {								      \
+	 /* Since movq takes a signed 32-bit immediate or a register source   \
+	    operand, use "er" constraint for 32-bit signed integer constant   \
+	    or register.  */						      \
+	 asm volatile ("movq %q0,%%fs:%P1(%q2)" :			      \
+		       : "er" ((uint64_t) cast_to_integer (value)),	      \
+			 "i" (offsetof (struct rseq_area, member)),	      \
+			 "r" (__rseq_offset));				      \
+       }})
+
+/* Set member of the RSEQ area directly.  */
+#define RSEQ_SETMEM(member, value) RSEQ_SETMEM_ONCE(member, value)
-- 
2.39.5


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block
  2025-01-09 16:32 [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
                   ` (4 preceding siblings ...)
  2025-01-09 16:32 ` [PATCH v16 5/8] nptl: Introduce <rseq-access.h> for RSEQ_* accessors Michael Jeanson
@ 2025-01-09 16:32 ` Michael Jeanson
  2025-01-10 19:39   ` Florian Weimer
  2025-01-11  2:53   ` H.J. Lu
  2025-01-09 16:32 ` [PATCH v16 7/8] nptl: Remove the rseq area from 'struct pthread' Michael Jeanson
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 39+ messages in thread
From: Michael Jeanson @ 2025-01-09 16:32 UTC (permalink / raw)
  To: libc-alpha
  Cc: Michael Jeanson, Florian Weimer, Carlos O'Donell, DJ Delorie,
	Mathieu Desnoyers

Move the rseq area to the newly added 'extra TLS' block, this is the
last step in adding support for the rseq extended ABI. The size of the
rseq area is now dynamic and depends on the rseq features reported by
the kernel through the elf auxiliary vector. This will allow
applications to use rseq features past the 32 bytes of the original rseq
ABI as they become available in future kernels.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
Changes since v15:
- Clarify comments in tests regarding the rseq offset and tls offset
- Check the rseq offset against a TLS variable in the tests
- Renamed RSEQ_GETMEM_VOLATILE to RSEQ_GETMEM_ONCE
Changes since v14:
- Add TLS_TP_OFFSET to the rseq offset tests
- Style nits
Changes since v12:
- Style nits, missing spaces before '(' in function macro/calls
- Add comment and variable array member to 'struct rseq_area'
- Comment and style nits in tst-rseq-disable.c
Changes since v11:
- __rseq_size is now set directly in _dl_parse_auxv, set it to 0 when
  the main thread registration fails or is disabled by tunable
---
 nptl/pthread_create.c                         |  2 +-
 sysdeps/nptl/dl-tls_init_tp.c                 |  9 --
 sysdeps/unix/sysv/linux/Makefile              | 10 ++
 sysdeps/unix/sysv/linux/dl-parse_auxv.h       | 12 +--
 sysdeps/unix/sysv/linux/rseq-internal.h       | 49 +++++++---
 sysdeps/unix/sysv/linux/sched_getcpu.c        |  3 +-
 .../unix/sysv/linux/tst-rseq-disable-static.c |  1 +
 sysdeps/unix/sysv/linux/tst-rseq-disable.c    | 77 +++++++++++++--
 .../unix/sysv/linux/tst-rseq-nptl-static.c    |  1 +
 sysdeps/unix/sysv/linux/tst-rseq-static.c     |  1 +
 sysdeps/unix/sysv/linux/tst-rseq.c            | 97 ++++++++++++++++---
 sysdeps/unix/sysv/linux/tst-rseq.h            |  2 +-
 12 files changed, 206 insertions(+), 58 deletions(-)
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-disable-static.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-static.c

diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 01e8a86980..9ae5423df0 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -696,7 +696,7 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 
   /* Inherit rseq registration state.  Without seccomp filters, rseq
      registration will either always fail or always succeed.  */
-  if ((int) THREAD_GETMEM_VOLATILE (self, rseq_area.cpu_id) >= 0)
+  if ((int) RSEQ_GETMEM_ONCE (cpu_id) >= 0)
     pd->flags |= ATTR_FLAG_DO_RSEQ;
 
   /* Initialize the field for the ID of the thread which is waiting
diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
index 46478324fc..f487bfb66e 100644
--- a/sysdeps/nptl/dl-tls_init_tp.c
+++ b/sysdeps/nptl/dl-tls_init_tp.c
@@ -109,15 +109,6 @@ __tls_init_tp (void)
     bool do_rseq = TUNABLE_GET (rseq, int, NULL);
     if (!rseq_register_current_thread (pd, do_rseq))
       _rseq_size = 0;
-
-#ifdef RSEQ_SIG
-    /* This should be a compile-time constant, but the current
-       infrastructure makes it difficult to determine its value.  Not
-       all targets support __thread_pointer, so set __rseq_offset only
-       if the rseq registration may have happened because RSEQ_SIG is
-       defined.  */
-    _rseq_offset = (char *) &pd->rseq_area - (char *) __thread_pointer ();
-#endif
   }
 
   /* Set initial thread's stack block from 0 up to __libc_stack_end.
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 4e7044f12f..c2acf18689 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -267,6 +267,11 @@ tests-internal += \
   tst-rseq-disable \
   # tests-internal
 
+tests-static += \
+  tst-rseq-disable-static \
+  tst-rseq-static \
+  # tests-static
+
 tests-time64 += \
   tst-adjtimex-time64 \
   tst-clock_adjtime-time64 \
@@ -410,6 +415,7 @@ $(objpfx)tst-sched-consts.out: ../sysdeps/unix/sysv/linux/tst-sched-consts.py
 $(objpfx)tst-sched-consts.out: $(sysdeps-linux-python-deps)
 
 tst-rseq-disable-ENV = GLIBC_TUNABLES=glibc.pthread.rseq=0
+tst-rseq-disable-static-ENV = GLIBC_TUNABLES=glibc.pthread.rseq=0
 
 endif # $(subdir) == misc
 
@@ -684,4 +690,8 @@ tests += \
 tests-internal += \
   tst-rseq-nptl \
   # tests-internal
+
+tests-static += \
+  tst-rseq-nptl-static \
+  # tests-static
 endif
diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
index 2d4243734c..41250c96d0 100644
--- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h
+++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
@@ -61,15 +61,9 @@ void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values)
 #endif
 
   /* Get the rseq feature size, with a minimum of RSEQ_AREA_SIZE_INITIAL_USED
-     (20) for kernels that don't have AT_RSEQ_FEATURE_SIZE.  Limit the feature
-     size to RSEQ_AREA_SIZE_MAX_USED (28) which fits the rseq area in 'struct
-     pthread' and represents the maximum feature size of currently released
-     kernels.  Since no kernels currently cross the 32 bytes of the original
-     ABI, the semantics of a feature size of 32 or more are still undetermined.
-     */
-  _rseq_size = MIN (MAX (auxv_values[AT_RSEQ_FEATURE_SIZE],
-                         RSEQ_AREA_SIZE_INITIAL_USED),
-		    RSEQ_AREA_SIZE_MAX_USED);
+     (20) for kernels that don't have AT_RSEQ_FEATURE_SIZE.  */
+  _rseq_size = MAX (auxv_values[AT_RSEQ_FEATURE_SIZE],
+                    RSEQ_AREA_SIZE_INITIAL_USED);
   _rseq_align = MAX (auxv_values[AT_RSEQ_ALIGN], RSEQ_MIN_ALIGN);
 
   DL_PLATFORM_AUXV
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
index 00be15cfc8..f89e784243 100644
--- a/sysdeps/unix/sysv/linux/rseq-internal.h
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -26,6 +26,30 @@
 #include <sys/rseq.h>
 #include <ldsodefs.h>
 #include <thread_pointer.h>
+#include <rseq-access.h>
+
+/* rseq area registered with the kernel.  Use a custom definition here to
+   isolate from the system provided header which could lack some fields of the
+   Extended ABI.
+
+   This is only used to get the field offsets and sizes, it should never be
+   used for direct object allocations.
+
+   Access to fields of the Extended ABI beyond the 20 bytes of the original ABI
+   (after 'flags') must be gated by a check of the feature size.  */
+struct rseq_area
+{
+  /* Original ABI.  */
+  uint32_t cpu_id_start;
+  uint32_t cpu_id;
+  uint64_t rseq_cs;
+  uint32_t flags;
+  /* Extended ABI.  */
+  uint32_t node_id;
+  uint32_t mm_cid;
+  /* Flexible array member to discourage direct object allocations.  */
+  char end[];
+};
 
 /* Minimum size of the rseq area allocation required by the syscall.  The
    actually used rseq feature size may be less (20 bytes initially).  */
@@ -47,10 +71,12 @@ extern size_t _rseq_align attribute_hidden;
 
 /* Size of the active features in the rseq area.
    Populated from the auxiliary vector with a minimum of '20'.
+   Set to '0' on registration failure of the main thread.
    In .data.relro but not yet write-protected.  */
 extern unsigned int _rseq_size attribute_hidden;
 
-/* Offset from the thread pointer to the rseq area.
+/* Offset from the thread pointer to the rseq area, always set to allow
+   checking the registration status by reading the 'cpu_id' field.
    In .data.relro but not yet write-protected.  */
 extern ptrdiff_t _rseq_offset attribute_hidden;
 
@@ -75,34 +101,35 @@ rseq_register_current_thread (struct pthread *self, bool do_rseq)
     {
       unsigned int size =  __rseq_size;
 
+      /* The feature size can be smaller than the minimum rseq area size of 32
+         bytes accepted by the syscall, if this is the case, bump the size of
+         the registration to the minimum.  The 'extra TLS' block is always at
+         least 32 bytes. */
       if (size < RSEQ_AREA_SIZE_INITIAL)
-        /* The initial implementation used only 20 bytes out of 32,
-           but still expected size 32.  */
         size = RSEQ_AREA_SIZE_INITIAL;
 
       /* Initialize the rseq fields that are read by the kernel on
          registration, there is no guarantee that struct pthread is
          cleared on all architectures.  */
-      THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_UNINITIALIZED);
-      THREAD_SETMEM (self, rseq_area.cpu_id_start, 0);
-      THREAD_SETMEM (self, rseq_area.rseq_cs, 0);
-      THREAD_SETMEM (self, rseq_area.flags, 0);
+      RSEQ_SETMEM (cpu_id, RSEQ_CPU_ID_UNINITIALIZED);
+      RSEQ_SETMEM (cpu_id_start, 0);
+      RSEQ_SETMEM (rseq_cs, 0);
+      RSEQ_SETMEM (flags, 0);
 
-      int ret = INTERNAL_SYSCALL_CALL (rseq, &self->rseq_area,
-                                       size, 0, RSEQ_SIG);
+      int ret = INTERNAL_SYSCALL_CALL (rseq, RSEQ_SELF (), size, 0, RSEQ_SIG);
       if (!INTERNAL_SYSCALL_ERROR_P (ret))
         return true;
     }
   /* When rseq is disabled by tunables or the registration fails, inform
      userspace by setting 'cpu_id' to RSEQ_CPU_ID_REGISTRATION_FAILED.  */
-  THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
+  RSEQ_SETMEM (cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
   return false;
 }
 #else /* RSEQ_SIG */
 static inline bool
 rseq_register_current_thread (struct pthread *self, bool do_rseq)
 {
-  THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
+  RSEQ_SETMEM (cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
   return false;
 }
 #endif /* RSEQ_SIG */
diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c
index 9a2e2a5663..828b651320 100644
--- a/sysdeps/unix/sysv/linux/sched_getcpu.c
+++ b/sysdeps/unix/sysv/linux/sched_getcpu.c
@@ -19,6 +19,7 @@
 #include <sched.h>
 #include <sysdep.h>
 #include <sysdep-vdso.h>
+#include <rseq-internal.h>
 
 static int
 vsyscall_sched_getcpu (void)
@@ -36,6 +37,6 @@ vsyscall_sched_getcpu (void)
 int
 sched_getcpu (void)
 {
-  int cpu_id = THREAD_GETMEM_VOLATILE (THREAD_SELF, rseq_area.cpu_id);
+  int cpu_id = RSEQ_GETMEM_ONCE (cpu_id);
   return __glibc_likely (cpu_id >= 0) ? cpu_id : vsyscall_sched_getcpu ();
 }
diff --git a/sysdeps/unix/sysv/linux/tst-rseq-disable-static.c b/sysdeps/unix/sysv/linux/tst-rseq-disable-static.c
new file mode 100644
index 0000000000..2687d13d3d
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-rseq-disable-static.c
@@ -0,0 +1 @@
+#include "tst-rseq-disable.c"
diff --git a/sysdeps/unix/sysv/linux/tst-rseq-disable.c b/sysdeps/unix/sysv/linux/tst-rseq-disable.c
index 0e191ffeb9..8e9c11f653 100644
--- a/sysdeps/unix/sysv/linux/tst-rseq-disable.c
+++ b/sysdeps/unix/sysv/linux/tst-rseq-disable.c
@@ -26,32 +26,82 @@
 #include <unistd.h>
 
 #ifdef RSEQ_SIG
+# include <sys/auxv.h>
+# include <dl-tls.h>
+# include "tst-rseq.h"
+
+/* Used to test private registration with the rseq system call because glibc
+   rseq is disabled.  */
+static __thread struct rseq local_rseq = {
+  .cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED,
+};
+
+/* Used to check if the address of the rseq area comes before or after the tls
+   blocks depending on the TLS model.  */
+static __thread char tls_var __attribute__ ((tls_model ("initial-exec")));
 
 /* Check that rseq can be registered and has not been taken by glibc.  */
 static void
 check_rseq_disabled (void)
 {
-  struct pthread *pd = THREAD_SELF;
+  struct rseq *rseq_abi = (struct rseq *) ((char *) __thread_pointer () +
+		           __rseq_offset);
+
+#if TLS_TCB_AT_TP
+  /* The rseq area block should come before the thread pointer and be at least
+     32 bytes. */
+  TEST_VERIFY (__rseq_offset <= -RSEQ_AREA_SIZE_INITIAL);
+
+  /* The rseq area block should come before TLS variables.  */
+  TEST_VERIFY ((intptr_t) rseq_abi < (intptr_t) &tls_var);
+#elif TLS_DTV_AT_TP
+  /* The rseq area block should come after the TCB, add the TLS block offset to
+     the rseq offset to get a value relative to the TCB and test that it's
+     non-negative.  */
+  TEST_VERIFY (__rseq_offset + TLS_TP_OFFSET >= 0);
+
+  /* The rseq area block should come after TLS variables.  */
+  TEST_VERIFY ((intptr_t) rseq_abi > (intptr_t) &tls_var);
+#else
+# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
+#endif
 
+  /* __rseq_flags is unused and should always be '0'.  */
   TEST_COMPARE (__rseq_flags, 0);
-  TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset
-               == (char *) &pd->rseq_area);
+
+  /* When rseq is not registered, __rseq_size should always be '0'.  */
   TEST_COMPARE (__rseq_size, 0);
-  TEST_COMPARE ((int) pd->rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
 
-  int ret = syscall (__NR_rseq, &pd->rseq_area, sizeof (pd->rseq_area),
-                     0, RSEQ_SIG);
+  /* When rseq is not registered, the 'cpu_id' field should be set to
+     RSEQ_CPU_ID_REGISTRATION_FAILED.  */
+  TEST_COMPARE ((int) rseq_abi->cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
+
+  /* Test a rseq registration which should succeed since the internal
+     registration is disabled.  */
+  int ret = syscall (__NR_rseq, &local_rseq, RSEQ_AREA_SIZE_INITIAL, 0, RSEQ_SIG);
   if (ret == 0)
     {
-      ret = syscall (__NR_rseq, &pd->rseq_area, sizeof (pd->rseq_area),
+      /* A successful registration should set the cpu id.  */
+      TEST_VERIFY (local_rseq.cpu_id >= 0);
+
+      /* Test we can also unregister rseq.  */
+      ret = syscall (__NR_rseq, &local_rseq, RSEQ_AREA_SIZE_INITIAL,
                      RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
       TEST_COMPARE (ret, 0);
-      pd->rseq_area.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED;
     }
   else
     {
-      TEST_VERIFY (errno != -EINVAL);
-      TEST_VERIFY (errno != -EBUSY);
+      /* Check if we failed with EINVAL which would mean an invalid rseq flags,
+         a mis-aligned rseq area address or an incorrect rseq size.  */
+      TEST_VERIFY (errno != EINVAL);
+
+      /* Check if we failed with EBUSY which means an existing rseq
+         registration. */
+      TEST_VERIFY (errno != EBUSY);
+
+      /* Check if we failed with EFAULT which means an invalid rseq area
+         address.  */
+      TEST_VERIFY (errno != EFAULT);
     }
 }
 
@@ -71,6 +121,13 @@ proc_func (void *ignored)
 static int
 do_test (void)
 {
+  printf ("info: __rseq_size: %u\n", __rseq_size);
+  printf ("info: __rseq_offset: %td\n", __rseq_offset);
+  printf ("info: __rseq_flags: %u\n", __rseq_flags);
+  printf ("info: getauxval (AT_RSEQ_FEATURE_SIZE): %ld\n",
+          getauxval (AT_RSEQ_FEATURE_SIZE));
+  printf ("info: getauxval (AT_RSEQ_ALIGN): %ld\n", getauxval (AT_RSEQ_ALIGN));
+
   puts ("info: checking main thread");
   check_rseq_disabled ();
 
diff --git a/sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c b/sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c
new file mode 100644
index 0000000000..6e2c923bb9
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c
@@ -0,0 +1 @@
+#include "tst-rseq-nptl.c"
diff --git a/sysdeps/unix/sysv/linux/tst-rseq-static.c b/sysdeps/unix/sysv/linux/tst-rseq-static.c
new file mode 100644
index 0000000000..1d97f3bd3d
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-rseq-static.c
@@ -0,0 +1 @@
+#include "tst-rseq.c"
diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c
index b152280eff..00181cfefb 100644
--- a/sysdeps/unix/sysv/linux/tst-rseq.c
+++ b/sysdeps/unix/sysv/linux/tst-rseq.c
@@ -19,6 +19,8 @@
    not linked against libpthread.  */
 
 #include <support/check.h>
+#include <support/namespace.h>
+#include <support/xthread.h>
 #include <stdio.h>
 #include <sys/rseq.h>
 #include <unistd.h>
@@ -32,25 +34,82 @@
 # include <sys/auxv.h>
 # include <thread_pointer.h>
 # include <tls.h>
+# include <dl-tls.h>
+# include <sys/auxv.h>
 # include "tst-rseq.h"
 
+/* Used to check if the address of the rseq area comes before or after the tls
+   blocks depending on the TLS model.  */
+static __thread char tls_var __attribute__ ((tls_model ("initial-exec")));
+
 static void
 do_rseq_main_test (void)
 {
-  struct pthread *pd = THREAD_SELF;
-  size_t rseq_feature_size = MIN (MAX (getauxval (AT_RSEQ_FEATURE_SIZE),
-                                       RSEQ_AREA_SIZE_INITIAL_USED),
-                                  RSEQ_AREA_SIZE_MAX_USED);
+  size_t rseq_align = MAX (getauxval (AT_RSEQ_ALIGN), RSEQ_MIN_ALIGN);
+  size_t rseq_feature_size = MAX (getauxval (AT_RSEQ_FEATURE_SIZE),
+                                  RSEQ_AREA_SIZE_INITIAL_USED);
+  size_t rseq_alloc_size = roundup (MAX (rseq_feature_size,
+                                    RSEQ_AREA_SIZE_INITIAL_USED), rseq_align);
+  struct rseq *rseq_abi = __thread_pointer () + __rseq_offset;
 
   TEST_VERIFY_EXIT (rseq_thread_registered ());
+
+  /* __rseq_flags is unused and should always be '0'.  */
   TEST_COMPARE (__rseq_flags, 0);
-  TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset
-               == (char *) &pd->rseq_area);
+
+  /* When rseq is registered, __rseq_size should report the feature size.  */
   TEST_COMPARE (__rseq_size, rseq_feature_size);
+
+  /* When rseq is registered, the 'cpu_id' field should be set to a valid cpu
+   * number.  */
+  TEST_VERIFY ((int32_t) rseq_abi->cpu_id >= 0);
+
+  /* The rseq area address must be aligned.  */
+  TEST_VERIFY (((unsigned long) rseq_abi % rseq_align) == 0);
+
+#if TLS_TCB_AT_TP
+  /* The rseq area block should come before the thread pointer and be at least
+     32 bytes. */
+  TEST_VERIFY (__rseq_offset <= -RSEQ_AREA_SIZE_INITIAL);
+
+  /* The rseq area block should come before TLS variables.  */
+  TEST_VERIFY ((intptr_t) rseq_abi < (intptr_t) &tls_var);
+#elif TLS_DTV_AT_TP
+  /* The rseq area block should come after the TCB, add the TLS block offset to
+     the rseq offset to get a value relative to the TCB and test that it's
+     non-negative.  */
+  TEST_VERIFY (__rseq_offset + TLS_TP_OFFSET >= 0);
+
+  /* The rseq area block should come after TLS variables.  */
+  TEST_VERIFY ((intptr_t) rseq_abi > (intptr_t) &tls_var);
+#else
+# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
+#endif
+
+  /* Test a rseq registration with the same arguments as the internal
+     registration which should fail with errno == EBUSY.  */
+  TEST_VERIFY (((unsigned long) rseq_abi % rseq_align) == 0);
+  TEST_VERIFY (__rseq_size <= rseq_alloc_size);
+  int ret = syscall (__NR_rseq, rseq_abi, rseq_alloc_size, 0, RSEQ_SIG);
+  TEST_VERIFY (ret != 0);
+  TEST_COMPARE (errno, EBUSY);
+}
+
+static void *
+thread_func (void *ignored)
+{
+  do_rseq_main_test ();
+  return NULL;
 }
 
 static void
-do_rseq_test (void)
+proc_func (void *ignored)
+{
+  do_rseq_main_test ();
+}
+
+static int
+do_test (void)
 {
   if (!rseq_available ())
     {
@@ -62,21 +121,27 @@ do_rseq_test (void)
   printf ("info: getauxval (AT_RSEQ_FEATURE_SIZE): %ld\n",
           getauxval (AT_RSEQ_FEATURE_SIZE));
   printf ("info: getauxval (AT_RSEQ_ALIGN): %ld\n", getauxval (AT_RSEQ_ALIGN));
+
+  puts ("info: checking main thread");
+  do_rseq_main_test ();
+
+  puts ("info: checking main thread (2)");
   do_rseq_main_test ();
+
+  puts ("info: checking new thread");
+  xpthread_join (xpthread_create (NULL, thread_func, NULL));
+
+  puts ("info: checking subprocess");
+  support_isolate_in_subprocess (proc_func, NULL);
+
+  return 0;
 }
 #else /* RSEQ_SIG */
-static void
-do_rseq_test (void)
-{
-  FAIL_UNSUPPORTED ("glibc does not define RSEQ_SIG, skipping test");
-}
-#endif /* RSEQ_SIG */
-
 static int
 do_test (void)
 {
-  do_rseq_test ();
-  return 0;
+  FAIL_UNSUPPORTED ("glibc does not define RSEQ_SIG, skipping test");
 }
+#endif /* RSEQ_SIG */
 
 #include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-rseq.h b/sysdeps/unix/sysv/linux/tst-rseq.h
index 15f512a983..812accc8f9 100644
--- a/sysdeps/unix/sysv/linux/tst-rseq.h
+++ b/sysdeps/unix/sysv/linux/tst-rseq.h
@@ -28,7 +28,7 @@
 static inline bool
 rseq_thread_registered (void)
 {
-  return THREAD_GETMEM_VOLATILE (THREAD_SELF, rseq_area.cpu_id) >= 0;
+  return RSEQ_GETMEM_ONCE (cpu_id) >= 0;
 }
 
 static inline int
-- 
2.39.5


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v16 7/8] nptl: Remove the rseq area from 'struct pthread'
  2025-01-09 16:32 [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
                   ` (5 preceding siblings ...)
  2025-01-09 16:32 ` [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block Michael Jeanson
@ 2025-01-09 16:32 ` Michael Jeanson
  2025-01-09 16:32 ` [PATCH v16 8/8] Linux: Update internal copy of '<sys/rseq.h>' Michael Jeanson
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 39+ messages in thread
From: Michael Jeanson @ 2025-01-09 16:32 UTC (permalink / raw)
  To: libc-alpha
  Cc: Michael Jeanson, Florian Weimer, Carlos O'Donell, DJ Delorie,
	Mathieu Desnoyers

The rseq extensible ABI implementation moved the rseq area to the 'extra
TLS' block, remove the unused 'rseq_area' member of 'struct pthread'.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
---
 nptl/descr.h | 22 +++-------------------
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/nptl/descr.h b/nptl/descr.h
index d0d30929e2..c60ca137d4 100644
--- a/nptl/descr.h
+++ b/nptl/descr.h
@@ -407,27 +407,11 @@ struct pthread
   /* getrandom vDSO per-thread opaque state.  */
   void *getrandom_buf;
 
-  /* rseq area registered with the kernel.  Use a custom definition
-     here to isolate from kernel struct rseq changes.  The
-     implementation of sched_getcpu needs acccess to the cpu_id field;
-     the other fields are unused and not included here.  */
-  union
-  {
-    struct
-    {
-      uint32_t cpu_id_start;
-      uint32_t cpu_id;
-      uint64_t rseq_cs;
-      uint32_t flags;
-    };
-    char pad[32];		/* Original rseq area size.  */
-  } rseq_area __attribute__ ((aligned (32)));
-
   /* Amount of end padding, if any, in this structure.
-     This definition relies on rseq_area being last.  */
+     This definition relies on getrandom_buf being last.  */
 #define PTHREAD_STRUCT_END_PADDING \
-  (sizeof (struct pthread) - offsetof (struct pthread, rseq_area) \
-   + sizeof ((struct pthread) {}.rseq_area))
+  (sizeof (struct pthread) - offsetof (struct pthread, getrandom_buf) \
+   + sizeof ((struct pthread) {}.getrandom_buf))
 } __attribute ((aligned (TCB_ALIGNMENT)));
 
 static inline bool
-- 
2.39.5


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v16 8/8] Linux: Update internal copy of '<sys/rseq.h>'
  2025-01-09 16:32 [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
                   ` (6 preceding siblings ...)
  2025-01-09 16:32 ` [PATCH v16 7/8] nptl: Remove the rseq area from 'struct pthread' Michael Jeanson
@ 2025-01-09 16:32 ` Michael Jeanson
  2025-01-10 15:39 ` [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 39+ messages in thread
From: Michael Jeanson @ 2025-01-09 16:32 UTC (permalink / raw)
  To: libc-alpha
  Cc: Michael Jeanson, Florian Weimer, Carlos O'Donell, DJ Delorie,
	Mathieu Desnoyers

Sync the internal copy of '<sys/rseq.h>' with the latest Linux kernel
'include/uapi/linux/rseq.h'.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
---
 sysdeps/unix/sysv/linux/sys/rseq.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/sysdeps/unix/sysv/linux/sys/rseq.h b/sysdeps/unix/sysv/linux/sys/rseq.h
index 0a6cc0b54c..38fdb27b11 100644
--- a/sysdeps/unix/sysv/linux/sys/rseq.h
+++ b/sysdeps/unix/sysv/linux/sys/rseq.h
@@ -152,6 +152,17 @@ struct rseq
            Inhibit instruction sequence block restart on migration for
            this thread.  */
     uint32_t flags;
+    /* Restartable sequences node_id field.  Updated by the kernel.  Read by
+       user-space with single-copy atomicity semantics.  This field should only
+       be read by the thread which registered this data structure.  Aligned on
+       32-bit.  Contains the current NUMA node ID. */
+    uint32_t node_id;
+    /* Restartable sequences mm_cid field.  Updated by the kernel.  Read by
+       user-space with single-copy atomicity semantics.  This field should only
+       be read by the thread which registered this data structure.  Aligned on
+       32-bit.  Contains the current thread's concurrency ID (allocated
+       uniquely within a memory map).  */
+    uint32_t mm_cid;
   } __attribute__ ((__aligned__ (32)));
 
 #endif /* __GLIBC_HAVE_KERNEL_RSEQ */
-- 
2.39.5


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 0/8] Add rseq extensible ABI support
  2025-01-09 16:32 [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
                   ` (7 preceding siblings ...)
  2025-01-09 16:32 ` [PATCH v16 8/8] Linux: Update internal copy of '<sys/rseq.h>' Michael Jeanson
@ 2025-01-10 15:39 ` Michael Jeanson
  2025-01-10 16:32   ` Adhemerval Zanella Netto
  2025-01-10 18:08 ` Florian Weimer
  2025-01-10 19:40 ` Florian Weimer
  10 siblings, 1 reply; 39+ messages in thread
From: Michael Jeanson @ 2025-01-10 15:39 UTC (permalink / raw)
  To: libc-alpha
  Cc: Florian Weimer, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

On 2025-01-09 11:32, Michael Jeanson wrote:
> Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding
> rseq features past the initial 32 bytes of the original ABI.

Is there a way to re-trigger the pre-commit CI without sending a whole
new patchset?

I was a bit trigger happy and sent this before Florian pushed is 
TLS_TP_OFFSET patches.

Thanks,

Michael

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 0/8] Add rseq extensible ABI support
  2025-01-10 15:39 ` [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
@ 2025-01-10 16:32   ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 39+ messages in thread
From: Adhemerval Zanella Netto @ 2025-01-10 16:32 UTC (permalink / raw)
  To: Michael Jeanson, libc-alpha
  Cc: Florian Weimer, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers



On 10/01/25 12:39, Michael Jeanson wrote:
> On 2025-01-09 11:32, Michael Jeanson wrote:
>> Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding
>> rseq features past the initial 32 bytes of the original ABI.
> 
> Is there a way to re-trigger the pre-commit CI without sending a whole
> new patchset?
> 
> I was a bit trigger happy and sent this before Florian pushed is 
> TLS_TP_OFFSET patches.

I triggered a rebuild [1], it is now trying against c3d1dac96bdd10250aa37bb367d5ef8334a093a1

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 0/8] Add rseq extensible ABI support
  2025-01-09 16:32 [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
                   ` (8 preceding siblings ...)
  2025-01-10 15:39 ` [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
@ 2025-01-10 18:08 ` Florian Weimer
  2025-01-10 19:17   ` Michael Jeanson
  2025-01-10 19:40 ` Florian Weimer
  10 siblings, 1 reply; 39+ messages in thread
From: Florian Weimer @ 2025-01-10 18:08 UTC (permalink / raw)
  To: Michael Jeanson
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

* Michael Jeanson:

> Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding
> rseq features past the initial 32 bytes of the original ABI.
>
> While the rseq features in the latest kernel still fit within the
> original ABI size, there are currently only 4 bytes left. It would thus
> be a good time to add support for the extensible ABI so that when new
> features are added, they are immediately available to GNU libc users.
>
> We use the ELF auxiliary vector to query the kernel for the size and
> alignment of the rseq area, if this fails we default to the original
> fixed size and alignment of '32' which the kernel will accept as a
> compatibility mode with the original ABI.
>
> This makes the size of the rseq area variable and thus requires to
> relocate it out of 'struct pthread'. We chose to move it after (in block
> allocation order) the last TLS block inside the static TLS block
> allocation. It required a fairly small modification to the TLS block
> allocator and did not interfere with the main executable TLS block which
> must always be the first block relative to the thread pointer.
>
> [1] https://lore.kernel.org/all/20221122203932.231377-4-mathieu.desnoyers@efficios.com/

I've just sent another patch that is a prerequisite for this series now
(on m68k, mips, riscv):

  [PATCH] Add missing include guards to <dl-tls.h>
  <https://inbox.sourceware.org/libc-alpha/875xmmtukv.fsf@oldenburg.str.redhat.com/>

I think you saw similar problems before?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 2/8] Add generic 'extra TLS'
  2025-01-09 16:32 ` [PATCH v16 2/8] Add generic 'extra TLS' Michael Jeanson
@ 2025-01-10 19:03   ` Florian Weimer
  2025-01-10 19:22     ` Michael Jeanson
  2025-01-11 14:51   ` Florian Weimer
  1 sibling, 1 reply; 39+ messages in thread
From: Florian Weimer @ 2025-01-10 19:03 UTC (permalink / raw)
  To: Michael Jeanson
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

* Michael Jeanson:

> +  /* In this layout the TLS blocks are located before the thread pointer. */

> +  /* In this layout the TLS blocks are located after the thread pointer. */

Minor style nit.  Missing two spaces after . at tend of comment.

Please fix before pushing.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Thanks,
Florian


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 4/8] nptl: add rtld_hidden_proto to __rseq_size and __rseq_offset
  2025-01-09 16:32 ` [PATCH v16 4/8] nptl: add rtld_hidden_proto to __rseq_size and __rseq_offset Michael Jeanson
@ 2025-01-10 19:05   ` Florian Weimer
  2025-01-10 19:26     ` Michael Jeanson
  0 siblings, 1 reply; 39+ messages in thread
From: Florian Weimer @ 2025-01-10 19:05 UTC (permalink / raw)
  To: Michael Jeanson
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

* Michael Jeanson:

> This allows accessing the internal aliases of __rseq_size and
> __rseq_offset from ld.so without ifdefs and avoids dynamic symbol
> binding at run time for both variables.
>
> Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
>  sysdeps/unix/sysv/linux/dl-rseq-symbols.S | 27 ++++++++++++++++-------
>  sysdeps/unix/sysv/linux/rseq-internal.h   | 16 ++++++++------
>  2 files changed, 28 insertions(+), 15 deletions(-)

I thought I had already acked this?  Anyway.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Thanks,
Florian


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 5/8] nptl: Introduce <rseq-access.h> for RSEQ_* accessors
  2025-01-09 16:32 ` [PATCH v16 5/8] nptl: Introduce <rseq-access.h> for RSEQ_* accessors Michael Jeanson
@ 2025-01-10 19:08   ` Florian Weimer
  2025-01-13 10:49   ` Frank Scheiner
  1 sibling, 0 replies; 39+ messages in thread
From: Florian Weimer @ 2025-01-10 19:08 UTC (permalink / raw)
  To: Michael Jeanson
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

* Michael Jeanson:

> In preparation to move the rseq area to the 'extra TLS' block, we need
> accessors based on the thread pointer and the rseq offset. The ONCE
> variant of the accessors ensures single-copy atomicity for loads and
> stores which is required for all fields once the registration is active.
>
> A separate header is required to allow including <atomic.h> which
> results in an include loop when added to <tcb-access.h>.
>
> Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
> Changes since v15:
> - Rename VOLATILE macros to ONCE
> Changes since v14:
> - Update copyright year to 2025
> Changes since v13:
> - Ensure that the VOLATILE variant static assert on 64bit types
>   on 32bit architectures
> - Split to separate header to allow including 'atomic.h' without
>   an include loop
> - Move rtld_hidden_proto rseq symbols to a separate patch
> Changes since v12:
> - Split RSEQ_SET/GETMEM from THREAD_SET/GETMEM
> - Rename rseq_get_area() to RSEQ_SELF()
> - Add rtld_hidden_proto to __rseq_size and __rseq_offset
> ---
>  sysdeps/i386/nptl/rseq-access.h         | 98 +++++++++++++++++++++++++
>  sysdeps/nptl/rseq-access.h              | 56 ++++++++++++++
>  sysdeps/unix/sysv/linux/rseq-internal.h |  8 ++
>  sysdeps/x86_64/nptl/rseq-access.h       | 77 +++++++++++++++++++
>  4 files changed, 239 insertions(+)
>  create mode 100644 sysdeps/i386/nptl/rseq-access.h
>  create mode 100644 sysdeps/nptl/rseq-access.h
>  create mode 100644 sysdeps/x86_64/nptl/rseq-access.h

Looks okay to me.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Florian


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 0/8] Add rseq extensible ABI support
  2025-01-10 18:08 ` Florian Weimer
@ 2025-01-10 19:17   ` Michael Jeanson
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Jeanson @ 2025-01-10 19:17 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

On 2025-01-10 13:08, Florian Weimer wrote:
> I've just sent another patch that is a prerequisite for this series now
> (on m68k, mips, riscv):
> 
>   [PATCH] Add missing include guards to <dl-tls.h>
>   <https://inbox.sourceware.org/libc-alpha/875xmmtukv.fsf@oldenburg.str.redhat.com/>
> 
> I think you saw similar problems before?

I encountered this issue only when building for mips-n32 but I couldn't
pinpoint why it didn't affect other architectures that also lacked
the include guards.

Adding the include guards to all the variants of <dl-tls.h> does seem
like the best solution.

Michael

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 2/8] Add generic 'extra TLS'
  2025-01-10 19:03   ` Florian Weimer
@ 2025-01-10 19:22     ` Michael Jeanson
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Jeanson @ 2025-01-10 19:22 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

On 2025-01-10 14:03, Florian Weimer wrote:
> * Michael Jeanson:
> 
>> +  /* In this layout the TLS blocks are located before the thread pointer. */
> 
>> +  /* In this layout the TLS blocks are located after the thread pointer. */
> 
> Minor style nit.  Missing two spaces after . at tend of comment.

Sorry about that, I thought I had fixed them all.

> 
> Please fix before pushing.

Will do.

Thanks,

Michael

> 
> Reviewed-by: Florian Weimer <fweimer@redhat.com>
> 
> Thanks,
> Florian
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 4/8] nptl: add rtld_hidden_proto to __rseq_size and __rseq_offset
  2025-01-10 19:05   ` Florian Weimer
@ 2025-01-10 19:26     ` Michael Jeanson
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Jeanson @ 2025-01-10 19:26 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

On 2025-01-10 14:05, Florian Weimer wrote:
> * Michael Jeanson:
> 
>> This allows accessing the internal aliases of __rseq_size and
>> __rseq_offset from ld.so without ifdefs and avoids dynamic symbol
>> binding at run time for both variables.
>>
>> Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
>> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> ---
>>  sysdeps/unix/sysv/linux/dl-rseq-symbols.S | 27 ++++++++++++++++-------
>>  sysdeps/unix/sysv/linux/rseq-internal.h   | 16 ++++++++------
>>  2 files changed, 28 insertions(+), 15 deletions(-)
> 
> I thought I had already acked this?  Anyway.

I reworded the comments following your advice but forgot to add
a changelog to this patch commit message.

> 
> Reviewed-by: Florian Weimer <fweimer@redhat.com>
> 
> Thanks,
> Florian
> 

Thanks,

Michael


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block
  2025-01-09 16:32 ` [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block Michael Jeanson
@ 2025-01-10 19:39   ` Florian Weimer
  2025-01-16 13:24     ` Stefan Liebler
  2025-01-11  2:53   ` H.J. Lu
  1 sibling, 1 reply; 39+ messages in thread
From: Florian Weimer @ 2025-01-10 19:39 UTC (permalink / raw)
  To: Michael Jeanson
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

* Michael Jeanson:

> +#if TLS_TCB_AT_TP
> +  /* The rseq area block should come before the thread pointer and be at least
> +     32 bytes. */
> +  TEST_VERIFY (__rseq_offset <= -RSEQ_AREA_SIZE_INITIAL);
> +
> +  /* The rseq area block should come before TLS variables.  */
> +  TEST_VERIFY ((intptr_t) rseq_abi < (intptr_t) &tls_var);
> +#elif TLS_DTV_AT_TP
> +  /* The rseq area block should come after the TCB, add the TLS block offset to
> +     the rseq offset to get a value relative to the TCB and test that it's
> +     non-negative.  */
> +  TEST_VERIFY (__rseq_offset + TLS_TP_OFFSET >= 0);
> +
> +  /* The rseq area block should come after TLS variables.  */
> +  TEST_VERIFY ((intptr_t) rseq_abi > (intptr_t) &tls_var);
> +#else
> +# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
> +#endif

Looks good now.  Rest is okay as well.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Thanks,
Florian


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 0/8] Add rseq extensible ABI support
  2025-01-09 16:32 [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
                   ` (9 preceding siblings ...)
  2025-01-10 18:08 ` Florian Weimer
@ 2025-01-10 19:40 ` Florian Weimer
  2025-01-10 19:51   ` Michael Jeanson
  10 siblings, 1 reply; 39+ messages in thread
From: Florian Weimer @ 2025-01-10 19:40 UTC (permalink / raw)
  To: Michael Jeanson
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

* Michael Jeanson:

> Introduced in Linux v6.3 the rseq extensible ABI [1] will allow adding
> rseq features past the initial 32 bytes of the original ABI.
>
> While the rseq features in the latest kernel still fit within the
> original ABI size, there are currently only 4 bytes left. It would thus
> be a good time to add support for the extensible ABI so that when new
> features are added, they are immediately available to GNU libc users.
>
> We use the ELF auxiliary vector to query the kernel for the size and
> alignment of the rseq area, if this fails we default to the original
> fixed size and alignment of '32' which the kernel will accept as a
> compatibility mode with the original ABI.
>
> This makes the size of the rseq area variable and thus requires to
> relocate it out of 'struct pthread'. We chose to move it after (in block
> allocation order) the last TLS block inside the static TLS block
> allocation. It required a fairly small modification to the TLS block
> allocator and did not interfere with the main executable TLS block which
> must always be the first block relative to the thread pointer.
>
> [1] https://lore.kernel.org/all/20221122203932.231377-4-mathieu.desnoyers@efficios.com/
>
> Cc: Florian Weimer <fweimer@redhat.com>
> Cc: Carlos O'Donell <carlos@redhat.com>
> Cc: DJ Delorie <dj@redhat.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

I've pushed my <dl-tls.h> cleanup, and posted my reviews.

I think this is now ready to go in.  Thank you for being persistent, I
know it took a long time.

Florian


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 0/8] Add rseq extensible ABI support
  2025-01-10 19:40 ` Florian Weimer
@ 2025-01-10 19:51   ` Michael Jeanson
  2025-01-10 19:58     ` Florian Weimer
  0 siblings, 1 reply; 39+ messages in thread
From: Michael Jeanson @ 2025-01-10 19:51 UTC (permalink / raw)
  To: Florian Weimer
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

On 2025-01-10 14:40, Florian Weimer wrote:
> 
> I've pushed my <dl-tls.h> cleanup, and posted my reviews.
> 
> I think this is now ready to go in.  Thank you for being persistent, I
> know it took a long time.

Thanks for taking the time to get me up to speed with how things are
done in the project and for the reviews of course.

Just to make sure I don't step on anyone's toes, this means I have the
go ahead to push this to master?

Thanks again,

Michael


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 0/8] Add rseq extensible ABI support
  2025-01-10 19:51   ` Michael Jeanson
@ 2025-01-10 19:58     ` Florian Weimer
  0 siblings, 0 replies; 39+ messages in thread
From: Florian Weimer @ 2025-01-10 19:58 UTC (permalink / raw)
  To: Michael Jeanson
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

* Michael Jeanson:

> Just to make sure I don't step on anyone's toes, this means I have the
> go ahead to push this to master?

I think so.  Andreas provided RM ack on Wednesday.

Florian


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block
  2025-01-09 16:32 ` [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block Michael Jeanson
  2025-01-10 19:39   ` Florian Weimer
@ 2025-01-11  2:53   ` H.J. Lu
  1 sibling, 0 replies; 39+ messages in thread
From: H.J. Lu @ 2025-01-11  2:53 UTC (permalink / raw)
  To: Michael Jeanson
  Cc: libc-alpha, Florian Weimer, Carlos O'Donell, DJ Delorie,
	Mathieu Desnoyers

On Fri, Jan 10, 2025 at 12:38 AM Michael Jeanson <mjeanson@efficios.com> wrote:
>
> Move the rseq area to the newly added 'extra TLS' block, this is the
> last step in adding support for the rseq extended ABI. The size of the
> rseq area is now dynamic and depends on the rseq features reported by
> the kernel through the elf auxiliary vector. This will allow
> applications to use rseq features past the 32 bytes of the original rseq
> ABI as they become available in future kernels.

This caused:

https://sourceware.org/bugzilla/show_bug.cgi?id=32543

> Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
> Changes since v15:
> - Clarify comments in tests regarding the rseq offset and tls offset
> - Check the rseq offset against a TLS variable in the tests
> - Renamed RSEQ_GETMEM_VOLATILE to RSEQ_GETMEM_ONCE
> Changes since v14:
> - Add TLS_TP_OFFSET to the rseq offset tests
> - Style nits
> Changes since v12:
> - Style nits, missing spaces before '(' in function macro/calls
> - Add comment and variable array member to 'struct rseq_area'
> - Comment and style nits in tst-rseq-disable.c
> Changes since v11:
> - __rseq_size is now set directly in _dl_parse_auxv, set it to 0 when
>   the main thread registration fails or is disabled by tunable
> ---
>  nptl/pthread_create.c                         |  2 +-
>  sysdeps/nptl/dl-tls_init_tp.c                 |  9 --
>  sysdeps/unix/sysv/linux/Makefile              | 10 ++
>  sysdeps/unix/sysv/linux/dl-parse_auxv.h       | 12 +--
>  sysdeps/unix/sysv/linux/rseq-internal.h       | 49 +++++++---
>  sysdeps/unix/sysv/linux/sched_getcpu.c        |  3 +-
>  .../unix/sysv/linux/tst-rseq-disable-static.c |  1 +
>  sysdeps/unix/sysv/linux/tst-rseq-disable.c    | 77 +++++++++++++--
>  .../unix/sysv/linux/tst-rseq-nptl-static.c    |  1 +
>  sysdeps/unix/sysv/linux/tst-rseq-static.c     |  1 +
>  sysdeps/unix/sysv/linux/tst-rseq.c            | 97 ++++++++++++++++---
>  sysdeps/unix/sysv/linux/tst-rseq.h            |  2 +-
>  12 files changed, 206 insertions(+), 58 deletions(-)
>  create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-disable-static.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-static.c
>
> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index 01e8a86980..9ae5423df0 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -696,7 +696,7 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>
>    /* Inherit rseq registration state.  Without seccomp filters, rseq
>       registration will either always fail or always succeed.  */
> -  if ((int) THREAD_GETMEM_VOLATILE (self, rseq_area.cpu_id) >= 0)
> +  if ((int) RSEQ_GETMEM_ONCE (cpu_id) >= 0)
>      pd->flags |= ATTR_FLAG_DO_RSEQ;
>
>    /* Initialize the field for the ID of the thread which is waiting
> diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
> index 46478324fc..f487bfb66e 100644
> --- a/sysdeps/nptl/dl-tls_init_tp.c
> +++ b/sysdeps/nptl/dl-tls_init_tp.c
> @@ -109,15 +109,6 @@ __tls_init_tp (void)
>      bool do_rseq = TUNABLE_GET (rseq, int, NULL);
>      if (!rseq_register_current_thread (pd, do_rseq))
>        _rseq_size = 0;
> -
> -#ifdef RSEQ_SIG
> -    /* This should be a compile-time constant, but the current
> -       infrastructure makes it difficult to determine its value.  Not
> -       all targets support __thread_pointer, so set __rseq_offset only
> -       if the rseq registration may have happened because RSEQ_SIG is
> -       defined.  */
> -    _rseq_offset = (char *) &pd->rseq_area - (char *) __thread_pointer ();
> -#endif
>    }
>
>    /* Set initial thread's stack block from 0 up to __libc_stack_end.
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index 4e7044f12f..c2acf18689 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -267,6 +267,11 @@ tests-internal += \
>    tst-rseq-disable \
>    # tests-internal
>
> +tests-static += \
> +  tst-rseq-disable-static \
> +  tst-rseq-static \
> +  # tests-static
> +
>  tests-time64 += \
>    tst-adjtimex-time64 \
>    tst-clock_adjtime-time64 \
> @@ -410,6 +415,7 @@ $(objpfx)tst-sched-consts.out: ../sysdeps/unix/sysv/linux/tst-sched-consts.py
>  $(objpfx)tst-sched-consts.out: $(sysdeps-linux-python-deps)
>
>  tst-rseq-disable-ENV = GLIBC_TUNABLES=glibc.pthread.rseq=0
> +tst-rseq-disable-static-ENV = GLIBC_TUNABLES=glibc.pthread.rseq=0
>
>  endif # $(subdir) == misc
>
> @@ -684,4 +690,8 @@ tests += \
>  tests-internal += \
>    tst-rseq-nptl \
>    # tests-internal
> +
> +tests-static += \
> +  tst-rseq-nptl-static \
> +  # tests-static
>  endif
> diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
> index 2d4243734c..41250c96d0 100644
> --- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h
> +++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
> @@ -61,15 +61,9 @@ void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values)
>  #endif
>
>    /* Get the rseq feature size, with a minimum of RSEQ_AREA_SIZE_INITIAL_USED
> -     (20) for kernels that don't have AT_RSEQ_FEATURE_SIZE.  Limit the feature
> -     size to RSEQ_AREA_SIZE_MAX_USED (28) which fits the rseq area in 'struct
> -     pthread' and represents the maximum feature size of currently released
> -     kernels.  Since no kernels currently cross the 32 bytes of the original
> -     ABI, the semantics of a feature size of 32 or more are still undetermined.
> -     */
> -  _rseq_size = MIN (MAX (auxv_values[AT_RSEQ_FEATURE_SIZE],
> -                         RSEQ_AREA_SIZE_INITIAL_USED),
> -                   RSEQ_AREA_SIZE_MAX_USED);
> +     (20) for kernels that don't have AT_RSEQ_FEATURE_SIZE.  */
> +  _rseq_size = MAX (auxv_values[AT_RSEQ_FEATURE_SIZE],
> +                    RSEQ_AREA_SIZE_INITIAL_USED);
>    _rseq_align = MAX (auxv_values[AT_RSEQ_ALIGN], RSEQ_MIN_ALIGN);
>
>    DL_PLATFORM_AUXV
> diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
> index 00be15cfc8..f89e784243 100644
> --- a/sysdeps/unix/sysv/linux/rseq-internal.h
> +++ b/sysdeps/unix/sysv/linux/rseq-internal.h
> @@ -26,6 +26,30 @@
>  #include <sys/rseq.h>
>  #include <ldsodefs.h>
>  #include <thread_pointer.h>
> +#include <rseq-access.h>
> +
> +/* rseq area registered with the kernel.  Use a custom definition here to
> +   isolate from the system provided header which could lack some fields of the
> +   Extended ABI.
> +
> +   This is only used to get the field offsets and sizes, it should never be
> +   used for direct object allocations.
> +
> +   Access to fields of the Extended ABI beyond the 20 bytes of the original ABI
> +   (after 'flags') must be gated by a check of the feature size.  */
> +struct rseq_area
> +{
> +  /* Original ABI.  */
> +  uint32_t cpu_id_start;
> +  uint32_t cpu_id;
> +  uint64_t rseq_cs;
> +  uint32_t flags;
> +  /* Extended ABI.  */
> +  uint32_t node_id;
> +  uint32_t mm_cid;
> +  /* Flexible array member to discourage direct object allocations.  */
> +  char end[];
> +};
>
>  /* Minimum size of the rseq area allocation required by the syscall.  The
>     actually used rseq feature size may be less (20 bytes initially).  */
> @@ -47,10 +71,12 @@ extern size_t _rseq_align attribute_hidden;
>
>  /* Size of the active features in the rseq area.
>     Populated from the auxiliary vector with a minimum of '20'.
> +   Set to '0' on registration failure of the main thread.
>     In .data.relro but not yet write-protected.  */
>  extern unsigned int _rseq_size attribute_hidden;
>
> -/* Offset from the thread pointer to the rseq area.
> +/* Offset from the thread pointer to the rseq area, always set to allow
> +   checking the registration status by reading the 'cpu_id' field.
>     In .data.relro but not yet write-protected.  */
>  extern ptrdiff_t _rseq_offset attribute_hidden;
>
> @@ -75,34 +101,35 @@ rseq_register_current_thread (struct pthread *self, bool do_rseq)
>      {
>        unsigned int size =  __rseq_size;
>
> +      /* The feature size can be smaller than the minimum rseq area size of 32
> +         bytes accepted by the syscall, if this is the case, bump the size of
> +         the registration to the minimum.  The 'extra TLS' block is always at
> +         least 32 bytes. */
>        if (size < RSEQ_AREA_SIZE_INITIAL)
> -        /* The initial implementation used only 20 bytes out of 32,
> -           but still expected size 32.  */
>          size = RSEQ_AREA_SIZE_INITIAL;
>
>        /* Initialize the rseq fields that are read by the kernel on
>           registration, there is no guarantee that struct pthread is
>           cleared on all architectures.  */
> -      THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_UNINITIALIZED);
> -      THREAD_SETMEM (self, rseq_area.cpu_id_start, 0);
> -      THREAD_SETMEM (self, rseq_area.rseq_cs, 0);
> -      THREAD_SETMEM (self, rseq_area.flags, 0);
> +      RSEQ_SETMEM (cpu_id, RSEQ_CPU_ID_UNINITIALIZED);
> +      RSEQ_SETMEM (cpu_id_start, 0);
> +      RSEQ_SETMEM (rseq_cs, 0);
> +      RSEQ_SETMEM (flags, 0);
>
> -      int ret = INTERNAL_SYSCALL_CALL (rseq, &self->rseq_area,
> -                                       size, 0, RSEQ_SIG);
> +      int ret = INTERNAL_SYSCALL_CALL (rseq, RSEQ_SELF (), size, 0, RSEQ_SIG);
>        if (!INTERNAL_SYSCALL_ERROR_P (ret))
>          return true;
>      }
>    /* When rseq is disabled by tunables or the registration fails, inform
>       userspace by setting 'cpu_id' to RSEQ_CPU_ID_REGISTRATION_FAILED.  */
> -  THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
> +  RSEQ_SETMEM (cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
>    return false;
>  }
>  #else /* RSEQ_SIG */
>  static inline bool
>  rseq_register_current_thread (struct pthread *self, bool do_rseq)
>  {
> -  THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
> +  RSEQ_SETMEM (cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
>    return false;
>  }
>  #endif /* RSEQ_SIG */
> diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c
> index 9a2e2a5663..828b651320 100644
> --- a/sysdeps/unix/sysv/linux/sched_getcpu.c
> +++ b/sysdeps/unix/sysv/linux/sched_getcpu.c
> @@ -19,6 +19,7 @@
>  #include <sched.h>
>  #include <sysdep.h>
>  #include <sysdep-vdso.h>
> +#include <rseq-internal.h>
>
>  static int
>  vsyscall_sched_getcpu (void)
> @@ -36,6 +37,6 @@ vsyscall_sched_getcpu (void)
>  int
>  sched_getcpu (void)
>  {
> -  int cpu_id = THREAD_GETMEM_VOLATILE (THREAD_SELF, rseq_area.cpu_id);
> +  int cpu_id = RSEQ_GETMEM_ONCE (cpu_id);
>    return __glibc_likely (cpu_id >= 0) ? cpu_id : vsyscall_sched_getcpu ();
>  }
> diff --git a/sysdeps/unix/sysv/linux/tst-rseq-disable-static.c b/sysdeps/unix/sysv/linux/tst-rseq-disable-static.c
> new file mode 100644
> index 0000000000..2687d13d3d
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-rseq-disable-static.c
> @@ -0,0 +1 @@
> +#include "tst-rseq-disable.c"
> diff --git a/sysdeps/unix/sysv/linux/tst-rseq-disable.c b/sysdeps/unix/sysv/linux/tst-rseq-disable.c
> index 0e191ffeb9..8e9c11f653 100644
> --- a/sysdeps/unix/sysv/linux/tst-rseq-disable.c
> +++ b/sysdeps/unix/sysv/linux/tst-rseq-disable.c
> @@ -26,32 +26,82 @@
>  #include <unistd.h>
>
>  #ifdef RSEQ_SIG
> +# include <sys/auxv.h>
> +# include <dl-tls.h>
> +# include "tst-rseq.h"
> +
> +/* Used to test private registration with the rseq system call because glibc
> +   rseq is disabled.  */
> +static __thread struct rseq local_rseq = {
> +  .cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED,
> +};
> +
> +/* Used to check if the address of the rseq area comes before or after the tls
> +   blocks depending on the TLS model.  */
> +static __thread char tls_var __attribute__ ((tls_model ("initial-exec")));
>
>  /* Check that rseq can be registered and has not been taken by glibc.  */
>  static void
>  check_rseq_disabled (void)
>  {
> -  struct pthread *pd = THREAD_SELF;
> +  struct rseq *rseq_abi = (struct rseq *) ((char *) __thread_pointer () +
> +                          __rseq_offset);
> +
> +#if TLS_TCB_AT_TP
> +  /* The rseq area block should come before the thread pointer and be at least
> +     32 bytes. */
> +  TEST_VERIFY (__rseq_offset <= -RSEQ_AREA_SIZE_INITIAL);
> +
> +  /* The rseq area block should come before TLS variables.  */
> +  TEST_VERIFY ((intptr_t) rseq_abi < (intptr_t) &tls_var);
> +#elif TLS_DTV_AT_TP
> +  /* The rseq area block should come after the TCB, add the TLS block offset to
> +     the rseq offset to get a value relative to the TCB and test that it's
> +     non-negative.  */
> +  TEST_VERIFY (__rseq_offset + TLS_TP_OFFSET >= 0);
> +
> +  /* The rseq area block should come after TLS variables.  */
> +  TEST_VERIFY ((intptr_t) rseq_abi > (intptr_t) &tls_var);
> +#else
> +# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
> +#endif
>
> +  /* __rseq_flags is unused and should always be '0'.  */
>    TEST_COMPARE (__rseq_flags, 0);
> -  TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset
> -               == (char *) &pd->rseq_area);
> +
> +  /* When rseq is not registered, __rseq_size should always be '0'.  */
>    TEST_COMPARE (__rseq_size, 0);
> -  TEST_COMPARE ((int) pd->rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
>
> -  int ret = syscall (__NR_rseq, &pd->rseq_area, sizeof (pd->rseq_area),
> -                     0, RSEQ_SIG);
> +  /* When rseq is not registered, the 'cpu_id' field should be set to
> +     RSEQ_CPU_ID_REGISTRATION_FAILED.  */
> +  TEST_COMPARE ((int) rseq_abi->cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
> +
> +  /* Test a rseq registration which should succeed since the internal
> +     registration is disabled.  */
> +  int ret = syscall (__NR_rseq, &local_rseq, RSEQ_AREA_SIZE_INITIAL, 0, RSEQ_SIG);
>    if (ret == 0)
>      {
> -      ret = syscall (__NR_rseq, &pd->rseq_area, sizeof (pd->rseq_area),
> +      /* A successful registration should set the cpu id.  */
> +      TEST_VERIFY (local_rseq.cpu_id >= 0);
> +
> +      /* Test we can also unregister rseq.  */
> +      ret = syscall (__NR_rseq, &local_rseq, RSEQ_AREA_SIZE_INITIAL,
>                       RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
>        TEST_COMPARE (ret, 0);
> -      pd->rseq_area.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED;
>      }
>    else
>      {
> -      TEST_VERIFY (errno != -EINVAL);
> -      TEST_VERIFY (errno != -EBUSY);
> +      /* Check if we failed with EINVAL which would mean an invalid rseq flags,
> +         a mis-aligned rseq area address or an incorrect rseq size.  */
> +      TEST_VERIFY (errno != EINVAL);
> +
> +      /* Check if we failed with EBUSY which means an existing rseq
> +         registration. */
> +      TEST_VERIFY (errno != EBUSY);
> +
> +      /* Check if we failed with EFAULT which means an invalid rseq area
> +         address.  */
> +      TEST_VERIFY (errno != EFAULT);
>      }
>  }
>
> @@ -71,6 +121,13 @@ proc_func (void *ignored)
>  static int
>  do_test (void)
>  {
> +  printf ("info: __rseq_size: %u\n", __rseq_size);
> +  printf ("info: __rseq_offset: %td\n", __rseq_offset);
> +  printf ("info: __rseq_flags: %u\n", __rseq_flags);
> +  printf ("info: getauxval (AT_RSEQ_FEATURE_SIZE): %ld\n",
> +          getauxval (AT_RSEQ_FEATURE_SIZE));
> +  printf ("info: getauxval (AT_RSEQ_ALIGN): %ld\n", getauxval (AT_RSEQ_ALIGN));
> +
>    puts ("info: checking main thread");
>    check_rseq_disabled ();
>
> diff --git a/sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c b/sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c
> new file mode 100644
> index 0000000000..6e2c923bb9
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-rseq-nptl-static.c
> @@ -0,0 +1 @@
> +#include "tst-rseq-nptl.c"
> diff --git a/sysdeps/unix/sysv/linux/tst-rseq-static.c b/sysdeps/unix/sysv/linux/tst-rseq-static.c
> new file mode 100644
> index 0000000000..1d97f3bd3d
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-rseq-static.c
> @@ -0,0 +1 @@
> +#include "tst-rseq.c"
> diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c
> index b152280eff..00181cfefb 100644
> --- a/sysdeps/unix/sysv/linux/tst-rseq.c
> +++ b/sysdeps/unix/sysv/linux/tst-rseq.c
> @@ -19,6 +19,8 @@
>     not linked against libpthread.  */
>
>  #include <support/check.h>
> +#include <support/namespace.h>
> +#include <support/xthread.h>
>  #include <stdio.h>
>  #include <sys/rseq.h>
>  #include <unistd.h>
> @@ -32,25 +34,82 @@
>  # include <sys/auxv.h>
>  # include <thread_pointer.h>
>  # include <tls.h>
> +# include <dl-tls.h>
> +# include <sys/auxv.h>
>  # include "tst-rseq.h"
>
> +/* Used to check if the address of the rseq area comes before or after the tls
> +   blocks depending on the TLS model.  */
> +static __thread char tls_var __attribute__ ((tls_model ("initial-exec")));
> +
>  static void
>  do_rseq_main_test (void)
>  {
> -  struct pthread *pd = THREAD_SELF;
> -  size_t rseq_feature_size = MIN (MAX (getauxval (AT_RSEQ_FEATURE_SIZE),
> -                                       RSEQ_AREA_SIZE_INITIAL_USED),
> -                                  RSEQ_AREA_SIZE_MAX_USED);
> +  size_t rseq_align = MAX (getauxval (AT_RSEQ_ALIGN), RSEQ_MIN_ALIGN);
> +  size_t rseq_feature_size = MAX (getauxval (AT_RSEQ_FEATURE_SIZE),
> +                                  RSEQ_AREA_SIZE_INITIAL_USED);
> +  size_t rseq_alloc_size = roundup (MAX (rseq_feature_size,
> +                                    RSEQ_AREA_SIZE_INITIAL_USED), rseq_align);
> +  struct rseq *rseq_abi = __thread_pointer () + __rseq_offset;
>
>    TEST_VERIFY_EXIT (rseq_thread_registered ());
> +
> +  /* __rseq_flags is unused and should always be '0'.  */
>    TEST_COMPARE (__rseq_flags, 0);
> -  TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset
> -               == (char *) &pd->rseq_area);
> +
> +  /* When rseq is registered, __rseq_size should report the feature size.  */
>    TEST_COMPARE (__rseq_size, rseq_feature_size);
> +
> +  /* When rseq is registered, the 'cpu_id' field should be set to a valid cpu
> +   * number.  */
> +  TEST_VERIFY ((int32_t) rseq_abi->cpu_id >= 0);
> +
> +  /* The rseq area address must be aligned.  */
> +  TEST_VERIFY (((unsigned long) rseq_abi % rseq_align) == 0);
> +
> +#if TLS_TCB_AT_TP
> +  /* The rseq area block should come before the thread pointer and be at least
> +     32 bytes. */
> +  TEST_VERIFY (__rseq_offset <= -RSEQ_AREA_SIZE_INITIAL);
> +
> +  /* The rseq area block should come before TLS variables.  */
> +  TEST_VERIFY ((intptr_t) rseq_abi < (intptr_t) &tls_var);
> +#elif TLS_DTV_AT_TP
> +  /* The rseq area block should come after the TCB, add the TLS block offset to
> +     the rseq offset to get a value relative to the TCB and test that it's
> +     non-negative.  */
> +  TEST_VERIFY (__rseq_offset + TLS_TP_OFFSET >= 0);
> +
> +  /* The rseq area block should come after TLS variables.  */
> +  TEST_VERIFY ((intptr_t) rseq_abi > (intptr_t) &tls_var);
> +#else
> +# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
> +#endif
> +
> +  /* Test a rseq registration with the same arguments as the internal
> +     registration which should fail with errno == EBUSY.  */
> +  TEST_VERIFY (((unsigned long) rseq_abi % rseq_align) == 0);
> +  TEST_VERIFY (__rseq_size <= rseq_alloc_size);
> +  int ret = syscall (__NR_rseq, rseq_abi, rseq_alloc_size, 0, RSEQ_SIG);
> +  TEST_VERIFY (ret != 0);
> +  TEST_COMPARE (errno, EBUSY);
> +}
> +
> +static void *
> +thread_func (void *ignored)
> +{
> +  do_rseq_main_test ();
> +  return NULL;
>  }
>
>  static void
> -do_rseq_test (void)
> +proc_func (void *ignored)
> +{
> +  do_rseq_main_test ();
> +}
> +
> +static int
> +do_test (void)
>  {
>    if (!rseq_available ())
>      {
> @@ -62,21 +121,27 @@ do_rseq_test (void)
>    printf ("info: getauxval (AT_RSEQ_FEATURE_SIZE): %ld\n",
>            getauxval (AT_RSEQ_FEATURE_SIZE));
>    printf ("info: getauxval (AT_RSEQ_ALIGN): %ld\n", getauxval (AT_RSEQ_ALIGN));
> +
> +  puts ("info: checking main thread");
> +  do_rseq_main_test ();
> +
> +  puts ("info: checking main thread (2)");
>    do_rseq_main_test ();
> +
> +  puts ("info: checking new thread");
> +  xpthread_join (xpthread_create (NULL, thread_func, NULL));
> +
> +  puts ("info: checking subprocess");
> +  support_isolate_in_subprocess (proc_func, NULL);
> +
> +  return 0;
>  }
>  #else /* RSEQ_SIG */
> -static void
> -do_rseq_test (void)
> -{
> -  FAIL_UNSUPPORTED ("glibc does not define RSEQ_SIG, skipping test");
> -}
> -#endif /* RSEQ_SIG */
> -
>  static int
>  do_test (void)
>  {
> -  do_rseq_test ();
> -  return 0;
> +  FAIL_UNSUPPORTED ("glibc does not define RSEQ_SIG, skipping test");
>  }
> +#endif /* RSEQ_SIG */
>
>  #include <support/test-driver.c>
> diff --git a/sysdeps/unix/sysv/linux/tst-rseq.h b/sysdeps/unix/sysv/linux/tst-rseq.h
> index 15f512a983..812accc8f9 100644
> --- a/sysdeps/unix/sysv/linux/tst-rseq.h
> +++ b/sysdeps/unix/sysv/linux/tst-rseq.h
> @@ -28,7 +28,7 @@
>  static inline bool
>  rseq_thread_registered (void)
>  {
> -  return THREAD_GETMEM_VOLATILE (THREAD_SELF, rseq_area.cpu_id) >= 0;
> +  return RSEQ_GETMEM_ONCE (cpu_id) >= 0;
>  }
>
>  static inline int
> --
> 2.39.5
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 2/8] Add generic 'extra TLS'
  2025-01-09 16:32 ` [PATCH v16 2/8] Add generic 'extra TLS' Michael Jeanson
  2025-01-10 19:03   ` Florian Weimer
@ 2025-01-11 14:51   ` Florian Weimer
  2025-01-11 15:36     ` Mathieu Desnoyers
                       ` (2 more replies)
  1 sibling, 3 replies; 39+ messages in thread
From: Florian Weimer @ 2025-01-11 14:51 UTC (permalink / raw)
  To: Michael Jeanson
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

* Michael Jeanson:

> diff --git a/csu/libc-tls.c b/csu/libc-tls.c
> index 15f470aa87..02855d0b54 100644
> --- a/csu/libc-tls.c
> +++ b/csu/libc-tls.c
> @@ -20,12 +20,14 @@
>  #include <errno.h>
>  #include <ldsodefs.h>
>  #include <tls.h>
> +#include <dl-tls.h>
>  #include <unistd.h>
>  #include <stdio.h>
>  #include <sys/param.h>
>  #include <array_length.h>
>  #include <pthreadP.h>
>  #include <dl-call_tls_init_tp.h>
> +#include <dl-extra_tls.h>
>  
>  #ifdef SHARED
>   #error makefile bug, this file is for static only
> @@ -110,6 +112,7 @@ __libc_setup_tls (void)
>    size_t filesz = 0;
>    void *initimage = NULL;
>    size_t align = 0;
> +  size_t tls_blocks_size = 0;
>    size_t max_align = TCB_ALIGNMENT;
>    size_t tcb_offset;
>    const ElfW(Phdr) *phdr;
> @@ -135,22 +138,89 @@ __libc_setup_tls (void)
>    /* Calculate the size of the static TLS surplus, with 0 auditors.  */
>    _dl_tls_static_surplus_init (0);
>  
> +  /* Extra TLS block for internal usage to append at the end of the TLS blocks
> +     (in allocation order).  The address at which the block is allocated must
> +     be aligned to 'extra_tls_align'.  The size of the block as returned by
> +     '_dl_extra_tls_get_size ()' is always a multiple of the aligment.
> +
> +     On Linux systems this is where the rseq area will be allocated.  On other
> +     systems it is currently unused and both values will be '0'.  */
> +  size_t extra_tls_size = _dl_extra_tls_get_size ();
> +  size_t extra_tls_align = _dl_extra_tls_get_align ();
> +
> +  /* Increase the maximum alignment with the extra TLS alignment requirements
> +     if necessary.  */
> +  max_align = MAX (max_align, extra_tls_align);
> +
>    /* We have to set up the TCB block which also (possibly) contains
>       'errno'.  Therefore we avoid 'malloc' which might touch 'errno'.
>       Instead we use 'sbrk' which would only uses 'errno' if it fails.
>       In this case we are right away out of memory and the user gets
>       what she/he deserves.  */
>  #if TLS_TCB_AT_TP
> +  /* In this layout the TLS blocks are located before the thread pointer. */
> +
> +  /* Record the size of the combined TLS blocks.
> +
> +     First reserve space for 'memsz' while respecting both its alignment
> +     requirements and those of the extra TLS blocks.  Then add the size of
> +     the extra TLS block.  Both values respect the extra TLS alignment
> +     requirements and so does the resulting size and the offset that will
> +     be derived from it.  */
> +  tls_blocks_size = roundup (memsz, MAX (align, extra_tls_align) ?: 1)
> +		  + extra_tls_size;
> +
> + /* Record the extra TLS block offset from the thread pointer.
> +
> +    With TLS_TCB_AT_TP the TLS blocks are allocated before the thread pointer
> +    in reverse order.  Our block is added last which results in it being the
> +    first in the static TLS block, thus record the most negative offset.
> +
> +    The alignment requirements of the pointer resulting from this offset and
> +    the thread pointer are enforced by 'max_align' which is used to align the
> +    tcb_offset.  */
> +  _dl_extra_tls_set_offset (-tls_blocks_size);
> +
>    /* Align the TCB offset to the maximum alignment, as
>       _dl_allocate_tls_storage (in elf/dl-tls.c) does using __libc_memalign
>       and dl_tls_static_align.  */
> -  tcb_offset = roundup (memsz + GLRO(dl_tls_static_surplus), max_align);
> +  tcb_offset = roundup (tls_blocks_size + GLRO(dl_tls_static_surplus), max_align);
>    tlsblock = _dl_early_allocate (tcb_offset + TLS_INIT_TCB_SIZE + max_align);
>    if (tlsblock == NULL)
>      _startup_fatal_tls_error ();
>  #elif TLS_DTV_AT_TP
> +  /* In this layout the TLS blocks are located after the thread pointer. */
> +
> +  /* Record the tcb_offset including the aligment requirements of 'memsz'
> +     that comes after it.  */
>    tcb_offset = roundup (TLS_INIT_TCB_SIZE, align ?: 1);
> -  tlsblock = _dl_early_allocate (tcb_offset + memsz + max_align
> +
> +  /* Record the size of the combined TLS blocks.
> +
> +     First reserve space for TLS_INIT_TCB_SIZE and 'memsz' while respecting
> +     both its alignment requirements and those of the extra TLS blocks.  Then
> +     add the size of the extra TLS block.  Both values respect the extra TLS
> +     alignment requirements and so does the resulting size and the offset that
> +     will be derived from it.  */
> +  tls_blocks_size = roundup (TLS_INIT_TCB_SIZE + memsz,
> +		  MAX (align, extra_tls_align) ?: 1) + extra_tls_size;
> +
> + /* Record the extra TLS block offset from the thread pointer.
> +
> +    With TLS_DTV_AT_TP the TLS blocks are allocated after the thread pointer in
> +    order. Our block is added last which results in it being the last in the
> +    static TLS block, thus record the offset as the size of the static TLS
> +    block minus the size of our block.
> +
> +    On some architectures the TLS blocks are offset from the thread pointer,
> +    include this offset in the extra TLS block offset.
> +
> +    The alignment requirements of the pointer resulting from this offset and
> +    the thread pointer are enforced by 'max_align' which is used to align the
> +    tcb_offset.  */
> +  _dl_extra_tls_set_offset (tls_blocks_size - extra_tls_size - TLS_TP_OFFSET);
> +
> +  tlsblock = _dl_early_allocate (tls_blocks_size + max_align
>  				 + TLS_PRE_TCB_SIZE
>  				 + GLRO(dl_tls_static_surplus));
>    if (tlsblock == NULL)
> @@ -209,11 +279,5 @@ __libc_setup_tls (void)
>    /* static_slotinfo.slotinfo[1].gen = 0; -- Already zero.  */
>    static_slotinfo.slotinfo[1].map = main_map;
>  
> -  memsz = roundup (memsz, align ?: 1);
> -
> -#if TLS_DTV_AT_TP
> -  memsz += tcb_offset;
> -#endif
> -
> -  init_static_tls (memsz, MAX (TCB_ALIGNMENT, max_align));
> +  init_static_tls (tls_blocks_size, MAX (TCB_ALIGNMENT, max_align));
>  }

This goes wrong on aarch64, resulting in a elf/tst-tlsalign-static
failure:

tdata1 TLS address 0xaaaac60e9014 % 4 = 0
tdata2 TLS address 0xaaaac60e9010 % 16 = 0
tdata3 TLS address 0xaaaac60e9000 % 4096 = 0
tbss1 TLS address 0xaaaac60ea014 % 4 = 0
tbss2 TLS address 0xaaaac60ea010 % 16 = 0
tbss3 TLS address 0xaaaac60ea000 % 4096 = 0
tbss3 value 24 should be 0

Or crash with segmentation fault because rseq_cs is wrong.

What seems to happen is that the TLS block (including the rseq area)
overlaps with the start of the sbrk heap.  So we are either allocating
to little memory for the TLS block, or alignment procedure move things
too far away from the start of the allocation.  But I've been staring at
it for a while, and I just don't see the bug. 8-(

Thanks,
Florian


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 2/8] Add generic 'extra TLS'
  2025-01-11 14:51   ` Florian Weimer
@ 2025-01-11 15:36     ` Mathieu Desnoyers
  2025-01-11 16:42       ` Florian Weimer
  2025-01-11 17:06     ` Mark Wielaard
  2025-01-12 10:05     ` Florian Weimer
  2 siblings, 1 reply; 39+ messages in thread
From: Mathieu Desnoyers @ 2025-01-11 15:36 UTC (permalink / raw)
  To: Florian Weimer, Michael Jeanson
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie

On 2025-01-11 09:51, Florian Weimer wrote:
> * Michael Jeanson:
> 
>> diff --git a/csu/libc-tls.c b/csu/libc-tls.c
>> index 15f470aa87..02855d0b54 100644
>> --- a/csu/libc-tls.c
>> +++ b/csu/libc-tls.c
>> @@ -20,12 +20,14 @@
>>   #include <errno.h>
>>   #include <ldsodefs.h>
>>   #include <tls.h>
>> +#include <dl-tls.h>
>>   #include <unistd.h>
>>   #include <stdio.h>
>>   #include <sys/param.h>
>>   #include <array_length.h>
>>   #include <pthreadP.h>
>>   #include <dl-call_tls_init_tp.h>
>> +#include <dl-extra_tls.h>
>>   
>>   #ifdef SHARED
>>    #error makefile bug, this file is for static only
>> @@ -110,6 +112,7 @@ __libc_setup_tls (void)
>>     size_t filesz = 0;
>>     void *initimage = NULL;
>>     size_t align = 0;
>> +  size_t tls_blocks_size = 0;
>>     size_t max_align = TCB_ALIGNMENT;
>>     size_t tcb_offset;
>>     const ElfW(Phdr) *phdr;
>> @@ -135,22 +138,89 @@ __libc_setup_tls (void)
>>     /* Calculate the size of the static TLS surplus, with 0 auditors.  */
>>     _dl_tls_static_surplus_init (0);
>>   
>> +  /* Extra TLS block for internal usage to append at the end of the TLS blocks
>> +     (in allocation order).  The address at which the block is allocated must
>> +     be aligned to 'extra_tls_align'.  The size of the block as returned by
>> +     '_dl_extra_tls_get_size ()' is always a multiple of the aligment.
>> +
>> +     On Linux systems this is where the rseq area will be allocated.  On other
>> +     systems it is currently unused and both values will be '0'.  */
>> +  size_t extra_tls_size = _dl_extra_tls_get_size ();
>> +  size_t extra_tls_align = _dl_extra_tls_get_align ();
>> +
>> +  /* Increase the maximum alignment with the extra TLS alignment requirements
>> +     if necessary.  */
>> +  max_align = MAX (max_align, extra_tls_align);
>> +
>>     /* We have to set up the TCB block which also (possibly) contains
>>        'errno'.  Therefore we avoid 'malloc' which might touch 'errno'.
>>        Instead we use 'sbrk' which would only uses 'errno' if it fails.
>>        In this case we are right away out of memory and the user gets
>>        what she/he deserves.  */
>>   #if TLS_TCB_AT_TP
>> +  /* In this layout the TLS blocks are located before the thread pointer. */
>> +
>> +  /* Record the size of the combined TLS blocks.
>> +
>> +     First reserve space for 'memsz' while respecting both its alignment
>> +     requirements and those of the extra TLS blocks.  Then add the size of
>> +     the extra TLS block.  Both values respect the extra TLS alignment
>> +     requirements and so does the resulting size and the offset that will
>> +     be derived from it.  */
>> +  tls_blocks_size = roundup (memsz, MAX (align, extra_tls_align) ?: 1)
>> +		  + extra_tls_size;
>> +
>> + /* Record the extra TLS block offset from the thread pointer.
>> +
>> +    With TLS_TCB_AT_TP the TLS blocks are allocated before the thread pointer
>> +    in reverse order.  Our block is added last which results in it being the
>> +    first in the static TLS block, thus record the most negative offset.
>> +
>> +    The alignment requirements of the pointer resulting from this offset and
>> +    the thread pointer are enforced by 'max_align' which is used to align the
>> +    tcb_offset.  */
>> +  _dl_extra_tls_set_offset (-tls_blocks_size);
>> +
>>     /* Align the TCB offset to the maximum alignment, as
>>        _dl_allocate_tls_storage (in elf/dl-tls.c) does using __libc_memalign
>>        and dl_tls_static_align.  */
>> -  tcb_offset = roundup (memsz + GLRO(dl_tls_static_surplus), max_align);
>> +  tcb_offset = roundup (tls_blocks_size + GLRO(dl_tls_static_surplus), max_align);
>>     tlsblock = _dl_early_allocate (tcb_offset + TLS_INIT_TCB_SIZE + max_align);
>>     if (tlsblock == NULL)
>>       _startup_fatal_tls_error ();
>>   #elif TLS_DTV_AT_TP
>> +  /* In this layout the TLS blocks are located after the thread pointer. */
>> +
>> +  /* Record the tcb_offset including the aligment requirements of 'memsz'
>> +     that comes after it.  */
>>     tcb_offset = roundup (TLS_INIT_TCB_SIZE, align ?: 1);
>> -  tlsblock = _dl_early_allocate (tcb_offset + memsz + max_align
>> +
>> +  /* Record the size of the combined TLS blocks.
>> +
>> +     First reserve space for TLS_INIT_TCB_SIZE and 'memsz' while respecting
>> +     both its alignment requirements and those of the extra TLS blocks.  Then
>> +     add the size of the extra TLS block.  Both values respect the extra TLS
>> +     alignment requirements and so does the resulting size and the offset that
>> +     will be derived from it.  */
>> +  tls_blocks_size = roundup (TLS_INIT_TCB_SIZE + memsz,
>> +		  MAX (align, extra_tls_align) ?: 1) + extra_tls_size;
>> +
>> + /* Record the extra TLS block offset from the thread pointer.
>> +
>> +    With TLS_DTV_AT_TP the TLS blocks are allocated after the thread pointer in
>> +    order. Our block is added last which results in it being the last in the
>> +    static TLS block, thus record the offset as the size of the static TLS
>> +    block minus the size of our block.
>> +
>> +    On some architectures the TLS blocks are offset from the thread pointer,
>> +    include this offset in the extra TLS block offset.
>> +
>> +    The alignment requirements of the pointer resulting from this offset and
>> +    the thread pointer are enforced by 'max_align' which is used to align the
>> +    tcb_offset.  */
>> +  _dl_extra_tls_set_offset (tls_blocks_size - extra_tls_size - TLS_TP_OFFSET);
>> +
>> +  tlsblock = _dl_early_allocate (tls_blocks_size + max_align
>>   				 + TLS_PRE_TCB_SIZE
>>   				 + GLRO(dl_tls_static_surplus));
>>     if (tlsblock == NULL)
>> @@ -209,11 +279,5 @@ __libc_setup_tls (void)
>>     /* static_slotinfo.slotinfo[1].gen = 0; -- Already zero.  */
>>     static_slotinfo.slotinfo[1].map = main_map;
>>   
>> -  memsz = roundup (memsz, align ?: 1);
>> -
>> -#if TLS_DTV_AT_TP
>> -  memsz += tcb_offset;
>> -#endif
>> -
>> -  init_static_tls (memsz, MAX (TCB_ALIGNMENT, max_align));
>> +  init_static_tls (tls_blocks_size, MAX (TCB_ALIGNMENT, max_align));
>>   }
> 
> This goes wrong on aarch64, resulting in a elf/tst-tlsalign-static
> failure:
> 
> tdata1 TLS address 0xaaaac60e9014 % 4 = 0
> tdata2 TLS address 0xaaaac60e9010 % 16 = 0
> tdata3 TLS address 0xaaaac60e9000 % 4096 = 0
> tbss1 TLS address 0xaaaac60ea014 % 4 = 0
> tbss2 TLS address 0xaaaac60ea010 % 16 = 0
> tbss3 TLS address 0xaaaac60ea000 % 4096 = 0
> tbss3 value 24 should be 0
> 
> Or crash with segmentation fault because rseq_cs is wrong.
> 
> What seems to happen is that the TLS block (including the rseq area)
> overlaps with the start of the sbrk heap.  So we are either allocating
> to little memory for the TLS block, or alignment procedure move things
> too far away from the start of the allocation.  But I've been staring at
> it for a while, and I just don't see the bug. 8-(

There is a significant allocation layout change in this patch between
v15:

https://sourceware.org/pipermail/libc-alpha/2025-January/163619.html

and v16 (last version committed):

https://sourceware.org/pipermail/libc-alpha/2025-January/163718.html

As explained by the second item in the Changes:

Changes since v15:
- Improve comments around size and alignment calculations
- Simplify tls_blocks_size calculations
- Make sure we don't roundup to zero alignment

I suspect something may be wrong with the "Simplify tls_blocks_size
calculations" changes introduced in v16.

Thanks,

Mathieu

> 
> Thanks,
> Florian
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 2/8] Add generic 'extra TLS'
  2025-01-11 15:36     ` Mathieu Desnoyers
@ 2025-01-11 16:42       ` Florian Weimer
  0 siblings, 0 replies; 39+ messages in thread
From: Florian Weimer @ 2025-01-11 16:42 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Michael Jeanson, libc-alpha, Carlos O'Donell, DJ Delorie

* Mathieu Desnoyers:

> On 2025-01-11 09:51, Florian Weimer wrote:
>> This goes wrong on aarch64, resulting in a elf/tst-tlsalign-static
>> failure:
>> tdata1 TLS address 0xaaaac60e9014 % 4 = 0
>> tdata2 TLS address 0xaaaac60e9010 % 16 = 0
>> tdata3 TLS address 0xaaaac60e9000 % 4096 = 0
>> tbss1 TLS address 0xaaaac60ea014 % 4 = 0
>> tbss2 TLS address 0xaaaac60ea010 % 16 = 0
>> tbss3 TLS address 0xaaaac60ea000 % 4096 = 0
>> tbss3 value 24 should be 0
>> Or crash with segmentation fault because rseq_cs is wrong.
>> What seems to happen is that the TLS block (including the rseq area)
>> overlaps with the start of the sbrk heap.  So we are either allocating
>> to little memory for the TLS block, or alignment procedure move things
>> too far away from the start of the allocation.  But I've been staring at
>> it for a while, and I just don't see the bug. 8-(
>
> There is a significant allocation layout change in this patch between
> v15:
>
> https://sourceware.org/pipermail/libc-alpha/2025-January/163619.html
>
> and v16 (last version committed):
>
> https://sourceware.org/pipermail/libc-alpha/2025-January/163718.html
>
> As explained by the second item in the Changes:
>
> Changes since v15:
> - Improve comments around size and alignment calculations
> - Simplify tls_blocks_size calculations
> - Make sure we don't roundup to zero alignment
>
> I suspect something may be wrong with the "Simplify tls_blocks_size
> calculations" changes introduced in v16.

Not sure yet.

I was super-confused for a while about this part:

  /* Record the tcb_offset including the aligment requirements of 'memsz'
     that comes after it.  */
  tcb_offset = roundup (TLS_INIT_TCB_SIZE, align ?: 1);

It's been there before, so I was wondering if the rseq changes merely
exposed a pre-existing bug.  But that doesn't seem to be the case here:
the link editor actually has code to round up its version of
TLS_INIT_TCB_SIZE to the p_align value, and the instructions in the test
binary have the right offset (>= 4096) in the test binary.  This is
quite odd and a bit wasteful, but it's not the bug we are looking for.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 2/8] Add generic 'extra TLS'
  2025-01-11 14:51   ` Florian Weimer
  2025-01-11 15:36     ` Mathieu Desnoyers
@ 2025-01-11 17:06     ` Mark Wielaard
  2025-01-11 19:11       ` Andreas K. Huettel
  2025-01-12 10:05     ` Florian Weimer
  2 siblings, 1 reply; 39+ messages in thread
From: Mark Wielaard @ 2025-01-11 17:06 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Michael Jeanson, libc-alpha, Carlos O'Donell, DJ Delorie,
	Mathieu Desnoyers

Hi,

On Sat, Jan 11, 2025 at 03:51:53PM +0100, Florian Weimer wrote:
> > -#endif
> > -
> > -  init_static_tls (memsz, MAX (TCB_ALIGNMENT, max_align));
> > +  init_static_tls (tls_blocks_size, MAX (TCB_ALIGNMENT, max_align));
> >  }
> 
> This goes wrong on aarch64, resulting in a elf/tst-tlsalign-static
> failure:
> 
> tdata1 TLS address 0xaaaac60e9014 % 4 = 0
> tdata2 TLS address 0xaaaac60e9010 % 16 = 0
> tdata3 TLS address 0xaaaac60e9000 % 4096 = 0
> tbss1 TLS address 0xaaaac60ea014 % 4 = 0
> tbss2 TLS address 0xaaaac60ea010 % 16 = 0
> tbss3 TLS address 0xaaaac60ea000 % 4096 = 0
> tbss3 value 24 should be 0
> 
> Or crash with segmentation fault because rseq_cs is wrong.
> 
> What seems to happen is that the TLS block (including the rseq area)
> overlaps with the start of the sbrk heap.  So we are either allocating
> to little memory for the TLS block, or alignment procedure move things
> too far away from the start of the allocation.  But I've been staring at
> it for a while, and I just don't see the bug. 8-(

It doesn't really help because there isn't much information in the
buildbot logs. But after the rseq patches went in we are seeing two
new failures on arm64:

 FAIL: elf/tst-tlsalign-extern-static
 FAIL: elf/tst-tlsalign-static

https://builder.sourceware.org/buildbot/#/builders/265/builds/921

Might of course be coincidence. Sadly the log seems empty they just
say original exit status 139

../scripts/evaluate-test.sh elf/tst-tlsalign-static $? false false > /var/lib/buildbot/worker/glibc-debian-arm64/glibc-build/elf/tst-tlsalign-static.test-result
Segmentation fault

Cheers,

Mark

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 2/8] Add generic 'extra TLS'
  2025-01-11 17:06     ` Mark Wielaard
@ 2025-01-11 19:11       ` Andreas K. Huettel
  0 siblings, 0 replies; 39+ messages in thread
From: Andreas K. Huettel @ 2025-01-11 19:11 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha
  Cc: Michael Jeanson, libc-alpha, Carlos O'Donell, DJ Delorie,
	Mathieu Desnoyers, Mark Wielaard

[-- Attachment #1: Type: text/plain, Size: 866 bytes --]

> 
> It doesn't really help because there isn't much information in the
> buildbot logs. But after the rseq patches went in we are seeing two
> new failures on arm64:
> 
>  FAIL: elf/tst-tlsalign-extern-static
>  FAIL: elf/tst-tlsalign-static
> 
> https://builder.sourceware.org/buildbot/#/builders/265/builds/921
> 
> Might of course be coincidence. Sadly the log seems empty they just
> say original exit status 139
> 
> ../scripts/evaluate-test.sh elf/tst-tlsalign-static $? false false > /var/lib/buildbot/worker/glibc-debian-arm64/glibc-build/elf/tst-tlsalign-static.test-result
> Segmentation fault

FWIW, I can confirm the two new failures on aarch64. No useful output here either.

> 
> Cheers,
> 
> Mark
> 


-- 
Andreas K. Hüttel
dilfridge@gentoo.org
Gentoo Linux developer
(council, toolchain, base-system, perl, libreoffice)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 981 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 2/8] Add generic 'extra TLS'
  2025-01-11 14:51   ` Florian Weimer
  2025-01-11 15:36     ` Mathieu Desnoyers
  2025-01-11 17:06     ` Mark Wielaard
@ 2025-01-12 10:05     ` Florian Weimer
  2 siblings, 0 replies; 39+ messages in thread
From: Florian Weimer @ 2025-01-12 10:05 UTC (permalink / raw)
  To: Michael Jeanson
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

* Florian Weimer:

> This goes wrong on aarch64, resulting in a elf/tst-tlsalign-static
> failure:
>
> tdata1 TLS address 0xaaaac60e9014 % 4 = 0
> tdata2 TLS address 0xaaaac60e9010 % 16 = 0
> tdata3 TLS address 0xaaaac60e9000 % 4096 = 0
> tbss1 TLS address 0xaaaac60ea014 % 4 = 0
> tbss2 TLS address 0xaaaac60ea010 % 16 = 0
> tbss3 TLS address 0xaaaac60ea000 % 4096 = 0
> tbss3 value 24 should be 0
>
> Or crash with segmentation fault because rseq_cs is wrong.

I've got a new test case with reproduces this issue reliably even under
GDB, with ASLR disabled.

info: rseq area size: 28
info: checking address ranges with initially loaded modules
info: adding TLS range for "" (4160 bytes)
info: 4 ranges found
error: overlap between address ranges
  TLS for main: [0xfffff7fd8000, 0xfffff7fd903f)
  rseq area: [0xfffff7fd9000, 0xfffff7fd901b)

It's statically linked and has a TLS variable with 4096 bytes.  So
clearly something goes wrong when tcb_offset is very large in
csu/libc-tls.c.

Hopefully from this test it's not going to be too far to the solution …

Thanks,
Florian


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 5/8] nptl: Introduce <rseq-access.h> for RSEQ_* accessors
  2025-01-09 16:32 ` [PATCH v16 5/8] nptl: Introduce <rseq-access.h> for RSEQ_* accessors Michael Jeanson
  2025-01-10 19:08   ` Florian Weimer
@ 2025-01-13 10:49   ` Frank Scheiner
  2025-01-13 19:11     ` Michael Jeanson
  1 sibling, 1 reply; 39+ messages in thread
From: Frank Scheiner @ 2025-01-13 10:49 UTC (permalink / raw)
  To: Michael Jeanson, Florian Weimer
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

[-- Attachment #1: Type: text/plain, Size: 15980 bytes --]

Dear Michael, Florian,

On 09.01.25 17:32, Michael Jeanson wrote:
> In preparation to move the rseq area to the 'extra TLS' block, we need
> accessors based on the thread pointer and the rseq offset. The ONCE
> variant of the accessors ensures single-copy atomicity for loads and
> stores which is required for all fields once the registration is active.
>
> A separate header is required to allow including <atomic.h> which
> results in an include loop when added to <tcb-access.h>.

The changes in 494d651 ([1]) break our ia64 toolchain builds as it
looks like ia64 is one of (or) the (only) arch(es) w/o
__builtin_thread_pointer() in the GCC:

```
In function '__thread_pointer',
    inlined from 'RSEQ_SELF' at ../sysdeps/unix/sysv/linux/rseq-internal.h:93:41,
    inlined from '__pthread_create_2_1' at pthread_create.c:699:13:
../sysdeps/generic/thread_pointer.h:25:10: error: '__builtin_thread_pointer' is not supported on this target
   25 |   return __builtin_thread_pointer ();
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
```

[1]: https://sourceware.org/git/?p=glibc.git;a=commit;h=494d65129ed5ae1154b75cc189bbdde5e9ecf1df

Looking into Florian's recent changes regarding <thread_pointer.h> ([2])
and spotting the similarities between:

* sysdeps/loongarch/thread_pointer.h
* sysdeps/powerpc/thread_pointer.h

...and sysdeps/ia64/nptl/tls.h ([3]), specifically:

```
register struct pthread *__thread_self __asm__("r13");
```

...plus getting confirmation from the "IA-64 Software Conventions and
Runtime Architecture Guide" (245358-002) and taking [4] into account,
I came up with the attached patch (based on the powerpc version) to
introduce __thread_pointer() also for ia64.

[2]: https://sourceware.org/git/?p=glibc.git;a=commit;h=7a3e2e877a70153a6d1b786925b34f3b396e20f1

[3]: https://github.com/linux-ia64/glibc-ia64/blob/4b5c3b3de53b138cc509d58d47148917013fead4/sysdeps/ia64/nptl/tls.h

[4]: https://sourceware.org/git/?p=glibc.git;a=commit;h=cb976fba4c51ede7bf8cee5035888527c308dfbc

This seems to indeed "fix" the ia64 build for me, I just don't really know if
it is fully correct to use __thread_self like that:

```
register struct pthread *__thread_self __asm__("r13");

static inline void *
__thread_pointer (void)
{
  return __thread_self;
}
```

But I guess a pointer is a pointer and the compiler seems to be happy,
too.

Cheers,
Frank


> Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
> Changes since v15:
> - Rename VOLATILE macros to ONCE
> Changes since v14:
> - Update copyright year to 2025
> Changes since v13:
> - Ensure that the VOLATILE variant static assert on 64bit types
>   on 32bit architectures
> - Split to separate header to allow including 'atomic.h' without
>   an include loop
> - Move rtld_hidden_proto rseq symbols to a separate patch
> Changes since v12:
> - Split RSEQ_SET/GETMEM from THREAD_SET/GETMEM
> - Rename rseq_get_area() to RSEQ_SELF()
> - Add rtld_hidden_proto to __rseq_size and __rseq_offset
> ---
>  sysdeps/i386/nptl/rseq-access.h         | 98 +++++++++++++++++++++++++
>  sysdeps/nptl/rseq-access.h              | 56 ++++++++++++++
>  sysdeps/unix/sysv/linux/rseq-internal.h |  8 ++
>  sysdeps/x86_64/nptl/rseq-access.h       | 77 +++++++++++++++++++
>  4 files changed, 239 insertions(+)
>  create mode 100644 sysdeps/i386/nptl/rseq-access.h
>  create mode 100644 sysdeps/nptl/rseq-access.h
>  create mode 100644 sysdeps/x86_64/nptl/rseq-access.h
>
> diff --git a/sysdeps/i386/nptl/rseq-access.h b/sysdeps/i386/nptl/rseq-access.h
> new file mode 100644
> index 0000000000..5e7e09d494
> --- /dev/null
> +++ b/sysdeps/i386/nptl/rseq-access.h
> @@ -0,0 +1,98 @@
> +/* RSEQ_* accessors.  i386 version.
> +   Copyright (C) 2002-2025 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#define __RSEQ_GETMEM(member) \
> +  ({ __typeof (RSEQ_SELF()->member) __value;				      \
> +     if (sizeof (__value) == 1)						      \
> +       asm volatile ("movb %%gs:%P2(%3),%b0"				      \
> +		     : "=q" (__value)					      \
> +		     : "0" (0), "i" (offsetof (struct rseq_area, member)),   \
> +		     "r" (__rseq_offset));				      \
> +     else if (sizeof (__value) == 4)					      \
> +       asm volatile ("movl %%gs:%P1(%2),%0"				      \
> +		     : "=r" (__value)					      \
> +		     : "i" (offsetof (struct rseq_area, member)),	      \
> +		       "r" (__rseq_offset));				      \
> +     else /* 8 */							      \
> +       {								      \
> +	 asm volatile  ("movl %%gs:%P1(%2),%%eax\n\t"			      \
> +			"movl %%gs:4+%P1(%2),%%edx"			      \
> +			: "=&A" (__value)				      \
> +			: "i" (offsetof (struct rseq_area, member)),	      \
> +			  "r" (__rseq_offset));				      \
> +       }								      \
> +     __value; })
> +
> +/* Read member of the RSEQ area directly.  */
> +#define RSEQ_GETMEM(member) \
> +  ({									      \
> +     _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
> +		     || sizeof (RSEQ_SELF()->member) == 4		      \
> +		     || sizeof (RSEQ_SELF()->member) == 8,		      \
> +		     "size of rseq data");				      \
> +     __RSEQ_GETMEM(member); })
> +
> +/* Read member of the RSEQ area directly, with single-copy atomicity semantics.
> +   Static assert for types >= 64 bits since they can't be loaded atomically on
> +   x86-32.  */
> +#define RSEQ_GETMEM_ONCE(member) \
> +  ({									      \
> +     _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
> +		     || sizeof (RSEQ_SELF()->member) == 4,		      \
> +		     "size of rseq data");				      \
> +     __RSEQ_GETMEM(member); })
> +
> +#define __RSEQ_SETMEM(member, value) \
> +  ({									      \
> +     if (sizeof (RSEQ_SELF()->member) == 1)				      \
> +       asm volatile ("movb %b0,%%gs:%P1(%2)" :				      \
> +		     : "iq" (value),					      \
> +		       "i" (offsetof (struct rseq_area, member)),	      \
> +		       "r" (__rseq_offset));				      \
> +     else if (sizeof (RSEQ_SELF()->member) == 4)			      \
> +       asm volatile ("movl %0,%%gs:%P1(%2)" :				      \
> +		     : "ir" (value),					      \
> +		       "i" (offsetof (struct rseq_area, member)),	      \
> +		       "r" (__rseq_offset));				      \
> +     else /* 8 */							      \
> +       {								      \
> +	 asm volatile ("movl %%eax,%%gs:%P1(%2)\n\t"			      \
> +		       "movl %%edx,%%gs:4+%P1(%2)" :			      \
> +		       : "A" ((uint64_t) cast_to_integer (value)),	      \
> +			 "i" (offsetof (struct rseq_area, member)),	      \
> +			 "r" (__rseq_offset));				      \
> +       }})
> +
> +/* Set member of the RSEQ area directly.  */
> +#define RSEQ_SETMEM(member, value) \
> +  ({									      \
> +     _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
> +		     || sizeof (RSEQ_SELF()->member) == 4		      \
> +		     || sizeof (RSEQ_SELF()->member) == 8,		      \
> +		     "size of rseq data");				      \
> +     __RSEQ_SETMEM(member, value); })
> +
> +/* Set member of the RSEQ area directly, with single-copy atomicity semantics.
> +   Static assert for types >= 64 bits since they can't be stored atomically on
> +   x86-32.  */
> +#define RSEQ_SETMEM_ONCE(member, value) \
> +  ({									      \
> +     _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
> +		     || sizeof (RSEQ_SELF()->member) == 4,		      \
> +		     "size of rseq data");				      \
> +     __RSEQ_SETMEM(member, value); })
> diff --git a/sysdeps/nptl/rseq-access.h b/sysdeps/nptl/rseq-access.h
> new file mode 100644
> index 0000000000..450f2dcca3
> --- /dev/null
> +++ b/sysdeps/nptl/rseq-access.h
> @@ -0,0 +1,56 @@
> +/* RSEQ_* accessors.  Generic version.
> +   Copyright (C) 2002-2025 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <atomic.h>
> +
> +/* Read member of the RSEQ area directly.  */
> +#define RSEQ_GETMEM(member) \
> +  RSEQ_SELF()->member
> +
> +/* Set member of the RSEQ area directly.  */
> +#define RSEQ_SETMEM(member, value) \
> +  RSEQ_SELF()->member = (value)
> +
> +/* Static assert for types that can't be loaded/stored atomically on the
> +   current architecture.  */
> +#if __HAVE_64B_ATOMICS
> +#define __RSEQ_ASSERT_ATOMIC(member) \
> +   _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
> +		   || sizeof (RSEQ_SELF()->member) == 4			      \
> +		   || sizeof (RSEQ_SELF()->member) == 8,		      \
> +		   "size of rseq data")
> +#else
> +#define __RSEQ_ASSERT_ATOMIC(member) \
> +   _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
> +		   || sizeof (RSEQ_SELF()->member) == 4,		      \
> +		   "size of rseq data")
> +#endif
> +
> +/* Read member of the RSEQ area directly, with single-copy atomicity semantics.  */
> +#define RSEQ_GETMEM_ONCE(member) \
> +  ({									      \
> +     __RSEQ_ASSERT_ATOMIC(member);					      \
> +     (*(volatile __typeof (RSEQ_SELF()->member) *)&RSEQ_SELF()->member);      \
> +  })
> +
> +/* Set member of the RSEQ area directly, with single-copy atomicity semantics.  */
> +#define RSEQ_SETMEM_ONCE(member, value) \
> +  ({									      \
> +     __RSEQ_ASSERT_ATOMIC(member);					      \
> +     (*(volatile __typeof (RSEQ_SELF()->member) *)&RSEQ_SELF()->member = (value)); \
> +  })
> diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
> index 3993431707..00be15cfc8 100644
> --- a/sysdeps/unix/sysv/linux/rseq-internal.h
> +++ b/sysdeps/unix/sysv/linux/rseq-internal.h
> @@ -25,6 +25,7 @@
>  #include <stdio.h>
>  #include <sys/rseq.h>
>  #include <ldsodefs.h>
> +#include <thread_pointer.h>
>
>  /* Minimum size of the rseq area allocation required by the syscall.  The
>     actually used rseq feature size may be less (20 bytes initially).  */
> @@ -59,6 +60,13 @@ extern ptrdiff_t _rseq_offset attribute_hidden;
>  rtld_hidden_proto (__rseq_size)
>  rtld_hidden_proto (__rseq_offset)
>
> +/* Returns a pointer to the current thread rseq area.  */
> +static inline struct rseq_area *
> +RSEQ_SELF (void)
> +{
> +  return (struct rseq_area *) ((char *) __thread_pointer () + __rseq_offset);
> +}
> +
>  #ifdef RSEQ_SIG
>  static inline bool
>  rseq_register_current_thread (struct pthread *self, bool do_rseq)
> diff --git a/sysdeps/x86_64/nptl/rseq-access.h b/sysdeps/x86_64/nptl/rseq-access.h
> new file mode 100644
> index 0000000000..535e36281f
> --- /dev/null
> +++ b/sysdeps/x86_64/nptl/rseq-access.h
> @@ -0,0 +1,77 @@
> +/* RSEQ_* accessors.  x86_64 version.
> +   Copyright (C) 2002-2025 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* Read member of the RSEQ area directly, with single-copy atomicity semantics.  */
> +#define RSEQ_GETMEM_ONCE(member) \
> +  ({ __typeof (RSEQ_SELF()->member) __value;				      \
> +     _Static_assert (sizeof (__value) == 1				      \
> +		     || sizeof (__value) == 4				      \
> +		     || sizeof (__value) == 8,				      \
> +		     "size of rseq data");			      \
> +     if (sizeof (__value) == 1)						      \
> +       asm volatile ("movb %%fs:%P2(%q3),%b0"				      \
> +		     : "=q" (__value)					      \
> +		     : "0" (0), "i" (offsetof (struct rseq_area, member)),    \
> +		       "r" (__rseq_offset));				      \
> +     else if (sizeof (__value) == 4)					      \
> +       asm volatile ("movl %%fs:%P1(%q2),%0"				      \
> +		     : "=r" (__value)					      \
> +		     : "i" (offsetof (struct rseq_area, member)),	      \
> +		       "r" (__rseq_offset));				      \
> +     else /* 8 */							      \
> +       {								      \
> +	 asm volatile ("movq %%fs:%P1(%q2),%q0"				      \
> +		       : "=r" (__value)					      \
> +		       : "i" (offsetof (struct rseq_area, member)),	      \
> +			 "r" (__rseq_offset));				      \
> +       }								      \
> +     __value; })
> +
> +/* Read member of the RSEQ area directly.  */
> +#define RSEQ_GETMEM(member) RSEQ_GETMEM_ONCE(member)
> +
> +/* Set member of the RSEQ area directly, with single-copy atomicity semantics.  */
> +#define RSEQ_SETMEM_ONCE(member, value) \
> +  ({									      \
> +     _Static_assert (sizeof (RSEQ_SELF()->member) == 1			      \
> +		     || sizeof (RSEQ_SELF()->member) == 4		      \
> +		     || sizeof (RSEQ_SELF()->member) == 8,		      \
> +		     "size of rseq data");			      \
> +     if (sizeof (RSEQ_SELF()->member) == 1)				      \
> +       asm volatile ("movb %b0,%%fs:%P1(%q2)" :				      \
> +		     : "iq" (value),					      \
> +		       "i" (offsetof (struct rseq_area, member)),	      \
> +		       "r" (__rseq_offset));				      \
> +     else if (sizeof (RSEQ_SELF()->member) == 4)			      \
> +       asm volatile ("movl %0,%%fs:%P1(%q2)" :				      \
> +		     : IMM_MODE (value),				      \
> +		       "i" (offsetof (struct rseq_area, member)),	      \
> +		       "r" (__rseq_offset));				      \
> +     else /* 8 */							      \
> +       {								      \
> +	 /* Since movq takes a signed 32-bit immediate or a register source   \
> +	    operand, use "er" constraint for 32-bit signed integer constant   \
> +	    or register.  */						      \
> +	 asm volatile ("movq %q0,%%fs:%P1(%q2)" :			      \
> +		       : "er" ((uint64_t) cast_to_integer (value)),	      \
> +			 "i" (offsetof (struct rseq_area, member)),	      \
> +			 "r" (__rseq_offset));				      \
> +       }})
> +
> +/* Set member of the RSEQ area directly.  */
> +#define RSEQ_SETMEM(member, value) RSEQ_SETMEM_ONCE(member, value)

[-- Attachment #2: 0001-Introduce-__thread_pointer-also-for-ia64.patch --]
[-- Type: text/x-patch, Size: 2723 bytes --]

From d72fe1091a501310a131a6becb705a9526581d7c Mon Sep 17 00:00:00 2001
From: Johnny Mnemonic <jm@machine-hall.org>
Date: Sun, 12 Jan 2025 21:30:13 +0100
Subject: [PATCH] Introduce __thread_pointer() also for ia64

Related to "Move <thread_pointer.h> to kernel-independent sysdeps
directories"

See 7a3e2e877a70153a6d1b786925b34f3b396e20f1 for reference.
---
 sysdeps/ia64/nptl/tls.h       |  6 +++---
 sysdeps/ia64/thread_pointer.h | 31 +++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 3 deletions(-)
 create mode 100644 sysdeps/ia64/thread_pointer.h

diff --git a/sysdeps/ia64/nptl/tls.h b/sysdeps/ia64/nptl/tls.h
index dbfb85a7f8..f70aabc699 100644
--- a/sysdeps/ia64/nptl/tls.h
+++ b/sysdeps/ia64/nptl/tls.h
@@ -1,5 +1,5 @@
 /* Definition for thread-local data handling.  nptl/IA-64 version.
-   Copyright (C) 2003-2024 Free Software Foundation, Inc.
+   Copyright (C) 2003-2025 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -27,6 +27,8 @@
 # include <stdlib.h>
 # include <list.h>
 # include <dl-dtv.h>
+// for __thread_self
+# include <thread_pointer.h>
 
 typedef struct
 {
@@ -34,8 +36,6 @@ typedef struct
   void *__private;
 } tcbhead_t;
 
-register struct pthread *__thread_self __asm__("r13");
-
 # define TLS_MULTIPLE_THREADS_IN_TCB 1
 
 #else /* __ASSEMBLER__ */
diff --git a/sysdeps/ia64/thread_pointer.h b/sysdeps/ia64/thread_pointer.h
new file mode 100644
index 0000000000..a7e15734cd
--- /dev/null
+++ b/sysdeps/ia64/thread_pointer.h
@@ -0,0 +1,31 @@
+/* __thread_pointer definition.  ia64 version.
+   Based on powerpc version.
+   Copyright (C) 2025 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_THREAD_POINTER_H
+#define _SYS_THREAD_POINTER_H
+
+register struct pthread *__thread_self __asm__("r13");
+
+static inline void *
+__thread_pointer (void)
+{
+  return __thread_self;
+}
+
+#endif /* _SYS_THREAD_POINTER_H */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 5/8] nptl: Introduce <rseq-access.h> for RSEQ_* accessors
  2025-01-13 10:49   ` Frank Scheiner
@ 2025-01-13 19:11     ` Michael Jeanson
  2025-01-14 20:49       ` Frank Scheiner
  0 siblings, 1 reply; 39+ messages in thread
From: Michael Jeanson @ 2025-01-13 19:11 UTC (permalink / raw)
  To: Frank Scheiner, Florian Weimer
  Cc: libc-alpha, Carlos O'Donell, DJ Delorie, Mathieu Desnoyers

On 2025-01-13 05:49, Frank Scheiner wrote: 
> The changes in 494d651 ([1]) break our ia64 toolchain builds as it
> looks like ia64 is one of (or) the (only) arch(es) w/o
> __builtin_thread_pointer() in the GCC:
> 
> ```
> In function '__thread_pointer',
>     inlined from 'RSEQ_SELF' at ../sysdeps/unix/sysv/linux/rseq-internal.h:93:41,
>     inlined from '__pthread_create_2_1' at pthread_create.c:699:13:
> ../sysdeps/generic/thread_pointer.h:25:10: error: '__builtin_thread_pointer' is not supported on this target
>    25 |   return __builtin_thread_pointer ();
>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
> ```
> 
> [1]: https://sourceware.org/git/?p=glibc.git;a=commit;h=494d65129ed5ae1154b75cc189bbdde5e9ecf1df

Do you maintain a private ia64 port in a separate branch/repo?

> 
> This seems to indeed "fix" the ia64 build for me, I just don't really know if
> it is fully correct to use __thread_self like that:
> 
> ```
> register struct pthread *__thread_self __asm__("r13");
> > static inline void *
> __thread_pointer (void)
> {
>   return __thread_self;
> }
> ```
> 
> But I guess a pointer is a pointer and the compiler seems to be happy,
> too.

Your patch should work as-is, some compilers might warn on the implicit
cast to 'void *' but you can just add the explicit cast in __thread_pointer().

Thanks,

Michael


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 5/8] nptl: Introduce <rseq-access.h> for RSEQ_* accessors
  2025-01-13 19:11     ` Michael Jeanson
@ 2025-01-14 20:49       ` Frank Scheiner
  0 siblings, 0 replies; 39+ messages in thread
From: Frank Scheiner @ 2025-01-14 20:49 UTC (permalink / raw)
  To: Michael Jeanson
  Cc: libc-alpha, Florian Weimer, Carlos O'Donell, DJ Delorie,
	Mathieu Desnoyers

Dear Michael,

On 13.01.25 20:11, Michael Jeanson wrote:
> On 2025-01-13 05:49, Frank Scheiner wrote:
>> The changes in 494d651 ([1]) break our ia64 toolchain builds as it
>> looks like ia64 is one of (or) the (only) arch(es) w/o
>> __builtin_thread_pointer() in the GCC:
>>
>> ```
>> In function '__thread_pointer',
>>     inlined from 'RSEQ_SELF' at ../sysdeps/unix/sysv/linux/rseq-internal.h:93:41,
>>     inlined from '__pthread_create_2_1' at pthread_create.c:699:13:
>> ../sysdeps/generic/thread_pointer.h:25:10: error: '__builtin_thread_pointer' is not supported on this target
>>    25 |   return __builtin_thread_pointer ();
>>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>> ```
>>
>> [1]: https://sourceware.org/git/?p=glibc.git;a=commit;h=494d65129ed5ae1154b75cc189bbdde5e9ecf1df
>
> Do you maintain a private ia64 port in a separate branch/repo?

Yeah, sort of. Mainly trying... ;-)

You can find the repo here:

https://github.com/linux-ia64/glibc-ia64/

We're actually a group of people and we also maintain Linux for ia64 out
of tree - as a matter of fact both since over a year now.

Well, we can't do w/o a kernel and a libc for the two distributions that
still support ia64, can we. (-:

More details about the whole effort can be found on:

http://epic-linux.org/

>> This seems to indeed "fix" the ia64 build for me, I just don't really know if
>> it is fully correct to use __thread_self like that:
>>
>> ```
>> register struct pthread *__thread_self __asm__("r13");
>>> static inline void *
>> __thread_pointer (void)
>> {
>>   return __thread_self;
>> }
>> ```
>>
>> But I guess a pointer is a pointer and the compiler seems to be happy,
>> too.
>
> Your patch should work as-is, some compilers might warn on the implicit
> cast to 'void *' but you can just add the explicit cast in __thread_pointer().

Thanks for the confirmation.

I also documented this issue in the meantime on [2].

[2]: https://github.com/linux-ia64/glibc-ia64/issues/8#issuecomment-2588226464

I actually did try with explicit cast first but the used gcc (should be
gcc-15-20250105 or [...]12) also didn't warn w/o, but that was maybe
due to the gcc command line used by the T2 build system - IDK for sure.

I left the explicit cast out for now as it's done implicitly anyhow. Or
is that considered bad practice? But it can always be changed later.

Cheers,
Frank



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block
  2025-01-10 19:39   ` Florian Weimer
@ 2025-01-16 13:24     ` Stefan Liebler
  2025-01-16 15:45       ` Michael Jeanson
  0 siblings, 1 reply; 39+ messages in thread
From: Stefan Liebler @ 2025-01-16 13:24 UTC (permalink / raw)
  To: libc-alpha

On 10.01.25 20:39, Florian Weimer wrote:
> * Michael Jeanson:
> 
>> +#if TLS_TCB_AT_TP
>> +  /* The rseq area block should come before the thread pointer and be at least
>> +     32 bytes. */
>> +  TEST_VERIFY (__rseq_offset <= -RSEQ_AREA_SIZE_INITIAL);
>> +
>> +  /* The rseq area block should come before TLS variables.  */
>> +  TEST_VERIFY ((intptr_t) rseq_abi < (intptr_t) &tls_var);
>> +#elif TLS_DTV_AT_TP
>> +  /* The rseq area block should come after the TCB, add the TLS block offset to
>> +     the rseq offset to get a value relative to the TCB and test that it's
>> +     non-negative.  */
>> +  TEST_VERIFY (__rseq_offset + TLS_TP_OFFSET >= 0);
>> +
>> +  /* The rseq area block should come after TLS variables.  */
>> +  TEST_VERIFY ((intptr_t) rseq_abi > (intptr_t) &tls_var);
>> +#else
>> +# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
>> +#endif
> 
> Looks good now.  Rest is okay as well.
> 
> Reviewed-by: Florian Weimer <fweimer@redhat.com>
> 
> Thanks,
> Florian
> 

Hi,

I've run kernel-next next-20250114 commit
dab2734f8e9ecba609d66d1dd087a392a7774c04 on s390x and get
FAIL: misc/tst-rseq-disable

original exit status 1

info: __rseq_size: 0

info: __rseq_offset: -256

info: __rseq_flags: 0

info: getauxval (AT_RSEQ_FEATURE_SIZE): 28

info: getauxval (AT_RSEQ_ALIGN): 32

info: checking main thread

../sysdeps/unix/sysv/linux/tst-rseq-disable.c:90: numeric comparison failure
   left: -1 (0xffffffff); from: ret
  right: 0 (0x0); from: 0
info: checking main thread (2)
error: ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:100: not true:
errno != EBUSY
info: checking new thread
../sysdeps/unix/sysv/linux/tst-rseq-disable.c:90: numeric comparison failure
   left: -1 (0xffffffff); from: ret
  right: 0 (0x0); from: 0
info: checking subprocess
error: ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:100: not true:
errno != EBUSY
error: 4 test failures

Unregistration fails on main-thread with EBUSY.
Is this a known issue?
Or does anybody see this FAIL on other architectures?

Bye,
Stefan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block
  2025-01-16 13:24     ` Stefan Liebler
@ 2025-01-16 15:45       ` Michael Jeanson
  2025-01-16 16:26         ` Stefan Liebler
  0 siblings, 1 reply; 39+ messages in thread
From: Michael Jeanson @ 2025-01-16 15:45 UTC (permalink / raw)
  To: Stefan Liebler, libc-alpha

On 2025-01-16 08:24, Stefan Liebler wrote: 
> Hi,
> 
> I've run kernel-next next-20250114 commit
> dab2734f8e9ecba609d66d1dd087a392a7774c04 on s390x and get
> FAIL: misc/tst-rseq-disable
> 
> original exit status 1
> 
> info: __rseq_size: 0
> 
> info: __rseq_offset: -256
> 
> info: __rseq_flags: 0
> 
> info: getauxval (AT_RSEQ_FEATURE_SIZE): 28
> 
> info: getauxval (AT_RSEQ_ALIGN): 32
> 
> info: checking main thread
> 
> ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:90: numeric comparison failure
>    left: -1 (0xffffffff); from: ret
>   right: 0 (0x0); from: 0
> info: checking main thread (2)
> error: ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:100: not true:
> errno != EBUSY
> info: checking new thread
> ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:90: numeric comparison failure
>    left: -1 (0xffffffff); from: ret
>   right: 0 (0x0); from: 0
> info: checking subprocess
> error: ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:100: not true:
> errno != EBUSY
> error: 4 test failures
> 
> Unregistration fails on main-thread with EBUSY.
> Is this a known issue?

This test checks that the tunable 'glibc.pthread.rseq' actually disables
the internal rseq registration. That part seems to work.

The test then attempts to do its own rseq registration which in your
case fails with EBUSY. This usually means there is already an active
registration for the current thread which should not be the case.

Do you also see this failure on a released kernel?

The only s390x system I have access to runs an old v6.1 kernel
and the test succeeds on it.


> Or does anybody see this FAIL on other architectures?
> 
> Bye,
> Stefan


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block
  2025-01-16 15:45       ` Michael Jeanson
@ 2025-01-16 16:26         ` Stefan Liebler
  2025-01-16 16:43           ` Michael Jeanson
  0 siblings, 1 reply; 39+ messages in thread
From: Stefan Liebler @ 2025-01-16 16:26 UTC (permalink / raw)
  To: Michael Jeanson, libc-alpha

On 16.01.25 16:45, Michael Jeanson wrote:
> On 2025-01-16 08:24, Stefan Liebler wrote: 
>> Hi,
>>
>> I've run kernel-next next-20250114 commit
>> dab2734f8e9ecba609d66d1dd087a392a7774c04 on s390x and get
>> FAIL: misc/tst-rseq-disable
>>
>> original exit status 1
>>
>> info: __rseq_size: 0
>>
>> info: __rseq_offset: -256
>>
>> info: __rseq_flags: 0
>>
>> info: getauxval (AT_RSEQ_FEATURE_SIZE): 28
>>
>> info: getauxval (AT_RSEQ_ALIGN): 32
>>
>> info: checking main thread
>>
>> ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:90: numeric comparison failure
>>    left: -1 (0xffffffff); from: ret
>>   right: 0 (0x0); from: 0
>> info: checking main thread (2)
>> error: ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:100: not true:
>> errno != EBUSY
>> info: checking new thread
>> ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:90: numeric comparison failure
>>    left: -1 (0xffffffff); from: ret
>>   right: 0 (0x0); from: 0
>> info: checking subprocess
>> error: ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:100: not true:
>> errno != EBUSY
>> error: 4 test failures
>>
>> Unregistration fails on main-thread with EBUSY.
>> Is this a known issue?
> 
> This test checks that the tunable 'glibc.pthread.rseq' actually disables
> the internal rseq registration. That part seems to work.
Yes, the first registration in "main thread" works.
> 
> The test then attempts to do its own rseq registration which in your
> case fails with EBUSY. This usually means there is already an active
> registration for the current thread which should not be the case.
The syscall with RSEQ_FLAG_UNREGISTER in "main thread" fails. I've
debugged to this syscall and it also fails with EBUSY.

> 
> Do you also see this failure on a released kernel?
No, I have not seen this fail before. Only on kernel-next (but no idea
when it started to fail there).

Any ideas how to track it down?
> 
> The only s390x system I have access to runs an old v6.1 kernel
> and the test succeeds on it.
> 
> 
>> Or does anybody see this FAIL on other architectures?
>>
>> Bye,
>> Stefan
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block
  2025-01-16 16:26         ` Stefan Liebler
@ 2025-01-16 16:43           ` Michael Jeanson
  2025-01-16 17:04             ` Michael Jeanson
  0 siblings, 1 reply; 39+ messages in thread
From: Michael Jeanson @ 2025-01-16 16:43 UTC (permalink / raw)
  To: Stefan Liebler, libc-alpha

On 2025-01-16 11:26, Stefan Liebler wrote:
> On 16.01.25 16:45, Michael Jeanson wrote:
>> On 2025-01-16 08:24, Stefan Liebler wrote: 
>>> Hi,
>>>
>>> I've run kernel-next next-20250114 commit
>>> dab2734f8e9ecba609d66d1dd087a392a7774c04 on s390x and get
>>> FAIL: misc/tst-rseq-disable
>>>
>>> original exit status 1
>>>
>>> info: __rseq_size: 0
>>>
>>> info: __rseq_offset: -256
>>>
>>> info: __rseq_flags: 0
>>>
>>> info: getauxval (AT_RSEQ_FEATURE_SIZE): 28
>>>
>>> info: getauxval (AT_RSEQ_ALIGN): 32
>>>
>>> info: checking main thread
>>>
>>> ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:90: numeric comparison failure
>>>    left: -1 (0xffffffff); from: ret
>>>   right: 0 (0x0); from: 0
>>> info: checking main thread (2)
>>> error: ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:100: not true:
>>> errno != EBUSY
>>> info: checking new thread
>>> ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:90: numeric comparison failure
>>>    left: -1 (0xffffffff); from: ret
>>>   right: 0 (0x0); from: 0
>>> info: checking subprocess
>>> error: ../sysdeps/unix/sysv/linux/tst-rseq-disable.c:100: not true:
>>> errno != EBUSY
>>> error: 4 test failures
>>>
>>> Unregistration fails on main-thread with EBUSY.
>>> Is this a known issue?
>>
>> This test checks that the tunable 'glibc.pthread.rseq' actually disables
>> the internal rseq registration. That part seems to work.
> Yes, the first registration in "main thread" works.
>>
>> The test then attempts to do its own rseq registration which in your
>> case fails with EBUSY. This usually means there is already an active
>> registration for the current thread which should not be the case.
> The syscall with RSEQ_FLAG_UNREGISTER in "main thread" fails. I've
> debugged to this syscall and it also fails with EBUSY.

Oh, I missed the failure on line 90. So re-reading the log, the main
thread registration works then fails to unregister, the second main thread
registration then fails with EBUSY since the previous registration is still
active. Same thing happens in the secondary thread.

> 
>>
>> Do you also see this failure on a released kernel?
> No, I have not seen this fail before. Only on kernel-next (but no idea
> when it started to fail there).
> 
> Any ideas how to track it down?

Can you check what is the value of errno after the failed unregistration?

I'll have a look at linux-next and check if I can reprod on another
architecture.

>>
>> The only s390x system I have access to runs an old v6.1 kernel
>> and the test succeeds on it.
>>
>>
>>> Or does anybody see this FAIL on other architectures?
>>>
>>> Bye,
>>> Stefan
>>
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block
  2025-01-16 16:43           ` Michael Jeanson
@ 2025-01-16 17:04             ` Michael Jeanson
  2025-01-16 21:28               ` Michael Jeanson
  0 siblings, 1 reply; 39+ messages in thread
From: Michael Jeanson @ 2025-01-16 17:04 UTC (permalink / raw)
  To: Stefan Liebler, libc-alpha

On 2025-01-16 11:43, Michael Jeanson wrote:
> Oh, I missed the failure on line 90. So re-reading the log, the main
> thread registration works then fails to unregister, the second main thread
> registration then fails with EBUSY since the previous registration is still
> active. Same thing happens in the secondary thread.
> 
>>
>>>
>>> Do you also see this failure on a released kernel?
>> No, I have not seen this fail before. Only on kernel-next (but no idea
>> when it started to fail there).
>>
>> Any ideas how to track it down?
> 
> Can you check what is the value of errno after the failed unregistration?
> 
> I'll have a look at linux-next and check if I can reprod on another
> architecture.

Also, does your kernel config sets "CONFIG_DEBUG_RSEQ=y"? And anything
in dmesg?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block
  2025-01-16 17:04             ` Michael Jeanson
@ 2025-01-16 21:28               ` Michael Jeanson
  2025-01-17 12:43                 ` Stefan Liebler
  0 siblings, 1 reply; 39+ messages in thread
From: Michael Jeanson @ 2025-01-16 21:28 UTC (permalink / raw)
  To: Stefan Liebler, libc-alpha

On 2025-01-16 12:04, Michael Jeanson wrote:
>> I'll have a look at linux-next and check if I can reprod on another
>> architecture.

So the issue is indeed specific to linux-next and affects all architectures,
the offending commit is :

 7d5265ffcd8b (rseq: Validate read-only fields under DEBUG_RSEQ config)

It's a simple logic inversion bug that breaks unregistration, Mathieu has
already sent an updated patch. In the mean time, you can apply this to
your kernel sources:

diff --git a/kernel/rseq.c b/kernel/rseq.c
index e04bb30a2eb8..442aba29bc4c 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -201,7 +201,7 @@ static int rseq_reset_rseq_cpu_node_id(struct task_struct *t)
 	/*
 	 * Validate read-only rseq fields.
 	 */
-	if (!rseq_validate_ro_fields(t))
+	if (rseq_validate_ro_fields(t))
 		return -EFAULT;
 	/*
 	 * Reset cpu_id_start to its initial state (0).


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block
  2025-01-16 21:28               ` Michael Jeanson
@ 2025-01-17 12:43                 ` Stefan Liebler
  0 siblings, 0 replies; 39+ messages in thread
From: Stefan Liebler @ 2025-01-17 12:43 UTC (permalink / raw)
  To: Michael Jeanson, libc-alpha

On 16.01.25 22:28, Michael Jeanson wrote:
> On 2025-01-16 12:04, Michael Jeanson wrote:
>>> I'll have a look at linux-next and check if I can reprod on another
>>> architecture.
> 
> So the issue is indeed specific to linux-next and affects all architectures,
> the offending commit is :
> 
>  7d5265ffcd8b (rseq: Validate read-only fields under DEBUG_RSEQ config)
> 
> It's a simple logic inversion bug that breaks unregistration, Mathieu has
> already sent an updated patch. In the mean time, you can apply this to
> your kernel sources:
> 
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index e04bb30a2eb8..442aba29bc4c 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -201,7 +201,7 @@ static int rseq_reset_rseq_cpu_node_id(struct task_struct *t)
>  	/*
>  	 * Validate read-only rseq fields.
>  	 */
> -	if (!rseq_validate_ro_fields(t))
> +	if (rseq_validate_ro_fields(t))
>  		return -EFAULT;
>  	/*
>  	 * Reset cpu_id_start to its initial state (0).
> 

Oh. Wow. Thanks a lot for the investigation.
I've already build a new kernel-next with this fix and unregistration
and the full testcase succeeds. (Of course tested only on s390x, but it
should fix it everywhere)

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2025-01-17 12:43 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-09 16:32 [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
2025-01-09 16:32 ` [PATCH v16 1/8] nptl: Add rseq auxvals Michael Jeanson
2025-01-09 16:32 ` [PATCH v16 2/8] Add generic 'extra TLS' Michael Jeanson
2025-01-10 19:03   ` Florian Weimer
2025-01-10 19:22     ` Michael Jeanson
2025-01-11 14:51   ` Florian Weimer
2025-01-11 15:36     ` Mathieu Desnoyers
2025-01-11 16:42       ` Florian Weimer
2025-01-11 17:06     ` Mark Wielaard
2025-01-11 19:11       ` Andreas K. Huettel
2025-01-12 10:05     ` Florian Weimer
2025-01-09 16:32 ` [PATCH v16 3/8] Add Linux " Michael Jeanson
2025-01-09 16:32 ` [PATCH v16 4/8] nptl: add rtld_hidden_proto to __rseq_size and __rseq_offset Michael Jeanson
2025-01-10 19:05   ` Florian Weimer
2025-01-10 19:26     ` Michael Jeanson
2025-01-09 16:32 ` [PATCH v16 5/8] nptl: Introduce <rseq-access.h> for RSEQ_* accessors Michael Jeanson
2025-01-10 19:08   ` Florian Weimer
2025-01-13 10:49   ` Frank Scheiner
2025-01-13 19:11     ` Michael Jeanson
2025-01-14 20:49       ` Frank Scheiner
2025-01-09 16:32 ` [PATCH v16 6/8] nptl: Move the rseq area to the 'extra TLS' block Michael Jeanson
2025-01-10 19:39   ` Florian Weimer
2025-01-16 13:24     ` Stefan Liebler
2025-01-16 15:45       ` Michael Jeanson
2025-01-16 16:26         ` Stefan Liebler
2025-01-16 16:43           ` Michael Jeanson
2025-01-16 17:04             ` Michael Jeanson
2025-01-16 21:28               ` Michael Jeanson
2025-01-17 12:43                 ` Stefan Liebler
2025-01-11  2:53   ` H.J. Lu
2025-01-09 16:32 ` [PATCH v16 7/8] nptl: Remove the rseq area from 'struct pthread' Michael Jeanson
2025-01-09 16:32 ` [PATCH v16 8/8] Linux: Update internal copy of '<sys/rseq.h>' Michael Jeanson
2025-01-10 15:39 ` [PATCH v16 0/8] Add rseq extensible ABI support Michael Jeanson
2025-01-10 16:32   ` Adhemerval Zanella Netto
2025-01-10 18:08 ` Florian Weimer
2025-01-10 19:17   ` Michael Jeanson
2025-01-10 19:40 ` Florian Weimer
2025-01-10 19:51   ` Michael Jeanson
2025-01-10 19:58     ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).