public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] Extensible rseq integration
@ 2021-12-07 12:59 Florian Weimer
  2021-12-07 13:00 ` [PATCH 1/8] nptl: Add <thread_pointer.h> for defining __thread_pointer Florian Weimer
                   ` (8 more replies)
  0 siblings, 9 replies; 33+ messages in thread
From: Florian Weimer @ 2021-12-07 12:59 UTC (permalink / raw)
  To: libc-alpha

This series integrates the previous posted v2 for <thread_pointer.h>.

It incorporates Mathieu's and Paul E. McKenney suggestion to use a
volatile read for rseq_abi.cpu_id access, using a new
THREAD_GETMEM_VOLATILE macro.

The last patch in the series makes rseq registration consistent across
threads.

Florian Weimer (8):
  nptl: Add <thread_pointer.h> for defining __thread_pointer
  nptl: Introduce <tcb-access.h> for THREAD_* accessors
  nptl: Introduce THREAD_GETMEM_VOLATILE
  nptl: Add rseq registration
  Linux: Use rseq to accelerate sched_getcpu
  nptl: Add glibc.pthread.rseq tunable to control rseq registration
  nptl: Add public rseq symbols and <sys/rseq.h>
  nptl: rseq failure after registration on main thread is fatal

 NEWS                                          |  11 +
 manual/threads.texi                           |  81 ++++++
 manual/tunables.texi                          |  10 +
 nptl/descr.h                                  |   4 +
 nptl/pthread_create.c                         |  22 ++
 sysdeps/aarch64/nptl/tls.h                    |  10 +-
 sysdeps/alpha/nptl/tls.h                      |  10 +-
 sysdeps/arc/nptl/tls.h                        |  10 +-
 sysdeps/arm/nptl/tls.h                        |  10 +-
 sysdeps/csky/nptl/tls.h                       |  10 +-
 sysdeps/hppa/nptl/tls.h                       |  10 +-
 sysdeps/i386/nptl/tcb-access.h                | 125 +++++++++
 sysdeps/i386/nptl/tls.h                       | 108 +-------
 sysdeps/ia64/nptl/tls.h                       |  10 +-
 sysdeps/m68k/nptl/tls.h                       |  10 +-
 sysdeps/microblaze/nptl/tls.h                 |  15 +-
 sysdeps/mips/nptl/tls.h                       |   9 +-
 sysdeps/nios2/nptl/tls.h                      |  10 +-
 sysdeps/nptl/dl-tls_init_tp.c                 |  38 ++-
 sysdeps/nptl/dl-tunables.list                 |   6 +
 sysdeps/nptl/internaltypes.h                  |   1 +
 sysdeps/nptl/tcb-access.h                     |  32 +++
 sysdeps/nptl/thread_pointer.h                 |  28 ++
 sysdeps/powerpc/nptl/thread_pointer.h         |  33 +++
 sysdeps/powerpc/nptl/tls.h                    |  15 +-
 sysdeps/riscv/nptl/tls.h                      |   9 +-
 sysdeps/s390/nptl/tls.h                       |  10 +-
 sysdeps/sh/nptl/tls.h                         |  14 +-
 sysdeps/sparc/nptl/tls.h                      |  10 +-
 sysdeps/unix/sysv/linux/Makefile              |  20 +-
 sysdeps/unix/sysv/linux/Versions              |   5 +
 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h   |  43 +++
 sysdeps/unix/sysv/linux/aarch64/ld.abilist    |   3 +
 sysdeps/unix/sysv/linux/alpha/ld.abilist      |   3 +
 sysdeps/unix/sysv/linux/arc/ld.abilist        |   3 +
 sysdeps/unix/sysv/linux/arm/be/ld.abilist     |   3 +
 sysdeps/unix/sysv/linux/arm/bits/rseq.h       |  83 ++++++
 sysdeps/unix/sysv/linux/arm/le/ld.abilist     |   3 +
 sysdeps/unix/sysv/linux/bits/rseq.h           |  29 ++
 sysdeps/unix/sysv/linux/csky/ld.abilist       |   3 +
 sysdeps/unix/sysv/linux/hppa/ld.abilist       |   3 +
 sysdeps/unix/sysv/linux/i386/ld.abilist       |   3 +
 sysdeps/unix/sysv/linux/ia64/ld.abilist       |   3 +
 .../unix/sysv/linux/m68k/coldfire/ld.abilist  |   3 +
 .../unix/sysv/linux/m68k/m680x0/ld.abilist    |   3 +
 sysdeps/unix/sysv/linux/microblaze/ld.abilist |   3 +
 sysdeps/unix/sysv/linux/mips/bits/rseq.h      |  62 +++++
 .../unix/sysv/linux/mips/mips32/ld.abilist    |   3 +
 .../sysv/linux/mips/mips64/n32/ld.abilist     |   3 +
 .../sysv/linux/mips/mips64/n64/ld.abilist     |   3 +
 sysdeps/unix/sysv/linux/nios2/ld.abilist      |   3 +
 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h   |  37 +++
 .../sysv/linux/powerpc/powerpc32/ld.abilist   |   3 +
 .../linux/powerpc/powerpc64/be/ld.abilist     |   3 +
 .../linux/powerpc/powerpc64/le/ld.abilist     |   3 +
 sysdeps/unix/sysv/linux/riscv/rv32/ld.abilist |   3 +
 sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist |   3 +
 sysdeps/unix/sysv/linux/rseq-internal.h       |  52 ++++
 sysdeps/unix/sysv/linux/s390/bits/rseq.h      |  37 +++
 .../unix/sysv/linux/s390/s390-32/ld.abilist   |   3 +
 .../unix/sysv/linux/s390/s390-64/ld.abilist   |   3 +
 sysdeps/unix/sysv/linux/sched_getcpu.c        |  19 +-
 sysdeps/unix/sysv/linux/sh/be/ld.abilist      |   3 +
 sysdeps/unix/sysv/linux/sh/le/ld.abilist      |   3 +
 .../unix/sysv/linux/sparc/sparc32/ld.abilist  |   3 +
 .../unix/sysv/linux/sparc/sparc64/ld.abilist  |   3 +
 sysdeps/unix/sysv/linux/sys/rseq.h            | 184 +++++++++++++
 sysdeps/unix/sysv/linux/tst-rseq-disable.c    |  95 +++++++
 sysdeps/unix/sysv/linux/tst-rseq-nptl.c       | 260 ++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-rseq.c            |  72 +++++
 sysdeps/unix/sysv/linux/tst-rseq.h            |  57 ++++
 sysdeps/unix/sysv/linux/x86/bits/rseq.h       |  30 ++
 sysdeps/unix/sysv/linux/x86_64/64/ld.abilist  |   3 +
 sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist |   3 +
 sysdeps/x86/nptl/thread_pointer.h             |  38 +++
 sysdeps/x86_64/nptl/tcb-access.h              | 132 +++++++++
 sysdeps/x86_64/nptl/tls.h                     | 114 +-------
 77 files changed, 1745 insertions(+), 382 deletions(-)
 create mode 100644 sysdeps/i386/nptl/tcb-access.h
 create mode 100644 sysdeps/nptl/tcb-access.h
 create mode 100644 sysdeps/nptl/thread_pointer.h
 create mode 100644 sysdeps/powerpc/nptl/thread_pointer.h
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/arm/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/mips/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-disable.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h
 create mode 100644 sysdeps/x86/nptl/thread_pointer.h
 create mode 100644 sysdeps/x86_64/nptl/tcb-access.h


base-commit: 68007900beef12000ed90f38c251eaf32fbc0490
-- 
2.33.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/8] nptl: Add <thread_pointer.h> for defining __thread_pointer
  2021-12-07 12:59 [PATCH v2 0/8] Extensible rseq integration Florian Weimer
@ 2021-12-07 13:00 ` Florian Weimer
  2021-12-08 11:05   ` Szabolcs Nagy
  2021-12-07 13:00 ` [PATCH v2 2/8] nptl: Introduce <tcb-access.h> for THREAD_* accessors Florian Weimer
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2021-12-07 13:00 UTC (permalink / raw)
  To: libc-alpha

<tls.h> already contains a definition that is quite similar,
but it is not consistent across architectures.

Only architectures for which rseq support is added are covered.
---
v2: As posted before.

 sysdeps/nptl/thread_pointer.h         | 28 ++++++++++++++++++++
 sysdeps/powerpc/nptl/thread_pointer.h | 33 +++++++++++++++++++++++
 sysdeps/x86/nptl/thread_pointer.h     | 38 +++++++++++++++++++++++++++
 3 files changed, 99 insertions(+)
 create mode 100644 sysdeps/nptl/thread_pointer.h
 create mode 100644 sysdeps/powerpc/nptl/thread_pointer.h
 create mode 100644 sysdeps/x86/nptl/thread_pointer.h

diff --git a/sysdeps/nptl/thread_pointer.h b/sysdeps/nptl/thread_pointer.h
new file mode 100644
index 0000000000..92f2f3093e
--- /dev/null
+++ b/sysdeps/nptl/thread_pointer.h
@@ -0,0 +1,28 @@
+/* __thread_pointer definition.  Generic version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_THREAD_POINTER_H
+#define _SYS_THREAD_POINTER_H
+
+static inline void *
+__thread_pointer (void)
+{
+  return __builtin_thread_pointer ();
+}
+
+#endif /* _SYS_THREAD_POINTER_H */
diff --git a/sysdeps/powerpc/nptl/thread_pointer.h b/sysdeps/powerpc/nptl/thread_pointer.h
new file mode 100644
index 0000000000..8fd5ba671f
--- /dev/null
+++ b/sysdeps/powerpc/nptl/thread_pointer.h
@@ -0,0 +1,33 @@
+/* __thread_pointer definition.  powerpc version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_THREAD_POINTER_H
+#define _SYS_THREAD_POINTER_H
+
+static inline void *
+__thread_pointer (void)
+{
+#ifdef __powerpc64__
+  register void *__result asm ("r13");
+#else
+  register void *__result asm ("r2");
+#endif
+  return __result;
+}
+
+#endif /* _SYS_THREAD_POINTER_H */
diff --git a/sysdeps/x86/nptl/thread_pointer.h b/sysdeps/x86/nptl/thread_pointer.h
new file mode 100644
index 0000000000..6b71b6f7e1
--- /dev/null
+++ b/sysdeps/x86/nptl/thread_pointer.h
@@ -0,0 +1,38 @@
+/* __thread_pointer definition.  x86 version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_THREAD_POINTER_H
+#define _SYS_THREAD_POINTER_H
+
+static inline void *
+__thread_pointer (void)
+{
+#if __GNUC_PREREQ (11, 1)
+  return __builtin_thread_pointer ();
+#else
+  void *__result;
+# ifdef __x86_64__
+  __asm__ ("mov %%fs:0, %0" : "=r" (__result));
+# else
+  __asm__ ("mov %%gs:0, %0" : "=r" (__result));
+# endif
+  return __result;
+#endif /* !GCC 11 */
+}
+
+#endif /* _SYS_THREAD_POINTER_H */
-- 
2.33.1



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 2/8] nptl: Introduce <tcb-access.h> for THREAD_* accessors
  2021-12-07 12:59 [PATCH v2 0/8] Extensible rseq integration Florian Weimer
  2021-12-07 13:00 ` [PATCH 1/8] nptl: Add <thread_pointer.h> for defining __thread_pointer Florian Weimer
@ 2021-12-07 13:00 ` Florian Weimer
  2021-12-08 11:09   ` Szabolcs Nagy
  2021-12-07 13:00 ` [PATCH v2 3/8] nptl: Introduce THREAD_GETMEM_VOLATILE Florian Weimer
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2021-12-07 13:00 UTC (permalink / raw)
  To: libc-alpha

These are common between most architectures.  Only the x86 targets
are outliers.
---
v2: New patch.

 sysdeps/aarch64/nptl/tls.h       |  10 +--
 sysdeps/alpha/nptl/tls.h         |  10 +--
 sysdeps/arc/nptl/tls.h           |  10 +--
 sysdeps/arm/nptl/tls.h           |  10 +--
 sysdeps/csky/nptl/tls.h          |  10 +--
 sysdeps/hppa/nptl/tls.h          |  10 +--
 sysdeps/i386/nptl/tcb-access.h   | 123 +++++++++++++++++++++++++++++
 sysdeps/i386/nptl/tls.h          | 108 +------------------------
 sysdeps/ia64/nptl/tls.h          |  10 +--
 sysdeps/m68k/nptl/tls.h          |  10 +--
 sysdeps/microblaze/nptl/tls.h    |  15 +---
 sysdeps/mips/nptl/tls.h          |   9 +--
 sysdeps/nios2/nptl/tls.h         |  10 +--
 sysdeps/nptl/tcb-access.h        |  30 +++++++
 sysdeps/powerpc/nptl/tls.h       |  15 +---
 sysdeps/riscv/nptl/tls.h         |   9 +--
 sysdeps/s390/nptl/tls.h          |  10 +--
 sysdeps/sh/nptl/tls.h            |  14 +---
 sysdeps/sparc/nptl/tls.h         |  10 +--
 sysdeps/x86_64/nptl/tcb-access.h | 130 +++++++++++++++++++++++++++++++
 sysdeps/x86_64/nptl/tls.h        | 114 +--------------------------
 21 files changed, 301 insertions(+), 376 deletions(-)
 create mode 100644 sysdeps/i386/nptl/tcb-access.h
 create mode 100644 sysdeps/nptl/tcb-access.h
 create mode 100644 sysdeps/x86_64/nptl/tcb-access.h

diff --git a/sysdeps/aarch64/nptl/tls.h b/sysdeps/aarch64/nptl/tls.h
index 72f22dc718..c9ae564bf2 100644
--- a/sysdeps/aarch64/nptl/tls.h
+++ b/sysdeps/aarch64/nptl/tls.h
@@ -98,15 +98,7 @@ typedef struct
 # define DB_THREAD_SELF \
   CONST_THREAD_AREA (64, sizeof (struct pthread))
 
-/* Access to data in the thread descriptor is easy.  */
-# define THREAD_GETMEM(descr, member) \
-  descr->member
-# define THREAD_GETMEM_NC(descr, member, idx) \
-  descr->member[idx]
-# define THREAD_SETMEM(descr, member, value) \
-  descr->member = (value)
-# define THREAD_SETMEM_NC(descr, member, idx, value) \
-  descr->member[idx] = (value)
+# include <tcb-access.h>
 
 /* Get and set the global scope generation counter in struct pthread.  */
 # define THREAD_GSCOPE_FLAG_UNUSED 0
diff --git a/sysdeps/alpha/nptl/tls.h b/sysdeps/alpha/nptl/tls.h
index 6328112135..eef922f268 100644
--- a/sysdeps/alpha/nptl/tls.h
+++ b/sysdeps/alpha/nptl/tls.h
@@ -92,15 +92,7 @@ typedef struct
 # define DB_THREAD_SELF \
   REGISTER (64, 64, 32 * 8, -sizeof (struct pthread))
 
-/* Access to data in the thread descriptor is easy.  */
-#define THREAD_GETMEM(descr, member) \
-  descr->member
-#define THREAD_GETMEM_NC(descr, member, idx) \
-  descr->member[idx]
-#define THREAD_SETMEM(descr, member, value) \
-  descr->member = (value)
-#define THREAD_SETMEM_NC(descr, member, idx, value) \
-  descr->member[idx] = (value)
+# include <tcb-access.h>
 
 /* Get and set the global scope generation counter in struct pthread.  */
 #define THREAD_GSCOPE_FLAG_UNUSED 0
diff --git a/sysdeps/arc/nptl/tls.h b/sysdeps/arc/nptl/tls.h
index e269c0a7a5..f6853867b2 100644
--- a/sysdeps/arc/nptl/tls.h
+++ b/sysdeps/arc/nptl/tls.h
@@ -100,15 +100,7 @@ typedef struct
 # define DB_THREAD_SELF \
   CONST_THREAD_AREA (32, sizeof (struct pthread))
 
-/* Access to data in the thread descriptor is easy.  */
-# define THREAD_GETMEM(descr, member) \
-  descr->member
-# define THREAD_GETMEM_NC(descr, member, idx) \
-  descr->member[idx]
-# define THREAD_SETMEM(descr, member, value) \
-  descr->member = (value)
-# define THREAD_SETMEM_NC(descr, member, idx, value) \
-  descr->member[idx] = (value)
+# include <tcb-access.h>
 
 /* Get and set the global scope generation counter in struct pthread.  */
 #define THREAD_GSCOPE_FLAG_UNUSED 0
diff --git a/sysdeps/arm/nptl/tls.h b/sysdeps/arm/nptl/tls.h
index 699c16acfb..06612b5449 100644
--- a/sysdeps/arm/nptl/tls.h
+++ b/sysdeps/arm/nptl/tls.h
@@ -89,15 +89,7 @@ typedef struct
 # define DB_THREAD_SELF \
   CONST_THREAD_AREA (32, sizeof (struct pthread))
 
-/* Access to data in the thread descriptor is easy.  */
-#define THREAD_GETMEM(descr, member) \
-  descr->member
-#define THREAD_GETMEM_NC(descr, member, idx) \
-  descr->member[idx]
-#define THREAD_SETMEM(descr, member, value) \
-  descr->member = (value)
-#define THREAD_SETMEM_NC(descr, member, idx, value) \
-  descr->member[idx] = (value)
+# include <tcb-access.h>
 
 /* Get and set the global scope generation counter in struct pthread.  */
 #define THREAD_GSCOPE_FLAG_UNUSED 0
diff --git a/sysdeps/csky/nptl/tls.h b/sysdeps/csky/nptl/tls.h
index b210dfcb76..39fd640459 100644
--- a/sysdeps/csky/nptl/tls.h
+++ b/sysdeps/csky/nptl/tls.h
@@ -116,15 +116,7 @@ typedef struct
 # define DB_THREAD_SELF \
   CONST_THREAD_AREA (32, sizeof (struct pthread))
 
-/* Access to data in the thread descriptor is easy.  */
-# define THREAD_GETMEM(descr, member) \
-  descr->member
-# define THREAD_GETMEM_NC(descr, member, idx) \
-  descr->member[idx]
-# define THREAD_SETMEM(descr, member, value) \
-  descr->member = (value)
-# define THREAD_SETMEM_NC(descr, member, idx, value) \
-  descr->member[idx] = (value)
+# include <tcb-access.h>
 
 /* Get and set the global scope generation counter in struct pthread.  */
 # define THREAD_GSCOPE_FLAG_UNUSED 0
diff --git a/sysdeps/hppa/nptl/tls.h b/sysdeps/hppa/nptl/tls.h
index 55559eb327..5f550227f2 100644
--- a/sysdeps/hppa/nptl/tls.h
+++ b/sysdeps/hppa/nptl/tls.h
@@ -107,15 +107,7 @@ typedef struct
 # define DB_THREAD_SELF \
   REGISTER (32, 32, 53 * 4, -sizeof (struct pthread))
 
-/* Access to data in the thread descriptor is easy.  */
-# define THREAD_GETMEM(descr, member) \
-  descr->member
-# define THREAD_GETMEM_NC(descr, member, idx) \
-  descr->member[idx]
-# define THREAD_SETMEM(descr, member, value) \
-  descr->member = (value)
-# define THREAD_SETMEM_NC(descr, member, idx, value) \
-  descr->member[idx] = (value)
+# include <tcb-access.h>
 
 static inline struct pthread *__get_cr27(void)
 {
diff --git a/sysdeps/i386/nptl/tcb-access.h b/sysdeps/i386/nptl/tcb-access.h
new file mode 100644
index 0000000000..6c6d561e39
--- /dev/null
+++ b/sysdeps/i386/nptl/tcb-access.h
@@ -0,0 +1,123 @@
+/* THREAD_* accessors.  i386 version.
+   Copyright (C) 2002-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Read member of the thread descriptor directly.  */
+#define THREAD_GETMEM(descr, member) \
+  ({ __typeof (descr->member) __value;					      \
+     _Static_assert (sizeof (__value) == 1				      \
+		     || sizeof (__value) == 4				      \
+		     || sizeof (__value) == 8,				      \
+		     "size of per-thread data");			      \
+     if (sizeof (__value) == 1)						      \
+       asm volatile ("movb %%gs:%P2,%b0"				      \
+		     : "=q" (__value)					      \
+		     : "0" (0), "i" (offsetof (struct pthread, member)));     \
+     else if (sizeof (__value) == 4)					      \
+       asm volatile ("movl %%gs:%P1,%0"					      \
+		     : "=r" (__value)					      \
+		     : "i" (offsetof (struct pthread, member)));	      \
+     else /* 8 */								      \
+       {								      \
+	 asm volatile ("movl %%gs:%P1,%%eax\n\t"			      \
+		       "movl %%gs:%P2,%%edx"				      \
+		       : "=A" (__value)					      \
+		       : "i" (offsetof (struct pthread, member)),	      \
+			 "i" (offsetof (struct pthread, member) + 4));	      \
+       }								      \
+     __value; })
+
+
+/* Same as THREAD_GETMEM, but the member offset can be non-constant.  */
+#define THREAD_GETMEM_NC(descr, member, idx) \
+  ({ __typeof (descr->member[0]) __value;				      \
+     _Static_assert (sizeof (__value) == 1				      \
+		     || sizeof (__value) == 4				      \
+		     || sizeof (__value) == 8,				      \
+		     "size of per-thread data");			      \
+     if (sizeof (__value) == 1)						      \
+       asm volatile ("movb %%gs:%P2(%3),%b0"				      \
+		     : "=q" (__value)					      \
+		     : "0" (0), "i" (offsetof (struct pthread, member[0])),   \
+		     "r" (idx));					      \
+     else if (sizeof (__value) == 4)					      \
+       asm volatile ("movl %%gs:%P1(,%2,4),%0"				      \
+		     : "=r" (__value)					      \
+		     : "i" (offsetof (struct pthread, member[0])),	      \
+		       "r" (idx));					      \
+     else /* 8 */							      \
+       {								      \
+	 asm volatile  ("movl %%gs:%P1(,%2,8),%%eax\n\t"		      \
+			"movl %%gs:4+%P1(,%2,8),%%edx"			      \
+			: "=&A" (__value)				      \
+			: "i" (offsetof (struct pthread, member[0])),	      \
+			  "r" (idx));					      \
+       }								      \
+     __value; })
+
+
+
+/* Set member of the thread descriptor directly.  */
+#define THREAD_SETMEM(descr, member, value) \
+  ({									      \
+     _Static_assert (sizeof (descr->member) == 1			      \
+		     || sizeof (descr->member) == 4			      \
+		     || sizeof (descr->member) == 8,			      \
+		     "size of per-thread data");			      \
+     if (sizeof (descr->member) == 1)					      \
+       asm volatile ("movb %b0,%%gs:%P1" :				      \
+		     : "iq" (value),					      \
+		       "i" (offsetof (struct pthread, member)));	      \
+     else if (sizeof (descr->member) == 4)				      \
+       asm volatile ("movl %0,%%gs:%P1" :				      \
+		     : "ir" (value),					      \
+		       "i" (offsetof (struct pthread, member)));	      \
+     else /* 8 */							      \
+       {								      \
+	 asm volatile ("movl %%eax,%%gs:%P1\n\t"			      \
+		       "movl %%edx,%%gs:%P2" :				      \
+		       : "A" ((uint64_t) cast_to_integer (value)),	      \
+			 "i" (offsetof (struct pthread, member)),	      \
+			 "i" (offsetof (struct pthread, member) + 4));	      \
+       }})
+
+
+/* Same as THREAD_SETMEM, but the member offset can be non-constant.  */
+#define THREAD_SETMEM_NC(descr, member, idx, value) \
+  ({									      \
+     _Static_assert (sizeof (descr->member[0]) == 1			      \
+		     || sizeof (descr->member[0]) == 4			      \
+		     || sizeof (descr->member[0]) == 8,			      \
+		     "size of per-thread data");			      \
+     if (sizeof (descr->member[0]) == 1)				      \
+       asm volatile ("movb %b0,%%gs:%P1(%2)" :				      \
+		     : "iq" (value),					      \
+		       "i" (offsetof (struct pthread, member)),		      \
+		       "r" (idx));					      \
+     else if (sizeof (descr->member[0]) == 4)				      \
+       asm volatile ("movl %0,%%gs:%P1(,%2,4)" :			      \
+		     : "ir" (value),					      \
+		       "i" (offsetof (struct pthread, member)),		      \
+		       "r" (idx));					      \
+     else /* 8 */							      \
+       {								      \
+	 asm volatile ("movl %%eax,%%gs:%P1(,%2,8)\n\t"			      \
+		       "movl %%edx,%%gs:4+%P1(,%2,8)" :			      \
+		       : "A" ((uint64_t) cast_to_integer (value)),	      \
+			 "i" (offsetof (struct pthread, member)),	      \
+			 "r" (idx));					      \
+       }})
diff --git a/sysdeps/i386/nptl/tls.h b/sysdeps/i386/nptl/tls.h
index cfb27f5ccd..d010e14920 100644
--- a/sysdeps/i386/nptl/tls.h
+++ b/sysdeps/i386/nptl/tls.h
@@ -250,113 +250,7 @@ tls_fill_user_desc (union user_desc_init *desc,
   REGISTER_THREAD_AREA (32, offsetof (struct user_regs_struct, xgs), 3) \
   REGISTER_THREAD_AREA (64, 26 * 8, 3) /* x86-64's user_regs_struct->gs */
 
-
-/* Read member of the thread descriptor directly.  */
-# define THREAD_GETMEM(descr, member) \
-  ({ __typeof (descr->member) __value;					      \
-     _Static_assert (sizeof (__value) == 1				      \
-		     || sizeof (__value) == 4				      \
-		     || sizeof (__value) == 8,				      \
-		     "size of per-thread data");			      \
-     if (sizeof (__value) == 1)						      \
-       asm volatile ("movb %%gs:%P2,%b0"				      \
-		     : "=q" (__value)					      \
-		     : "0" (0), "i" (offsetof (struct pthread, member)));     \
-     else if (sizeof (__value) == 4)					      \
-       asm volatile ("movl %%gs:%P1,%0"					      \
-		     : "=r" (__value)					      \
-		     : "i" (offsetof (struct pthread, member)));	      \
-     else /* 8 */								      \
-       {								      \
-	 asm volatile ("movl %%gs:%P1,%%eax\n\t"			      \
-		       "movl %%gs:%P2,%%edx"				      \
-		       : "=A" (__value)					      \
-		       : "i" (offsetof (struct pthread, member)),	      \
-			 "i" (offsetof (struct pthread, member) + 4));	      \
-       }								      \
-     __value; })
-
-
-/* Same as THREAD_GETMEM, but the member offset can be non-constant.  */
-# define THREAD_GETMEM_NC(descr, member, idx) \
-  ({ __typeof (descr->member[0]) __value;				      \
-     _Static_assert (sizeof (__value) == 1				      \
-		     || sizeof (__value) == 4				      \
-		     || sizeof (__value) == 8,				      \
-		     "size of per-thread data");			      \
-     if (sizeof (__value) == 1)						      \
-       asm volatile ("movb %%gs:%P2(%3),%b0"				      \
-		     : "=q" (__value)					      \
-		     : "0" (0), "i" (offsetof (struct pthread, member[0])),   \
-		     "r" (idx));					      \
-     else if (sizeof (__value) == 4)					      \
-       asm volatile ("movl %%gs:%P1(,%2,4),%0"				      \
-		     : "=r" (__value)					      \
-		     : "i" (offsetof (struct pthread, member[0])),	      \
-		       "r" (idx));					      \
-     else /* 8 */							      \
-       {								      \
-	 asm volatile  ("movl %%gs:%P1(,%2,8),%%eax\n\t"		      \
-			"movl %%gs:4+%P1(,%2,8),%%edx"			      \
-			: "=&A" (__value)				      \
-			: "i" (offsetof (struct pthread, member[0])),	      \
-			  "r" (idx));					      \
-       }								      \
-     __value; })
-
-
-
-/* Set member of the thread descriptor directly.  */
-# define THREAD_SETMEM(descr, member, value) \
-  ({									      \
-     _Static_assert (sizeof (descr->member) == 1			      \
-		     || sizeof (descr->member) == 4			      \
-		     || sizeof (descr->member) == 8,			      \
-		     "size of per-thread data");			      \
-     if (sizeof (descr->member) == 1)					      \
-       asm volatile ("movb %b0,%%gs:%P1" :				      \
-		     : "iq" (value),					      \
-		       "i" (offsetof (struct pthread, member)));	      \
-     else if (sizeof (descr->member) == 4)				      \
-       asm volatile ("movl %0,%%gs:%P1" :				      \
-		     : "ir" (value),					      \
-		       "i" (offsetof (struct pthread, member)));	      \
-     else /* 8 */							      \
-       {								      \
-	 asm volatile ("movl %%eax,%%gs:%P1\n\t"			      \
-		       "movl %%edx,%%gs:%P2" :				      \
-		       : "A" ((uint64_t) cast_to_integer (value)),	      \
-			 "i" (offsetof (struct pthread, member)),	      \
-			 "i" (offsetof (struct pthread, member) + 4));	      \
-       }})
-
-
-/* Same as THREAD_SETMEM, but the member offset can be non-constant.  */
-# define THREAD_SETMEM_NC(descr, member, idx, value) \
-  ({									      \
-     _Static_assert (sizeof (descr->member[0]) == 1			      \
-		     || sizeof (descr->member[0]) == 4			      \
-		     || sizeof (descr->member[0]) == 8,			      \
-		     "size of per-thread data");			      \
-     if (sizeof (descr->member[0]) == 1)				      \
-       asm volatile ("movb %b0,%%gs:%P1(%2)" :				      \
-		     : "iq" (value),					      \
-		       "i" (offsetof (struct pthread, member)),		      \
-		       "r" (idx));					      \
-     else if (sizeof (descr->member[0]) == 4)				      \
-       asm volatile ("movl %0,%%gs:%P1(,%2,4)" :			      \
-		     : "ir" (value),					      \
-		       "i" (offsetof (struct pthread, member)),		      \
-		       "r" (idx));					      \
-     else /* 8 */							      \
-       {								      \
-	 asm volatile ("movl %%eax,%%gs:%P1(,%2,8)\n\t"			      \
-		       "movl %%edx,%%gs:4+%P1(,%2,8)" :			      \
-		       : "A" ((uint64_t) cast_to_integer (value)),	      \
-			 "i" (offsetof (struct pthread, member)),	      \
-			 "r" (idx));					      \
-       }})
-
+# include <tcb-access.h>
 
 /* Set the stack guard field in TCB head.  */
 #define THREAD_SET_STACK_GUARD(value) \
diff --git a/sysdeps/ia64/nptl/tls.h b/sysdeps/ia64/nptl/tls.h
index 8c26728859..44951da24b 100644
--- a/sysdeps/ia64/nptl/tls.h
+++ b/sysdeps/ia64/nptl/tls.h
@@ -128,15 +128,7 @@ register struct pthread *__thread_self __asm__("r13");
 /* Magic for libthread_db to know how to do THREAD_SELF.  */
 # define DB_THREAD_SELF REGISTER (64, 64, 13 * 8, -TLS_PRE_TCB_SIZE)
 
-/* Access to data in the thread descriptor is easy.  */
-#define THREAD_GETMEM(descr, member) \
-  descr->member
-#define THREAD_GETMEM_NC(descr, member, idx) \
-  descr->member[idx]
-#define THREAD_SETMEM(descr, member, value) \
-  descr->member = (value)
-#define THREAD_SETMEM_NC(descr, member, idx, value) \
-  descr->member[idx] = (value)
+# include <tcb-access.h>
 
 /* Set the stack guard field in TCB head.  */
 #define THREAD_SET_STACK_GUARD(value) \
diff --git a/sysdeps/m68k/nptl/tls.h b/sysdeps/m68k/nptl/tls.h
index 34906b1c13..257af6bddc 100644
--- a/sysdeps/m68k/nptl/tls.h
+++ b/sysdeps/m68k/nptl/tls.h
@@ -117,15 +117,7 @@ extern void * __m68k_read_tp (void);
 # define DB_THREAD_SELF \
   CONST_THREAD_AREA (32, TLS_TCB_OFFSET + TLS_PRE_TCB_SIZE)
 
-/* Access to data in the thread descriptor is easy.  */
-# define THREAD_GETMEM(descr, member) \
-  descr->member
-# define THREAD_GETMEM_NC(descr, member, idx) \
-  descr->member[idx]
-# define THREAD_SETMEM(descr, member, value) \
-  descr->member = (value)
-# define THREAD_SETMEM_NC(descr, member, idx, value) \
-  descr->member[idx] = (value)
+# include <tcb-access.h>
 
 /* l_tls_offset == 0 is perfectly valid on M68K, so we have to use some
    different value to mean unset l_tls_offset.  */
diff --git a/sysdeps/microblaze/nptl/tls.h b/sysdeps/microblaze/nptl/tls.h
index 0ca67a777d..a31703b247 100644
--- a/sysdeps/microblaze/nptl/tls.h
+++ b/sysdeps/microblaze/nptl/tls.h
@@ -100,20 +100,7 @@ typedef struct
 # define DB_THREAD_SELF \
   CONST_THREAD_AREA (32, sizeof (struct pthread))
 
-/* Read member of the thread descriptor directly.  */
-# define THREAD_GETMEM(descr, member) (descr->member)
-
-/* Same as THREAD_GETMEM, but the member offset can be non-constant.  */
-# define THREAD_GETMEM_NC(descr, member, idx) \
-  (descr->member[idx])
-
-/* Set member of the thread descriptor directly.  */
-# define THREAD_SETMEM(descr, member, value) \
-  (descr->member = (value))
-
-/* Same as THREAD_SETMEM, but the member offset can be non-constant.  */
-# define THREAD_SETMEM_NC(descr, member, idx, value) \
-  (descr->member[idx] = (value))
+# include <tcb-access.h>
 
 /* Get and set the global scope generation counter in struct pthread.  */
 # define THREAD_GSCOPE_FLAG_UNUSED 0
diff --git a/sysdeps/mips/nptl/tls.h b/sysdeps/mips/nptl/tls.h
index 04e823b4c7..afb8308e1b 100644
--- a/sysdeps/mips/nptl/tls.h
+++ b/sysdeps/mips/nptl/tls.h
@@ -144,14 +144,7 @@ typedef struct
   CONST_THREAD_AREA (32, TLS_TCB_OFFSET + TLS_PRE_TCB_SIZE)
 
 /* Access to data in the thread descriptor is easy.  */
-# define THREAD_GETMEM(descr, member) \
-  descr->member
-# define THREAD_GETMEM_NC(descr, member, idx) \
-  descr->member[idx]
-# define THREAD_SETMEM(descr, member, value) \
-  descr->member = (value)
-# define THREAD_SETMEM_NC(descr, member, idx, value) \
-  descr->member[idx] = (value)
+# include <tcb-access.h>
 
 /* l_tls_offset == 0 is perfectly valid on MIPS, so we have to use some
    different value to mean unset l_tls_offset.  */
diff --git a/sysdeps/nios2/nptl/tls.h b/sysdeps/nios2/nptl/tls.h
index fd484135f4..173c395449 100644
--- a/sysdeps/nios2/nptl/tls.h
+++ b/sysdeps/nios2/nptl/tls.h
@@ -112,15 +112,7 @@ register struct pthread *__thread_self __asm__("r23");
 # define DB_THREAD_SELF \
   REGISTER (32, 32, 23 * 4, -TLS_PRE_TCB_SIZE - TLS_TCB_OFFSET)
 
-/* Access to data in the thread descriptor is easy.  */
-# define THREAD_GETMEM(descr, member) \
-  descr->member
-# define THREAD_GETMEM_NC(descr, member, idx) \
-  descr->member[idx]
-# define THREAD_SETMEM(descr, member, value) \
-  descr->member = (value)
-# define THREAD_SETMEM_NC(descr, member, idx, value) \
-  descr->member[idx] = (value)
+# include <tcb-access.h>
 
 # define THREAD_GET_POINTER_GUARD()				\
   (((tcbhead_t *) (READ_THREAD_POINTER ()			\
diff --git a/sysdeps/nptl/tcb-access.h b/sysdeps/nptl/tcb-access.h
new file mode 100644
index 0000000000..b4137b8ab8
--- /dev/null
+++ b/sysdeps/nptl/tcb-access.h
@@ -0,0 +1,30 @@
+/* THREAD_* accessors.  Generic version based on struct pthread pointers.
+   Copyright (C) 2002-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Note: These are for accessing the TCB of the *current* thread.
+   descr can be disregarded on some targets as an optimization.  See
+   i386 for an example.  */
+
+#define THREAD_GETMEM(descr, member) \
+  descr->member
+#define THREAD_GETMEM_NC(descr, member, idx) \
+  descr->member[idx]
+#define THREAD_SETMEM(descr, member, value) \
+  descr->member = (value)
+#define THREAD_SETMEM_NC(descr, member, idx, value) \
+  descr->member[idx] = (value)
diff --git a/sysdeps/powerpc/nptl/tls.h b/sysdeps/powerpc/nptl/tls.h
index cc93c44964..7d2f16dcf2 100644
--- a/sysdeps/powerpc/nptl/tls.h
+++ b/sysdeps/powerpc/nptl/tls.h
@@ -176,20 +176,7 @@ typedef struct
   REGISTER (64, 64, PT_THREAD_POINTER * 8,				      \
 	    - TLS_TCB_OFFSET - TLS_PRE_TCB_SIZE)
 
-/* Read member of the thread descriptor directly.  */
-# define THREAD_GETMEM(descr, member) ((void)(descr), (THREAD_SELF)->member)
-
-/* Same as THREAD_GETMEM, but the member offset can be non-constant.  */
-# define THREAD_GETMEM_NC(descr, member, idx) \
-    ((void)(descr), (THREAD_SELF)->member[idx])
-
-/* Set member of the thread descriptor directly.  */
-# define THREAD_SETMEM(descr, member, value) \
-    ((void)(descr), (THREAD_SELF)->member = (value))
-
-/* Same as THREAD_SETMEM, but the member offset can be non-constant.  */
-# define THREAD_SETMEM_NC(descr, member, idx, value) \
-    ((void)(descr), (THREAD_SELF)->member[idx] = (value))
+# include <tcb-access.h>
 
 /* Set the stack guard field in TCB head.  */
 # define THREAD_SET_STACK_GUARD(value) \
diff --git a/sysdeps/riscv/nptl/tls.h b/sysdeps/riscv/nptl/tls.h
index e4bd736feb..a966d440c5 100644
--- a/sysdeps/riscv/nptl/tls.h
+++ b/sysdeps/riscv/nptl/tls.h
@@ -105,14 +105,7 @@ typedef struct
   REGISTER (64, 64, 4 * 8, - TLS_TCB_OFFSET - TLS_PRE_TCB_SIZE)
 
 /* Access to data in the thread descriptor is easy.  */
-# define THREAD_GETMEM(descr, member) \
-  descr->member
-# define THREAD_GETMEM_NC(descr, member, idx) \
-  descr->member[idx]
-# define THREAD_SETMEM(descr, member, value) \
-  descr->member = (value)
-# define THREAD_SETMEM_NC(descr, member, idx, value) \
-  descr->member[idx] = (value)
+# include <tcb-access.h>
 
 /* l_tls_offset == 0 is perfectly valid, so we have to use some different
    value to mean unset l_tls_offset.  */
diff --git a/sysdeps/s390/nptl/tls.h b/sysdeps/s390/nptl/tls.h
index 804486dfdd..16c5811e06 100644
--- a/sysdeps/s390/nptl/tls.h
+++ b/sysdeps/s390/nptl/tls.h
@@ -135,15 +135,7 @@ typedef struct
 # define DB_THREAD_SELF REGISTER (32, 32, 18 * 4, 0) \
 			REGISTER (64, __WORDSIZE, 18 * 8, 0)
 
-/* Access to data in the thread descriptor is easy.  */
-#define THREAD_GETMEM(descr, member) \
-  descr->member
-#define THREAD_GETMEM_NC(descr, member, idx) \
-  descr->member[idx]
-#define THREAD_SETMEM(descr, member, value) \
-  descr->member = (value)
-#define THREAD_SETMEM_NC(descr, member, idx, value) \
-  descr->member[idx] = (value)
+# include <tcb-access.h>
 
 /* Set the stack guard field in TCB head.  */
 #define THREAD_SET_STACK_GUARD(value) \
diff --git a/sysdeps/sh/nptl/tls.h b/sysdeps/sh/nptl/tls.h
index 2a9ee1def1..aadd5be022 100644
--- a/sysdeps/sh/nptl/tls.h
+++ b/sysdeps/sh/nptl/tls.h
@@ -113,19 +113,7 @@ typedef struct
 # define DB_THREAD_SELF \
   REGISTER (32, 32, REG_GBR * 4, -sizeof (struct pthread))
 
-/* Read member of the thread descriptor directly.  */
-# define THREAD_GETMEM(descr, member) (descr->member)
-
-/* Same as THREAD_GETMEM, but the member offset can be non-constant.  */
-# define THREAD_GETMEM_NC(descr, member, idx) (descr->member[idx])
-
-/* Set member of the thread descriptor directly.  */
-# define THREAD_SETMEM(descr, member, value) \
-    descr->member = (value)
-
-/* Same as THREAD_SETMEM, but the member offset can be non-constant.  */
-# define THREAD_SETMEM_NC(descr, member, idx, value) \
-    descr->member[idx] = (value)
+# include <tcb-access.h>
 
 #define THREAD_GET_POINTER_GUARD() \
   ({ tcbhead_t *__tcbp;							      \
diff --git a/sysdeps/sparc/nptl/tls.h b/sysdeps/sparc/nptl/tls.h
index 55955f376a..d4e6e525d9 100644
--- a/sysdeps/sparc/nptl/tls.h
+++ b/sysdeps/sparc/nptl/tls.h
@@ -112,15 +112,7 @@ register struct pthread *__thread_self __asm__("%g7");
   REGISTER (32, 32, 10 * 4, 0) \
   REGISTER (64, __WORDSIZE, (6 * 8) + (__WORDSIZE==64?0:4), 0)
 
-/* Access to data in the thread descriptor is easy.  */
-#define THREAD_GETMEM(descr, member) \
-  descr->member
-#define THREAD_GETMEM_NC(descr, member, idx) \
-  descr->member[idx]
-#define THREAD_SETMEM(descr, member, value) \
-  descr->member = (value)
-#define THREAD_SETMEM_NC(descr, member, idx, value) \
-  descr->member[idx] = (value)
+# include <tcb-access.h>
 
 /* Set the stack guard field in TCB head.  */
 #define THREAD_SET_STACK_GUARD(value) \
diff --git a/sysdeps/x86_64/nptl/tcb-access.h b/sysdeps/x86_64/nptl/tcb-access.h
new file mode 100644
index 0000000000..18848a729d
--- /dev/null
+++ b/sysdeps/x86_64/nptl/tcb-access.h
@@ -0,0 +1,130 @@
+/* THREAD_* accessors.  x86_64 version.
+   Copyright (C) 2002-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Read member of the thread descriptor directly.  */
+# define THREAD_GETMEM(descr, member) \
+  ({ __typeof (descr->member) __value;					      \
+     _Static_assert (sizeof (__value) == 1				      \
+		     || sizeof (__value) == 4				      \
+		     || sizeof (__value) == 8,				      \
+		     "size of per-thread data");			      \
+     if (sizeof (__value) == 1)						      \
+       asm volatile ("movb %%fs:%P2,%b0"				      \
+		     : "=q" (__value)					      \
+		     : "0" (0), "i" (offsetof (struct pthread, member)));     \
+     else if (sizeof (__value) == 4)					      \
+       asm volatile ("movl %%fs:%P1,%0"					      \
+		     : "=r" (__value)					      \
+		     : "i" (offsetof (struct pthread, member)));	      \
+     else /* 8 */								      \
+       {								      \
+	 asm volatile ("movq %%fs:%P1,%q0"				      \
+		       : "=r" (__value)					      \
+		       : "i" (offsetof (struct pthread, member)));	      \
+       }								      \
+     __value; })
+
+
+/* Same as THREAD_GETMEM, but the member offset can be non-constant.  */
+# define THREAD_GETMEM_NC(descr, member, idx) \
+  ({ __typeof (descr->member[0]) __value;				      \
+     _Static_assert (sizeof (__value) == 1				      \
+		     || sizeof (__value) == 4				      \
+		     || sizeof (__value) == 8,				      \
+		     "size of per-thread data");			      \
+     if (sizeof (__value) == 1)						      \
+       asm volatile ("movb %%fs:%P2(%q3),%b0"				      \
+		     : "=q" (__value)					      \
+		     : "0" (0), "i" (offsetof (struct pthread, member[0])),   \
+		       "r" (idx));					      \
+     else if (sizeof (__value) == 4)					      \
+       asm volatile ("movl %%fs:%P1(,%q2,4),%0"				      \
+		     : "=r" (__value)					      \
+		     : "i" (offsetof (struct pthread, member[0])), "r" (idx));\
+     else /* 8 */							      \
+       {								      \
+	 asm volatile ("movq %%fs:%P1(,%q2,8),%q0"			      \
+		       : "=r" (__value)					      \
+		       : "i" (offsetof (struct pthread, member[0])),	      \
+			 "r" (idx));					      \
+       }								      \
+     __value; })
+
+
+/* Loading addresses of objects on x86-64 needs to be treated special
+   when generating PIC code.  */
+#ifdef __pic__
+# define IMM_MODE "nr"
+#else
+# define IMM_MODE "ir"
+#endif
+
+
+/* Set member of the thread descriptor directly.  */
+# define THREAD_SETMEM(descr, member, value) \
+  ({									      \
+     _Static_assert (sizeof (descr->member) == 1			      \
+		     || sizeof (descr->member) == 4			      \
+		     || sizeof (descr->member) == 8,			      \
+		     "size of per-thread data");			      \
+     if (sizeof (descr->member) == 1)					      \
+       asm volatile ("movb %b0,%%fs:%P1" :				      \
+		     : "iq" (value),					      \
+		       "i" (offsetof (struct pthread, member)));	      \
+     else if (sizeof (descr->member) == 4)				      \
+       asm volatile ("movl %0,%%fs:%P1" :				      \
+		     : IMM_MODE (value),				      \
+		       "i" (offsetof (struct pthread, member)));	      \
+     else /* 8 */							      \
+       {								      \
+	 /* Since movq takes a signed 32-bit immediate or a register source   \
+	    operand, use "er" constraint for 32-bit signed integer constant   \
+	    or register.  */						      \
+	 asm volatile ("movq %q0,%%fs:%P1" :				      \
+		       : "er" ((uint64_t) cast_to_integer (value)),	      \
+			 "i" (offsetof (struct pthread, member)));	      \
+       }})
+
+
+/* Same as THREAD_SETMEM, but the member offset can be non-constant.  */
+# define THREAD_SETMEM_NC(descr, member, idx, value) \
+  ({									      \
+     _Static_assert (sizeof (descr->member[0]) == 1			      \
+		     || sizeof (descr->member[0]) == 4			      \
+		     || sizeof (descr->member[0]) == 8,			      \
+		     "size of per-thread data");			      \
+     if (sizeof (descr->member[0]) == 1)				      \
+       asm volatile ("movb %b0,%%fs:%P1(%q2)" :				      \
+		     : "iq" (value),					      \
+		       "i" (offsetof (struct pthread, member[0])),	      \
+		       "r" (idx));					      \
+     else if (sizeof (descr->member[0]) == 4)				      \
+       asm volatile ("movl %0,%%fs:%P1(,%q2,4)" :			      \
+		     : IMM_MODE (value),				      \
+		       "i" (offsetof (struct pthread, member[0])),	      \
+		       "r" (idx));					      \
+     else /* 8 */							      \
+       {								      \
+	 /* Since movq takes a signed 32-bit immediate or a register source   \
+	    operand, use "er" constraint for 32-bit signed integer constant   \
+	    or register.  */						      \
+	 asm volatile ("movq %q0,%%fs:%P1(,%q2,8)" :			      \
+		       : "er" ((uint64_t) cast_to_integer (value)),	      \
+			 "i" (offsetof (struct pthread, member[0])),	      \
+			 "r" (idx));					      \
+       }})
diff --git a/sysdeps/x86_64/nptl/tls.h b/sysdeps/x86_64/nptl/tls.h
index b0d044353b..a39579897c 100644
--- a/sysdeps/x86_64/nptl/tls.h
+++ b/sysdeps/x86_64/nptl/tls.h
@@ -195,119 +195,7 @@ _Static_assert (offsetof (tcbhead_t, __glibc_unused2) == 0x80,
 # define DB_THREAD_SELF_INCLUDE  <sys/reg.h> /* For the FS constant.  */
 # define DB_THREAD_SELF CONST_THREAD_AREA (64, FS)
 
-/* Read member of the thread descriptor directly.  */
-# define THREAD_GETMEM(descr, member) \
-  ({ __typeof (descr->member) __value;					      \
-     _Static_assert (sizeof (__value) == 1				      \
-		     || sizeof (__value) == 4				      \
-		     || sizeof (__value) == 8,				      \
-		     "size of per-thread data");			      \
-     if (sizeof (__value) == 1)						      \
-       asm volatile ("movb %%fs:%P2,%b0"				      \
-		     : "=q" (__value)					      \
-		     : "0" (0), "i" (offsetof (struct pthread, member)));     \
-     else if (sizeof (__value) == 4)					      \
-       asm volatile ("movl %%fs:%P1,%0"					      \
-		     : "=r" (__value)					      \
-		     : "i" (offsetof (struct pthread, member)));	      \
-     else /* 8 */								      \
-       {								      \
-	 asm volatile ("movq %%fs:%P1,%q0"				      \
-		       : "=r" (__value)					      \
-		       : "i" (offsetof (struct pthread, member)));	      \
-       }								      \
-     __value; })
-
-
-/* Same as THREAD_GETMEM, but the member offset can be non-constant.  */
-# define THREAD_GETMEM_NC(descr, member, idx) \
-  ({ __typeof (descr->member[0]) __value;				      \
-     _Static_assert (sizeof (__value) == 1				      \
-		     || sizeof (__value) == 4				      \
-		     || sizeof (__value) == 8,				      \
-		     "size of per-thread data");			      \
-     if (sizeof (__value) == 1)						      \
-       asm volatile ("movb %%fs:%P2(%q3),%b0"				      \
-		     : "=q" (__value)					      \
-		     : "0" (0), "i" (offsetof (struct pthread, member[0])),   \
-		       "r" (idx));					      \
-     else if (sizeof (__value) == 4)					      \
-       asm volatile ("movl %%fs:%P1(,%q2,4),%0"				      \
-		     : "=r" (__value)					      \
-		     : "i" (offsetof (struct pthread, member[0])), "r" (idx));\
-     else /* 8 */							      \
-       {								      \
-	 asm volatile ("movq %%fs:%P1(,%q2,8),%q0"			      \
-		       : "=r" (__value)					      \
-		       : "i" (offsetof (struct pthread, member[0])),	      \
-			 "r" (idx));					      \
-       }								      \
-     __value; })
-
-
-/* Loading addresses of objects on x86-64 needs to be treated special
-   when generating PIC code.  */
-#ifdef __pic__
-# define IMM_MODE "nr"
-#else
-# define IMM_MODE "ir"
-#endif
-
-
-/* Set member of the thread descriptor directly.  */
-# define THREAD_SETMEM(descr, member, value) \
-  ({									      \
-     _Static_assert (sizeof (descr->member) == 1			      \
-		     || sizeof (descr->member) == 4			      \
-		     || sizeof (descr->member) == 8,			      \
-		     "size of per-thread data");			      \
-     if (sizeof (descr->member) == 1)					      \
-       asm volatile ("movb %b0,%%fs:%P1" :				      \
-		     : "iq" (value),					      \
-		       "i" (offsetof (struct pthread, member)));	      \
-     else if (sizeof (descr->member) == 4)				      \
-       asm volatile ("movl %0,%%fs:%P1" :				      \
-		     : IMM_MODE (value),				      \
-		       "i" (offsetof (struct pthread, member)));	      \
-     else /* 8 */							      \
-       {								      \
-	 /* Since movq takes a signed 32-bit immediate or a register source   \
-	    operand, use "er" constraint for 32-bit signed integer constant   \
-	    or register.  */						      \
-	 asm volatile ("movq %q0,%%fs:%P1" :				      \
-		       : "er" ((uint64_t) cast_to_integer (value)),	      \
-			 "i" (offsetof (struct pthread, member)));	      \
-       }})
-
-
-/* Same as THREAD_SETMEM, but the member offset can be non-constant.  */
-# define THREAD_SETMEM_NC(descr, member, idx, value) \
-  ({									      \
-     _Static_assert (sizeof (descr->member[0]) == 1			      \
-		     || sizeof (descr->member[0]) == 4			      \
-		     || sizeof (descr->member[0]) == 8,			      \
-		     "size of per-thread data");			      \
-     if (sizeof (descr->member[0]) == 1)				      \
-       asm volatile ("movb %b0,%%fs:%P1(%q2)" :				      \
-		     : "iq" (value),					      \
-		       "i" (offsetof (struct pthread, member[0])),	      \
-		       "r" (idx));					      \
-     else if (sizeof (descr->member[0]) == 4)				      \
-       asm volatile ("movl %0,%%fs:%P1(,%q2,4)" :			      \
-		     : IMM_MODE (value),				      \
-		       "i" (offsetof (struct pthread, member[0])),	      \
-		       "r" (idx));					      \
-     else /* 8 */							      \
-       {								      \
-	 /* Since movq takes a signed 32-bit immediate or a register source   \
-	    operand, use "er" constraint for 32-bit signed integer constant   \
-	    or register.  */						      \
-	 asm volatile ("movq %q0,%%fs:%P1(,%q2,8)" :			      \
-		       : "er" ((uint64_t) cast_to_integer (value)),	      \
-			 "i" (offsetof (struct pthread, member[0])),	      \
-			 "r" (idx));					      \
-       }})
-
+# include <tcb-access.h>
 
 /* Set the stack guard field in TCB head.  */
 # define THREAD_SET_STACK_GUARD(value) \
-- 
2.33.1



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 3/8] nptl: Introduce THREAD_GETMEM_VOLATILE
  2021-12-07 12:59 [PATCH v2 0/8] Extensible rseq integration Florian Weimer
  2021-12-07 13:00 ` [PATCH 1/8] nptl: Add <thread_pointer.h> for defining __thread_pointer Florian Weimer
  2021-12-07 13:00 ` [PATCH v2 2/8] nptl: Introduce <tcb-access.h> for THREAD_* accessors Florian Weimer
@ 2021-12-07 13:00 ` Florian Weimer
  2021-12-08 11:23   ` Szabolcs Nagy
  2021-12-07 13:01 ` [PATCH 4/8] nptl: Add rseq registration Florian Weimer
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2021-12-07 13:00 UTC (permalink / raw)
  To: libc-alpha

This will be needed for rseq TCB access.
---
v2: New patch.

 sysdeps/i386/nptl/tcb-access.h   | 2 ++
 sysdeps/nptl/tcb-access.h        | 2 ++
 sysdeps/x86_64/nptl/tcb-access.h | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/sysdeps/i386/nptl/tcb-access.h b/sysdeps/i386/nptl/tcb-access.h
index 6c6d561e39..5ddd83224b 100644
--- a/sysdeps/i386/nptl/tcb-access.h
+++ b/sysdeps/i386/nptl/tcb-access.h
@@ -41,6 +41,8 @@
        }								      \
      __value; })
 
+/* THREAD_GETMEM already forces a read.  */
+#define THREAD_GETMEM_VOLATILE(descr, member) THREAD_GETMEM (descr, member)
 
 /* Same as THREAD_GETMEM, but the member offset can be non-constant.  */
 #define THREAD_GETMEM_NC(descr, member, idx) \
diff --git a/sysdeps/nptl/tcb-access.h b/sysdeps/nptl/tcb-access.h
index b4137b8ab8..bbe20b7225 100644
--- a/sysdeps/nptl/tcb-access.h
+++ b/sysdeps/nptl/tcb-access.h
@@ -22,6 +22,8 @@
 
 #define THREAD_GETMEM(descr, member) \
   descr->member
+#define THREAD_GETMEM_VOLATILE(descr, member) \
+  (*(volatile __typeof (descr->member) *)&descr->member)
 #define THREAD_GETMEM_NC(descr, member, idx) \
   descr->member[idx]
 #define THREAD_SETMEM(descr, member, value) \
diff --git a/sysdeps/x86_64/nptl/tcb-access.h b/sysdeps/x86_64/nptl/tcb-access.h
index 18848a729d..e4d2d07a9b 100644
--- a/sysdeps/x86_64/nptl/tcb-access.h
+++ b/sysdeps/x86_64/nptl/tcb-access.h
@@ -39,6 +39,8 @@
        }								      \
      __value; })
 
+/* THREAD_GETMEM already forces a read.  */
+#define THREAD_GETMEM_VOLATILE(descr, member) THREAD_GETMEM (descr, member)
 
 /* Same as THREAD_GETMEM, but the member offset can be non-constant.  */
 # define THREAD_GETMEM_NC(descr, member, idx) \
-- 
2.33.1



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4/8] nptl: Add rseq registration
  2021-12-07 12:59 [PATCH v2 0/8] Extensible rseq integration Florian Weimer
                   ` (2 preceding siblings ...)
  2021-12-07 13:00 ` [PATCH v2 3/8] nptl: Introduce THREAD_GETMEM_VOLATILE Florian Weimer
@ 2021-12-07 13:01 ` Florian Weimer
  2021-12-08 16:51   ` Szabolcs Nagy
                     ` (2 more replies)
  2021-12-07 13:02 ` [PATCH v2 5/8] Linux: Use rseq to accelerate sched_getcpu Florian Weimer
                   ` (4 subsequent siblings)
  8 siblings, 3 replies; 33+ messages in thread
From: Florian Weimer @ 2021-12-07 13:01 UTC (permalink / raw)
  To: libc-alpha

The rseq area is placed directly into struct pthread.  rseq
registration failure is not treated as an error, so it is possible
that threads run with inconsistent registration status.

<sys/rseq.h> is not yet installed as a public header.

Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
v2: Use volatite access to cpu_id.  Drop csu/libc-tls.c spurious change.

 nptl/descr.h                                |   4 +
 nptl/pthread_create.c                       |  13 +
 sysdeps/nptl/dl-tls_init_tp.c               |   8 +-
 sysdeps/unix/sysv/linux/Makefile            |   9 +-
 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h |  43 ++++
 sysdeps/unix/sysv/linux/arm/bits/rseq.h     |  83 +++++++
 sysdeps/unix/sysv/linux/bits/rseq.h         |  29 +++
 sysdeps/unix/sysv/linux/mips/bits/rseq.h    |  62 +++++
 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h |  37 +++
 sysdeps/unix/sysv/linux/rseq-internal.h     |  45 ++++
 sysdeps/unix/sysv/linux/s390/bits/rseq.h    |  37 +++
 sysdeps/unix/sysv/linux/sys/rseq.h          | 174 +++++++++++++
 sysdeps/unix/sysv/linux/tst-rseq-nptl.c     | 260 ++++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-rseq.c          |  64 +++++
 sysdeps/unix/sysv/linux/tst-rseq.h          |  57 +++++
 sysdeps/unix/sysv/linux/x86/bits/rseq.h     |  30 +++
 16 files changed, 952 insertions(+), 3 deletions(-)
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/arm/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/mips/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h

diff --git a/nptl/descr.h b/nptl/descr.h
index af2a6ab87a..92db305913 100644
--- a/nptl/descr.h
+++ b/nptl/descr.h
@@ -34,6 +34,7 @@
 #include <bits/types/res_state.h>
 #include <kernel-features.h>
 #include <tls-internal-struct.h>
+#include <sys/rseq.h>
 
 #ifndef TCB_ALIGNMENT
 # define TCB_ALIGNMENT 32
@@ -406,6 +407,9 @@ struct pthread
   /* Used on strsignal.  */
   struct tls_internal_t tls_state;
 
+  /* rseq area registered with the kernel.  */
+  struct rseq rseq_area;
+
   /* This member must be last.  */
   char end_padding[];
 
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index bad9eeb52f..ea0d79341e 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -32,6 +32,7 @@
 #include <default-sched.h>
 #include <futex-internal.h>
 #include <tls-setup.h>
+#include <rseq-internal.h>
 #include "libioP.h"
 #include <sys/single_threaded.h>
 #include <version.h>
@@ -366,6 +367,9 @@ start_thread (void *arg)
   /* Initialize pointers to locale data.  */
   __ctype_init ();
 
+  /* Register rseq TLS to the kernel.  */
+  rseq_register_current_thread (pd);
+
 #ifndef __ASSUME_SET_ROBUST_LIST
   if (__nptl_set_robust_list_avail)
 #endif
@@ -571,6 +575,15 @@ out:
      process is really dead since 'clone' got passed the CLONE_CHILD_CLEARTID
      flag.  The 'tid' field in the TCB will be set to zero.
 
+     rseq TLS is still registered at this point.  Rely on implicit
+     unregistration performed by the kernel on thread teardown.  This is not a
+     problem because the rseq TLS lives on the stack, and the stack outlives
+     the thread.  If TCB allocation is ever changed, additional steps may be
+     required, such as performing explicit rseq unregistration before
+     reclaiming the rseq TLS area memory.  It is NOT sufficient to block
+     signals because the kernel may write to the rseq area even without
+     signals.
+
      The exit code is zero since in case all threads exit by calling
      'pthread_exit' the exit status must be 0 (zero).  */
   while (1)
diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
index ca494dd3a5..fedb876fdb 100644
--- a/sysdeps/nptl/dl-tls_init_tp.c
+++ b/sysdeps/nptl/dl-tls_init_tp.c
@@ -21,6 +21,7 @@
 #include <list.h>
 #include <pthreadP.h>
 #include <tls.h>
+#include <rseq-internal.h>
 
 #ifndef __ASSUME_SET_ROBUST_LIST
 bool __nptl_set_robust_list_avail;
@@ -57,11 +58,12 @@ __tls_pre_init_tp (void)
 void
 __tls_init_tp (void)
 {
+  struct pthread *pd = THREAD_SELF;
+
   /* Set up thread stack list management.  */
-  list_add (&THREAD_SELF->list, &GL (dl_stack_user));
+  list_add (&pd->list, &GL (dl_stack_user));
 
    /* Early initialization of the TCB.   */
-   struct pthread *pd = THREAD_SELF;
    pd->tid = INTERNAL_SYSCALL_CALL (set_tid_address, &pd->tid);
    THREAD_SETMEM (pd, specific[0], &pd->specific_1stblock[0]);
    THREAD_SETMEM (pd, user_stack, true);
@@ -90,6 +92,8 @@ __tls_init_tp (void)
       }
   }
 
+  rseq_register_current_thread (pd);
+
   /* Set initial thread's stack block from 0 up to __libc_stack_end.
      It will be bigger than it actually is, but for unwind.c/pt-longjmp.c
      purposes this is good enough.  */
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 29c6c78f98..eb0f5fc021 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -131,7 +131,10 @@ ifeq ($(have-GLIBC_2.27)$(build-shared),yesyes)
 tests += tst-ofdlocks-compat
 endif
 
-tests-internal += tst-sigcontext-get_pc
+tests-internal += \
+  tst-rseq \
+  tst-sigcontext-get_pc \
+  # tests-internal
 
 tests-time64 += \
   tst-adjtimex-time64 \
@@ -357,4 +360,8 @@ endif
 
 ifeq ($(subdir),nptl)
 tests += tst-align-clone tst-getpid1
+
+# tst-rseq-nptl is an internal test because it requires a definition of
+# __NR_rseq from the internal system call list.
+tests-internal += tst-rseq-nptl
 endif
diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
new file mode 100644
index 0000000000..9ba92725c7
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
@@ -0,0 +1,43 @@
+/* Restartable Sequences Linux aarch64 architecture header.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries.  It needs to be defined for each
+   architecture.  When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.
+
+   aarch64 -mbig-endian generates mixed endianness code vs data:
+   little-endian code and big-endian data.  Ensure the RSEQ_SIG signature
+   matches code endianness.  */
+
+#define RSEQ_SIG_CODE  0xd428bc00  /* BRK #0x45E0.  */
+
+#ifdef __AARCH64EB__
+# define RSEQ_SIG_DATA 0x00bc28d4  /* BRK #0x45E0.  */
+#else
+# define RSEQ_SIG_DATA RSEQ_SIG_CODE
+#endif
+
+#define RSEQ_SIG       RSEQ_SIG_DATA
diff --git a/sysdeps/unix/sysv/linux/arm/bits/rseq.h b/sysdeps/unix/sysv/linux/arm/bits/rseq.h
new file mode 100644
index 0000000000..0542b26f6a
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h
@@ -0,0 +1,83 @@
+/* Restartable Sequences Linux arm architecture header.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/*
+   RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries.  It needs to be defined for each
+   architecture.  When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.
+
+   - ARM little endian
+
+   RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
+   value 0x5de3.  This traps if user-space reaches this instruction by mistake,
+   and the uncommon operand ensures the kernel does not move the instruction
+   pointer to attacker-controlled code on rseq abort.
+
+   The instruction pattern in the A32 instruction set is:
+
+   e7f5def3    udf    #24035    ; 0x5de3
+
+   This translates to the following instruction pattern in the T16 instruction
+   set:
+
+   little endian:
+   def3        udf    #243      ; 0xf3
+   e7f5        b.n    <7f5>
+
+   - ARMv6+ big endian (BE8):
+
+   ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
+   code and big-endian data.  The data value of the signature needs to have its
+   byte order reversed to generate the trap instruction:
+
+   Data: 0xf3def5e7
+
+   Translates to this A32 instruction pattern:
+
+   e7f5def3    udf    #24035    ; 0x5de3
+
+   Translates to this T16 instruction pattern:
+
+   def3        udf    #243      ; 0xf3
+   e7f5        b.n    <7f5>
+
+   - Prior to ARMv6 big endian (BE32):
+
+   Prior to ARMv6, -mbig-endian generates big-endian code and data
+   (which match), so the endianness of the data representation of the
+   signature should not be reversed.  However, the choice between BE32
+   and BE8 is done by the linker, so we cannot know whether code and
+   data endianness will be mixed before the linker is invoked.  So rather
+   than try to play tricks with the linker, the rseq signature is simply
+   data (not a trap instruction) prior to ARMv6 on big endian.  This is
+   why the signature is expressed as data (.word) rather than as
+   instruction (.inst) in assembler.  */
+
+#ifdef __ARMEB__
+# define RSEQ_SIG    0xf3def5e7      /* udf    #24035    ; 0x5de3 (ARMv6+) */
+#else
+# define RSEQ_SIG    0xe7f5def3      /* udf    #24035    ; 0x5de3 */
+#endif
diff --git a/sysdeps/unix/sysv/linux/bits/rseq.h b/sysdeps/unix/sysv/linux/bits/rseq.h
new file mode 100644
index 0000000000..46cf5d1c74
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/bits/rseq.h
@@ -0,0 +1,29 @@
+/* Restartable Sequences architecture header.  Stub version.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries.  It needs to be defined for each
+   architecture.  When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.  */
diff --git a/sysdeps/unix/sysv/linux/mips/bits/rseq.h b/sysdeps/unix/sysv/linux/mips/bits/rseq.h
new file mode 100644
index 0000000000..a9defee568
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/mips/bits/rseq.h
@@ -0,0 +1,62 @@
+/* Restartable Sequences Linux mips architecture header.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries.  It needs to be defined for each
+   architecture.  When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.
+
+   RSEQ_SIG uses the break instruction.  The instruction pattern is:
+
+   On MIPS:
+        0350000d        break     0x350
+
+   On nanoMIPS:
+        00100350        break     0x350
+
+   On microMIPS:
+        0000d407        break     0x350
+
+   For nanoMIPS32 and microMIPS, the instruction stream is encoded as
+   16-bit halfwords, so the signature halfwords need to be swapped
+   accordingly for little-endian.  */
+
+#if defined (__nanomips__)
+# ifdef __MIPSEL__
+#  define RSEQ_SIG      0x03500010
+# else
+#  define RSEQ_SIG      0x00100350
+# endif
+#elif defined (__mips_micromips)
+# ifdef __MIPSEL__
+#  define RSEQ_SIG      0xd4070000
+# else
+#  define RSEQ_SIG      0x0000d407
+# endif
+#elif defined (__mips__)
+# define RSEQ_SIG       0x0350000d
+#else
+/* Unknown MIPS architecture.  */
+#endif
diff --git a/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
new file mode 100644
index 0000000000..05b3cf7b8f
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
@@ -0,0 +1,37 @@
+/* Restartable Sequences Linux powerpc architecture header.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries.  It needs to be defined for each
+   architecture.  When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.
+
+   RSEQ_SIG uses the following trap instruction:
+
+   powerpc-be:    0f e5 00 0b           twui   r5,11
+   powerpc64-le:  0b 00 e5 0f           twui   r5,11
+   powerpc64-be:  0f e5 00 0b           twui   r5,11  */
+
+#define RSEQ_SIG        0x0fe5000b
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
new file mode 100644
index 0000000000..909f547825
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -0,0 +1,45 @@
+/* Restartable Sequences internal API.  Linux implementation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef RSEQ_INTERNAL_H
+#define RSEQ_INTERNAL_H
+
+#include <sysdep.h>
+#include <errno.h>
+#include <kernel-features.h>
+#include <stdio.h>
+#include <sys/rseq.h>
+
+#ifdef RSEQ_SIG
+static inline void
+rseq_register_current_thread (struct pthread *self)
+{
+  int ret = INTERNAL_SYSCALL_CALL (rseq,
+                                   &self->rseq_area, sizeof (self->rseq_area),
+                                   0, RSEQ_SIG);
+  if (INTERNAL_SYSCALL_ERROR_P (ret))
+    THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
+}
+#else /* RSEQ_SIG */
+static inline void
+rseq_register_current_thread (struct pthread *self)
+{
+  THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
+}
+#endif /* RSEQ_SIG */
+
+#endif /* rseq-internal.h */
diff --git a/sysdeps/unix/sysv/linux/s390/bits/rseq.h b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
new file mode 100644
index 0000000000..3030e38f40
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
@@ -0,0 +1,37 @@
+/* Restartable Sequences Linux s390 architecture header.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries.  It needs to be defined for each
+   architecture.  When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.
+
+   RSEQ_SIG uses the trap4 instruction.  As Linux does not make use of the
+   access-register mode nor the linkage stack this instruction will always
+   cause a special-operation exception (the trap-enabled bit in the DUCT
+   is and will stay 0).  The instruction pattern is
+       b2 ff 0f ff        trap4   4095(%r0)  */
+
+#define RSEQ_SIG        0xB2FF0FFF
diff --git a/sysdeps/unix/sysv/linux/sys/rseq.h b/sysdeps/unix/sysv/linux/sys/rseq.h
new file mode 100644
index 0000000000..c8edff50d4
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/sys/rseq.h
@@ -0,0 +1,174 @@
+/* Restartable Sequences exported symbols.  Linux header.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+#define _SYS_RSEQ_H	1
+
+/* Architecture-specific rseq signature.  */
+#include <bits/rseq.h>
+
+#include <stdint.h>
+#include <sys/cdefs.h>
+#include <bits/endian.h>
+
+#ifdef __has_include
+# if __has_include ("linux/rseq.h")
+#  define __GLIBC_HAVE_KERNEL_RSEQ
+# endif
+#else
+# include <linux/version.h>
+# if LINUX_VERSION_CODE >= KERNEL_VERSION (4, 18, 0)
+#  define __GLIBC_HAVE_KERNEL_RSEQ
+# endif
+#endif
+
+#ifdef __GLIBC_HAVE_KERNEL_RSEQ
+/* We use the structures declarations from the kernel headers.  */
+# include <linux/rseq.h>
+#else /* __GLIBC_HAVE_KERNEL_RSEQ */
+/* We use a copy of the include/uapi/linux/rseq.h kernel header.  */
+
+enum rseq_cpu_id_state
+  {
+    RSEQ_CPU_ID_UNINITIALIZED = -1,
+    RSEQ_CPU_ID_REGISTRATION_FAILED = -2,
+  };
+
+enum rseq_flags
+  {
+    RSEQ_FLAG_UNREGISTER = (1 << 0),
+  };
+
+enum rseq_cs_flags_bit
+  {
+    RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT = 0,
+    RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT = 1,
+    RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT = 2,
+  };
+
+enum rseq_cs_flags
+  {
+    RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT =
+      (1U << RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT),
+    RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL =
+      (1U << RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT),
+    RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE =
+      (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT),
+  };
+
+/* struct rseq_cs is aligned on 32 bytes to ensure it is always
+   contained within a single cache-line.  It is usually declared as
+   link-time constant data.  */
+struct rseq_cs
+  {
+    /* Version of this structure.  */
+    uint32_t version;
+    /* enum rseq_cs_flags.  */
+    uint32_t flags;
+    uint64_t start_ip;
+    /* Offset from start_ip.  */
+    uint64_t post_commit_offset;
+    uint64_t abort_ip;
+  } __attribute__ ((__aligned__ (32)));
+
+/* struct rseq is aligned on 32 bytes to ensure it is always
+   contained within a single cache-line.
+
+   A single struct rseq per thread is allowed.  */
+struct rseq
+  {
+    /* Restartable sequences cpu_id_start field.  Updated by the
+       kernel.  Read by user-space with single-copy atomicity
+       semantics.  This field should only be read by the thread which
+       registered this data structure.  Aligned on 32-bit.  Always
+       contains a value in the range of possible CPUs, although the
+       value may not be the actual current CPU (e.g. if rseq is not
+       initialized).  This CPU number value should always be compared
+       against the value of the cpu_id field before performing a rseq
+       commit or returning a value read from a data structure indexed
+       using the cpu_id_start value.  */
+    uint32_t cpu_id_start;
+    /* Restartable sequences cpu_id field.  Updated by the kernel.
+       Read by user-space with single-copy atomicity semantics.  This
+       field should only be read by the thread which registered this
+       data structure.  Aligned on 32-bit.  Values
+       RSEQ_CPU_ID_UNINITIALIZED and RSEQ_CPU_ID_REGISTRATION_FAILED
+       have a special semantic: the former means "rseq uninitialized",
+       and latter means "rseq initialization failed".  This value is
+       meant to be read within rseq critical sections and compared
+       with the cpu_id_start value previously read, before performing
+       the commit instruction, or read and compared with the
+       cpu_id_start value before returning a value loaded from a data
+       structure indexed using the cpu_id_start value.  */
+    uint32_t cpu_id;
+    /* Restartable sequences rseq_cs field.
+
+       Contains NULL when no critical section is active for the current
+       thread, or holds a pointer to the currently active struct rseq_cs.
+
+       Updated by user-space, which sets the address of the currently
+       active rseq_cs at the beginning of assembly instruction sequence
+       block, and set to NULL by the kernel when it restarts an assembly
+       instruction sequence block, as well as when the kernel detects that
+       it is preempting or delivering a signal outside of the range
+       targeted by the rseq_cs.  Also needs to be set to NULL by user-space
+       before reclaiming memory that contains the targeted struct rseq_cs.
+
+       Read and set by the kernel.  Set by user-space with single-copy
+       atomicity semantics.  This field should only be updated by the
+       thread which registered this data structure.  Aligned on 64-bit.  */
+    union
+      {
+        uint64_t ptr64;
+# ifdef __LP64__
+        uint64_t ptr;
+# else /* __LP64__ */
+        struct
+          {
+#if __BYTE_ORDER == __BIG_ENDIAN
+            uint32_t padding; /* Initialized to zero.  */
+            uint32_t ptr32;
+#  else /* LITTLE */
+            uint32_t ptr32;
+            uint32_t padding; /* Initialized to zero.  */
+#  endif /* ENDIAN */
+          } ptr;
+# endif /* __LP64__ */
+      } rseq_cs;
+
+    /* Restartable sequences flags field.
+
+       This field should only be updated by the thread which
+       registered this data structure.  Read by the kernel.
+       Mainly used for single-stepping through rseq critical sections
+       with debuggers.
+
+       - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
+           Inhibit instruction sequence block restart on preemption
+           for this thread.
+       - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
+           Inhibit instruction sequence block restart on signal
+           delivery for this thread.
+       - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
+           Inhibit instruction sequence block restart on migration for
+           this thread.  */
+    uint32_t flags;
+  } __attribute__ ((__aligned__ (32)));
+
+#endif /* __GLIBC_HAVE_KERNEL_RSEQ */
+
+#endif /* sys/rseq.h */
diff --git a/sysdeps/unix/sysv/linux/tst-rseq-nptl.c b/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
new file mode 100644
index 0000000000..d31d94445c
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
@@ -0,0 +1,260 @@
+/* Restartable Sequences NPTL test.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* These tests validate that rseq is registered from various execution
+   contexts (main thread, destructor, other threads, other threads created
+   from destructor, forked process (without exec), pthread_atfork handlers,
+   pthread setspecific destructors, signal handlers, atexit handlers).
+
+   See the Linux kernel selftests for extensive rseq stress-tests.  */
+
+#include <stdio.h>
+#include <support/check.h>
+#include <support/xthread.h>
+#include <sys/rseq.h>
+#include <unistd.h>
+
+#ifdef RSEQ_SIG
+# include <array_length.h>
+# include <errno.h>
+# include <error.h>
+# include <pthread.h>
+# include <signal.h>
+# include <stdlib.h>
+# include <string.h>
+# include <support/namespace.h>
+# include <support/xsignal.h>
+# include <syscall.h>
+# include <sys/types.h>
+# include <sys/wait.h>
+# include "tst-rseq.h"
+
+static pthread_key_t rseq_test_key;
+
+static void
+atfork_prepare (void)
+{
+  if (!rseq_thread_registered ())
+    {
+      printf ("error: rseq not registered in pthread atfork prepare\n");
+      support_record_failure ();
+    }
+}
+
+static void
+atfork_parent (void)
+{
+  if (!rseq_thread_registered ())
+    {
+      printf ("error: rseq not registered in pthread atfork parent\n");
+      support_record_failure ();
+    }
+}
+
+static void
+atfork_child (void)
+{
+  if (!rseq_thread_registered ())
+    {
+      printf ("error: rseq not registered in pthread atfork child\n");
+      support_record_failure ();
+    }
+}
+
+static void
+rseq_key_destructor (void *arg)
+{
+  /* Cannot use deferred failure reporting after main returns.  */
+  if (!rseq_thread_registered ())
+    FAIL_EXIT1 ("rseq not registered in pthread key destructor");
+}
+
+static void
+atexit_handler (void)
+{
+  /* Cannot use deferred failure reporting after main returns.  */
+  if (!rseq_thread_registered ())
+    FAIL_EXIT1 ("rseq not registered in atexit handler");
+}
+
+/* Used to avoid -Werror=stringop-overread warning with
+   pthread_setspecific and GCC 11.  */
+static char one = 1;
+
+static void
+do_rseq_main_test (void)
+{
+  TEST_COMPARE (atexit (atexit_handler), 0);
+  rseq_test_key = xpthread_key_create (rseq_key_destructor);
+  TEST_COMPARE (pthread_atfork (atfork_prepare, atfork_parent, atfork_child), 0);
+  xraise (SIGUSR1);
+  TEST_COMPARE (pthread_setspecific (rseq_test_key, &one), 0);
+  TEST_VERIFY_EXIT (rseq_thread_registered ());
+}
+
+static void
+cancel_routine (void *arg)
+{
+  if (!rseq_thread_registered ())
+    {
+      printf ("error: rseq not registered in cancel routine\n");
+      support_record_failure ();
+    }
+}
+
+static pthread_barrier_t cancel_thread_barrier;
+static pthread_cond_t cancel_thread_cond = PTHREAD_COND_INITIALIZER;
+static pthread_mutex_t cancel_thread_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+static void
+test_cancel_thread (void)
+{
+  pthread_cleanup_push (cancel_routine, NULL);
+  (void) xpthread_barrier_wait (&cancel_thread_barrier);
+  /* Wait forever until cancellation.  */
+  xpthread_cond_wait (&cancel_thread_cond, &cancel_thread_mutex);
+  pthread_cleanup_pop (0);
+}
+
+static void *
+thread_function (void * arg)
+{
+  int i = (int) (intptr_t) arg;
+
+  xraise (SIGUSR1);
+  if (i == 0)
+    test_cancel_thread ();
+  TEST_COMPARE (pthread_setspecific (rseq_test_key, &one), 0);
+  return rseq_thread_registered () ? NULL : (void *) 1l;
+}
+
+static void
+sighandler (int sig)
+{
+  if (!rseq_thread_registered ())
+    {
+      printf ("error: rseq not registered in signal handler\n");
+      support_record_failure ();
+    }
+}
+
+static void
+setup_signals (void)
+{
+  struct sigaction sa;
+
+  sigemptyset (&sa.sa_mask);
+  sigaddset (&sa.sa_mask, SIGUSR1);
+  sa.sa_flags = 0;
+  sa.sa_handler = sighandler;
+  xsigaction (SIGUSR1, &sa, NULL);
+}
+
+static int
+do_rseq_threads_test (int nr_threads)
+{
+  pthread_t th[nr_threads];
+  int i;
+  int result = 0;
+
+  xpthread_barrier_init (&cancel_thread_barrier, NULL, 2);
+
+  for (i = 0; i < nr_threads; ++i)
+    th[i] = xpthread_create (NULL, thread_function,
+                             (void *) (intptr_t) i);
+
+  (void) xpthread_barrier_wait (&cancel_thread_barrier);
+
+  xpthread_cancel (th[0]);
+
+  for (i = 0; i < nr_threads; ++i)
+    {
+      void *v;
+
+      v = xpthread_join (th[i]);
+      if (i != 0 && v != NULL)
+        {
+          printf ("error: join %d successful, but child failed\n", i);
+          result = 1;
+        }
+      else if (i == 0 && v == NULL)
+        {
+          printf ("error: join %d successful, child did not fail as expected\n", i);
+          result = 1;
+        }
+    }
+
+  xpthread_barrier_destroy (&cancel_thread_barrier);
+
+  return result;
+}
+
+static void
+subprocess_callback (void *closure)
+{
+  do_rseq_main_test ();
+}
+
+static void
+do_rseq_fork_test (void)
+{
+  support_isolate_in_subprocess (subprocess_callback, NULL);
+}
+
+static int
+do_rseq_test (void)
+{
+  int t[] = { 1, 2, 6, 5, 4, 3, 50 };
+  int i, result = 0;
+
+  if (!rseq_available ())
+    FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test");
+  setup_signals ();
+  xraise (SIGUSR1);
+  do_rseq_main_test ();
+  for (i = 0; i < array_length (t); i++)
+    if (do_rseq_threads_test (t[i]))
+      result = 1;
+  do_rseq_fork_test ();
+  return result;
+}
+
+static void __attribute__ ((destructor))
+do_rseq_destructor_test (void)
+{
+  /* Cannot use deferred failure reporting after main returns.  */
+  if (do_rseq_test ())
+    FAIL_EXIT1 ("rseq not registered within destructor");
+  xpthread_key_delete (rseq_test_key);
+}
+
+#else /* RSEQ_SIG */
+static int
+do_rseq_test (void)
+{
+  FAIL_UNSUPPORTED ("glibc does not define RSEQ_SIG, skipping test");
+  return 0;
+}
+#endif /* RSEQ_SIG */
+
+static int
+do_test (void)
+{
+  return do_rseq_test ();
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c
new file mode 100644
index 0000000000..926376b6a5
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-rseq.c
@@ -0,0 +1,64 @@
+/* Restartable Sequences single-threaded tests.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* These tests validate that rseq is registered from main in an executable
+   not linked against libpthread.  */
+
+#include <support/check.h>
+#include <stdio.h>
+#include <sys/rseq.h>
+#include <unistd.h>
+
+#ifdef RSEQ_SIG
+# include <errno.h>
+# include <error.h>
+# include <stdlib.h>
+# include <string.h>
+# include <syscall.h>
+# include "tst-rseq.h"
+
+static void
+do_rseq_main_test (void)
+{
+  TEST_VERIFY_EXIT (rseq_thread_registered ());
+}
+
+static void
+do_rseq_test (void)
+{
+  if (!rseq_available ())
+    {
+      FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test");
+    }
+  do_rseq_main_test ();
+}
+#else /* RSEQ_SIG */
+static void
+do_rseq_test (void)
+{
+  FAIL_UNSUPPORTED ("glibc does not define RSEQ_SIG, skipping test");
+}
+#endif /* RSEQ_SIG */
+
+static int
+do_test (void)
+{
+  do_rseq_test ();
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-rseq.h b/sysdeps/unix/sysv/linux/tst-rseq.h
new file mode 100644
index 0000000000..a476c316fc
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-rseq.h
@@ -0,0 +1,57 @@
+/* Restartable Sequences tests header.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <error.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <support/check.h>
+#include <syscall.h>
+#include <sys/rseq.h>
+#include <tls.h>
+
+static inline bool
+rseq_thread_registered (void)
+{
+  return THREAD_GETMEM_VOLATILE (THREAD_SELF, rseq_area.cpu_id) >= 0;
+}
+
+static inline int
+sys_rseq (struct rseq *rseq_abi, uint32_t rseq_len, int flags, uint32_t sig)
+{
+  return syscall (__NR_rseq, rseq_abi, rseq_len, flags, sig);
+}
+
+static inline bool
+rseq_available (void)
+{
+  int rc;
+
+  rc = sys_rseq (NULL, 0, 0, 0);
+  if (rc != -1)
+    FAIL_EXIT1 ("Unexpected rseq return value %d", rc);
+  switch (errno)
+    {
+    case ENOSYS:
+      return false;
+    case EINVAL:
+      /* rseq is implemented, but detected an invalid rseq_len parameter.  */
+      return true;
+    default:
+      FAIL_EXIT1 ("Unexpected rseq error %s", strerror (errno));
+    }
+}
diff --git a/sysdeps/unix/sysv/linux/x86/bits/rseq.h b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
new file mode 100644
index 0000000000..9fc909e7c8
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
@@ -0,0 +1,30 @@
+/* Restartable Sequences Linux x86 architecture header.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   RSEQ_SIG is used with the following reserved undefined instructions, which
+   trap in user-space:
+
+   x86-32:    0f b9 3d 53 30 05 53      ud1    0x53053053,%edi
+   x86-64:    0f b9 3d 53 30 05 53      ud1    0x53053053(%rip),%edi  */
+
+#define RSEQ_SIG        0x53053053
-- 
2.33.1



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 5/8] Linux: Use rseq to accelerate sched_getcpu
  2021-12-07 12:59 [PATCH v2 0/8] Extensible rseq integration Florian Weimer
                   ` (3 preceding siblings ...)
  2021-12-07 13:01 ` [PATCH 4/8] nptl: Add rseq registration Florian Weimer
@ 2021-12-07 13:02 ` Florian Weimer
  2021-12-08 16:53   ` Szabolcs Nagy
  2021-12-07 13:02 ` [PATCH v2 6/8] nptl: Add glibc.pthread.rseq tunable to control rseq registration Florian Weimer
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2021-12-07 13:02 UTC (permalink / raw)
  To: libc-alpha

Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
v2: Use volatile access.

 sysdeps/unix/sysv/linux/sched_getcpu.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c
index c41e986f2c..6f78edaea1 100644
--- a/sysdeps/unix/sysv/linux/sched_getcpu.c
+++ b/sysdeps/unix/sysv/linux/sched_getcpu.c
@@ -20,8 +20,8 @@
 #include <sysdep.h>
 #include <sysdep-vdso.h>
 
-int
-sched_getcpu (void)
+static int
+vsyscall_sched_getcpu (void)
 {
   unsigned int cpu;
   int r = -1;
@@ -32,3 +32,18 @@ sched_getcpu (void)
 #endif
   return r == -1 ? r : cpu;
 }
+
+#ifdef RSEQ_SIG
+int
+sched_getcpu (void)
+{
+  int cpu_id = THREAD_GETMEM_VOLATILE (THREAD_SELF, rseq_area.cpu_id);
+  return __glibc_likely (cpu_id >= 0) ? cpu_id : vsyscall_sched_getcpu ();
+}
+#else /* RSEQ_SIG */
+int
+sched_getcpu (void)
+{
+  return vsyscall_sched_getcpu ();
+}
+#endif /* RSEQ_SIG */
-- 
2.33.1



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 6/8] nptl: Add glibc.pthread.rseq tunable to control rseq registration
  2021-12-07 12:59 [PATCH v2 0/8] Extensible rseq integration Florian Weimer
                   ` (4 preceding siblings ...)
  2021-12-07 13:02 ` [PATCH v2 5/8] Linux: Use rseq to accelerate sched_getcpu Florian Weimer
@ 2021-12-07 13:02 ` Florian Weimer
  2021-12-08 17:22   ` Szabolcs Nagy
  2021-12-08 18:03   ` Siddhesh Poyarekar
  2021-12-07 13:03 ` [PATCH 7/8] nptl: Add public rseq symbols and <sys/rseq.h> Florian Weimer
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 33+ messages in thread
From: Florian Weimer @ 2021-12-07 13:02 UTC (permalink / raw)
  To: libc-alpha

This tunable allows applications to register the rseq area instead
of glibc.
---
v2: Unchanged.

 manual/tunables.texi                       | 10 +++
 nptl/pthread_create.c                      | 10 ++-
 sysdeps/nptl/dl-tls_init_tp.c              | 11 ++-
 sysdeps/nptl/dl-tunables.list              |  6 ++
 sysdeps/nptl/internaltypes.h               |  1 +
 sysdeps/unix/sysv/linux/Makefile           |  8 ++
 sysdeps/unix/sysv/linux/rseq-internal.h    | 19 +++--
 sysdeps/unix/sysv/linux/tst-rseq-disable.c | 89 ++++++++++++++++++++++
 8 files changed, 145 insertions(+), 9 deletions(-)
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-disable.c

diff --git a/manual/tunables.texi b/manual/tunables.texi
index 10f4d75993..5d50b90f64 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -424,6 +424,16 @@ The value is measured in bytes.  The default is @samp{41943040}
 (fourty mibibytes).
 @end deftp
 
+@deftp Tunable glibc.pthread.rseq
+The @code{glibc.pthread.rseq} tunable can be set to @samp{0}, to disable
+restartable sequences support in @theglibc{}.  This enables applications
+to perform direct restartable sequence registration with the kernel.
+The default is @samp{1}, which means that @theglibc{} performs
+registration on behalf of the application.
+
+Restartable sequences are a Linux-specific extension.
+@end deftp
+
 @node Hardware Capability Tunables
 @section Hardware Capability Tunables
 @cindex hardware capability tunables
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index ea0d79341e..4608fd9068 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -368,7 +368,10 @@ start_thread (void *arg)
   __ctype_init ();
 
   /* Register rseq TLS to the kernel.  */
-  rseq_register_current_thread (pd);
+  {
+    bool do_rseq = THREAD_GETMEM (pd, flags) & ATTR_FLAG_DO_RSEQ;
+    rseq_register_current_thread (pd, do_rseq);
+  }
 
 #ifndef __ASSUME_SET_ROBUST_LIST
   if (__nptl_set_robust_list_avail)
@@ -677,6 +680,11 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
   pd->flags = ((iattr->flags & ~(ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET))
 	       | (self->flags & (ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET)));
 
+  /* Inherit rseq registration state.  Without seccomp filters, rseq
+     registration will either always fail or always succeed.  */
+  if ((int) THREAD_GETMEM_VOLATILE (self, rseq_area.cpu_id) >= 0)
+    pd->flags |= ATTR_FLAG_DO_RSEQ;
+
   /* Initialize the field for the ID of the thread which is waiting
      for us.  This is a self-reference in case the thread is created
      detached.  */
diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
index fedb876fdb..b39dfbff2c 100644
--- a/sysdeps/nptl/dl-tls_init_tp.c
+++ b/sysdeps/nptl/dl-tls_init_tp.c
@@ -23,6 +23,9 @@
 #include <tls.h>
 #include <rseq-internal.h>
 
+#define TUNABLE_NAMESPACE pthread
+#include <dl-tunables.h>
+
 #ifndef __ASSUME_SET_ROBUST_LIST
 bool __nptl_set_robust_list_avail;
 rtld_hidden_data_def (__nptl_set_robust_list_avail)
@@ -92,7 +95,13 @@ __tls_init_tp (void)
       }
   }
 
-  rseq_register_current_thread (pd);
+  {
+    bool do_rseq = true;
+#if HAVE_TUNABLES
+    do_rseq = TUNABLE_GET (rseq, int, NULL);
+#endif
+    rseq_register_current_thread (pd, do_rseq);
+  }
 
   /* Set initial thread's stack block from 0 up to __libc_stack_end.
      It will be bigger than it actually is, but for unwind.c/pt-longjmp.c
diff --git a/sysdeps/nptl/dl-tunables.list b/sysdeps/nptl/dl-tunables.list
index ac5d053298..d24f4be0d0 100644
--- a/sysdeps/nptl/dl-tunables.list
+++ b/sysdeps/nptl/dl-tunables.list
@@ -27,5 +27,11 @@ glibc {
       type: SIZE_T
       default: 41943040
     }
+    rseq {
+      type: INT_32
+      minval: 0
+      maxval: 1
+      default: 1
+    }
   }
 }
diff --git a/sysdeps/nptl/internaltypes.h b/sysdeps/nptl/internaltypes.h
index 6032a6b785..dec8c5b5ff 100644
--- a/sysdeps/nptl/internaltypes.h
+++ b/sysdeps/nptl/internaltypes.h
@@ -48,6 +48,7 @@ struct pthread_attr
 #define ATTR_FLAG_OLDATTR		0x0010
 #define ATTR_FLAG_SCHED_SET		0x0020
 #define ATTR_FLAG_POLICY_SET		0x0040
+#define ATTR_FLAG_DO_RSEQ		0x0080
 
 /* Used to allocate a pthread_attr_t object which is also accessed
    internally.  */
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index eb0f5fc021..62a796f214 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -136,6 +136,12 @@ tests-internal += \
   tst-sigcontext-get_pc \
   # tests-internal
 
+ifneq (no,$(have-tunables))
+tests-internal += \
+  tst-rseq-disable \
+  # tests-internal $(have-tunables)
+endif
+
 tests-time64 += \
   tst-adjtimex-time64 \
   tst-clock_adjtime-time64 \
@@ -227,6 +233,8 @@ $(objpfx)tst-mman-consts.out: ../sysdeps/unix/sysv/linux/tst-mman-consts.py
 	  < /dev/null > $@ 2>&1; $(evaluate-test)
 $(objpfx)tst-mman-consts.out: $(sysdeps-linux-python-deps)
 
+tst-rseq-disable-ENV = GLIBC_TUNABLES=glibc.pthread.rseq=0
+
 endif # $(subdir) == misc
 
 ifeq ($(subdir),time)
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
index 909f547825..15bc7ffd6e 100644
--- a/sysdeps/unix/sysv/linux/rseq-internal.h
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -21,22 +21,27 @@
 #include <sysdep.h>
 #include <errno.h>
 #include <kernel-features.h>
+#include <stdbool.h>
 #include <stdio.h>
 #include <sys/rseq.h>
 
 #ifdef RSEQ_SIG
 static inline void
-rseq_register_current_thread (struct pthread *self)
+rseq_register_current_thread (struct pthread *self, bool do_rseq)
 {
-  int ret = INTERNAL_SYSCALL_CALL (rseq,
-                                   &self->rseq_area, sizeof (self->rseq_area),
-                                   0, RSEQ_SIG);
-  if (INTERNAL_SYSCALL_ERROR_P (ret))
-    THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
+  if (do_rseq)
+    {
+      int ret = INTERNAL_SYSCALL_CALL (rseq, &self->rseq_area,
+                                       sizeof (self->rseq_area),
+                                       0, RSEQ_SIG);
+      if (!INTERNAL_SYSCALL_ERROR_P (ret))
+        return;
+    }
+  THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
 }
 #else /* RSEQ_SIG */
 static inline void
-rseq_register_current_thread (struct pthread *self)
+rseq_register_current_thread (struct pthread *self, bool do_rseq)
 {
   THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
 }
diff --git a/sysdeps/unix/sysv/linux/tst-rseq-disable.c b/sysdeps/unix/sysv/linux/tst-rseq-disable.c
new file mode 100644
index 0000000000..000e351872
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-rseq-disable.c
@@ -0,0 +1,89 @@
+/* Test disabling of rseq registration via tunable.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <stdio.h>
+#include <support/check.h>
+#include <support/namespace.h>
+#include <support/xthread.h>
+#include <sysdep.h>
+#include <unistd.h>
+
+#ifdef RSEQ_SIG
+
+/* Check that rseq can be registered and has not been taken by glibc.  */
+static void
+check_rseq_disabled (void)
+{
+  struct pthread *pd = THREAD_SELF;
+  TEST_COMPARE ((int) pd->rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
+
+  int ret = syscall (__NR_rseq, &pd->rseq_area, sizeof (pd->rseq_area),
+                     0, RSEQ_SIG);
+  if (ret == 0)
+    {
+      ret = syscall (__NR_rseq, &pd->rseq_area, sizeof (pd->rseq_area),
+                     RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
+      TEST_COMPARE (ret, 0);
+      pd->rseq_area.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED;
+    }
+  else
+    {
+      TEST_VERIFY (errno != -EINVAL);
+      TEST_VERIFY (errno != -EBUSY);
+    }
+}
+
+static void *
+thread_func (void *ignored)
+{
+  check_rseq_disabled ();
+  return NULL;
+}
+
+static void
+proc_func (void *ignored)
+{
+  check_rseq_disabled ();
+}
+
+static int
+do_test (void)
+{
+  puts ("info: checking main thread");
+  check_rseq_disabled ();
+
+  puts ("info: checking main thread (2)");
+  check_rseq_disabled ();
+
+  puts ("info: checking new thread");
+  xpthread_join (xpthread_create (NULL, thread_func, NULL));
+
+  puts ("info: checking subprocess");
+  support_isolate_in_subprocess (proc_func, NULL);
+
+  return 0;
+}
+#else /* !RSEQ_SIG */
+static int
+do_test (void)
+{
+  FAIL_UNSUPPORTED ("glibc does not define RSEQ_SIG, skipping test");
+}
+#endif
+
+#include <support/test-driver.c>
-- 
2.33.1



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 7/8] nptl: Add public rseq symbols and <sys/rseq.h>
  2021-12-07 12:59 [PATCH v2 0/8] Extensible rseq integration Florian Weimer
                   ` (5 preceding siblings ...)
  2021-12-07 13:02 ` [PATCH v2 6/8] nptl: Add glibc.pthread.rseq tunable to control rseq registration Florian Weimer
@ 2021-12-07 13:03 ` Florian Weimer
  2021-12-08 17:34   ` Szabolcs Nagy
  2021-12-09 12:26   ` Szabolcs Nagy
  2021-12-07 13:04 ` [PATCH v2 8/8] nptl: rseq failure after registration on main thread is fatal Florian Weimer
  2022-02-01 15:21 ` [PATCH v2 0/8] Extensible rseq integration Rich Felker
  8 siblings, 2 replies; 33+ messages in thread
From: Florian Weimer @ 2021-12-07 13:03 UTC (permalink / raw)
  To: libc-alpha

The relationship between the thread pointer and the rseq area
is made explicit.  The constant offset can be used by JIT compilers
to optimize rseq access (e.g., for really fast sched_getcpu).

Extensibility is provided through __rseq_size and __rseq_flags.
(In the future, the kernel could request a different rseq size
via the auxiliary vector.)

Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
v2: Fix manual typos.

 NEWS                                          | 11 +++
 manual/threads.texi                           | 81 +++++++++++++++++++
 sysdeps/nptl/dl-tls_init_tp.c                 | 23 +++++-
 sysdeps/unix/sysv/linux/Makefile              |  3 +-
 sysdeps/unix/sysv/linux/Versions              |  5 ++
 sysdeps/unix/sysv/linux/aarch64/ld.abilist    |  3 +
 sysdeps/unix/sysv/linux/alpha/ld.abilist      |  3 +
 sysdeps/unix/sysv/linux/arc/ld.abilist        |  3 +
 sysdeps/unix/sysv/linux/arm/be/ld.abilist     |  3 +
 sysdeps/unix/sysv/linux/arm/le/ld.abilist     |  3 +
 sysdeps/unix/sysv/linux/csky/ld.abilist       |  3 +
 sysdeps/unix/sysv/linux/hppa/ld.abilist       |  3 +
 sysdeps/unix/sysv/linux/i386/ld.abilist       |  3 +
 sysdeps/unix/sysv/linux/ia64/ld.abilist       |  3 +
 .../unix/sysv/linux/m68k/coldfire/ld.abilist  |  3 +
 .../unix/sysv/linux/m68k/m680x0/ld.abilist    |  3 +
 sysdeps/unix/sysv/linux/microblaze/ld.abilist |  3 +
 .../unix/sysv/linux/mips/mips32/ld.abilist    |  3 +
 .../sysv/linux/mips/mips64/n32/ld.abilist     |  3 +
 .../sysv/linux/mips/mips64/n64/ld.abilist     |  3 +
 sysdeps/unix/sysv/linux/nios2/ld.abilist      |  3 +
 .../sysv/linux/powerpc/powerpc32/ld.abilist   |  3 +
 .../linux/powerpc/powerpc64/be/ld.abilist     |  3 +
 .../linux/powerpc/powerpc64/le/ld.abilist     |  3 +
 sysdeps/unix/sysv/linux/riscv/rv32/ld.abilist |  3 +
 sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist |  3 +
 sysdeps/unix/sysv/linux/rseq-internal.h       |  8 +-
 .../unix/sysv/linux/s390/s390-32/ld.abilist   |  3 +
 .../unix/sysv/linux/s390/s390-64/ld.abilist   |  3 +
 sysdeps/unix/sysv/linux/sh/be/ld.abilist      |  3 +
 sysdeps/unix/sysv/linux/sh/le/ld.abilist      |  3 +
 .../unix/sysv/linux/sparc/sparc32/ld.abilist  |  3 +
 .../unix/sysv/linux/sparc/sparc64/ld.abilist  |  3 +
 sysdeps/unix/sysv/linux/sys/rseq.h            | 10 +++
 sysdeps/unix/sysv/linux/tst-rseq-disable.c    |  6 ++
 sysdeps/unix/sysv/linux/tst-rseq.c            |  8 ++
 sysdeps/unix/sysv/linux/x86_64/64/ld.abilist  |  3 +
 sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist |  3 +
 38 files changed, 237 insertions(+), 5 deletions(-)

diff --git a/NEWS b/NEWS
index 1398cf2e87..8744a92532 100644
--- a/NEWS
+++ b/NEWS
@@ -68,6 +68,17 @@ Major new features:
   to be used by compilers for optimizing usage of 'memcmp' when its
   return value is only used for its boolean status.
 
+* Support for automatically registering threads with the Linux rseq
+  system call has been added.  This system call is implemented starting
+  from Linux 4.18.  The Restartable Sequences ABI accelerates user-space
+  operations on per-cpu data.  It allows user-space to perform updates
+  on per-cpu data without requiring heavy-weight atomic operations.
+  Automatically registering threads allows all libraries, including
+  libc, to make immediate use of the rseq support by using the
+  documented ABI, via the __rseq_flags, __rseq_offset, and __rseq_size
+  variables.  The GNU C Library manual has details on integration of
+  Restartable Sequences.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * The r_version update in the debugger interface makes the glibc binary
diff --git a/manual/threads.texi b/manual/threads.texi
index 06b6b277a1..c11be15c89 100644
--- a/manual/threads.texi
+++ b/manual/threads.texi
@@ -629,6 +629,8 @@ the standard.
 * Waiting with Explicit Clocks::          Functions for waiting with an
                                           explicit clock specification.
 * Single-Threaded::                       Detecting single-threaded execution.
+* Restartable Sequences::                 Linux-specific restartable sequences
+                                          integration.
 @end menu
 
 @node Default Thread Attributes
@@ -958,6 +960,85 @@ application-created thread because future versions of @theglibc{} may
 create background threads after the first thread has been created, and
 the application has no way of knowning that these threads are present.
 
+@node Restartable Sequences
+@subsubsection Restartable Sequences
+
+This section describes restartable sequences integration for
+@theglibc{}.  This functionality is only available on Linux.
+
+@deftp {Data Type} {struct rseq}
+@standards{Linux, sys/rseq.h}
+The type of the restartable sequences area.  Future versions
+of Linux may add additional fields to the end of this structure.
+
+
+Users need to obtain the address of the restartable sequences area using
+the thread pointer and the @code{__rseq_offset} variable, described
+below.
+
+One use of the restartable sequences area is to read the current CPU
+number from its @code{cpu_id} field, as an inline version of
+@code{sched_getcpu}.  @Theglibc{} sets the @code{cpu_id} field to
+@code{RSEQ_CPU_ID_REGISTRATION_FAILED} if registration failed or was
+explicitly disabled.
+
+Furthermore, users can store the address of a @code{struct rseq_cs}
+object into the @code{rseq_cs} field of @code{struct rseq}, thus
+informing the kernel that the thread enters a restartable sequence
+critical section.  This pointer and the code areas it itself points to
+must not be left pointing to memory areas which are freed or re-used.
+Several approaches can guarantee this.  If the application or library
+can guarantee that the memory used to hold the @code{struct rseq_cs} and
+the code areas it refers to are never freed or re-used, no special
+action must be taken.  Else, before that memory is re-used of freed, the
+application is responsible for setting the @code{rseq_cs} field to
+@code{NULL} in each thread's restartable sequence area to guarantee that
+it does not leak dangling references.  Because the application does not
+typically have knowledge of libraries' use of restartable sequences, it
+is recommended that libraries using restartable sequences which may end
+up freeing or re-using their memory set the @code{rseq_cs} field to
+@code{NULL} before returning from library functions which use
+restartable sequences.
+
+The manual for the @code{rseq} system call can be found
+at @uref{https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/doc/man/rseq.2}.
+@end deftp
+
+@deftypevar {int} __rseq_offset
+@standards{Linux, sys/rseq.h}
+This variable contains the offset between the thread pointer (as defined
+by @code{__builtin_thread_pointer} or the thread pointer register for
+the architecture) and the restartable sequences area.  This value is the
+same for all threads in the process.  If the restartable sequences area
+is located at a lower address than the location to which the thread
+pointer points, the value is negative.
+@end deftypevar
+
+@deftypevar {unsigned int} __rseq_size
+@standards{Linux, sys/rseq.h}
+This variable is either zero (if restartable sequence registration
+failed or has been disabled) or the size of the restartable sequence
+registration.  This can be less can be different from the size of
+@code{struct rseq} if the kernel has extended the size of the
+registration.  If registration is successful, @code{__rseq_size} is at
+least 32 (the initial size of @code{struct rseq}).
+@end deftypevar
+
+@deftypevar {unsigned int} __rseq_flags
+@standards{Linux, sys/rseq.h}
+The flags used during restartable sequence registration with the kernel.
+Currently zero.
+@end deftypevar
+
+@deftypevr Macro int RSEQ_SIG
+@standards{Linux, sys/rseq.h}
+Each supported architecture provides a @code{RSEQ_SIG} macro in
+@file{sys/rseq.h} which contains a signature.  That signature is
+expected to be present in the code before each restartable sequences
+abort handler.  Failure to provide the expected signature may terminate
+the process with a segmentation fault.
+@end deftypevr
+
 @c FIXME these are undocumented:
 @c pthread_atfork
 @c pthread_attr_destroy
diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
index b39dfbff2c..77443cc330 100644
--- a/sysdeps/nptl/dl-tls_init_tp.c
+++ b/sysdeps/nptl/dl-tls_init_tp.c
@@ -22,6 +22,7 @@
 #include <pthreadP.h>
 #include <tls.h>
 #include <rseq-internal.h>
+#include <thread_pointer.h>
 
 #define TUNABLE_NAMESPACE pthread
 #include <dl-tunables.h>
@@ -43,6 +44,10 @@ rtld_mutex_dummy (pthread_mutex_t *lock)
 }
 #endif
 
+const unsigned int __rseq_flags;
+const unsigned int __rseq_size attribute_relro;
+const int __rseq_offset attribute_relro;
+
 void
 __tls_pre_init_tp (void)
 {
@@ -100,7 +105,23 @@ __tls_init_tp (void)
 #if HAVE_TUNABLES
     do_rseq = TUNABLE_GET (rseq, int, NULL);
 #endif
-    rseq_register_current_thread (pd, do_rseq);
+    if (rseq_register_current_thread (pd, do_rseq))
+      {
+        /* We need a writable view of the variables.  They are in
+           .data.relro and are not yet write-protected.  */
+        extern unsigned int size __asm__ ("__rseq_size");
+        size = sizeof (pd->rseq_area);
+      }
+
+#ifdef RSEQ_SIG
+    /* This should be a compile-time constant, but the current
+       infrastructure makes it difficult to determine its value.  Not
+       all targets support __thread_pointer, so set set __rseq_offset
+       only if thre rseq registration may have happened because
+       RSEQ_SIG is defined.  */
+    extern int offset __asm__ ("__rseq_offset");
+    offset = (char *) &pd->rseq_area - (char *) __thread_pointer ();
+#endif
   }
 
   /* Set initial thread's stack block from 0 up to __libc_stack_end.
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 62a796f214..61acc1987d 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -112,7 +112,8 @@ sysdep_headers += sys/mount.h sys/acct.h \
 		  bits/types/struct_semid64_ds_helper.h \
 		  bits/types/struct_shmid64_ds.h \
 		  bits/types/struct_shmid64_ds_helper.h \
-		  bits/pthread_stack_min.h bits/pthread_stack_min-dynamic.h
+		  bits/pthread_stack_min.h bits/pthread_stack_min-dynamic.h \
+		  sys/rseq.h bits/rseq.h
 
 tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \
 	 tst-quota tst-sync_file_range tst-sysconf-iov_max tst-ttyname \
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index 26452f3f17..3f8809a158 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -316,6 +316,11 @@ librt {
 }
 
 ld {
+  GLIBC_2.35 {
+    __rseq_flags;
+    __rseq_offset;
+    __rseq_size;
+  }
   GLIBC_PRIVATE {
     __nptl_change_stack_perm;
   }
diff --git a/sysdeps/unix/sysv/linux/aarch64/ld.abilist b/sysdeps/unix/sysv/linux/aarch64/ld.abilist
index 80b2fe6725..717a35f242 100644
--- a/sysdeps/unix/sysv/linux/aarch64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/ld.abilist
@@ -3,3 +3,6 @@ GLIBC_2.17 __stack_chk_guard D 0x8
 GLIBC_2.17 __tls_get_addr F
 GLIBC_2.17 _dl_mcount F
 GLIBC_2.17 _r_debug D 0x28
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/alpha/ld.abilist b/sysdeps/unix/sysv/linux/alpha/ld.abilist
index 98a03f611f..76911bd7f8 100644
--- a/sysdeps/unix/sysv/linux/alpha/ld.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/ld.abilist
@@ -2,4 +2,7 @@ GLIBC_2.0 _r_debug D 0x28
 GLIBC_2.1 __libc_stack_end D 0x8
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x8
diff --git a/sysdeps/unix/sysv/linux/arc/ld.abilist b/sysdeps/unix/sysv/linux/arc/ld.abilist
index 048f17c848..71c67f9803 100644
--- a/sysdeps/unix/sysv/linux/arc/ld.abilist
+++ b/sysdeps/unix/sysv/linux/arc/ld.abilist
@@ -3,3 +3,6 @@ GLIBC_2.32 __stack_chk_guard D 0x4
 GLIBC_2.32 __tls_get_addr F
 GLIBC_2.32 _dl_mcount F
 GLIBC_2.32 _r_debug D 0x14
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/arm/be/ld.abilist b/sysdeps/unix/sysv/linux/arm/be/ld.abilist
index cc8825c3bc..3859433b21 100644
--- a/sysdeps/unix/sysv/linux/arm/be/ld.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/ld.abilist
@@ -1,3 +1,6 @@
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
 GLIBC_2.4 __libc_stack_end D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
 GLIBC_2.4 __tls_get_addr F
diff --git a/sysdeps/unix/sysv/linux/arm/le/ld.abilist b/sysdeps/unix/sysv/linux/arm/le/ld.abilist
index cc8825c3bc..3859433b21 100644
--- a/sysdeps/unix/sysv/linux/arm/le/ld.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/ld.abilist
@@ -1,3 +1,6 @@
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
 GLIBC_2.4 __libc_stack_end D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
 GLIBC_2.4 __tls_get_addr F
diff --git a/sysdeps/unix/sysv/linux/csky/ld.abilist b/sysdeps/unix/sysv/linux/csky/ld.abilist
index 564ac09737..6bfc582b73 100644
--- a/sysdeps/unix/sysv/linux/csky/ld.abilist
+++ b/sysdeps/unix/sysv/linux/csky/ld.abilist
@@ -3,3 +3,6 @@ GLIBC_2.29 __stack_chk_guard D 0x4
 GLIBC_2.29 __tls_get_addr F
 GLIBC_2.29 _dl_mcount F
 GLIBC_2.29 _r_debug D 0x14
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/hppa/ld.abilist b/sysdeps/unix/sysv/linux/hppa/ld.abilist
index d155a59843..efccd6a023 100644
--- a/sysdeps/unix/sysv/linux/hppa/ld.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/ld.abilist
@@ -2,4 +2,7 @@ GLIBC_2.2 __libc_stack_end D 0x4
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.2 _r_debug D 0x14
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/i386/ld.abilist b/sysdeps/unix/sysv/linux/i386/ld.abilist
index 0478e22071..1eb94ae75f 100644
--- a/sysdeps/unix/sysv/linux/i386/ld.abilist
+++ b/sysdeps/unix/sysv/linux/i386/ld.abilist
@@ -3,3 +3,6 @@ GLIBC_2.1 __libc_stack_end D 0x4
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 ___tls_get_addr F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/ia64/ld.abilist b/sysdeps/unix/sysv/linux/ia64/ld.abilist
index 33f91199bf..2cc68bcf7b 100644
--- a/sysdeps/unix/sysv/linux/ia64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/ld.abilist
@@ -2,3 +2,6 @@ GLIBC_2.2 __libc_stack_end D 0x8
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.2 _r_debug D 0x28
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist
index cc8825c3bc..3859433b21 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist
@@ -1,3 +1,6 @@
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
 GLIBC_2.4 __libc_stack_end D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
 GLIBC_2.4 __tls_get_addr F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist
index 3ba474c27f..e62b2742af 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist
@@ -2,4 +2,7 @@ GLIBC_2.0 _r_debug D 0x14
 GLIBC_2.1 __libc_stack_end D 0x4
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/microblaze/ld.abilist b/sysdeps/unix/sysv/linux/microblaze/ld.abilist
index a4933c3541..5d63d74e8f 100644
--- a/sysdeps/unix/sysv/linux/microblaze/ld.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/ld.abilist
@@ -3,3 +3,6 @@ GLIBC_2.18 __stack_chk_guard D 0x4
 GLIBC_2.18 __tls_get_addr F
 GLIBC_2.18 _dl_mcount F
 GLIBC_2.18 _r_debug D 0x14
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/ld.abilist b/sysdeps/unix/sysv/linux/mips/mips32/ld.abilist
index be09641a48..53ca22de2f 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/ld.abilist
@@ -2,4 +2,7 @@ GLIBC_2.0 _r_debug D 0x14
 GLIBC_2.2 __libc_stack_end D 0x4
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist
index be09641a48..53ca22de2f 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist
@@ -2,4 +2,7 @@ GLIBC_2.0 _r_debug D 0x14
 GLIBC_2.2 __libc_stack_end D 0x4
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist
index 1ea36e13f2..d1cdd68333 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist
@@ -2,4 +2,7 @@ GLIBC_2.0 _r_debug D 0x28
 GLIBC_2.2 __libc_stack_end D 0x8
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x8
diff --git a/sysdeps/unix/sysv/linux/nios2/ld.abilist b/sysdeps/unix/sysv/linux/nios2/ld.abilist
index 52178802dd..bcbba1823e 100644
--- a/sysdeps/unix/sysv/linux/nios2/ld.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/ld.abilist
@@ -3,3 +3,6 @@ GLIBC_2.21 __stack_chk_guard D 0x4
 GLIBC_2.21 __tls_get_addr F
 GLIBC_2.21 _dl_mcount F
 GLIBC_2.21 _r_debug D 0x14
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist
index 4bbfba7a61..0d033cb8bd 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist
@@ -4,3 +4,6 @@ GLIBC_2.1 _dl_mcount F
 GLIBC_2.22 __tls_get_addr_opt F
 GLIBC_2.23 __parse_hwcap_and_convert_at_platform F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist
index 283fb4510b..9c627b1ddf 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist
@@ -4,3 +4,6 @@ GLIBC_2.3 __libc_stack_end D 0x8
 GLIBC_2.3 __tls_get_addr F
 GLIBC_2.3 _dl_mcount F
 GLIBC_2.3 _r_debug D 0x28
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist
index b1f313c7cd..3a748c2817 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist
@@ -4,3 +4,6 @@ GLIBC_2.17 _dl_mcount F
 GLIBC_2.17 _r_debug D 0x28
 GLIBC_2.22 __tls_get_addr_opt F
 GLIBC_2.23 __parse_hwcap_and_convert_at_platform F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/ld.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/ld.abilist
index 94ca64c43d..4c67ea18d6 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/ld.abilist
@@ -3,3 +3,6 @@ GLIBC_2.33 __stack_chk_guard D 0x4
 GLIBC_2.33 __tls_get_addr F
 GLIBC_2.33 _dl_mcount F
 GLIBC_2.33 _r_debug D 0x14
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist
index 845f356c3c..09596f09e2 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist
@@ -3,3 +3,6 @@ GLIBC_2.27 __stack_chk_guard D 0x8
 GLIBC_2.27 __tls_get_addr F
 GLIBC_2.27 _dl_mcount F
 GLIBC_2.27 _r_debug D 0x28
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
index 15bc7ffd6e..9e8f99fd51 100644
--- a/sysdeps/unix/sysv/linux/rseq-internal.h
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -26,7 +26,7 @@
 #include <sys/rseq.h>
 
 #ifdef RSEQ_SIG
-static inline void
+static inline bool
 rseq_register_current_thread (struct pthread *self, bool do_rseq)
 {
   if (do_rseq)
@@ -35,15 +35,17 @@ rseq_register_current_thread (struct pthread *self, bool do_rseq)
                                        sizeof (self->rseq_area),
                                        0, RSEQ_SIG);
       if (!INTERNAL_SYSCALL_ERROR_P (ret))
-        return;
+        return true;
     }
   THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
+  return false;
 }
 #else /* RSEQ_SIG */
-static inline void
+static inline bool
 rseq_register_current_thread (struct pthread *self, bool do_rseq)
 {
   THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
+  return false;
 }
 #endif /* RSEQ_SIG */
 
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist
index b56f005beb..2c47004bae 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist
@@ -2,3 +2,6 @@ GLIBC_2.0 _r_debug D 0x14
 GLIBC_2.1 __libc_stack_end D 0x4
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 __tls_get_offset F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist
index 6f788a086d..385a73a257 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist
@@ -2,3 +2,6 @@ GLIBC_2.2 __libc_stack_end D 0x8
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.2 _r_debug D 0x28
 GLIBC_2.3 __tls_get_offset F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/sh/be/ld.abilist b/sysdeps/unix/sysv/linux/sh/be/ld.abilist
index d155a59843..efccd6a023 100644
--- a/sysdeps/unix/sysv/linux/sh/be/ld.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/ld.abilist
@@ -2,4 +2,7 @@ GLIBC_2.2 __libc_stack_end D 0x4
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.2 _r_debug D 0x14
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/sh/le/ld.abilist b/sysdeps/unix/sysv/linux/sh/le/ld.abilist
index d155a59843..efccd6a023 100644
--- a/sysdeps/unix/sysv/linux/sh/le/ld.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/ld.abilist
@@ -2,4 +2,7 @@ GLIBC_2.2 __libc_stack_end D 0x4
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.2 _r_debug D 0x14
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist
index 0c6610e3c2..8fb5ff3ef3 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist
@@ -2,3 +2,6 @@ GLIBC_2.0 _r_debug D 0x14
 GLIBC_2.1 __libc_stack_end D 0x4
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist
index 33f91199bf..2cc68bcf7b 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist
@@ -2,3 +2,6 @@ GLIBC_2.2 __libc_stack_end D 0x8
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.2 _r_debug D 0x28
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/sys/rseq.h b/sysdeps/unix/sysv/linux/sys/rseq.h
index c8edff50d4..1215b5d086 100644
--- a/sysdeps/unix/sysv/linux/sys/rseq.h
+++ b/sysdeps/unix/sysv/linux/sys/rseq.h
@@ -171,4 +171,14 @@ struct rseq
 
 #endif /* __GLIBC_HAVE_KERNEL_RSEQ */
 
+/* Offset from the thread pointer to the rseq area.  */
+extern const int __rseq_offset;
+
+/* Size of the registered rseq area.  0 if the registration was
+   unsuccessful.  */
+extern const unsigned int __rseq_size;
+
+/* Flags used during rseq registration.  */
+extern const unsigned int __rseq_flags;
+
 #endif /* sys/rseq.h */
diff --git a/sysdeps/unix/sysv/linux/tst-rseq-disable.c b/sysdeps/unix/sysv/linux/tst-rseq-disable.c
index 000e351872..6d73f77e96 100644
--- a/sysdeps/unix/sysv/linux/tst-rseq-disable.c
+++ b/sysdeps/unix/sysv/linux/tst-rseq-disable.c
@@ -21,6 +21,7 @@
 #include <support/namespace.h>
 #include <support/xthread.h>
 #include <sysdep.h>
+#include <thread_pointer.h>
 #include <unistd.h>
 
 #ifdef RSEQ_SIG
@@ -30,6 +31,11 @@ static void
 check_rseq_disabled (void)
 {
   struct pthread *pd = THREAD_SELF;
+
+  TEST_COMPARE (__rseq_flags, 0);
+  TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset
+               == (char *) &pd->rseq_area);
+  TEST_COMPARE (__rseq_size, 0);
   TEST_COMPARE ((int) pd->rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
 
   int ret = syscall (__NR_rseq, &pd->rseq_area, sizeof (pd->rseq_area),
diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c
index 926376b6a5..572c11166f 100644
--- a/sysdeps/unix/sysv/linux/tst-rseq.c
+++ b/sysdeps/unix/sysv/linux/tst-rseq.c
@@ -29,12 +29,20 @@
 # include <stdlib.h>
 # include <string.h>
 # include <syscall.h>
+# include <thread_pointer.h>
+# include <tls.h>
 # include "tst-rseq.h"
 
 static void
 do_rseq_main_test (void)
 {
+  struct pthread *pd = THREAD_SELF;
+
   TEST_VERIFY_EXIT (rseq_thread_registered ());
+  TEST_COMPARE (__rseq_flags, 0);
+  TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset
+               == (char *) &pd->rseq_area);
+  TEST_COMPARE (__rseq_size, sizeof (pd->rseq_area));
 }
 
 static void
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/ld.abilist b/sysdeps/unix/sysv/linux/x86_64/64/ld.abilist
index d3cdf7611e..49a8f31c93 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/ld.abilist
@@ -2,3 +2,6 @@ GLIBC_2.2.5 __libc_stack_end D 0x8
 GLIBC_2.2.5 _dl_mcount F
 GLIBC_2.2.5 _r_debug D 0x28
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist
index c70bccf782..ce68cc6304 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist
@@ -2,3 +2,6 @@ GLIBC_2.16 __libc_stack_end D 0x4
 GLIBC_2.16 __tls_get_addr F
 GLIBC_2.16 _dl_mcount F
 GLIBC_2.16 _r_debug D 0x14
+GLIBC_2.35 __rseq_flags D 0x4
+GLIBC_2.35 __rseq_offset D 0x4
+GLIBC_2.35 __rseq_size D 0x4
-- 
2.33.1



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 8/8] nptl: rseq failure after registration on main thread is fatal
  2021-12-07 12:59 [PATCH v2 0/8] Extensible rseq integration Florian Weimer
                   ` (6 preceding siblings ...)
  2021-12-07 13:03 ` [PATCH 7/8] nptl: Add public rseq symbols and <sys/rseq.h> Florian Weimer
@ 2021-12-07 13:04 ` Florian Weimer
  2021-12-08 17:36   ` Szabolcs Nagy
  2022-02-01 15:21 ` [PATCH v2 0/8] Extensible rseq integration Rich Felker
  8 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2021-12-07 13:04 UTC (permalink / raw)
  To: libc-alpha

This simplifies the application programming model.

Browser sandboxes have already been fixed:

  Sandbox is incompatible with rseq registration
  <https://bugzilla.mozilla.org/show_bug.cgi?id=1651701>

  Allow rseq in the Linux sandboxes. r=gcp
  <https://hg.mozilla.org/mozilla-central/rev/042425712eb1>

  Sandbox needs to support rseq system call
  <https://bugs.chromium.org/p/chromium/issues/detail?id=1104160>

  Linux sandbox: Allow rseq(2)
  <https://chromium.googlesource.com/chromium/src.git/+/230675d9ac8f1>
---
v2: New patch.  Tested with Firefox 94.0 on Fedora 35.

 nptl/pthread_create.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 4608fd9068..c097fc54e6 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -370,7 +370,8 @@ start_thread (void *arg)
   /* Register rseq TLS to the kernel.  */
   {
     bool do_rseq = THREAD_GETMEM (pd, flags) & ATTR_FLAG_DO_RSEQ;
-    rseq_register_current_thread (pd, do_rseq);
+    if (!rseq_register_current_thread (pd, do_rseq) && do_rseq)
+      __libc_fatal ("Fatal glibc error: rseq registration failed\n");
   }
 
 #ifndef __ASSUME_SET_ROBUST_LIST
-- 
2.33.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] nptl: Add <thread_pointer.h> for defining __thread_pointer
  2021-12-07 13:00 ` [PATCH 1/8] nptl: Add <thread_pointer.h> for defining __thread_pointer Florian Weimer
@ 2021-12-08 11:05   ` Szabolcs Nagy
  2021-12-08 17:55     ` Florian Weimer
  0 siblings, 1 reply; 33+ messages in thread
From: Szabolcs Nagy @ 2021-12-08 11:05 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 12/07/2021 14:00, Florian Weimer via Libc-alpha wrote:
> <tls.h> already contains a definition that is quite similar,
> but it is not consistent across architectures.
> 
> Only architectures for which rseq support is added are covered.

This looks ok.

It's an annoying gcc bug that __builtin_thread_pointer
does not work consistently across targets.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

> ---
> v2: As posted before.
> 
>  sysdeps/nptl/thread_pointer.h         | 28 ++++++++++++++++++++
>  sysdeps/powerpc/nptl/thread_pointer.h | 33 +++++++++++++++++++++++
>  sysdeps/x86/nptl/thread_pointer.h     | 38 +++++++++++++++++++++++++++
>  3 files changed, 99 insertions(+)
>  create mode 100644 sysdeps/nptl/thread_pointer.h
>  create mode 100644 sysdeps/powerpc/nptl/thread_pointer.h
>  create mode 100644 sysdeps/x86/nptl/thread_pointer.h
> 
> diff --git a/sysdeps/nptl/thread_pointer.h b/sysdeps/nptl/thread_pointer.h
> new file mode 100644
> index 0000000000..92f2f3093e
> --- /dev/null
> +++ b/sysdeps/nptl/thread_pointer.h
> @@ -0,0 +1,28 @@
> +/* __thread_pointer definition.  Generic version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_THREAD_POINTER_H
> +#define _SYS_THREAD_POINTER_H
> +
> +static inline void *
> +__thread_pointer (void)
> +{
> +  return __builtin_thread_pointer ();
> +}
> +
> +#endif /* _SYS_THREAD_POINTER_H */
> diff --git a/sysdeps/powerpc/nptl/thread_pointer.h b/sysdeps/powerpc/nptl/thread_pointer.h
> new file mode 100644
> index 0000000000..8fd5ba671f
> --- /dev/null
> +++ b/sysdeps/powerpc/nptl/thread_pointer.h
> @@ -0,0 +1,33 @@
> +/* __thread_pointer definition.  powerpc version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_THREAD_POINTER_H
> +#define _SYS_THREAD_POINTER_H
> +
> +static inline void *
> +__thread_pointer (void)
> +{
> +#ifdef __powerpc64__
> +  register void *__result asm ("r13");
> +#else
> +  register void *__result asm ("r2");
> +#endif
> +  return __result;
> +}
> +
> +#endif /* _SYS_THREAD_POINTER_H */
> diff --git a/sysdeps/x86/nptl/thread_pointer.h b/sysdeps/x86/nptl/thread_pointer.h
> new file mode 100644
> index 0000000000..6b71b6f7e1
> --- /dev/null
> +++ b/sysdeps/x86/nptl/thread_pointer.h
> @@ -0,0 +1,38 @@
> +/* __thread_pointer definition.  x86 version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_THREAD_POINTER_H
> +#define _SYS_THREAD_POINTER_H
> +
> +static inline void *
> +__thread_pointer (void)
> +{
> +#if __GNUC_PREREQ (11, 1)
> +  return __builtin_thread_pointer ();
> +#else
> +  void *__result;
> +# ifdef __x86_64__
> +  __asm__ ("mov %%fs:0, %0" : "=r" (__result));
> +# else
> +  __asm__ ("mov %%gs:0, %0" : "=r" (__result));
> +# endif
> +  return __result;
> +#endif /* !GCC 11 */
> +}
> +
> +#endif /* _SYS_THREAD_POINTER_H */
> -- 
> 2.33.1
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/8] nptl: Introduce <tcb-access.h> for THREAD_* accessors
  2021-12-07 13:00 ` [PATCH v2 2/8] nptl: Introduce <tcb-access.h> for THREAD_* accessors Florian Weimer
@ 2021-12-08 11:09   ` Szabolcs Nagy
  0 siblings, 0 replies; 33+ messages in thread
From: Szabolcs Nagy @ 2021-12-08 11:09 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 12/07/2021 14:00, Florian Weimer via Libc-alpha wrote:
> These are common between most architectures.  Only the x86 targets
> are outliers.

This code refactoring looks OK.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/8] nptl: Introduce THREAD_GETMEM_VOLATILE
  2021-12-07 13:00 ` [PATCH v2 3/8] nptl: Introduce THREAD_GETMEM_VOLATILE Florian Weimer
@ 2021-12-08 11:23   ` Szabolcs Nagy
  0 siblings, 0 replies; 33+ messages in thread
From: Szabolcs Nagy @ 2021-12-08 11:23 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 12/07/2021 14:00, Florian Weimer via Libc-alpha wrote:
> This will be needed for rseq TCB access.

I think volatile access is the right thing to do.
(we could change the type of cpu_id to be volatile but
then accesses outside of rseq registration would be
unnecessarily volatile)

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] nptl: Add rseq registration
  2021-12-07 13:01 ` [PATCH 4/8] nptl: Add rseq registration Florian Weimer
@ 2021-12-08 16:51   ` Szabolcs Nagy
  2021-12-08 18:03   ` Siddhesh Poyarekar
  2021-12-09  1:51   ` Noah Goldstein
  2 siblings, 0 replies; 33+ messages in thread
From: Szabolcs Nagy @ 2021-12-08 16:51 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 12/07/2021 14:01, Florian Weimer via Libc-alpha wrote:
> The rseq area is placed directly into struct pthread.  rseq
> registration failure is not treated as an error, so it is possible
> that threads run with inconsistent registration status.
> 
> <sys/rseq.h> is not yet installed as a public header.
> 
> Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

looks good.

most of the changes were reviewed when rseq was first committed.

my problem with __has_include ("linux/rseq.h") etc in sys/rseq.h
is that linux might change that later to conflict with libc headers
in some way. but i don't have a better way to avoid issues when
both libc and linux headers are included into the same TU.

despite the comments in linux/rseq.h (and sys/rseq.h) the
RSEQ_CPU_ID_UNINITIALIZED state is now not observable.
i guess it is just an unused piece of linux uapi so ok.

inconsistent rseq status in threads is ok.

not unregistering on thread exit is ok.

updated tests are ok.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 5/8] Linux: Use rseq to accelerate sched_getcpu
  2021-12-07 13:02 ` [PATCH v2 5/8] Linux: Use rseq to accelerate sched_getcpu Florian Weimer
@ 2021-12-08 16:53   ` Szabolcs Nagy
  0 siblings, 0 replies; 33+ messages in thread
From: Szabolcs Nagy @ 2021-12-08 16:53 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 12/07/2021 14:02, Florian Weimer via Libc-alpha wrote:
> Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
> v2: Use volatile access.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 6/8] nptl: Add glibc.pthread.rseq tunable to control rseq registration
  2021-12-07 13:02 ` [PATCH v2 6/8] nptl: Add glibc.pthread.rseq tunable to control rseq registration Florian Weimer
@ 2021-12-08 17:22   ` Szabolcs Nagy
  2021-12-08 18:03   ` Siddhesh Poyarekar
  1 sibling, 0 replies; 33+ messages in thread
From: Szabolcs Nagy @ 2021-12-08 17:22 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 12/07/2021 14:02, Florian Weimer via Libc-alpha wrote:
> This tunable allows applications to register the rseq area instead
> of glibc.
> ---
> v2: Unchanged.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/8] nptl: Add public rseq symbols and <sys/rseq.h>
  2021-12-07 13:03 ` [PATCH 7/8] nptl: Add public rseq symbols and <sys/rseq.h> Florian Weimer
@ 2021-12-08 17:34   ` Szabolcs Nagy
  2021-12-09 12:26   ` Szabolcs Nagy
  1 sibling, 0 replies; 33+ messages in thread
From: Szabolcs Nagy @ 2021-12-08 17:34 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 12/07/2021 14:03, Florian Weimer via Libc-alpha wrote:
> The relationship between the thread pointer and the rseq area
> is made explicit.  The constant offset can be used by JIT compilers
> to optimize rseq access (e.g., for really fast sched_getcpu).
> 
> Extensibility is provided through __rseq_size and __rseq_flags.
> (In the future, the kernel could request a different rseq size
> via the auxiliary vector.)
> 
> Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
> v2: Fix manual typos.

Other than the typos noted below this looks ok.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

> --- a/manual/threads.texi
> +++ b/manual/threads.texi
> @@ -629,6 +629,8 @@ the standard.
>  * Waiting with Explicit Clocks::          Functions for waiting with an
>                                            explicit clock specification.
>  * Single-Threaded::                       Detecting single-threaded execution.
> +* Restartable Sequences::                 Linux-specific restartable sequences
> +                                          integration.
>  @end menu
>  
>  @node Default Thread Attributes
> @@ -958,6 +960,85 @@ application-created thread because future versions of @theglibc{} may
>  create background threads after the first thread has been created, and
>  the application has no way of knowning that these threads are present.
>  
> +@node Restartable Sequences
> +@subsubsection Restartable Sequences
> +
> +This section describes restartable sequences integration for
> +@theglibc{}.  This functionality is only available on Linux.
> +
> +@deftp {Data Type} {struct rseq}
> +@standards{Linux, sys/rseq.h}
> +The type of the restartable sequences area.  Future versions
> +of Linux may add additional fields to the end of this structure.
> +
> +
> +Users need to obtain the address of the restartable sequences area using
> +the thread pointer and the @code{__rseq_offset} variable, described
> +below.
> +
> +One use of the restartable sequences area is to read the current CPU
> +number from its @code{cpu_id} field, as an inline version of
> +@code{sched_getcpu}.  @Theglibc{} sets the @code{cpu_id} field to
> +@code{RSEQ_CPU_ID_REGISTRATION_FAILED} if registration failed or was
> +explicitly disabled.
> +
> +Furthermore, users can store the address of a @code{struct rseq_cs}
> +object into the @code{rseq_cs} field of @code{struct rseq}, thus
> +informing the kernel that the thread enters a restartable sequence
> +critical section.  This pointer and the code areas it itself points to
> +must not be left pointing to memory areas which are freed or re-used.
> +Several approaches can guarantee this.  If the application or library
> +can guarantee that the memory used to hold the @code{struct rseq_cs} and
> +the code areas it refers to are never freed or re-used, no special
> +action must be taken.  Else, before that memory is re-used of freed, the
> +application is responsible for setting the @code{rseq_cs} field to
> +@code{NULL} in each thread's restartable sequence area to guarantee that
> +it does not leak dangling references.  Because the application does not
> +typically have knowledge of libraries' use of restartable sequences, it
> +is recommended that libraries using restartable sequences which may end
> +up freeing or re-using their memory set the @code{rseq_cs} field to
> +@code{NULL} before returning from library functions which use
> +restartable sequences.
> +
> +The manual for the @code{rseq} system call can be found
> +at @uref{https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/doc/man/rseq.2}.
> +@end deftp
> +
> +@deftypevar {int} __rseq_offset
> +@standards{Linux, sys/rseq.h}
> +This variable contains the offset between the thread pointer (as defined
> +by @code{__builtin_thread_pointer} or the thread pointer register for
> +the architecture) and the restartable sequences area.  This value is the
> +same for all threads in the process.  If the restartable sequences area
> +is located at a lower address than the location to which the thread
> +pointer points, the value is negative.
> +@end deftypevar
> +
> +@deftypevar {unsigned int} __rseq_size
> +@standards{Linux, sys/rseq.h}
> +This variable is either zero (if restartable sequence registration
> +failed or has been disabled) or the size of the restartable sequence
> +registration.  This can be less can be different from the size of

can be can be

> +@code{struct rseq} if the kernel has extended the size of the
> +registration.  If registration is successful, @code{__rseq_size} is at
> +least 32 (the initial size of @code{struct rseq}).
> +@end deftypevar
> +
> +@deftypevar {unsigned int} __rseq_flags
> +@standards{Linux, sys/rseq.h}
> +The flags used during restartable sequence registration with the kernel.
> +Currently zero.
> +@end deftypevar
> +
> +@deftypevr Macro int RSEQ_SIG
> +@standards{Linux, sys/rseq.h}
> +Each supported architecture provides a @code{RSEQ_SIG} macro in
> +@file{sys/rseq.h} which contains a signature.  That signature is
> +expected to be present in the code before each restartable sequences
> +abort handler.  Failure to provide the expected signature may terminate
> +the process with a segmentation fault.
> +@end deftypevr
> +
>  @c FIXME these are undocumented:
>  @c pthread_atfork
>  @c pthread_attr_destroy
> diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
> index b39dfbff2c..77443cc330 100644
> --- a/sysdeps/nptl/dl-tls_init_tp.c
> +++ b/sysdeps/nptl/dl-tls_init_tp.c
> @@ -22,6 +22,7 @@
>  #include <pthreadP.h>
>  #include <tls.h>
>  #include <rseq-internal.h>
> +#include <thread_pointer.h>
>  
>  #define TUNABLE_NAMESPACE pthread
>  #include <dl-tunables.h>
> @@ -43,6 +44,10 @@ rtld_mutex_dummy (pthread_mutex_t *lock)
>  }
>  #endif
>  
> +const unsigned int __rseq_flags;
> +const unsigned int __rseq_size attribute_relro;
> +const int __rseq_offset attribute_relro;
> +
>  void
>  __tls_pre_init_tp (void)
>  {
> @@ -100,7 +105,23 @@ __tls_init_tp (void)
>  #if HAVE_TUNABLES
>      do_rseq = TUNABLE_GET (rseq, int, NULL);
>  #endif
> -    rseq_register_current_thread (pd, do_rseq);
> +    if (rseq_register_current_thread (pd, do_rseq))
> +      {
> +        /* We need a writable view of the variables.  They are in
> +           .data.relro and are not yet write-protected.  */
> +        extern unsigned int size __asm__ ("__rseq_size");
> +        size = sizeof (pd->rseq_area);
> +      }
> +
> +#ifdef RSEQ_SIG
> +    /* This should be a compile-time constant, but the current
> +       infrastructure makes it difficult to determine its value.  Not
> +       all targets support __thread_pointer, so set set __rseq_offset
> +       only if thre rseq registration may have happened because

set set
thre

> +       RSEQ_SIG is defined.  */
> +    extern int offset __asm__ ("__rseq_offset");
> +    offset = (char *) &pd->rseq_area - (char *) __thread_pointer ();
> +#endif
>    }

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 8/8] nptl: rseq failure after registration on main thread is fatal
  2021-12-07 13:04 ` [PATCH v2 8/8] nptl: rseq failure after registration on main thread is fatal Florian Weimer
@ 2021-12-08 17:36   ` Szabolcs Nagy
  0 siblings, 0 replies; 33+ messages in thread
From: Szabolcs Nagy @ 2021-12-08 17:36 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 12/07/2021 14:04, Florian Weimer via Libc-alpha wrote:
> This simplifies the application programming model.
> 
> Browser sandboxes have already been fixed:
> 
>   Sandbox is incompatible with rseq registration
>   <https://bugzilla.mozilla.org/show_bug.cgi?id=1651701>
> 
>   Allow rseq in the Linux sandboxes. r=gcp
>   <https://hg.mozilla.org/mozilla-central/rev/042425712eb1>
> 
>   Sandbox needs to support rseq system call
>   <https://bugs.chromium.org/p/chromium/issues/detail?id=1104160>
> 
>   Linux sandbox: Allow rseq(2)
>   <https://chromium.googlesource.com/chromium/src.git/+/230675d9ac8f1>
> ---
> v2: New patch.  Tested with Firefox 94.0 on Fedora 35.

looks good.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

> 
>  nptl/pthread_create.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index 4608fd9068..c097fc54e6 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -370,7 +370,8 @@ start_thread (void *arg)
>    /* Register rseq TLS to the kernel.  */
>    {
>      bool do_rseq = THREAD_GETMEM (pd, flags) & ATTR_FLAG_DO_RSEQ;
> -    rseq_register_current_thread (pd, do_rseq);
> +    if (!rseq_register_current_thread (pd, do_rseq) && do_rseq)
> +      __libc_fatal ("Fatal glibc error: rseq registration failed\n");
>    }
>  
>  #ifndef __ASSUME_SET_ROBUST_LIST
> -- 
> 2.33.1
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] nptl: Add <thread_pointer.h> for defining __thread_pointer
  2021-12-08 11:05   ` Szabolcs Nagy
@ 2021-12-08 17:55     ` Florian Weimer
  2021-12-09 11:52       ` Szabolcs Nagy
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2021-12-08 17:55 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

* Szabolcs Nagy:

> The 12/07/2021 14:00, Florian Weimer via Libc-alpha wrote:
>> <tls.h> already contains a definition that is quite similar,
>> but it is not consistent across architectures.
>> 
>> Only architectures for which rseq support is added are covered.
>
> This looks ok.
>
> It's an annoying gcc bug that __builtin_thread_pointer
> does not work consistently across targets.
>
> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

We don't need m68k for rseq, so I haven't added it, but I saw that
__thread_pointer is actually a system call there.  Maybe that's why it's
not a universal GCC feature.  Furthermore, for many ABIs, the thread
pointer is somewhat implicit.  On x86, it took some discussion to figure
out that we actually have a canonical notion of a thread pointer.  On
some other targets, the thread pointer is stored explicitly in a
(system) register, but it actually points to nowhere, so that local-exec
TLS access can make better use of immediate instruction operands.

It's also annoying that __has_builtin (__builtin_thread_pointer)
evaluates to true even for GCC targets where actually using
__builtin_thread_pointer () results in a compiler error.

In the future, we could install this as <sys/thread_pointer.h> if people
think it's useful (not just in an rseq context).

Thanks,
Florian


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] nptl: Add rseq registration
  2021-12-07 13:01 ` [PATCH 4/8] nptl: Add rseq registration Florian Weimer
  2021-12-08 16:51   ` Szabolcs Nagy
@ 2021-12-08 18:03   ` Siddhesh Poyarekar
  2021-12-08 18:08     ` Florian Weimer
  2021-12-09  1:51   ` Noah Goldstein
  2 siblings, 1 reply; 33+ messages in thread
From: Siddhesh Poyarekar @ 2021-12-08 18:03 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 12/7/21 18:31, Florian Weimer via Libc-alpha wrote:
> The rseq area is placed directly into struct pthread.  rseq
> registration failure is not treated as an error, so it is possible
> that threads run with inconsistent registration status.
> 
> <sys/rseq.h> is not yet installed as a public header.
> 
> Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
> v2: Use volatite access to cpu_id.  Drop csu/libc-tls.c spurious change.
> 
>   nptl/descr.h                                |   4 +
>   nptl/pthread_create.c                       |  13 +
>   sysdeps/nptl/dl-tls_init_tp.c               |   8 +-
>   sysdeps/unix/sysv/linux/Makefile            |   9 +-
>   sysdeps/unix/sysv/linux/aarch64/bits/rseq.h |  43 ++++
>   sysdeps/unix/sysv/linux/arm/bits/rseq.h     |  83 +++++++
>   sysdeps/unix/sysv/linux/bits/rseq.h         |  29 +++
>   sysdeps/unix/sysv/linux/mips/bits/rseq.h    |  62 +++++
>   sysdeps/unix/sysv/linux/powerpc/bits/rseq.h |  37 +++
>   sysdeps/unix/sysv/linux/rseq-internal.h     |  45 ++++
>   sysdeps/unix/sysv/linux/s390/bits/rseq.h    |  37 +++
>   sysdeps/unix/sysv/linux/sys/rseq.h          | 174 +++++++++++++
>   sysdeps/unix/sysv/linux/tst-rseq-nptl.c     | 260 ++++++++++++++++++++
>   sysdeps/unix/sysv/linux/tst-rseq.c          |  64 +++++
>   sysdeps/unix/sysv/linux/tst-rseq.h          |  57 +++++
>   sysdeps/unix/sysv/linux/x86/bits/rseq.h     |  30 +++
>   16 files changed, 952 insertions(+), 3 deletions(-)
>   create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/arm/bits/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/mips/bits/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h
>   create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl.c
>   create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.c
>   create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.h
>   create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h
> 
> diff --git a/nptl/descr.h b/nptl/descr.h
> index af2a6ab87a..92db305913 100644
> --- a/nptl/descr.h
> +++ b/nptl/descr.h
> @@ -34,6 +34,7 @@
>   #include <bits/types/res_state.h>
>   #include <kernel-features.h>
>   #include <tls-internal-struct.h>
> +#include <sys/rseq.h>
>   
>   #ifndef TCB_ALIGNMENT
>   # define TCB_ALIGNMENT 32
> @@ -406,6 +407,9 @@ struct pthread
>     /* Used on strsignal.  */
>     struct tls_internal_t tls_state;
>   
> +  /* rseq area registered with the kernel.  */
> +  struct rseq rseq_area;
> +
>     /* This member must be last.  */
>     char end_padding[];
>   
> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index bad9eeb52f..ea0d79341e 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -32,6 +32,7 @@
>   #include <default-sched.h>
>   #include <futex-internal.h>
>   #include <tls-setup.h>
> +#include <rseq-internal.h>
>   #include "libioP.h"
>   #include <sys/single_threaded.h>
>   #include <version.h>
> @@ -366,6 +367,9 @@ start_thread (void *arg)
>     /* Initialize pointers to locale data.  */
>     __ctype_init ();
>   
> +  /* Register rseq TLS to the kernel.  */
> +  rseq_register_current_thread (pd);
> +
>   #ifndef __ASSUME_SET_ROBUST_LIST
>     if (__nptl_set_robust_list_avail)
>   #endif
> @@ -571,6 +575,15 @@ out:
>        process is really dead since 'clone' got passed the CLONE_CHILD_CLEARTID
>        flag.  The 'tid' field in the TCB will be set to zero.
>   
> +     rseq TLS is still registered at this point.  Rely on implicit
> +     unregistration performed by the kernel on thread teardown.  This is not a
> +     problem because the rseq TLS lives on the stack, and the stack outlives
> +     the thread.  If TCB allocation is ever changed, additional steps may be
> +     required, such as performing explicit rseq unregistration before
> +     reclaiming the rseq TLS area memory.  It is NOT sufficient to block
> +     signals because the kernel may write to the rseq area even without
> +     signals.
> +
>        The exit code is zero since in case all threads exit by calling
>        'pthread_exit' the exit status must be 0 (zero).  */
>     while (1)
> diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
> index ca494dd3a5..fedb876fdb 100644
> --- a/sysdeps/nptl/dl-tls_init_tp.c
> +++ b/sysdeps/nptl/dl-tls_init_tp.c
> @@ -21,6 +21,7 @@
>   #include <list.h>
>   #include <pthreadP.h>
>   #include <tls.h>
> +#include <rseq-internal.h>
>   
>   #ifndef __ASSUME_SET_ROBUST_LIST
>   bool __nptl_set_robust_list_avail;
> @@ -57,11 +58,12 @@ __tls_pre_init_tp (void)
>   void
>   __tls_init_tp (void)
>   {
> +  struct pthread *pd = THREAD_SELF;
> +
>     /* Set up thread stack list management.  */
> -  list_add (&THREAD_SELF->list, &GL (dl_stack_user));
> +  list_add (&pd->list, &GL (dl_stack_user));
>   
>      /* Early initialization of the TCB.   */
> -   struct pthread *pd = THREAD_SELF;
>      pd->tid = INTERNAL_SYSCALL_CALL (set_tid_address, &pd->tid);
>      THREAD_SETMEM (pd, specific[0], &pd->specific_1stblock[0]);
>      THREAD_SETMEM (pd, user_stack, true);
> @@ -90,6 +92,8 @@ __tls_init_tp (void)
>         }
>     }
>   
> +  rseq_register_current_thread (pd);
> +
>     /* Set initial thread's stack block from 0 up to __libc_stack_end.
>        It will be bigger than it actually is, but for unwind.c/pt-longjmp.c
>        purposes this is good enough.  */
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index 29c6c78f98..eb0f5fc021 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -131,7 +131,10 @@ ifeq ($(have-GLIBC_2.27)$(build-shared),yesyes)
>   tests += tst-ofdlocks-compat
>   endif
>   
> -tests-internal += tst-sigcontext-get_pc
> +tests-internal += \
> +  tst-rseq \
> +  tst-sigcontext-get_pc \
> +  # tests-internal
>   
>   tests-time64 += \
>     tst-adjtimex-time64 \
> @@ -357,4 +360,8 @@ endif
>   
>   ifeq ($(subdir),nptl)
>   tests += tst-align-clone tst-getpid1
> +
> +# tst-rseq-nptl is an internal test because it requires a definition of
> +# __NR_rseq from the internal system call list.
> +tests-internal += tst-rseq-nptl
>   endif
> diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
> new file mode 100644
> index 0000000000..9ba92725c7
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
> @@ -0,0 +1,43 @@
> +/* Restartable Sequences Linux aarch64 architecture header.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* RSEQ_SIG is a signature required before each abort handler code.
> +
> +   It is a 32-bit value that maps to actual architecture code compiled
> +   into applications and libraries.  It needs to be defined for each
> +   architecture.  When choosing this value, it needs to be taken into
> +   account that generating invalid instructions may have ill effects on
> +   tools like objdump, and may also have impact on the CPU speculative
> +   execution efficiency in some cases.
> +
> +   aarch64 -mbig-endian generates mixed endianness code vs data:
> +   little-endian code and big-endian data.  Ensure the RSEQ_SIG signature
> +   matches code endianness.  */
> +
> +#define RSEQ_SIG_CODE  0xd428bc00  /* BRK #0x45E0.  */
> +
> +#ifdef __AARCH64EB__
> +# define RSEQ_SIG_DATA 0x00bc28d4  /* BRK #0x45E0.  */
> +#else
> +# define RSEQ_SIG_DATA RSEQ_SIG_CODE
> +#endif
> +
> +#define RSEQ_SIG       RSEQ_SIG_DATA
> diff --git a/sysdeps/unix/sysv/linux/arm/bits/rseq.h b/sysdeps/unix/sysv/linux/arm/bits/rseq.h
> new file mode 100644
> index 0000000000..0542b26f6a
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h
> @@ -0,0 +1,83 @@
> +/* Restartable Sequences Linux arm architecture header.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/*
> +   RSEQ_SIG is a signature required before each abort handler code.
> +
> +   It is a 32-bit value that maps to actual architecture code compiled
> +   into applications and libraries.  It needs to be defined for each
> +   architecture.  When choosing this value, it needs to be taken into
> +   account that generating invalid instructions may have ill effects on
> +   tools like objdump, and may also have impact on the CPU speculative
> +   execution efficiency in some cases.
> +
> +   - ARM little endian
> +
> +   RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
> +   value 0x5de3.  This traps if user-space reaches this instruction by mistake,
> +   and the uncommon operand ensures the kernel does not move the instruction
> +   pointer to attacker-controlled code on rseq abort.
> +
> +   The instruction pattern in the A32 instruction set is:
> +
> +   e7f5def3    udf    #24035    ; 0x5de3
> +
> +   This translates to the following instruction pattern in the T16 instruction
> +   set:
> +
> +   little endian:
> +   def3        udf    #243      ; 0xf3
> +   e7f5        b.n    <7f5>
> +
> +   - ARMv6+ big endian (BE8):
> +
> +   ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
> +   code and big-endian data.  The data value of the signature needs to have its
> +   byte order reversed to generate the trap instruction:
> +
> +   Data: 0xf3def5e7
> +
> +   Translates to this A32 instruction pattern:
> +
> +   e7f5def3    udf    #24035    ; 0x5de3
> +
> +   Translates to this T16 instruction pattern:
> +
> +   def3        udf    #243      ; 0xf3
> +   e7f5        b.n    <7f5>
> +
> +   - Prior to ARMv6 big endian (BE32):
> +
> +   Prior to ARMv6, -mbig-endian generates big-endian code and data
> +   (which match), so the endianness of the data representation of the
> +   signature should not be reversed.  However, the choice between BE32
> +   and BE8 is done by the linker, so we cannot know whether code and
> +   data endianness will be mixed before the linker is invoked.  So rather
> +   than try to play tricks with the linker, the rseq signature is simply
> +   data (not a trap instruction) prior to ARMv6 on big endian.  This is
> +   why the signature is expressed as data (.word) rather than as
> +   instruction (.inst) in assembler.  */
> +
> +#ifdef __ARMEB__
> +# define RSEQ_SIG    0xf3def5e7      /* udf    #24035    ; 0x5de3 (ARMv6+) */
> +#else
> +# define RSEQ_SIG    0xe7f5def3      /* udf    #24035    ; 0x5de3 */
> +#endif
> diff --git a/sysdeps/unix/sysv/linux/bits/rseq.h b/sysdeps/unix/sysv/linux/bits/rseq.h
> new file mode 100644
> index 0000000000..46cf5d1c74
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/bits/rseq.h
> @@ -0,0 +1,29 @@
> +/* Restartable Sequences architecture header.  Stub version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* RSEQ_SIG is a signature required before each abort handler code.
> +
> +   It is a 32-bit value that maps to actual architecture code compiled
> +   into applications and libraries.  It needs to be defined for each
> +   architecture.  When choosing this value, it needs to be taken into
> +   account that generating invalid instructions may have ill effects on
> +   tools like objdump, and may also have impact on the CPU speculative
> +   execution efficiency in some cases.  */
> diff --git a/sysdeps/unix/sysv/linux/mips/bits/rseq.h b/sysdeps/unix/sysv/linux/mips/bits/rseq.h
> new file mode 100644
> index 0000000000..a9defee568
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/mips/bits/rseq.h
> @@ -0,0 +1,62 @@
> +/* Restartable Sequences Linux mips architecture header.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* RSEQ_SIG is a signature required before each abort handler code.
> +
> +   It is a 32-bit value that maps to actual architecture code compiled
> +   into applications and libraries.  It needs to be defined for each
> +   architecture.  When choosing this value, it needs to be taken into
> +   account that generating invalid instructions may have ill effects on
> +   tools like objdump, and may also have impact on the CPU speculative
> +   execution efficiency in some cases.
> +
> +   RSEQ_SIG uses the break instruction.  The instruction pattern is:
> +
> +   On MIPS:
> +        0350000d        break     0x350
> +
> +   On nanoMIPS:
> +        00100350        break     0x350
> +
> +   On microMIPS:
> +        0000d407        break     0x350
> +
> +   For nanoMIPS32 and microMIPS, the instruction stream is encoded as
> +   16-bit halfwords, so the signature halfwords need to be swapped
> +   accordingly for little-endian.  */
> +
> +#if defined (__nanomips__)
> +# ifdef __MIPSEL__
> +#  define RSEQ_SIG      0x03500010
> +# else
> +#  define RSEQ_SIG      0x00100350
> +# endif
> +#elif defined (__mips_micromips)
> +# ifdef __MIPSEL__
> +#  define RSEQ_SIG      0xd4070000
> +# else
> +#  define RSEQ_SIG      0x0000d407
> +# endif
> +#elif defined (__mips__)
> +# define RSEQ_SIG       0x0350000d
> +#else
> +/* Unknown MIPS architecture.  */
> +#endif
> diff --git a/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
> new file mode 100644
> index 0000000000..05b3cf7b8f
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
> @@ -0,0 +1,37 @@
> +/* Restartable Sequences Linux powerpc architecture header.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* RSEQ_SIG is a signature required before each abort handler code.
> +
> +   It is a 32-bit value that maps to actual architecture code compiled
> +   into applications and libraries.  It needs to be defined for each
> +   architecture.  When choosing this value, it needs to be taken into
> +   account that generating invalid instructions may have ill effects on
> +   tools like objdump, and may also have impact on the CPU speculative
> +   execution efficiency in some cases.
> +
> +   RSEQ_SIG uses the following trap instruction:
> +
> +   powerpc-be:    0f e5 00 0b           twui   r5,11
> +   powerpc64-le:  0b 00 e5 0f           twui   r5,11
> +   powerpc64-be:  0f e5 00 0b           twui   r5,11  */
> +
> +#define RSEQ_SIG        0x0fe5000b
> diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
> new file mode 100644
> index 0000000000..909f547825
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/rseq-internal.h
> @@ -0,0 +1,45 @@
> +/* Restartable Sequences internal API.  Linux implementation.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef RSEQ_INTERNAL_H
> +#define RSEQ_INTERNAL_H
> +
> +#include <sysdep.h>
> +#include <errno.h>
> +#include <kernel-features.h>
> +#include <stdio.h>
> +#include <sys/rseq.h>
> +
> +#ifdef RSEQ_SIG
> +static inline void
> +rseq_register_current_thread (struct pthread *self)
> +{
> +  int ret = INTERNAL_SYSCALL_CALL (rseq,
> +                                   &self->rseq_area, sizeof (self->rseq_area),
> +                                   0, RSEQ_SIG);
> +  if (INTERNAL_SYSCALL_ERROR_P (ret))
> +    THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);

Why can't we just leave it as the kernel did when it failed the syscall? 
  It looks like we'll only end up shadowing UNINITIALIZED all the time 
and it may cause issues if linux decides to use -2 for some other 
purpose in future.

Siddhesh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 6/8] nptl: Add glibc.pthread.rseq tunable to control rseq registration
  2021-12-07 13:02 ` [PATCH v2 6/8] nptl: Add glibc.pthread.rseq tunable to control rseq registration Florian Weimer
  2021-12-08 17:22   ` Szabolcs Nagy
@ 2021-12-08 18:03   ` Siddhesh Poyarekar
  2021-12-09  8:03     ` Siddhesh Poyarekar
  1 sibling, 1 reply; 33+ messages in thread
From: Siddhesh Poyarekar @ 2021-12-08 18:03 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 12/7/21 18:32, Florian Weimer via Libc-alpha wrote:
> This tunable allows applications to register the rseq area instead
> of glibc.
> ---
> v2: Unchanged.
> 
>   manual/tunables.texi                       | 10 +++
>   nptl/pthread_create.c                      | 10 ++-
>   sysdeps/nptl/dl-tls_init_tp.c              | 11 ++-
>   sysdeps/nptl/dl-tunables.list              |  6 ++
>   sysdeps/nptl/internaltypes.h               |  1 +
>   sysdeps/unix/sysv/linux/Makefile           |  8 ++
>   sysdeps/unix/sysv/linux/rseq-internal.h    | 19 +++--
>   sysdeps/unix/sysv/linux/tst-rseq-disable.c | 89 ++++++++++++++++++++++
>   8 files changed, 145 insertions(+), 9 deletions(-)
>   create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-disable.c
> 
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index 10f4d75993..5d50b90f64 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -424,6 +424,16 @@ The value is measured in bytes.  The default is @samp{41943040}
>   (fourty mibibytes).
>   @end deftp
>   
> +@deftp Tunable glibc.pthread.rseq
> +The @code{glibc.pthread.rseq} tunable can be set to @samp{0}, to disable
> +restartable sequences support in @theglibc{}.  This enables applications
> +to perform direct restartable sequence registration with the kernel.
> +The default is @samp{1}, which means that @theglibc{} performs
> +registration on behalf of the application.
> +
> +Restartable sequences are a Linux-specific extension.
> +@end deftp
> +

OK.

>   @node Hardware Capability Tunables
>   @section Hardware Capability Tunables
>   @cindex hardware capability tunables
> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index ea0d79341e..4608fd9068 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -368,7 +368,10 @@ start_thread (void *arg)
>     __ctype_init ();
>   
>     /* Register rseq TLS to the kernel.  */
> -  rseq_register_current_thread (pd);
> +  {
> +    bool do_rseq = THREAD_GETMEM (pd, flags) & ATTR_FLAG_DO_RSEQ;
> +    rseq_register_current_thread (pd, do_rseq);
> +  }
>   

OK, the flag is set...

>   #ifndef __ASSUME_SET_ROBUST_LIST
>     if (__nptl_set_robust_list_avail)
> @@ -677,6 +680,11 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>     pd->flags = ((iattr->flags & ~(ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET))
>   	       | (self->flags & (ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET)));
>   
> +  /* Inherit rseq registration state.  Without seccomp filters, rseq
> +     registration will either always fail or always succeed.  */
> +  if ((int) THREAD_GETMEM_VOLATILE (self, rseq_area.cpu_id) >= 0)
> +    pd->flags |= ATTR_FLAG_DO_RSEQ;
> +

... here, as is inherited from the calling thread, which should 
eventually be inherited from the main thread and through the tunable.

Further to my comment in 4/8, if we leave it as UNINITIALIZED, we could 
simply modify the check to != -1.

>     /* Initialize the field for the ID of the thread which is waiting
>        for us.  This is a self-reference in case the thread is created
>        detached.  */
> diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
> index fedb876fdb..b39dfbff2c 100644
> --- a/sysdeps/nptl/dl-tls_init_tp.c
> +++ b/sysdeps/nptl/dl-tls_init_tp.c
> @@ -23,6 +23,9 @@
>   #include <tls.h>
>   #include <rseq-internal.h>
>   
> +#define TUNABLE_NAMESPACE pthread
> +#include <dl-tunables.h>
> +
>   #ifndef __ASSUME_SET_ROBUST_LIST
>   bool __nptl_set_robust_list_avail;
>   rtld_hidden_data_def (__nptl_set_robust_list_avail)
> @@ -92,7 +95,13 @@ __tls_init_tp (void)
>         }
>     }
>   
> -  rseq_register_current_thread (pd);
> +  {
> +    bool do_rseq = true;
> +#if HAVE_TUNABLES
> +    do_rseq = TUNABLE_GET (rseq, int, NULL);
> +#endif
> +    rseq_register_current_thread (pd, do_rseq);
> +  }

rseq registration for the main thread.  OK.

>   
>     /* Set initial thread's stack block from 0 up to __libc_stack_end.
>        It will be bigger than it actually is, but for unwind.c/pt-longjmp.c
> diff --git a/sysdeps/nptl/dl-tunables.list b/sysdeps/nptl/dl-tunables.list
> index ac5d053298..d24f4be0d0 100644
> --- a/sysdeps/nptl/dl-tunables.list
> +++ b/sysdeps/nptl/dl-tunables.list
> @@ -27,5 +27,11 @@ glibc {
>         type: SIZE_T
>         default: 41943040
>       }
> +    rseq {
> +      type: INT_32
> +      minval: 0
> +      maxval: 1
> +      default: 1
> +    }

Tunable defaults to 1 and can only have two values.  OK.

>     }
>   }
> diff --git a/sysdeps/nptl/internaltypes.h b/sysdeps/nptl/internaltypes.h
> index 6032a6b785..dec8c5b5ff 100644
> --- a/sysdeps/nptl/internaltypes.h
> +++ b/sysdeps/nptl/internaltypes.h
> @@ -48,6 +48,7 @@ struct pthread_attr
>   #define ATTR_FLAG_OLDATTR		0x0010
>   #define ATTR_FLAG_SCHED_SET		0x0020
>   #define ATTR_FLAG_POLICY_SET		0x0040
> +#define ATTR_FLAG_DO_RSEQ		0x0080
>   
>   /* Used to allocate a pthread_attr_t object which is also accessed
>      internally.  */
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index eb0f5fc021..62a796f214 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -136,6 +136,12 @@ tests-internal += \
>     tst-sigcontext-get_pc \
>     # tests-internal
>   
> +ifneq (no,$(have-tunables))
> +tests-internal += \
> +  tst-rseq-disable \
> +  # tests-internal $(have-tunables)
> +endif
> +

Test conditional on tunables.  OK.

>   tests-time64 += \
>     tst-adjtimex-time64 \
>     tst-clock_adjtime-time64 \
> @@ -227,6 +233,8 @@ $(objpfx)tst-mman-consts.out: ../sysdeps/unix/sysv/linux/tst-mman-consts.py
>   	  < /dev/null > $@ 2>&1; $(evaluate-test)
>   $(objpfx)tst-mman-consts.out: $(sysdeps-linux-python-deps)
>   
> +tst-rseq-disable-ENV = GLIBC_TUNABLES=glibc.pthread.rseq=0
> +

OK.

>   endif # $(subdir) == misc
>   
>   ifeq ($(subdir),time)
> diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
> index 909f547825..15bc7ffd6e 100644
> --- a/sysdeps/unix/sysv/linux/rseq-internal.h
> +++ b/sysdeps/unix/sysv/linux/rseq-internal.h
> @@ -21,22 +21,27 @@
>   #include <sysdep.h>
>   #include <errno.h>
>   #include <kernel-features.h>
> +#include <stdbool.h>
>   #include <stdio.h>
>   #include <sys/rseq.h>
>   
>   #ifdef RSEQ_SIG
>   static inline void
> -rseq_register_current_thread (struct pthread *self)
> +rseq_register_current_thread (struct pthread *self, bool do_rseq)
>   {
> -  int ret = INTERNAL_SYSCALL_CALL (rseq,
> -                                   &self->rseq_area, sizeof (self->rseq_area),
> -                                   0, RSEQ_SIG);
> -  if (INTERNAL_SYSCALL_ERROR_P (ret))
> -    THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
> +  if (do_rseq)
> +    {
> +      int ret = INTERNAL_SYSCALL_CALL (rseq, &self->rseq_area,
> +                                       sizeof (self->rseq_area),
> +                                       0, RSEQ_SIG);
> +      if (!INTERNAL_SYSCALL_ERROR_P (ret))
> +        return;
> +    }
> +  THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);

Register only if do_rseq.  OK, but if we get rid of 
RSEQ_CPU_ID_REGISTRATION_FAILED and leave cpu_id untouched, this 
function could simply be an INTERNAL_SYSCALL_CALL and the do_rseq 
condition could be hoisted into the caller as the tunable check and 
ATTR_FLAG_DO_RSEQ flag check for the main thread and children threads 
respectively.

>   }
>   #else /* RSEQ_SIG */
>   static inline void
> -rseq_register_current_thread (struct pthread *self)
> +rseq_register_current_thread (struct pthread *self, bool do_rseq)
>   {
>     THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
>   }
> diff --git a/sysdeps/unix/sysv/linux/tst-rseq-disable.c b/sysdeps/unix/sysv/linux/tst-rseq-disable.c
> new file mode 100644
> index 0000000000..000e351872
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-rseq-disable.c
> @@ -0,0 +1,89 @@
> +/* Test disabling of rseq registration via tunable.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <errno.h>
> +#include <stdio.h>
> +#include <support/check.h>
> +#include <support/namespace.h>
> +#include <support/xthread.h>
> +#include <sysdep.h>
> +#include <unistd.h>
> +
> +#ifdef RSEQ_SIG
> +
> +/* Check that rseq can be registered and has not been taken by glibc.  */
> +static void
> +check_rseq_disabled (void)
> +{
> +  struct pthread *pd = THREAD_SELF;
> +  TEST_COMPARE ((int) pd->rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);

If we use UNINITIALIZED, then that's what we would check for.

> +
> +  int ret = syscall (__NR_rseq, &pd->rseq_area, sizeof (pd->rseq_area),
> +                     0, RSEQ_SIG);
> +  if (ret == 0)
> +    {
> +      ret = syscall (__NR_rseq, &pd->rseq_area, sizeof (pd->rseq_area),
> +                     RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
> +      TEST_COMPARE (ret, 0);
> +      pd->rseq_area.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED;

Is this needed because the kernel sets cpu_id to UNINITIALIZED?

> +    }
> +  else
> +    {
> +      TEST_VERIFY (errno != -EINVAL);
> +      TEST_VERIFY (errno != -EBUSY);
> +    }
> +}
> +
> +static void *
> +thread_func (void *ignored)
> +{
> +  check_rseq_disabled ();
> +  return NULL;
> +}

Checking threads.  OK.

> +
> +static void
> +proc_func (void *ignored)
> +{
> +  check_rseq_disabled ();
> +}

Checking at process level.  OK.

> +
> +static int
> +do_test (void)
> +{
> +  puts ("info: checking main thread");
> +  check_rseq_disabled ();
> +
> +  puts ("info: checking main thread (2)");
> +  check_rseq_disabled ();
> +
> +  puts ("info: checking new thread");
> +  xpthread_join (xpthread_create (NULL, thread_func, NULL));
> +
> +  puts ("info: checking subprocess");
> +  support_isolate_in_subprocess (proc_func, NULL);
> +
> +  return 0;
> +}
> +#else /* !RSEQ_SIG */
> +static int
> +do_test (void)
> +{
> +  FAIL_UNSUPPORTED ("glibc does not define RSEQ_SIG, skipping test");
> +}
> +#endif
> +
> +#include <support/test-driver.c>
> 

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] nptl: Add rseq registration
  2021-12-08 18:03   ` Siddhesh Poyarekar
@ 2021-12-08 18:08     ` Florian Weimer
  2021-12-08 23:27       ` Siddhesh Poyarekar
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2021-12-08 18:08 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha

* Siddhesh Poyarekar:

>> +#ifdef RSEQ_SIG
>> +static inline void
>> +rseq_register_current_thread (struct pthread *self)
>> +{
>> +  int ret = INTERNAL_SYSCALL_CALL (rseq,
>> +                                   &self->rseq_area, sizeof (self->rseq_area),
>> +                                   0, RSEQ_SIG);
>> +  if (INTERNAL_SYSCALL_ERROR_P (ret))
>> +    THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
>
> Why can't we just leave it as the kernel did when it failed the
> syscall?

The kernel definitely won't write anything if the failure is ENOSYS.  I
don't expect the kernel to write something for the other failures,
either.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] nptl: Add rseq registration
  2021-12-08 18:08     ` Florian Weimer
@ 2021-12-08 23:27       ` Siddhesh Poyarekar
  2021-12-09  7:42         ` Florian Weimer
  0 siblings, 1 reply; 33+ messages in thread
From: Siddhesh Poyarekar @ 2021-12-08 23:27 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On 12/8/21 23:38, Florian Weimer wrote:
> * Siddhesh Poyarekar:
> 
>>> +#ifdef RSEQ_SIG
>>> +static inline void
>>> +rseq_register_current_thread (struct pthread *self)
>>> +{
>>> +  int ret = INTERNAL_SYSCALL_CALL (rseq,
>>> +                                   &self->rseq_area, sizeof (self->rseq_area),
>>> +                                   0, RSEQ_SIG);
>>> +  if (INTERNAL_SYSCALL_ERROR_P (ret))
>>> +    THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
>>
>> Why can't we just leave it as the kernel did when it failed the
>> syscall?
> 
> The kernel definitely won't write anything if the failure is ENOSYS.  I
> don't expect the kernel to write something for the other failures,
> either.

OK, I interpreted the from the outdated manpage patch[1] that the kernel 
ensures that uninitialized cpu_id will be read as -1.  I read the rseq 
implementation in the kernel and saw that there are a number of error 
paths where the kernel simply returns without touching the user memory. 
  I suppose what they meant by "uninitialized" in the manpage is 
actually "reset after unregister", which is odd.

In any case, what I meant to eventually get at (sorry I wasn't specific; 
I wrote both patch reviews together and didn't realize they'd be read as 
separate emails!) is that RSEQ_CPU_ID_UNINITIALIZED seemed enough for 
all use cases and RSEQ_CPU_ID_REGISTRATION_FAILED seemed unnecessary.

On syscall failure (or tunable being disabled) too it seems safe to do 
THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_UNINITIALIZED); 
AFAICT, __tls_init_tp will run early enough that it won't have 
overwritten any earlier rseq calls from user code.

Is there a use case I'm missing?

Thanks,
Siddhesh

[1] https://lkml.org/lkml/2019/2/28/183

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] nptl: Add rseq registration
  2021-12-07 13:01 ` [PATCH 4/8] nptl: Add rseq registration Florian Weimer
  2021-12-08 16:51   ` Szabolcs Nagy
  2021-12-08 18:03   ` Siddhesh Poyarekar
@ 2021-12-09  1:51   ` Noah Goldstein
  2 siblings, 0 replies; 33+ messages in thread
From: Noah Goldstein @ 2021-12-09  1:51 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GNU C Library

On Tue, Dec 7, 2021 at 7:02 AM Florian Weimer via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> The rseq area is placed directly into struct pthread.  rseq
> registration failure is not treated as an error, so it is possible
> that threads run with inconsistent registration status.
>
> <sys/rseq.h> is not yet installed as a public header.
>
> Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
> v2: Use volatite access to cpu_id.  Drop csu/libc-tls.c spurious change.
>
>  nptl/descr.h                                |   4 +
>  nptl/pthread_create.c                       |  13 +
>  sysdeps/nptl/dl-tls_init_tp.c               |   8 +-
>  sysdeps/unix/sysv/linux/Makefile            |   9 +-
>  sysdeps/unix/sysv/linux/aarch64/bits/rseq.h |  43 ++++
>  sysdeps/unix/sysv/linux/arm/bits/rseq.h     |  83 +++++++
>  sysdeps/unix/sysv/linux/bits/rseq.h         |  29 +++
>  sysdeps/unix/sysv/linux/mips/bits/rseq.h    |  62 +++++
>  sysdeps/unix/sysv/linux/powerpc/bits/rseq.h |  37 +++
>  sysdeps/unix/sysv/linux/rseq-internal.h     |  45 ++++
>  sysdeps/unix/sysv/linux/s390/bits/rseq.h    |  37 +++
>  sysdeps/unix/sysv/linux/sys/rseq.h          | 174 +++++++++++++
>  sysdeps/unix/sysv/linux/tst-rseq-nptl.c     | 260 ++++++++++++++++++++
>  sysdeps/unix/sysv/linux/tst-rseq.c          |  64 +++++
>  sysdeps/unix/sysv/linux/tst-rseq.h          |  57 +++++
>  sysdeps/unix/sysv/linux/x86/bits/rseq.h     |  30 +++
>  16 files changed, 952 insertions(+), 3 deletions(-)
>  create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
>  create mode 100644 sysdeps/unix/sysv/linux/arm/bits/rseq.h
>  create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h
>  create mode 100644 sysdeps/unix/sysv/linux/mips/bits/rseq.h
>  create mode 100644 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
>  create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h
>  create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h
>  create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h
>  create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.c
>  create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.h
>  create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h
>
> diff --git a/nptl/descr.h b/nptl/descr.h
> index af2a6ab87a..92db305913 100644
> --- a/nptl/descr.h
> +++ b/nptl/descr.h
> @@ -34,6 +34,7 @@
>  #include <bits/types/res_state.h>
>  #include <kernel-features.h>
>  #include <tls-internal-struct.h>
> +#include <sys/rseq.h>
>
>  #ifndef TCB_ALIGNMENT
>  # define TCB_ALIGNMENT 32
> @@ -406,6 +407,9 @@ struct pthread
>    /* Used on strsignal.  */
>    struct tls_internal_t tls_state;
>
> +  /* rseq area registered with the kernel.  */
> +  struct rseq rseq_area;
> +
>    /* This member must be last.  */
>    char end_padding[];
>
> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index bad9eeb52f..ea0d79341e 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -32,6 +32,7 @@
>  #include <default-sched.h>
>  #include <futex-internal.h>
>  #include <tls-setup.h>
> +#include <rseq-internal.h>
>  #include "libioP.h"
>  #include <sys/single_threaded.h>
>  #include <version.h>
> @@ -366,6 +367,9 @@ start_thread (void *arg)
>    /* Initialize pointers to locale data.  */
>    __ctype_init ();
>
> +  /* Register rseq TLS to the kernel.  */
> +  rseq_register_current_thread (pd);
> +
>  #ifndef __ASSUME_SET_ROBUST_LIST
>    if (__nptl_set_robust_list_avail)
>  #endif
> @@ -571,6 +575,15 @@ out:
>       process is really dead since 'clone' got passed the CLONE_CHILD_CLEARTID
>       flag.  The 'tid' field in the TCB will be set to zero.
>
> +     rseq TLS is still registered at this point.  Rely on implicit
> +     unregistration performed by the kernel on thread teardown.  This is not a
> +     problem because the rseq TLS lives on the stack, and the stack outlives
> +     the thread.  If TCB allocation is ever changed, additional steps may be
> +     required, such as performing explicit rseq unregistration before
> +     reclaiming the rseq TLS area memory.  It is NOT sufficient to block
> +     signals because the kernel may write to the rseq area even without
> +     signals.
> +
>       The exit code is zero since in case all threads exit by calling
>       'pthread_exit' the exit status must be 0 (zero).  */
>    while (1)
> diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c
> index ca494dd3a5..fedb876fdb 100644
> --- a/sysdeps/nptl/dl-tls_init_tp.c
> +++ b/sysdeps/nptl/dl-tls_init_tp.c
> @@ -21,6 +21,7 @@
>  #include <list.h>
>  #include <pthreadP.h>
>  #include <tls.h>
> +#include <rseq-internal.h>
>
>  #ifndef __ASSUME_SET_ROBUST_LIST
>  bool __nptl_set_robust_list_avail;
> @@ -57,11 +58,12 @@ __tls_pre_init_tp (void)
>  void
>  __tls_init_tp (void)
>  {
> +  struct pthread *pd = THREAD_SELF;
> +
>    /* Set up thread stack list management.  */
> -  list_add (&THREAD_SELF->list, &GL (dl_stack_user));
> +  list_add (&pd->list, &GL (dl_stack_user));
>
>     /* Early initialization of the TCB.   */
> -   struct pthread *pd = THREAD_SELF;
>     pd->tid = INTERNAL_SYSCALL_CALL (set_tid_address, &pd->tid);
>     THREAD_SETMEM (pd, specific[0], &pd->specific_1stblock[0]);
>     THREAD_SETMEM (pd, user_stack, true);
> @@ -90,6 +92,8 @@ __tls_init_tp (void)
>        }
>    }
>
> +  rseq_register_current_thread (pd);
> +
>    /* Set initial thread's stack block from 0 up to __libc_stack_end.
>       It will be bigger than it actually is, but for unwind.c/pt-longjmp.c
>       purposes this is good enough.  */
> diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
> index 29c6c78f98..eb0f5fc021 100644
> --- a/sysdeps/unix/sysv/linux/Makefile
> +++ b/sysdeps/unix/sysv/linux/Makefile
> @@ -131,7 +131,10 @@ ifeq ($(have-GLIBC_2.27)$(build-shared),yesyes)
>  tests += tst-ofdlocks-compat
>  endif
>
> -tests-internal += tst-sigcontext-get_pc
> +tests-internal += \
> +  tst-rseq \
> +  tst-sigcontext-get_pc \
> +  # tests-internal
>
>  tests-time64 += \
>    tst-adjtimex-time64 \
> @@ -357,4 +360,8 @@ endif
>
>  ifeq ($(subdir),nptl)
>  tests += tst-align-clone tst-getpid1
> +
> +# tst-rseq-nptl is an internal test because it requires a definition of
> +# __NR_rseq from the internal system call list.
> +tests-internal += tst-rseq-nptl
>  endif
> diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
> new file mode 100644
> index 0000000000..9ba92725c7
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
> @@ -0,0 +1,43 @@
> +/* Restartable Sequences Linux aarch64 architecture header.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* RSEQ_SIG is a signature required before each abort handler code.
> +
> +   It is a 32-bit value that maps to actual architecture code compiled
> +   into applications and libraries.  It needs to be defined for each
> +   architecture.  When choosing this value, it needs to be taken into
> +   account that generating invalid instructions may have ill effects on
> +   tools like objdump, and may also have impact on the CPU speculative
> +   execution efficiency in some cases.
> +
> +   aarch64 -mbig-endian generates mixed endianness code vs data:
> +   little-endian code and big-endian data.  Ensure the RSEQ_SIG signature
> +   matches code endianness.  */
> +
> +#define RSEQ_SIG_CODE  0xd428bc00  /* BRK #0x45E0.  */
> +
> +#ifdef __AARCH64EB__
> +# define RSEQ_SIG_DATA 0x00bc28d4  /* BRK #0x45E0.  */
> +#else
> +# define RSEQ_SIG_DATA RSEQ_SIG_CODE
> +#endif
> +
> +#define RSEQ_SIG       RSEQ_SIG_DATA
> diff --git a/sysdeps/unix/sysv/linux/arm/bits/rseq.h b/sysdeps/unix/sysv/linux/arm/bits/rseq.h
> new file mode 100644
> index 0000000000..0542b26f6a
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h
> @@ -0,0 +1,83 @@
> +/* Restartable Sequences Linux arm architecture header.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/*
> +   RSEQ_SIG is a signature required before each abort handler code.
> +
> +   It is a 32-bit value that maps to actual architecture code compiled
> +   into applications and libraries.  It needs to be defined for each
> +   architecture.  When choosing this value, it needs to be taken into
> +   account that generating invalid instructions may have ill effects on
> +   tools like objdump, and may also have impact on the CPU speculative
> +   execution efficiency in some cases.
> +
> +   - ARM little endian
> +
> +   RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
> +   value 0x5de3.  This traps if user-space reaches this instruction by mistake,
> +   and the uncommon operand ensures the kernel does not move the instruction
> +   pointer to attacker-controlled code on rseq abort.
> +
> +   The instruction pattern in the A32 instruction set is:
> +
> +   e7f5def3    udf    #24035    ; 0x5de3
> +
> +   This translates to the following instruction pattern in the T16 instruction
> +   set:
> +
> +   little endian:
> +   def3        udf    #243      ; 0xf3
> +   e7f5        b.n    <7f5>
> +
> +   - ARMv6+ big endian (BE8):
> +
> +   ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
> +   code and big-endian data.  The data value of the signature needs to have its
> +   byte order reversed to generate the trap instruction:
> +
> +   Data: 0xf3def5e7
> +
> +   Translates to this A32 instruction pattern:
> +
> +   e7f5def3    udf    #24035    ; 0x5de3
> +
> +   Translates to this T16 instruction pattern:
> +
> +   def3        udf    #243      ; 0xf3
> +   e7f5        b.n    <7f5>
> +
> +   - Prior to ARMv6 big endian (BE32):
> +
> +   Prior to ARMv6, -mbig-endian generates big-endian code and data
> +   (which match), so the endianness of the data representation of the
> +   signature should not be reversed.  However, the choice between BE32
> +   and BE8 is done by the linker, so we cannot know whether code and
> +   data endianness will be mixed before the linker is invoked.  So rather
> +   than try to play tricks with the linker, the rseq signature is simply
> +   data (not a trap instruction) prior to ARMv6 on big endian.  This is
> +   why the signature is expressed as data (.word) rather than as
> +   instruction (.inst) in assembler.  */
> +
> +#ifdef __ARMEB__
> +# define RSEQ_SIG    0xf3def5e7      /* udf    #24035    ; 0x5de3 (ARMv6+) */
> +#else
> +# define RSEQ_SIG    0xe7f5def3      /* udf    #24035    ; 0x5de3 */
> +#endif
> diff --git a/sysdeps/unix/sysv/linux/bits/rseq.h b/sysdeps/unix/sysv/linux/bits/rseq.h
> new file mode 100644
> index 0000000000..46cf5d1c74
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/bits/rseq.h
> @@ -0,0 +1,29 @@
> +/* Restartable Sequences architecture header.  Stub version.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* RSEQ_SIG is a signature required before each abort handler code.
> +
> +   It is a 32-bit value that maps to actual architecture code compiled
> +   into applications and libraries.  It needs to be defined for each
> +   architecture.  When choosing this value, it needs to be taken into
> +   account that generating invalid instructions may have ill effects on
> +   tools like objdump, and may also have impact on the CPU speculative
> +   execution efficiency in some cases.  */
> diff --git a/sysdeps/unix/sysv/linux/mips/bits/rseq.h b/sysdeps/unix/sysv/linux/mips/bits/rseq.h
> new file mode 100644
> index 0000000000..a9defee568
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/mips/bits/rseq.h
> @@ -0,0 +1,62 @@
> +/* Restartable Sequences Linux mips architecture header.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* RSEQ_SIG is a signature required before each abort handler code.
> +
> +   It is a 32-bit value that maps to actual architecture code compiled
> +   into applications and libraries.  It needs to be defined for each
> +   architecture.  When choosing this value, it needs to be taken into
> +   account that generating invalid instructions may have ill effects on
> +   tools like objdump, and may also have impact on the CPU speculative
> +   execution efficiency in some cases.
> +
> +   RSEQ_SIG uses the break instruction.  The instruction pattern is:
> +
> +   On MIPS:
> +        0350000d        break     0x350
> +
> +   On nanoMIPS:
> +        00100350        break     0x350
> +
> +   On microMIPS:
> +        0000d407        break     0x350
> +
> +   For nanoMIPS32 and microMIPS, the instruction stream is encoded as
> +   16-bit halfwords, so the signature halfwords need to be swapped
> +   accordingly for little-endian.  */
> +
> +#if defined (__nanomips__)
> +# ifdef __MIPSEL__
> +#  define RSEQ_SIG      0x03500010
> +# else
> +#  define RSEQ_SIG      0x00100350
> +# endif
> +#elif defined (__mips_micromips)
> +# ifdef __MIPSEL__
> +#  define RSEQ_SIG      0xd4070000
> +# else
> +#  define RSEQ_SIG      0x0000d407
> +# endif
> +#elif defined (__mips__)
> +# define RSEQ_SIG       0x0350000d
> +#else
> +/* Unknown MIPS architecture.  */
> +#endif
> diff --git a/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
> new file mode 100644
> index 0000000000..05b3cf7b8f
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
> @@ -0,0 +1,37 @@
> +/* Restartable Sequences Linux powerpc architecture header.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* RSEQ_SIG is a signature required before each abort handler code.
> +
> +   It is a 32-bit value that maps to actual architecture code compiled
> +   into applications and libraries.  It needs to be defined for each
> +   architecture.  When choosing this value, it needs to be taken into
> +   account that generating invalid instructions may have ill effects on
> +   tools like objdump, and may also have impact on the CPU speculative
> +   execution efficiency in some cases.
> +
> +   RSEQ_SIG uses the following trap instruction:
> +
> +   powerpc-be:    0f e5 00 0b           twui   r5,11
> +   powerpc64-le:  0b 00 e5 0f           twui   r5,11
> +   powerpc64-be:  0f e5 00 0b           twui   r5,11  */
> +
> +#define RSEQ_SIG        0x0fe5000b
> diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
> new file mode 100644
> index 0000000000..909f547825
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/rseq-internal.h
> @@ -0,0 +1,45 @@
> +/* Restartable Sequences internal API.  Linux implementation.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef RSEQ_INTERNAL_H
> +#define RSEQ_INTERNAL_H
> +
> +#include <sysdep.h>
> +#include <errno.h>
> +#include <kernel-features.h>
> +#include <stdio.h>
> +#include <sys/rseq.h>
> +
> +#ifdef RSEQ_SIG
> +static inline void
> +rseq_register_current_thread (struct pthread *self)
> +{
> +  int ret = INTERNAL_SYSCALL_CALL (rseq,
> +                                   &self->rseq_area, sizeof (self->rseq_area),
> +                                   0, RSEQ_SIG);
> +  if (INTERNAL_SYSCALL_ERROR_P (ret))
> +    THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
> +}
> +#else /* RSEQ_SIG */
> +static inline void
> +rseq_register_current_thread (struct pthread *self)
> +{
> +  THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
> +}
> +#endif /* RSEQ_SIG */
> +
> +#endif /* rseq-internal.h */
> diff --git a/sysdeps/unix/sysv/linux/s390/bits/rseq.h b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
> new file mode 100644
> index 0000000000..3030e38f40
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
> @@ -0,0 +1,37 @@
> +/* Restartable Sequences Linux s390 architecture header.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* RSEQ_SIG is a signature required before each abort handler code.
> +
> +   It is a 32-bit value that maps to actual architecture code compiled
> +   into applications and libraries.  It needs to be defined for each
> +   architecture.  When choosing this value, it needs to be taken into
> +   account that generating invalid instructions may have ill effects on
> +   tools like objdump, and may also have impact on the CPU speculative
> +   execution efficiency in some cases.
> +
> +   RSEQ_SIG uses the trap4 instruction.  As Linux does not make use of the
> +   access-register mode nor the linkage stack this instruction will always
> +   cause a special-operation exception (the trap-enabled bit in the DUCT
> +   is and will stay 0).  The instruction pattern is
> +       b2 ff 0f ff        trap4   4095(%r0)  */
> +
> +#define RSEQ_SIG        0xB2FF0FFF
> diff --git a/sysdeps/unix/sysv/linux/sys/rseq.h b/sysdeps/unix/sysv/linux/sys/rseq.h
> new file mode 100644
> index 0000000000..c8edff50d4
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/sys/rseq.h
> @@ -0,0 +1,174 @@
> +/* Restartable Sequences exported symbols.  Linux header.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +#define _SYS_RSEQ_H    1
> +
> +/* Architecture-specific rseq signature.  */
> +#include <bits/rseq.h>
> +
> +#include <stdint.h>
> +#include <sys/cdefs.h>
> +#include <bits/endian.h>
> +
> +#ifdef __has_include
> +# if __has_include ("linux/rseq.h")
> +#  define __GLIBC_HAVE_KERNEL_RSEQ
> +# endif
> +#else
> +# include <linux/version.h>
> +# if LINUX_VERSION_CODE >= KERNEL_VERSION (4, 18, 0)
> +#  define __GLIBC_HAVE_KERNEL_RSEQ
> +# endif
> +#endif
> +
> +#ifdef __GLIBC_HAVE_KERNEL_RSEQ
> +/* We use the structures declarations from the kernel headers.  */
> +# include <linux/rseq.h>
> +#else /* __GLIBC_HAVE_KERNEL_RSEQ */
> +/* We use a copy of the include/uapi/linux/rseq.h kernel header.  */
> +
> +enum rseq_cpu_id_state
> +  {
> +    RSEQ_CPU_ID_UNINITIALIZED = -1,
> +    RSEQ_CPU_ID_REGISTRATION_FAILED = -2,
> +  };
> +
> +enum rseq_flags
> +  {
> +    RSEQ_FLAG_UNREGISTER = (1 << 0),
> +  };
> +
> +enum rseq_cs_flags_bit
> +  {
> +    RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT = 0,
> +    RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT = 1,
> +    RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT = 2,
> +  };
> +
> +enum rseq_cs_flags
> +  {
> +    RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT =
> +      (1U << RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT),
> +    RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL =
> +      (1U << RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT),
> +    RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE =
> +      (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT),
> +  };
> +
> +/* struct rseq_cs is aligned on 32 bytes to ensure it is always
> +   contained within a single cache-line.  It is usually declared as
> +   link-time constant data.  */
> +struct rseq_cs
> +  {
> +    /* Version of this structure.  */
> +    uint32_t version;
> +    /* enum rseq_cs_flags.  */
> +    uint32_t flags;
> +    uint64_t start_ip;
> +    /* Offset from start_ip.  */
> +    uint64_t post_commit_offset;
> +    uint64_t abort_ip;
> +  } __attribute__ ((__aligned__ (32)));
> +
> +/* struct rseq is aligned on 32 bytes to ensure it is always
> +   contained within a single cache-line.
> +
> +   A single struct rseq per thread is allowed.  */
> +struct rseq
> +  {
> +    /* Restartable sequences cpu_id_start field.  Updated by the
> +       kernel.  Read by user-space with single-copy atomicity
> +       semantics.  This field should only be read by the thread which
> +       registered this data structure.  Aligned on 32-bit.  Always
> +       contains a value in the range of possible CPUs, although the
> +       value may not be the actual current CPU (e.g. if rseq is not
> +       initialized).  This CPU number value should always be compared
> +       against the value of the cpu_id field before performing a rseq
> +       commit or returning a value read from a data structure indexed
> +       using the cpu_id_start value.  */
> +    uint32_t cpu_id_start;
> +    /* Restartable sequences cpu_id field.  Updated by the kernel.
> +       Read by user-space with single-copy atomicity semantics.  This
> +       field should only be read by the thread which registered this
> +       data structure.  Aligned on 32-bit.  Values
> +       RSEQ_CPU_ID_UNINITIALIZED and RSEQ_CPU_ID_REGISTRATION_FAILED
> +       have a special semantic: the former means "rseq uninitialized",
> +       and latter means "rseq initialization failed".  This value is
> +       meant to be read within rseq critical sections and compared
> +       with the cpu_id_start value previously read, before performing
> +       the commit instruction, or read and compared with the
> +       cpu_id_start value before returning a value loaded from a data
> +       structure indexed using the cpu_id_start value.  */
> +    uint32_t cpu_id;
> +    /* Restartable sequences rseq_cs field.
> +
> +       Contains NULL when no critical section is active for the current
> +       thread, or holds a pointer to the currently active struct rseq_cs.
> +
> +       Updated by user-space, which sets the address of the currently
> +       active rseq_cs at the beginning of assembly instruction sequence
> +       block, and set to NULL by the kernel when it restarts an assembly
> +       instruction sequence block, as well as when the kernel detects that
> +       it is preempting or delivering a signal outside of the range
> +       targeted by the rseq_cs.  Also needs to be set to NULL by user-space
> +       before reclaiming memory that contains the targeted struct rseq_cs.
> +
> +       Read and set by the kernel.  Set by user-space with single-copy
> +       atomicity semantics.  This field should only be updated by the
> +       thread which registered this data structure.  Aligned on 64-bit.  */
> +    union
> +      {
> +        uint64_t ptr64;
> +# ifdef __LP64__
> +        uint64_t ptr;
> +# else /* __LP64__ */
> +        struct
> +          {
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +            uint32_t padding; /* Initialized to zero.  */
> +            uint32_t ptr32;
> +#  else /* LITTLE */
> +            uint32_t ptr32;
> +            uint32_t padding; /* Initialized to zero.  */
> +#  endif /* ENDIAN */
> +          } ptr;
> +# endif /* __LP64__ */
> +      } rseq_cs;
> +
> +    /* Restartable sequences flags field.
> +
> +       This field should only be updated by the thread which
> +       registered this data structure.  Read by the kernel.
> +       Mainly used for single-stepping through rseq critical sections
> +       with debuggers.
> +
> +       - RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT
> +           Inhibit instruction sequence block restart on preemption
> +           for this thread.
> +       - RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL
> +           Inhibit instruction sequence block restart on signal
> +           delivery for this thread.
> +       - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
> +           Inhibit instruction sequence block restart on migration for
> +           this thread.  */
> +    uint32_t flags;
> +  } __attribute__ ((__aligned__ (32)));
> +
> +#endif /* __GLIBC_HAVE_KERNEL_RSEQ */
> +
> +#endif /* sys/rseq.h */
> diff --git a/sysdeps/unix/sysv/linux/tst-rseq-nptl.c b/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
> new file mode 100644
> index 0000000000..d31d94445c
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
> @@ -0,0 +1,260 @@
> +/* Restartable Sequences NPTL test.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* These tests validate that rseq is registered from various execution
> +   contexts (main thread, destructor, other threads, other threads created
> +   from destructor, forked process (without exec), pthread_atfork handlers,
> +   pthread setspecific destructors, signal handlers, atexit handlers).
> +
> +   See the Linux kernel selftests for extensive rseq stress-tests.  */
> +
> +#include <stdio.h>
> +#include <support/check.h>
> +#include <support/xthread.h>
> +#include <sys/rseq.h>
> +#include <unistd.h>
> +
> +#ifdef RSEQ_SIG
> +# include <array_length.h>
> +# include <errno.h>
> +# include <error.h>
> +# include <pthread.h>
> +# include <signal.h>
> +# include <stdlib.h>
> +# include <string.h>
> +# include <support/namespace.h>
> +# include <support/xsignal.h>
> +# include <syscall.h>
> +# include <sys/types.h>
> +# include <sys/wait.h>
> +# include "tst-rseq.h"
> +
> +static pthread_key_t rseq_test_key;
> +
> +static void
> +atfork_prepare (void)
> +{
> +  if (!rseq_thread_registered ())
> +    {
> +      printf ("error: rseq not registered in pthread atfork prepare\n");
> +      support_record_failure ();
> +    }
> +}
> +
> +static void
> +atfork_parent (void)
> +{
> +  if (!rseq_thread_registered ())
> +    {
> +      printf ("error: rseq not registered in pthread atfork parent\n");
> +      support_record_failure ();
> +    }
> +}
> +
> +static void
> +atfork_child (void)
> +{
> +  if (!rseq_thread_registered ())
> +    {
> +      printf ("error: rseq not registered in pthread atfork child\n");
> +      support_record_failure ();
> +    }
> +}
> +
> +static void
> +rseq_key_destructor (void *arg)
> +{
> +  /* Cannot use deferred failure reporting after main returns.  */
> +  if (!rseq_thread_registered ())
> +    FAIL_EXIT1 ("rseq not registered in pthread key destructor");
> +}
> +
> +static void
> +atexit_handler (void)
> +{
> +  /* Cannot use deferred failure reporting after main returns.  */
> +  if (!rseq_thread_registered ())
> +    FAIL_EXIT1 ("rseq not registered in atexit handler");
> +}
> +
> +/* Used to avoid -Werror=stringop-overread warning with
> +   pthread_setspecific and GCC 11.  */
> +static char one = 1;
> +
> +static void
> +do_rseq_main_test (void)
> +{
> +  TEST_COMPARE (atexit (atexit_handler), 0);
> +  rseq_test_key = xpthread_key_create (rseq_key_destructor);
> +  TEST_COMPARE (pthread_atfork (atfork_prepare, atfork_parent, atfork_child), 0);
> +  xraise (SIGUSR1);
> +  TEST_COMPARE (pthread_setspecific (rseq_test_key, &one), 0);
> +  TEST_VERIFY_EXIT (rseq_thread_registered ());
> +}
> +
> +static void
> +cancel_routine (void *arg)
> +{
> +  if (!rseq_thread_registered ())
> +    {
> +      printf ("error: rseq not registered in cancel routine\n");
> +      support_record_failure ();
> +    }
> +}
> +
> +static pthread_barrier_t cancel_thread_barrier;
> +static pthread_cond_t cancel_thread_cond = PTHREAD_COND_INITIALIZER;
> +static pthread_mutex_t cancel_thread_mutex = PTHREAD_MUTEX_INITIALIZER;
> +
> +static void
> +test_cancel_thread (void)
> +{
> +  pthread_cleanup_push (cancel_routine, NULL);
> +  (void) xpthread_barrier_wait (&cancel_thread_barrier);
> +  /* Wait forever until cancellation.  */
> +  xpthread_cond_wait (&cancel_thread_cond, &cancel_thread_mutex);
> +  pthread_cleanup_pop (0);
> +}
> +
> +static void *
> +thread_function (void * arg)
> +{
> +  int i = (int) (intptr_t) arg;
> +
> +  xraise (SIGUSR1);
> +  if (i == 0)
> +    test_cancel_thread ();
> +  TEST_COMPARE (pthread_setspecific (rseq_test_key, &one), 0);
> +  return rseq_thread_registered () ? NULL : (void *) 1l;
> +}
> +
> +static void
> +sighandler (int sig)
> +{
> +  if (!rseq_thread_registered ())
> +    {
> +      printf ("error: rseq not registered in signal handler\n");
> +      support_record_failure ();
> +    }
> +}
> +
> +static void
> +setup_signals (void)
> +{
> +  struct sigaction sa;
> +
> +  sigemptyset (&sa.sa_mask);
> +  sigaddset (&sa.sa_mask, SIGUSR1);
> +  sa.sa_flags = 0;
> +  sa.sa_handler = sighandler;
> +  xsigaction (SIGUSR1, &sa, NULL);
> +}
> +
> +static int
> +do_rseq_threads_test (int nr_threads)
> +{
> +  pthread_t th[nr_threads];
> +  int i;
> +  int result = 0;
> +
> +  xpthread_barrier_init (&cancel_thread_barrier, NULL, 2);
> +
> +  for (i = 0; i < nr_threads; ++i)
> +    th[i] = xpthread_create (NULL, thread_function,
> +                             (void *) (intptr_t) i);
> +
> +  (void) xpthread_barrier_wait (&cancel_thread_barrier);
> +
> +  xpthread_cancel (th[0]);
> +
> +  for (i = 0; i < nr_threads; ++i)
> +    {
> +      void *v;
> +
> +      v = xpthread_join (th[i]);
> +      if (i != 0 && v != NULL)
> +        {
> +          printf ("error: join %d successful, but child failed\n", i);
> +          result = 1;
> +        }
> +      else if (i == 0 && v == NULL)
> +        {
> +          printf ("error: join %d successful, child did not fail as expected\n", i);
> +          result = 1;
> +        }
> +    }
> +
> +  xpthread_barrier_destroy (&cancel_thread_barrier);
> +
> +  return result;
> +}
> +
> +static void
> +subprocess_callback (void *closure)
> +{
> +  do_rseq_main_test ();
> +}
> +
> +static void
> +do_rseq_fork_test (void)
> +{
> +  support_isolate_in_subprocess (subprocess_callback, NULL);
> +}
> +
> +static int
> +do_rseq_test (void)
> +{
> +  int t[] = { 1, 2, 6, 5, 4, 3, 50 };
> +  int i, result = 0;
> +
> +  if (!rseq_available ())
> +    FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test");
> +  setup_signals ();
> +  xraise (SIGUSR1);
> +  do_rseq_main_test ();
> +  for (i = 0; i < array_length (t); i++)
> +    if (do_rseq_threads_test (t[i]))
> +      result = 1;
> +  do_rseq_fork_test ();
> +  return result;
> +}
> +
> +static void __attribute__ ((destructor))
> +do_rseq_destructor_test (void)
> +{
> +  /* Cannot use deferred failure reporting after main returns.  */
> +  if (do_rseq_test ())
> +    FAIL_EXIT1 ("rseq not registered within destructor");
> +  xpthread_key_delete (rseq_test_key);
> +}
> +
> +#else /* RSEQ_SIG */
> +static int
> +do_rseq_test (void)
> +{
> +  FAIL_UNSUPPORTED ("glibc does not define RSEQ_SIG, skipping test");
> +  return 0;
> +}
> +#endif /* RSEQ_SIG */
> +
> +static int
> +do_test (void)
> +{
> +  return do_rseq_test ();
> +}
> +
> +#include <support/test-driver.c>
> diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c
> new file mode 100644
> index 0000000000..926376b6a5
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-rseq.c
> @@ -0,0 +1,64 @@
> +/* Restartable Sequences single-threaded tests.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +/* These tests validate that rseq is registered from main in an executable
> +   not linked against libpthread.  */
> +
> +#include <support/check.h>
> +#include <stdio.h>
> +#include <sys/rseq.h>
> +#include <unistd.h>
> +
> +#ifdef RSEQ_SIG
> +# include <errno.h>
> +# include <error.h>
> +# include <stdlib.h>
> +# include <string.h>
> +# include <syscall.h>
> +# include "tst-rseq.h"
> +
> +static void
> +do_rseq_main_test (void)
> +{
> +  TEST_VERIFY_EXIT (rseq_thread_registered ());
> +}
> +
> +static void
> +do_rseq_test (void)
> +{
> +  if (!rseq_available ())
> +    {
> +      FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test");
> +    }
> +  do_rseq_main_test ();
> +}
> +#else /* RSEQ_SIG */
> +static void
> +do_rseq_test (void)
> +{
> +  FAIL_UNSUPPORTED ("glibc does not define RSEQ_SIG, skipping test");
> +}
> +#endif /* RSEQ_SIG */
> +
> +static int
> +do_test (void)
> +{
> +  do_rseq_test ();
> +  return 0;
> +}

Should the test possibly include a simple critical section? Maybe a while(1)
and either timesout or hits the abort handler?
Timeout -> error.
abort handler -> test passed.
> +
> +#include <support/test-driver.c>
> diff --git a/sysdeps/unix/sysv/linux/tst-rseq.h b/sysdeps/unix/sysv/linux/tst-rseq.h
> new file mode 100644
> index 0000000000..a476c316fc
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/tst-rseq.h
> @@ -0,0 +1,57 @@
> +/* Restartable Sequences tests header.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <errno.h>
> +#include <error.h>
> +#include <stdbool.h>
> +#include <stdint.h>
> +#include <support/check.h>
> +#include <syscall.h>
> +#include <sys/rseq.h>
> +#include <tls.h>
> +
> +static inline bool
> +rseq_thread_registered (void)
> +{
> +  return THREAD_GETMEM_VOLATILE (THREAD_SELF, rseq_area.cpu_id) >= 0;
> +}
> +
> +static inline int
> +sys_rseq (struct rseq *rseq_abi, uint32_t rseq_len, int flags, uint32_t sig)
> +{
> +  return syscall (__NR_rseq, rseq_abi, rseq_len, flags, sig);
> +}
> +
> +static inline bool
> +rseq_available (void)
> +{
> +  int rc;
> +
> +  rc = sys_rseq (NULL, 0, 0, 0);
> +  if (rc != -1)
> +    FAIL_EXIT1 ("Unexpected rseq return value %d", rc);
> +  switch (errno)
> +    {
> +    case ENOSYS:
> +      return false;
> +    case EINVAL:
> +      /* rseq is implemented, but detected an invalid rseq_len parameter.  */
> +      return true;
> +    default:
> +      FAIL_EXIT1 ("Unexpected rseq error %s", strerror (errno));
> +    }
> +}
> diff --git a/sysdeps/unix/sysv/linux/x86/bits/rseq.h b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
> new file mode 100644
> index 0000000000..9fc909e7c8
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
> @@ -0,0 +1,30 @@
> +/* Restartable Sequences Linux x86 architecture header.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* RSEQ_SIG is a signature required before each abort handler code.
> +
> +   RSEQ_SIG is used with the following reserved undefined instructions, which
> +   trap in user-space:
> +
> +   x86-32:    0f b9 3d 53 30 05 53      ud1    0x53053053,%edi
> +   x86-64:    0f b9 3d 53 30 05 53      ud1    0x53053053(%rip),%edi  */
> +
> +#define RSEQ_SIG        0x53053053
> --
> 2.33.1
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] nptl: Add rseq registration
  2021-12-08 23:27       ` Siddhesh Poyarekar
@ 2021-12-09  7:42         ` Florian Weimer
  2021-12-09  8:01           ` Siddhesh Poyarekar
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2021-12-09  7:42 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: libc-alpha

* Siddhesh Poyarekar:

> On 12/8/21 23:38, Florian Weimer wrote:
>> * Siddhesh Poyarekar:
>> 
>>>> +#ifdef RSEQ_SIG
>>>> +static inline void
>>>> +rseq_register_current_thread (struct pthread *self)
>>>> +{
>>>> +  int ret = INTERNAL_SYSCALL_CALL (rseq,
>>>> +                                   &self->rseq_area, sizeof (self->rseq_area),
>>>> +                                   0, RSEQ_SIG);
>>>> +  if (INTERNAL_SYSCALL_ERROR_P (ret))
>>>> +    THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
>>>
>>> Why can't we just leave it as the kernel did when it failed the
>>> syscall?
>> The kernel definitely won't write anything if the failure is ENOSYS.
>> I
>> don't expect the kernel to write something for the other failures,
>> either.
>
> OK, I interpreted the from the outdated manpage patch[1] that the
> kernel ensures that uninitialized cpu_id will be read as -1.  I read
> the rseq implementation in the kernel and saw that there are a number
> of error paths where the kernel simply returns without touching the
> user memory.   I suppose what they meant by "uninitialized" in the
> manpage is actually "reset after unregister", which is odd.
>
> In any case, what I meant to eventually get at (sorry I wasn't
> specific; I wrote both patch reviews together and didn't realize
> they'd be read as separate emails!) is that RSEQ_CPU_ID_UNINITIALIZED
> seemed enough for all use cases and RSEQ_CPU_ID_REGISTRATION_FAILED
> seemed unnecessary.

Yes, but the constant is (also) defined in the UAPI headers, so it's
value is fixed.  And RSEQ_CPU_ID_REGISTRATION_FAILED (that is, -2)
is closer to the behavior we want to trigger in application (that there
is nothing to register because we already tried and failed).

Thanks,
Florian


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/8] nptl: Add rseq registration
  2021-12-09  7:42         ` Florian Weimer
@ 2021-12-09  8:01           ` Siddhesh Poyarekar
  0 siblings, 0 replies; 33+ messages in thread
From: Siddhesh Poyarekar @ 2021-12-09  8:01 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On 12/9/21 13:12, Florian Weimer wrote:
> * Siddhesh Poyarekar:
> 
>> On 12/8/21 23:38, Florian Weimer wrote:
>>> * Siddhesh Poyarekar:
>>>
>>>>> +#ifdef RSEQ_SIG
>>>>> +static inline void
>>>>> +rseq_register_current_thread (struct pthread *self)
>>>>> +{
>>>>> +  int ret = INTERNAL_SYSCALL_CALL (rseq,
>>>>> +                                   &self->rseq_area, sizeof (self->rseq_area),
>>>>> +                                   0, RSEQ_SIG);
>>>>> +  if (INTERNAL_SYSCALL_ERROR_P (ret))
>>>>> +    THREAD_SETMEM (self, rseq_area.cpu_id, RSEQ_CPU_ID_REGISTRATION_FAILED);
>>>>
>>>> Why can't we just leave it as the kernel did when it failed the
>>>> syscall?
>>> The kernel definitely won't write anything if the failure is ENOSYS.
>>> I
>>> don't expect the kernel to write something for the other failures,
>>> either.
>>
>> OK, I interpreted the from the outdated manpage patch[1] that the
>> kernel ensures that uninitialized cpu_id will be read as -1.  I read
>> the rseq implementation in the kernel and saw that there are a number
>> of error paths where the kernel simply returns without touching the
>> user memory.   I suppose what they meant by "uninitialized" in the
>> manpage is actually "reset after unregister", which is odd.
>>
>> In any case, what I meant to eventually get at (sorry I wasn't
>> specific; I wrote both patch reviews together and didn't realize
>> they'd be read as separate emails!) is that RSEQ_CPU_ID_UNINITIALIZED
>> seemed enough for all use cases and RSEQ_CPU_ID_REGISTRATION_FAILED
>> seemed unnecessary.
> 
> Yes, but the constant is (also) defined in the UAPI headers, so it's
> value is fixed.  And RSEQ_CPU_ID_REGISTRATION_FAILED (that is, -2)
> is closer to the behavior we want to trigger in application (that there
> is nothing to register because we already tried and failed).

OK, I see it in the headers, sorry.  Once again I assumed only 
RSEQ_CPU_ID_UNINITIALIZED was defined because the man page didn't 
specify it :/

It's redundant IMO, but that's a Linux API problem.  No objections from 
me then.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 6/8] nptl: Add glibc.pthread.rseq tunable to control rseq registration
  2021-12-08 18:03   ` Siddhesh Poyarekar
@ 2021-12-09  8:03     ` Siddhesh Poyarekar
  0 siblings, 0 replies; 33+ messages in thread
From: Siddhesh Poyarekar @ 2021-12-09  8:03 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 12/8/21 23:33, Siddhesh Poyarekar wrote:
> On 12/7/21 18:32, Florian Weimer via Libc-alpha wrote:
>> This tunable allows applications to register the rseq area instead
>> of glibc.

This is OK overall too, since I had misunderstood 4/8.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] nptl: Add <thread_pointer.h> for defining __thread_pointer
  2021-12-08 17:55     ` Florian Weimer
@ 2021-12-09 11:52       ` Szabolcs Nagy
  0 siblings, 0 replies; 33+ messages in thread
From: Szabolcs Nagy @ 2021-12-09 11:52 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 12/08/2021 18:55, Florian Weimer wrote:
> * Szabolcs Nagy:
> > The 12/07/2021 14:00, Florian Weimer via Libc-alpha wrote:
> >> <tls.h> already contains a definition that is quite similar,
> >> but it is not consistent across architectures.
> >> 
> >> Only architectures for which rseq support is added are covered.
> >
> > This looks ok.
> >
> > It's an annoying gcc bug that __builtin_thread_pointer
> > does not work consistently across targets.
> >
> > Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
> 
> We don't need m68k for rseq, so I haven't added it, but I saw that
> __thread_pointer is actually a system call there.  Maybe that's why it's
> not a universal GCC feature.  Furthermore, for many ABIs, the thread
> pointer is somewhat implicit.  On x86, it took some discussion to figure
> out that we actually have a canonical notion of a thread pointer.  On
> some other targets, the thread pointer is stored explicitly in a
> (system) register, but it actually points to nowhere, so that local-exec
> TLS access can make better use of immediate instruction operands.

i think local-exec tls access has to expose
some notion of thread pointer to the compiler
(from which a tls variable is at fixed offset).

whatever that notion is, __builtin_thread_pointer
can be defined based on that and if there is
nothing exposed then presumably tls access
relies on libc apis so __builtin_thread_pointer
can also rely on a libc api (or syscall).

tp is useful as a thread identifier and for fixed
offset tcb abis. (especially within libc and
compiler runtimes)

> 
> It's also annoying that __has_builtin (__builtin_thread_pointer)
> evaluates to true even for GCC targets where actually using
> __builtin_thread_pointer () results in a compiler error.
> 
> In the future, we could install this as <sys/thread_pointer.h> if people
> think it's useful (not just in an rseq context).

yeah, i would prefer gcc to be fixed.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/8] nptl: Add public rseq symbols and <sys/rseq.h>
  2021-12-07 13:03 ` [PATCH 7/8] nptl: Add public rseq symbols and <sys/rseq.h> Florian Weimer
  2021-12-08 17:34   ` Szabolcs Nagy
@ 2021-12-09 12:26   ` Szabolcs Nagy
  2021-12-09 12:34     ` Florian Weimer
  2021-12-09 12:36     ` Szabolcs Nagy
  1 sibling, 2 replies; 33+ messages in thread
From: Szabolcs Nagy @ 2021-12-09 12:26 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 12/07/2021 14:03, Florian Weimer via Libc-alpha wrote:
> diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c
> index 926376b6a5..572c11166f 100644
> --- a/sysdeps/unix/sysv/linux/tst-rseq.c
> +++ b/sysdeps/unix/sysv/linux/tst-rseq.c
> @@ -29,12 +29,20 @@
>  # include <stdlib.h>
>  # include <string.h>
>  # include <syscall.h>
> +# include <thread_pointer.h>
> +# include <tls.h>
>  # include "tst-rseq.h"
>  
>  static void
>  do_rseq_main_test (void)
>  {
> +  struct pthread *pd = THREAD_SELF;
> +
>    TEST_VERIFY_EXIT (rseq_thread_registered ());
> +  TEST_COMPARE (__rseq_flags, 0);
> +  TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset
> +               == (char *) &pd->rseq_area);
> +  TEST_COMPARE (__rseq_size, sizeof (pd->rseq_area));
>  }

sorry i just tested the committed patches on 32bit arm
(on 64bit kernel) and there is a tls alignment issue

FAIL: nptl/tst-tls3
FAIL: nptl/tst-tls3-malloc
FAIL: nptl/tst-tls5

outputs:

initial thread's struct pthread not aligned enough
initial thread's struct pthread not aligned enough
pthread_self () = 0xf7e2d350, size 1408, align 32, WRONG ALIGNMENT

and rseq registration fails with EINVAL causing

FAIL: misc/tst-rseq

output is

../sysdeps/unix/sysv/linux/tst-rseq.c:45: numeric comparison failure
   left: 0 (0x0); from: __rseq_size
  right: 32 (0x20); from: sizeof (pd->rseq_area)
error: 1 test failures

strace has

...
set_tls(0xf7e41e30)                     = 0
set_tid_address(0xf7e41918)             = 1181659
set_robust_list(0xf7e41920, 12)         = 0
syscall_0x18e(0xf7e41e10, 0x20, 0, 0xe7f5def3, 0xf7e418b0, 0xf7e41e30) = -1 (errno 22)
mprotect(0xf7df6000, 8192, PROT_READ)   = 0
mprotect(0xf7e15000, 4096, PROT_READ)   = 0
mprotect(0xf7e44000, 8192, PROT_READ)   = 0
ugetrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
mmap2(NULL, 8, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0xf7e40000
getrandom("\x44\x21\x97\xf7", 4, GRND_NONBLOCK) = 4
syscall_0x18e(0, 0, 0, 0, 0xffc1be68, 0x1) = -1 (errno 22)
write(1, "../sysdeps/unix/sysv/linux/tst-r"..., 69) = 69
write(1, "   left: ", 9)                = 9
...

0x18e is __NR_rseq and errno 22 is EINVAL.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/8] nptl: Add public rseq symbols and <sys/rseq.h>
  2021-12-09 12:26   ` Szabolcs Nagy
@ 2021-12-09 12:34     ` Florian Weimer
  2021-12-09 12:36     ` Szabolcs Nagy
  1 sibling, 0 replies; 33+ messages in thread
From: Florian Weimer @ 2021-12-09 12:34 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

* Szabolcs Nagy:

> The 12/07/2021 14:03, Florian Weimer via Libc-alpha wrote:
>> diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c
>> index 926376b6a5..572c11166f 100644
>> --- a/sysdeps/unix/sysv/linux/tst-rseq.c
>> +++ b/sysdeps/unix/sysv/linux/tst-rseq.c
>> @@ -29,12 +29,20 @@
>>  # include <stdlib.h>
>>  # include <string.h>
>>  # include <syscall.h>
>> +# include <thread_pointer.h>
>> +# include <tls.h>
>>  # include "tst-rseq.h"
>>  
>>  static void
>>  do_rseq_main_test (void)
>>  {
>> +  struct pthread *pd = THREAD_SELF;
>> +
>>    TEST_VERIFY_EXIT (rseq_thread_registered ());
>> +  TEST_COMPARE (__rseq_flags, 0);
>> +  TEST_VERIFY ((char *) __thread_pointer () + __rseq_offset
>> +               == (char *) &pd->rseq_area);
>> +  TEST_COMPARE (__rseq_size, sizeof (pd->rseq_area));
>>  }
>
> sorry i just tested the committed patches on 32bit arm
> (on 64bit kernel) and there is a tls alignment issue
>
> FAIL: nptl/tst-tls3
> FAIL: nptl/tst-tls3-malloc
> FAIL: nptl/tst-tls5
>
> outputs:
>
> initial thread's struct pthread not aligned enough
> initial thread's struct pthread not aligned enough
> pthread_self () = 0xf7e2d350, size 1408, align 32, WRONG ALIGNMENT
>
> and rseq registration fails with EINVAL causing
>
> FAIL: misc/tst-rseq

I missed that we have both TCB_ALIGNMENT and TLS_TCB_ALIGN.
I think we need to remove the latter.  I will try to work on a patch
later today.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/8] nptl: Add public rseq symbols and <sys/rseq.h>
  2021-12-09 12:26   ` Szabolcs Nagy
  2021-12-09 12:34     ` Florian Weimer
@ 2021-12-09 12:36     ` Szabolcs Nagy
  1 sibling, 0 replies; 33+ messages in thread
From: Szabolcs Nagy @ 2021-12-09 12:36 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

The 12/09/2021 12:26, Szabolcs Nagy via Libc-alpha wrote:
> sorry i just tested the committed patches on 32bit arm
> (on 64bit kernel) and there is a tls alignment issue
> 
> FAIL: nptl/tst-tls3
> FAIL: nptl/tst-tls3-malloc
> FAIL: nptl/tst-tls5
> 
> outputs:
> 
> initial thread's struct pthread not aligned enough
> initial thread's struct pthread not aligned enough
> pthread_self () = 0xf7e2d350, size 1408, align 32, WRONG ALIGNMENT

it seems the TCB_ALIGNMENT patch does not work.
there is a separate TLS_TCB_ALIGN that is used
in elf/dl-tls.c _dl_determine_tlsoffset
i guess max(TCB_ALIGNMENT, TLS_TCB_ALIGN) should
be used.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/8] Extensible rseq integration
  2021-12-07 12:59 [PATCH v2 0/8] Extensible rseq integration Florian Weimer
                   ` (7 preceding siblings ...)
  2021-12-07 13:04 ` [PATCH v2 8/8] nptl: rseq failure after registration on main thread is fatal Florian Weimer
@ 2022-02-01 15:21 ` Rich Felker
  2022-02-01 16:36   ` Florian Weimer
  8 siblings, 1 reply; 33+ messages in thread
From: Rich Felker @ 2022-02-01 15:21 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Tue, Dec 07, 2021 at 01:59:26PM +0100, Florian Weimer via Libc-alpha wrote:
> This series integrates the previous posted v2 for <thread_pointer.h>.
> 
> It incorporates Mathieu's and Paul E. McKenney suggestion to use a
> volatile read for rseq_abi.cpu_id access, using a new
> THREAD_GETMEM_VOLATILE macro.
> 
> The last patch in the series makes rseq registration consistent across
> threads.
> 
> Florian Weimer (8):
>   nptl: Add <thread_pointer.h> for defining __thread_pointer
>   nptl: Introduce <tcb-access.h> for THREAD_* accessors
>   nptl: Introduce THREAD_GETMEM_VOLATILE
>   nptl: Add rseq registration
>   Linux: Use rseq to accelerate sched_getcpu
>   nptl: Add glibc.pthread.rseq tunable to control rseq registration
>   nptl: Add public rseq symbols and <sys/rseq.h>
>   nptl: rseq failure after registration on main thread is fatal

I'm sorry for bringing this up so late; I wasn't aware that redesign
of the rseq ABI was taking place. I wish this had been discussed in a
cross-libc venue, since, in its current form, I don't think the ABI is
suitable for inclusion in, or use as a third-party library with, musl.

The most pressing issue I see is that it does not admit lazy
registration, which precludes it being implemented outside of libc
(because it has to hook into pthread_create) and imposes runtime cost
on programs which do not use it. RSEQ_CPU_ID_UNINITIALIZED exists to
inform the application about an uninitialized state, but the
application has no way to request an attempt at registration upon
seeing it. I think that would be easy to add. Basically it's just
making the syscall, which a consumer of the ABI could in theory do
itself, but it's probably best not to have it do that and instead have
registration mediated through the ABI/through libc.

Related to this, if rseq is implemented outside of libc, I'm not sure
if there's a safe way to ensure it's unregistered prior to thread
exit. It may already be possible but I haven't sufficiently convinced
myself.

On another issue, while this isn't entirely a show-stopper, I'm not a
fan of requiring constant __rseq_offset. This comes across as an
instance-specific hack to make up for GD TLS being slow, when we
already have a fully general solution to that which isn't being
deployed: TLSDESC. As it stands in the current ABI, whatever library
is providing rseq must be present at application startup; it can't be
dlopened. And due to the ABI this applies *even if* we just wanted to
make rseq always-fail in that case. The ABI simply doesn't admit not
having memory pre-reserved for every thread (note: the size is
something like a +30% increase to musl's per-thread memory usage and
will surely increase over time, which is a lot for something we don't
expect the vast majority of applications to use).

One minor and hopefully non-controversial declared-ABI issue I see is
that the __rseq_offset etc. objects are declared const, with a
pre-relro access hack used to modify them at runtime. This is
incompatible with LTO and static linking. If protecting them is
desired, they should be declared non-const but live in non-modifiable
memory, like string literals do. Otherwise a static linking LTO
compiler is free to copy the initial values directly into code.

I'm not sure what the right thing to do on the verge of release is. If
it were my choice, I would hold it back and wait until it was better
reviewed and these issues worked out before making it public API/ABI,
but I don't know what glibc's constraints here are and how to best
weigh them against the ability to revise this ABI after release. Most
of these things I think *are* of the sort that can be fixed in
non-breaking ways, except that applications written to the current
version might need to adjust before they can use a version of the
API/ABI we'd be willing to adopt in musl.

Rich

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/8] Extensible rseq integration
  2022-02-01 15:21 ` [PATCH v2 0/8] Extensible rseq integration Rich Felker
@ 2022-02-01 16:36   ` Florian Weimer
  2022-02-03  0:37     ` Carlos O'Donell
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer @ 2022-02-01 16:36 UTC (permalink / raw)
  To: Rich Felker; +Cc: libc-alpha, Mathieu Desnoyers, Carlos O'Donell

* Rich Felker:

> On Tue, Dec 07, 2021 at 01:59:26PM +0100, Florian Weimer via Libc-alpha wrote:
>> This series integrates the previous posted v2 for <thread_pointer.h>.
>> 
>> It incorporates Mathieu's and Paul E. McKenney suggestion to use a
>> volatile read for rseq_abi.cpu_id access, using a new
>> THREAD_GETMEM_VOLATILE macro.
>> 
>> The last patch in the series makes rseq registration consistent across
>> threads.
>> 
>> Florian Weimer (8):
>>   nptl: Add <thread_pointer.h> for defining __thread_pointer
>>   nptl: Introduce <tcb-access.h> for THREAD_* accessors
>>   nptl: Introduce THREAD_GETMEM_VOLATILE
>>   nptl: Add rseq registration
>>   Linux: Use rseq to accelerate sched_getcpu
>>   nptl: Add glibc.pthread.rseq tunable to control rseq registration
>>   nptl: Add public rseq symbols and <sys/rseq.h>
>>   nptl: rseq failure after registration on main thread is fatal
>
> I'm sorry for bringing this up so late; I wasn't aware that redesign
> of the rseq ABI was taking place. I wish this had been discussed in a
> cross-libc venue, since, in its current form, I don't think the ABI is
> suitable for inclusion in, or use as a third-party library with, musl.

Well, I Cc:ed you on the original proposal in November, and cross-posted
it to linux-api as well.

> The most pressing issue I see is that it does not admit lazy
> registration, which precludes it being implemented outside of libc
> (because it has to hook into pthread_create) and imposes runtime cost
> on programs which do not use it. RSEQ_CPU_ID_UNINITIALIZED exists to
> inform the application about an uninitialized state, but the
> application has no way to request an attempt at registration upon
> seeing it. I think that would be easy to add. Basically it's just
> making the syscall, which a consumer of the ABI could in theory do
> itself, but it's probably best not to have it do that and instead have
> registration mediated through the ABI/through libc.

I rejected that because the programming model is too complex: In the
extreme, a library that observes rseq support on the main thread may be
called again from another thread where rseq is not yet enabled, and
cannot be enabled.

I think it is also necessary to enable it unconditionally to force
people to actually implement support for it in their tools (e.g., CRIU).
Otherwise we'll never get to the point where it is reliable.  I doubt
we'd have learned about the CRIU issue by now unless we took that step.

> Related to this, if rseq is implemented outside of libc, I'm not sure
> if there's a safe way to ensure it's unregistered prior to thread
> exit. It may already be possible but I haven't sufficiently convinced
> myself.

I expect that asking for rseq to be implemented outside of libc is like
asking for robust mutexes to be implemented outside libc: it's really
pushing what can be done in a library.

> On another issue, while this isn't entirely a show-stopper, I'm not a
> fan of requiring constant __rseq_offset. This comes across as an
> instance-specific hack to make up for GD TLS being slow, when we
> already have a fully general solution to that which isn't being
> deployed: TLSDESC. As it stands in the current ABI, whatever library
> is providing rseq must be present at application startup; it can't be
> dlopened. And due to the ABI this applies *even if* we just wanted to
> make rseq always-fail in that case. The ABI simply doesn't admit not
> having memory pre-reserved for every thread (note: the size is
> something like a +30% increase to musl's per-thread memory usage and
> will surely increase over time, which is a lot for something we don't
> expect the vast majority of applications to use).

If the memory is not allocated, __rseq_size can be set to 0.

> One minor and hopefully non-controversial declared-ABI issue I see is
> that the __rseq_offset etc. objects are declared const, with a
> pre-relro access hack used to modify them at runtime. This is
> incompatible with LTO and static linking. If protecting them is
> desired, they should be declared non-const but live in non-modifiable
> memory, like string literals do. Otherwise a static linking LTO
> compiler is free to copy the initial values directly into code.

Yes, you'll need a compiler barrier with LTO.  It's not different from
other types of relocations.

> I'm not sure what the right thing to do on the verge of release is. If
> it were my choice, I would hold it back and wait until it was better
> reviewed and these issues worked out before making it public API/ABI,
> but I don't know what glibc's constraints here are and how to best
> weigh them against the ability to revise this ABI after release. Most
> of these things I think *are* of the sort that can be fixed in
> non-breaking ways, except that applications written to the current
> version might need to adjust before they can use a version of the
> API/ABI we'd be willing to adopt in musl.

Quoting for Mathieu's benefit.  Also Cc:ing Carlos as the release
manager.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/8] Extensible rseq integration
  2022-02-01 16:36   ` Florian Weimer
@ 2022-02-03  0:37     ` Carlos O'Donell
  0 siblings, 0 replies; 33+ messages in thread
From: Carlos O'Donell @ 2022-02-03  0:37 UTC (permalink / raw)
  To: Florian Weimer, Rich Felker; +Cc: libc-alpha, Mathieu Desnoyers

On 2/1/22 11:36, Florian Weimer wrote:
> * Rich Felker:
> 
>> On Tue, Dec 07, 2021 at 01:59:26PM +0100, Florian Weimer via Libc-alpha wrote:
>>> This series integrates the previous posted v2 for <thread_pointer.h>.
>>>
>>> It incorporates Mathieu's and Paul E. McKenney suggestion to use a
>>> volatile read for rseq_abi.cpu_id access, using a new
>>> THREAD_GETMEM_VOLATILE macro.
>>>
>>> The last patch in the series makes rseq registration consistent across
>>> threads.
>>>
>>> Florian Weimer (8):
>>>   nptl: Add <thread_pointer.h> for defining __thread_pointer
>>>   nptl: Introduce <tcb-access.h> for THREAD_* accessors
>>>   nptl: Introduce THREAD_GETMEM_VOLATILE
>>>   nptl: Add rseq registration
>>>   Linux: Use rseq to accelerate sched_getcpu
>>>   nptl: Add glibc.pthread.rseq tunable to control rseq registration
>>>   nptl: Add public rseq symbols and <sys/rseq.h>
>>>   nptl: rseq failure after registration on main thread is fatal
>>
>> I'm sorry for bringing this up so late; I wasn't aware that redesign
>> of the rseq ABI was taking place. I wish this had been discussed in a
>> cross-libc venue, since, in its current form, I don't think the ABI is
>> suitable for inclusion in, or use as a third-party library with, musl.
> 
> Well, I Cc:ed you on the original proposal in November, and cross-posted
> it to linux-api as well.
> 
>> The most pressing issue I see is that it does not admit lazy
>> registration, which precludes it being implemented outside of libc
>> (because it has to hook into pthread_create) and imposes runtime cost
>> on programs which do not use it. RSEQ_CPU_ID_UNINITIALIZED exists to
>> inform the application about an uninitialized state, but the
>> application has no way to request an attempt at registration upon
>> seeing it. I think that would be easy to add. Basically it's just
>> making the syscall, which a consumer of the ABI could in theory do
>> itself, but it's probably best not to have it do that and instead have
>> registration mediated through the ABI/through libc.
> 
> I rejected that because the programming model is too complex: In the
> extreme, a library that observes rseq support on the main thread may be
> called again from another thread where rseq is not yet enabled, and
> cannot be enabled.
> 
> I think it is also necessary to enable it unconditionally to force
> people to actually implement support for it in their tools (e.g., CRIU).
> Otherwise we'll never get to the point where it is reliable.  I doubt
> we'd have learned about the CRIU issue by now unless we took that step.

Agreed.

>> Related to this, if rseq is implemented outside of libc, I'm not sure
>> if there's a safe way to ensure it's unregistered prior to thread
>> exit. It may already be possible but I haven't sufficiently convinced
>> myself.
> 
> I expect that asking for rseq to be implemented outside of libc is like
> asking for robust mutexes to be implemented outside libc: it's really
> pushing what can be done in a library.

This is a design decision that we made in glibc.

>> On another issue, while this isn't entirely a show-stopper, I'm not a
>> fan of requiring constant __rseq_offset. This comes across as an
>> instance-specific hack to make up for GD TLS being slow, when we
>> already have a fully general solution to that which isn't being
>> deployed: TLSDESC. As it stands in the current ABI, whatever library
>> is providing rseq must be present at application startup; it can't be
>> dlopened. And due to the ABI this applies *even if* we just wanted to
>> make rseq always-fail in that case. The ABI simply doesn't admit not
>> having memory pre-reserved for every thread (note: the size is
>> something like a +30% increase to musl's per-thread memory usage and
>> will surely increase over time, which is a lot for something we don't
>> expect the vast majority of applications to use).
> 
> If the memory is not allocated, __rseq_size can be set to 0.
> 
>> One minor and hopefully non-controversial declared-ABI issue I see is
>> that the __rseq_offset etc. objects are declared const, with a
>> pre-relro access hack used to modify them at runtime. This is
>> incompatible with LTO and static linking. If protecting them is
>> desired, they should be declared non-const but live in non-modifiable
>> memory, like string literals do. Otherwise a static linking LTO
>> compiler is free to copy the initial values directly into code.
> 
> Yes, you'll need a compiler barrier with LTO.  It's not different from
> other types of relocations.

Agreed.

At the language level the offset is constant.

Of the two choices, I think that making __rseq_offset non-const is pessimistic.

LTO and static linking must be aware of details outside of the language level
and may need to handle those details in an implementation defined manner.

>> I'm not sure what the right thing to do on the verge of release is. If
>> it were my choice, I would hold it back and wait until it was better
>> reviewed and these issues worked out before making it public API/ABI,
>> but I don't know what glibc's constraints here are and how to best
>> weigh them against the ability to revise this ABI after release. Most
>> of these things I think *are* of the sort that can be fixed in
>> non-breaking ways, except that applications written to the current
>> version might need to adjust before they can use a version of the
>> API/ABI we'd be willing to adopt in musl.
> 
> Quoting for Mathieu's benefit.  Also Cc:ing Carlos as the release
> manager.

We have spent ~1.5 years correcting rseq integration since the initial attempt
in July 2020. The inclusion of __rseq is ready for glibc 2.35.

I plan to make the release with the ABI included.

I think we can continue to work with the musl community on integration issues.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-02-03  0:37 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-07 12:59 [PATCH v2 0/8] Extensible rseq integration Florian Weimer
2021-12-07 13:00 ` [PATCH 1/8] nptl: Add <thread_pointer.h> for defining __thread_pointer Florian Weimer
2021-12-08 11:05   ` Szabolcs Nagy
2021-12-08 17:55     ` Florian Weimer
2021-12-09 11:52       ` Szabolcs Nagy
2021-12-07 13:00 ` [PATCH v2 2/8] nptl: Introduce <tcb-access.h> for THREAD_* accessors Florian Weimer
2021-12-08 11:09   ` Szabolcs Nagy
2021-12-07 13:00 ` [PATCH v2 3/8] nptl: Introduce THREAD_GETMEM_VOLATILE Florian Weimer
2021-12-08 11:23   ` Szabolcs Nagy
2021-12-07 13:01 ` [PATCH 4/8] nptl: Add rseq registration Florian Weimer
2021-12-08 16:51   ` Szabolcs Nagy
2021-12-08 18:03   ` Siddhesh Poyarekar
2021-12-08 18:08     ` Florian Weimer
2021-12-08 23:27       ` Siddhesh Poyarekar
2021-12-09  7:42         ` Florian Weimer
2021-12-09  8:01           ` Siddhesh Poyarekar
2021-12-09  1:51   ` Noah Goldstein
2021-12-07 13:02 ` [PATCH v2 5/8] Linux: Use rseq to accelerate sched_getcpu Florian Weimer
2021-12-08 16:53   ` Szabolcs Nagy
2021-12-07 13:02 ` [PATCH v2 6/8] nptl: Add glibc.pthread.rseq tunable to control rseq registration Florian Weimer
2021-12-08 17:22   ` Szabolcs Nagy
2021-12-08 18:03   ` Siddhesh Poyarekar
2021-12-09  8:03     ` Siddhesh Poyarekar
2021-12-07 13:03 ` [PATCH 7/8] nptl: Add public rseq symbols and <sys/rseq.h> Florian Weimer
2021-12-08 17:34   ` Szabolcs Nagy
2021-12-09 12:26   ` Szabolcs Nagy
2021-12-09 12:34     ` Florian Weimer
2021-12-09 12:36     ` Szabolcs Nagy
2021-12-07 13:04 ` [PATCH v2 8/8] nptl: rseq failure after registration on main thread is fatal Florian Weimer
2021-12-08 17:36   ` Szabolcs Nagy
2022-02-01 15:21 ` [PATCH v2 0/8] Extensible rseq integration Rich Felker
2022-02-01 16:36   ` Florian Weimer
2022-02-03  0:37     ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).