* [PATCH 0/4] Restartable Sequences support for glibc 2.30 @ 2019-02-12 19:43 Mathieu Desnoyers 2019-02-12 19:43 ` [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) Mathieu Desnoyers ` (5 more replies) 0 siblings, 6 replies; 52+ messages in thread From: Mathieu Desnoyers @ 2019-02-12 19:43 UTC (permalink / raw) To: Carlos O'Donell Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Mathieu Desnoyers Hi, This patchset implements basic support for the "rseq" Linux system call in glibc by registering the rseq TLS abi. One patch in this series modifies sched_getcpu() to speed up reading the current CPU number by reading __rseq_abi.cpu_id when rseq is available. The only point that still appears to not reach concensus is whether it's acceptable to define the RSEQ_SIG code signature for each architecture. If I missed other points that failed to reach concensus, please let me know! Please consider for inclusion into glibc, Thanks, Mathieu Mathieu Desnoyers (4): glibc: Perform rseq(2) registration at C startup and thread creation (v7) glibc: sched_getcpu(): use rseq cpu_id TLS on Linux support record failure: allow use from constructor rseq registration tests (v2) NEWS | 11 + csu/libc-start.c | 12 +- misc/Makefile | 3 +- misc/rseq-internal.h | 34 ++ nptl/pthread_create.c | 9 + support/check.h | 4 + support/support_record_failure.c | 18 +- sysdeps/unix/sysv/linux/Makefile | 8 +- sysdeps/unix/sysv/linux/Versions | 4 + sysdeps/unix/sysv/linux/aarch64/bits/rseq.h | 24 ++ sysdeps/unix/sysv/linux/aarch64/libc.abilist | 2 + sysdeps/unix/sysv/linux/alpha/libc.abilist | 2 + sysdeps/unix/sysv/linux/arm/bits/rseq.h | 24 ++ sysdeps/unix/sysv/linux/arm/libc.abilist | 2 + sysdeps/unix/sysv/linux/bits/rseq.h | 24 ++ sysdeps/unix/sysv/linux/hppa/libc.abilist | 2 + sysdeps/unix/sysv/linux/i386/libc.abilist | 2 + sysdeps/unix/sysv/linux/ia64/libc.abilist | 2 + .../sysv/linux/m68k/coldfire/libc.abilist | 2 + .../unix/sysv/linux/m68k/m680x0/libc.abilist | 2 + .../unix/sysv/linux/microblaze/libc.abilist | 2 + sysdeps/unix/sysv/linux/mips/bits/rseq.h | 24 ++ .../sysv/linux/mips/mips32/fpu/libc.abilist | 2 + .../sysv/linux/mips/mips32/nofpu/libc.abilist | 2 + .../sysv/linux/mips/mips64/n32/libc.abilist | 2 + .../sysv/linux/mips/mips64/n64/libc.abilist | 2 + sysdeps/unix/sysv/linux/nios2/libc.abilist | 2 + sysdeps/unix/sysv/linux/powerpc/bits/rseq.h | 24 ++ .../linux/powerpc/powerpc32/fpu/libc.abilist | 2 + .../powerpc/powerpc32/nofpu/libc.abilist | 2 + .../linux/powerpc/powerpc64/be/libc.abilist | 2 + .../linux/powerpc/powerpc64/le/libc.abilist | 2 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 2 + sysdeps/unix/sysv/linux/rseq-internal.h | 91 +++++ sysdeps/unix/sysv/linux/rseq-sym.c | 54 +++ sysdeps/unix/sysv/linux/s390/bits/rseq.h | 24 ++ .../unix/sysv/linux/s390/s390-32/libc.abilist | 2 + .../unix/sysv/linux/s390/s390-64/libc.abilist | 2 + sysdeps/unix/sysv/linux/sched_getcpu.c | 25 +- sysdeps/unix/sysv/linux/sh/libc.abilist | 2 + .../sysv/linux/sparc/sparc32/libc.abilist | 2 + .../sysv/linux/sparc/sparc64/libc.abilist | 2 + sysdeps/unix/sysv/linux/sys/rseq.h | 65 +++ sysdeps/unix/sysv/linux/tst-rseq-nptl.c | 381 ++++++++++++++++++ sysdeps/unix/sysv/linux/tst-rseq.c | 110 +++++ sysdeps/unix/sysv/linux/x86/bits/rseq.h | 24 ++ .../unix/sysv/linux/x86_64/64/libc.abilist | 2 + .../unix/sysv/linux/x86_64/x32/libc.abilist | 2 + 48 files changed, 1034 insertions(+), 15 deletions(-) create mode 100644 misc/rseq-internal.h create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/arm/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/mips/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h create mode 100644 sysdeps/unix/sysv/linux/rseq-sym.c create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl.c create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.c create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h -- 2.17.1 ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-02-12 19:43 [PATCH 0/4] Restartable Sequences support for glibc 2.30 Mathieu Desnoyers @ 2019-02-12 19:43 ` Mathieu Desnoyers 2019-03-22 20:09 ` Carlos O'Donell 2019-02-12 19:43 ` [PATCH 3/4] support record failure: allow use from constructor Mathieu Desnoyers ` (4 subsequent siblings) 5 siblings, 1 reply; 52+ messages in thread From: Mathieu Desnoyers @ 2019-02-12 19:43 UTC (permalink / raw) To: Carlos O'Donell Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Mathieu Desnoyers, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api Register rseq(2) TLS for each thread (including main), and unregister for each thread (excluding main). "rseq" stands for Restartable Sequences. See the rseq(2) man page proposed here: https://lkml.org/lkml/2018/9/19/647 This patch is based on glibc-2.29. The rseq(2) system call was merged into Linux 4.18. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Carlos O'Donell <carlos@redhat.com> CC: Florian Weimer <fweimer@redhat.com> CC: Joseph Myers <joseph@codesourcery.com> CC: Szabolcs Nagy <szabolcs.nagy@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ben Maurer <bmaurer@fb.com> CC: Peter Zijlstra <peterz@infradead.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Boqun Feng <boqun.feng@gmail.com> CC: Will Deacon <will.deacon@arm.com> CC: Dave Watson <davejwatson@fb.com> CC: Paul Turner <pjt@google.com> CC: Rich Felker <dalias@libc.org> CC: libc-alpha@sourceware.org CC: linux-kernel@vger.kernel.org CC: linux-api@vger.kernel.org --- Changes since v1: - Move __rseq_refcount to an extra field at the end of __rseq_abi to eliminate one symbol. All libraries/programs which try to register rseq (glibc, early-adopter applications, early-adopter libraries) should use the rseq refcount. It becomes part of the ABI within a user-space process, but it's not part of the ABI shared with the kernel per se. - Restructure how this code is organized so glibc keeps building on non-Linux targets. - Use non-weak symbol for __rseq_abi. - Move rseq registration/unregistration implementation into its own nptl/rseq.c compile unit. - Move __rseq_abi symbol under GLIBC_2.29. Changes since v2: - Move __rseq_refcount to its own symbol, which is less ugly than trying to play tricks with the rseq uapi. - Move __rseq_abi from nptl to csu (C start up), so it can be used across glibc, including memory allocator and sched_getcpu(). The __rseq_refcount symbol is kept in nptl, because there is no reason to use it elsewhere in glibc. Changes since v3: - Set __rseq_refcount TLS to 1 on register/set to 0 on unregister because glibc is the first/last user. - Unconditionally register/unregister rseq at thread start/exit, because glibc is the first/last user. - Add missing abilist items. - Rebase on glibc master commit a502c5294. - Add NEWS entry. Changes since v4: - Do not use "weak" symbols for __rseq_abi and __rseq_refcount. Based on "System V Application Binary Interface", weak only affects the link editor, not the dynamic linker. - Install a new sys/rseq.h system header on Linux, which contains the RSEQ_SIG definition, __rseq_abi declaration and __rseq_refcount declaration. Move those definition/declarations from rseq-internal.h to the installed sys/rseq.h header. - Considering that rseq is only available on Linux, move csu/rseq.c to sysdeps/unix/sysv/linux/rseq-sym.c. - Move __rseq_refcount from nptl/rseq.c to sysdeps/unix/sysv/linux/rseq-sym.c, so it is only defined on Linux. - Move both ABI definitions for __rseq_abi and __rseq_refcount to sysdeps/unix/sysv/linux/Versions, so they only appear on Linux. - Document __rseq_abi and __rseq_refcount volatile. - Document the RSEQ_SIG signature define. - Move registration functions from rseq.c to rseq-internal.h static inline functions. Introduce empty stubs in misc/rseq-internal.h, which can be overridden by architecture code in sysdeps/unix/sysv/linux/rseq-internal.h. - Rename __rseq_register_current_thread and __rseq_unregister_current_thread to rseq_register_current_thread and rseq_unregister_current_thread, now that those are only visible as internal static inline functions. - Invoke rseq_register_current_thread() from libc-start.c LIBC_START_MAIN rather than nptl init, so applications not linked against libpthread.so have rseq registered for their main() thread. Note that it is invoked separately for SHARED and !SHARED builds. Changes since v5: - Replace __rseq_refcount by __rseq_lib_abi, which contains two uint32_t: register_state and refcount. The "register_state" field allows inhibiting rseq registration from signal handlers nested on top of glibc registration and occuring after rseq unregistration by glibc. - Introduce enum rseq_register_state, which contains the states allowed for the struct rseq_lib_abi register_state field. Changes since v6: - Introduce bits/rseq.h to define RSEQ_SIG for each architecture. The generic bits/rseq.h does not define RSEQ_SIG, meaning that each architecture implementing rseq needs to implement bits/rseq.h. - Rename enum item RSEQ_REGISTER_NESTED to RSEQ_REGISTER_ONGOING. - Port to glibc-2.29. --- NEWS | 11 +++ csu/libc-start.c | 12 ++- misc/Makefile | 3 +- misc/rseq-internal.h | 34 +++++++ nptl/pthread_create.c | 9 ++ sysdeps/unix/sysv/linux/Makefile | 4 +- sysdeps/unix/sysv/linux/Versions | 4 + sysdeps/unix/sysv/linux/aarch64/bits/rseq.h | 24 +++++ sysdeps/unix/sysv/linux/aarch64/libc.abilist | 2 + sysdeps/unix/sysv/linux/alpha/libc.abilist | 2 + sysdeps/unix/sysv/linux/arm/bits/rseq.h | 24 +++++ sysdeps/unix/sysv/linux/arm/libc.abilist | 2 + sysdeps/unix/sysv/linux/bits/rseq.h | 24 +++++ sysdeps/unix/sysv/linux/hppa/libc.abilist | 2 + sysdeps/unix/sysv/linux/i386/libc.abilist | 2 + sysdeps/unix/sysv/linux/ia64/libc.abilist | 2 + .../sysv/linux/m68k/coldfire/libc.abilist | 2 + .../unix/sysv/linux/m68k/m680x0/libc.abilist | 2 + .../unix/sysv/linux/microblaze/libc.abilist | 2 + sysdeps/unix/sysv/linux/mips/bits/rseq.h | 24 +++++ .../sysv/linux/mips/mips32/fpu/libc.abilist | 2 + .../sysv/linux/mips/mips32/nofpu/libc.abilist | 2 + .../sysv/linux/mips/mips64/n32/libc.abilist | 2 + .../sysv/linux/mips/mips64/n64/libc.abilist | 2 + sysdeps/unix/sysv/linux/nios2/libc.abilist | 2 + sysdeps/unix/sysv/linux/powerpc/bits/rseq.h | 24 +++++ .../linux/powerpc/powerpc32/fpu/libc.abilist | 2 + .../powerpc/powerpc32/nofpu/libc.abilist | 2 + .../linux/powerpc/powerpc64/be/libc.abilist | 2 + .../linux/powerpc/powerpc64/le/libc.abilist | 2 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 2 + sysdeps/unix/sysv/linux/rseq-internal.h | 91 +++++++++++++++++++ sysdeps/unix/sysv/linux/rseq-sym.c | 54 +++++++++++ sysdeps/unix/sysv/linux/s390/bits/rseq.h | 24 +++++ .../unix/sysv/linux/s390/s390-32/libc.abilist | 2 + .../unix/sysv/linux/s390/s390-64/libc.abilist | 2 + sysdeps/unix/sysv/linux/sh/libc.abilist | 2 + .../sysv/linux/sparc/sparc32/libc.abilist | 2 + .../sysv/linux/sparc/sparc64/libc.abilist | 2 + sysdeps/unix/sysv/linux/sys/rseq.h | 65 +++++++++++++ sysdeps/unix/sysv/linux/x86/bits/rseq.h | 24 +++++ .../unix/sysv/linux/x86_64/64/libc.abilist | 2 + .../unix/sysv/linux/x86_64/x32/libc.abilist | 2 + 43 files changed, 501 insertions(+), 6 deletions(-) create mode 100644 misc/rseq-internal.h create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/arm/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/mips/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h create mode 100644 sysdeps/unix/sysv/linux/rseq-sym.c create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h diff --git a/NEWS b/NEWS index 912a9bdc0f..0608c60f7d 100644 --- a/NEWS +++ b/NEWS @@ -5,6 +5,17 @@ See the end for copying conditions. Please send GNU C library bug reports via <https://sourceware.org/bugzilla/> using `glibc' in the "product" field. \f +Version 2.30 + +Major new features: + +* Support for automatically registering threads with the Linux rseq(2) + system call has been added. This system call is implemented starting + from Linux 4.18. In order to be activated, it requires that glibc is built + against kernel headers that include this system call, and that glibc + detects availability of that system call at runtime. + +\f Version 2.29 Major new features: diff --git a/csu/libc-start.c b/csu/libc-start.c index 5d9c3675fa..8680afc0ef 100644 --- a/csu/libc-start.c +++ b/csu/libc-start.c @@ -22,6 +22,7 @@ #include <ldsodefs.h> #include <exit-thread.h> #include <libc-internal.h> +#include <rseq-internal.h> #include <elf/dl-tunables.h> @@ -140,7 +141,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), __libc_multiple_libcs = &_dl_starting_up && !_dl_starting_up; -#ifndef SHARED +#ifdef SHARED + /* Register rseq ABI to the kernel. */ + (void) rseq_register_current_thread (); +#else _dl_relocate_static_pie (); char **ev = &argv[argc + 1]; @@ -218,6 +222,9 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), } # endif + /* Register rseq ABI to the kernel. */ + (void) rseq_register_current_thread (); + /* Initialize libpthread if linked in. */ if (__pthread_initialize_minimal != NULL) __pthread_initialize_minimal (); @@ -230,8 +237,7 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), # else __pointer_chk_guard_local = pointer_chk_guard; # endif - -#endif /* !SHARED */ +#endif /* Register the destructor of the dynamic linker if there is any. */ if (__glibc_likely (rtld_fini != NULL)) diff --git a/misc/Makefile b/misc/Makefile index cf0daa1161..0ae1dbaf80 100644 --- a/misc/Makefile +++ b/misc/Makefile @@ -36,7 +36,8 @@ headers := sys/uio.h bits/uio-ext.h bits/uio_lim.h \ syslog.h sys/syslog.h \ bits/syslog.h bits/syslog-ldbl.h bits/syslog-path.h bits/error.h \ bits/select2.h bits/hwcap.h sys/auxv.h \ - sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h + sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h \ + rseq-internal.h routines := brk sbrk sstk ioctl \ readv writev preadv preadv64 pwritev pwritev64 \ diff --git a/misc/rseq-internal.h b/misc/rseq-internal.h new file mode 100644 index 0000000000..915122e4bf --- /dev/null +++ b/misc/rseq-internal.h @@ -0,0 +1,34 @@ +/* Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef RSEQ_INTERNAL_H +#define RSEQ_INTERNAL_H + +static inline int +rseq_register_current_thread (void) +{ + return -1; +} + +static inline int +rseq_unregister_current_thread (void) +{ + return -1; +} + +#endif /* rseq-internal.h */ diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c index 2bd2b10727..90b3419390 100644 --- a/nptl/pthread_create.c +++ b/nptl/pthread_create.c @@ -33,6 +33,7 @@ #include <default-sched.h> #include <futex-internal.h> #include <tls-setup.h> +#include <rseq-internal.h> #include "libioP.h" #include <shlib-compat.h> @@ -378,6 +379,7 @@ __free_tcb (struct pthread *pd) START_THREAD_DEFN { struct pthread *pd = START_THREAD_SELF; + bool has_rseq = false; #if HP_TIMING_AVAIL /* Remember the time when the thread was started. */ @@ -396,6 +398,9 @@ START_THREAD_DEFN if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0) == -2)) futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE); + /* Register rseq TLS to the kernel. */ + has_rseq = !rseq_register_current_thread (); + #ifdef __NR_set_robust_list # ifndef __ASSUME_SET_ROBUST_LIST if (__set_robust_list_avail >= 0) @@ -573,6 +578,10 @@ START_THREAD_DEFN } #endif + /* Unregister rseq TLS from kernel. */ + if (has_rseq && rseq_unregister_current_thread ()) + abort(); + advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd, pd->guardsize); diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile index 5f8c2c7c7d..5b541469ec 100644 --- a/sysdeps/unix/sysv/linux/Makefile +++ b/sysdeps/unix/sysv/linux/Makefile @@ -1,5 +1,5 @@ ifeq ($(subdir),csu) -sysdep_routines += errno-loc +sysdep_routines += errno-loc rseq-sym endif ifeq ($(subdir),assert) @@ -48,7 +48,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \ bits/termios-c_iflag.h bits/termios-c_oflag.h \ bits/termios-baud.h bits/termios-c_cflag.h \ bits/termios-c_lflag.h bits/termios-tcflow.h \ - bits/termios-misc.h + bits/termios-misc.h sys/rseq.h bits/rseq.h tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \ tst-quota tst-sync_file_range tst-sysconf-iov_max tst-ttyname \ diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions index f1e12d9c69..ad88c2b7ff 100644 --- a/sysdeps/unix/sysv/linux/Versions +++ b/sysdeps/unix/sysv/linux/Versions @@ -174,6 +174,10 @@ libc { GLIBC_2.29 { getcpu; } + GLIBC_2.30 { + __rseq_abi; + __rseq_lib_abi; + } GLIBC_PRIVATE { # functions used in other libraries __syscall_rt_sigqueueinfo; diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h new file mode 100644 index 0000000000..543bc5388a --- /dev/null +++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Signature required before each abort handler code. */ +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0. */ diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist index 9c330f325e..bc937f585d 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist @@ -2141,3 +2141,5 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist index f630fa4c6f..89cc8b1cfb 100644 --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist @@ -2036,6 +2036,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/arm/bits/rseq.h b/sysdeps/unix/sysv/linux/arm/bits/rseq.h new file mode 100644 index 0000000000..19d3755837 --- /dev/null +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Signature required before each abort handler code. */ +#define RSEQ_SIG 0x53053053 diff --git a/sysdeps/unix/sysv/linux/arm/libc.abilist b/sysdeps/unix/sysv/linux/arm/libc.abilist index b96f45590f..e5055f2d4e 100644 --- a/sysdeps/unix/sysv/linux/arm/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/libc.abilist @@ -126,6 +126,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.4 _Exit F GLIBC_2.4 _IO_2_1_stderr_ D 0xa0 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0 diff --git a/sysdeps/unix/sysv/linux/bits/rseq.h b/sysdeps/unix/sysv/linux/bits/rseq.h new file mode 100644 index 0000000000..d60f02cfeb --- /dev/null +++ b/sysdeps/unix/sysv/linux/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Each architecture supporting rseq should define RSEQ_SIG as a 32-bit + signature inserted before each rseq abort label in the code section. */ diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist index 088a8ee369..546d073cdb 100644 --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist @@ -1883,6 +1883,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist index f7ff2c57b9..ac1de6e4b3 100644 --- a/sysdeps/unix/sysv/linux/i386/libc.abilist +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist @@ -2048,6 +2048,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist index becd8b1033..cc3445b958 100644 --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist @@ -1917,6 +1917,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist index 74e42a5209..f7e28bd5a0 100644 --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist @@ -127,6 +127,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.4 _Exit F GLIBC_2.4 _IO_2_1_stderr_ D 0x98 GLIBC_2.4 _IO_2_1_stdin_ D 0x98 diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist index 4af5a74e8a..b8f00f6111 100644 --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist @@ -1992,6 +1992,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/microblaze/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/libc.abilist index ccef673fd2..19f191434f 100644 --- a/sysdeps/unix/sysv/linux/microblaze/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/libc.abilist @@ -2133,3 +2133,5 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 diff --git a/sysdeps/unix/sysv/linux/mips/bits/rseq.h b/sysdeps/unix/sysv/linux/mips/bits/rseq.h new file mode 100644 index 0000000000..19d3755837 --- /dev/null +++ b/sysdeps/unix/sysv/linux/mips/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Signature required before each abort handler code. */ +#define RSEQ_SIG 0x53053053 diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist index 1054bb599e..fe43507f55 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist @@ -1970,6 +1970,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist index 4f5b5ffebf..b247c6ea9b 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist @@ -1968,6 +1968,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist index 943aee58d4..5339ca52b6 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist @@ -1976,6 +1976,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist index 17a5d17ef9..11f24eb440 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist @@ -1971,6 +1971,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist index 4d62a540fd..fd223bfc44 100644 --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist @@ -2174,3 +2174,5 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 diff --git a/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h new file mode 100644 index 0000000000..19d3755837 --- /dev/null +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Signature required before each abort handler code. */ +#define RSEQ_SIG 0x53053053 diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist index ecc2d6fa13..cc53178e81 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist @@ -1996,6 +1996,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist index f5830f9c33..2de3134bc7 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist @@ -2000,6 +2000,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist index 633d8f4792..aae3def700 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist @@ -126,6 +126,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 _Exit F GLIBC_2.3 _IO_2_1_stderr_ D 0xe0 GLIBC_2.3 _IO_2_1_stdin_ D 0xe0 diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist index 2c712636ef..8d582a3a9b 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist @@ -2231,3 +2231,5 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist index 195bc8b2cf..155953f6cf 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist @@ -2103,3 +2103,5 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h new file mode 100644 index 0000000000..d676da3701 --- /dev/null +++ b/sysdeps/unix/sysv/linux/rseq-internal.h @@ -0,0 +1,91 @@ +/* Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef RSEQ_INTERNAL_H +#define RSEQ_INTERNAL_H + +#include <sysdep.h> + +#ifdef __NR_rseq + +#include <errno.h> +#include <sys/rseq.h> + +static inline int +rseq_register_current_thread (void) +{ + int rc, ret = 0; + INTERNAL_SYSCALL_DECL (err); + + if (__rseq_abi.cpu_id == RSEQ_CPU_ID_REGISTRATION_FAILED) + return -1; + /* Temporarily prevent nested signal handlers from registering rseq. */ + __rseq_lib_abi.register_state = RSEQ_REGISTER_ONGOING; + if (__rseq_lib_abi.refcount == UINT_MAX) + { + ret = -1; + goto end; + } + if (__rseq_lib_abi.refcount++) + goto end; + rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq), + 0, RSEQ_SIG); + if (!rc) + goto end; + if (INTERNAL_SYSCALL_ERRNO (rc, err) != EBUSY) + __rseq_abi.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED; + ret = -1; +end: + __rseq_lib_abi.register_state = RSEQ_REGISTER_ALLOWED; + return ret; +} + +static inline int +rseq_unregister_current_thread (void) +{ + int rc, ret = 0; + INTERNAL_SYSCALL_DECL (err); + + /* Setting __rseq_register_state = RSEQ_REGISTER_EXITING for the rest of the + thread lifetime. Ensures signal handlers nesting just before thread exit + don't try to register rseq. */ + __rseq_lib_abi.register_state = RSEQ_REGISTER_EXITING; + __rseq_lib_abi.refcount = 0; + rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq), + RSEQ_FLAG_UNREGISTER, RSEQ_SIG); + if (!rc) + goto end; + ret = -1; +end: + return ret; +} +#else +static inline int +rseq_register_current_thread (void) +{ + return -1; +} + +static inline int +rseq_unregister_current_thread (void) +{ + return -1; +} +#endif + +#endif /* rseq-internal.h */ diff --git a/sysdeps/unix/sysv/linux/rseq-sym.c b/sysdeps/unix/sysv/linux/rseq-sym.c new file mode 100644 index 0000000000..99b277e9d6 --- /dev/null +++ b/sysdeps/unix/sysv/linux/rseq-sym.c @@ -0,0 +1,54 @@ +/* Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <sys/syscall.h> +#include <stdint.h> + +#ifdef __NR_rseq +#include <sys/rseq.h> +#else + +enum rseq_cpu_id_state { + RSEQ_CPU_ID_UNINITIALIZED = -1, + RSEQ_CPU_ID_REGISTRATION_FAILED = -2, +}; + +/* linux/rseq.h defines struct rseq as aligned on 32 bytes. The kernel ABI + size is 20 bytes. */ +struct rseq { + uint32_t cpu_id_start; + uint32_t cpu_id; + uint64_t rseq_cs; + uint32_t flags; +} __attribute__ ((aligned(4 * sizeof(uint64_t)))); + +struct rseq_lib_abi +{ + uint32_t register_state; + uint32_t refcount; +}; + +#endif + +/* volatile because fields can be read/updated by the kernel. */ +__thread volatile struct rseq __rseq_abi = { + .cpu_id = RSEQ_CPU_ID_UNINITIALIZED, +}; + +/* volatile because fields can be read/updated by signal handlers. */ +__thread volatile struct rseq_lib_abi __rseq_lib_abi; diff --git a/sysdeps/unix/sysv/linux/s390/bits/rseq.h b/sysdeps/unix/sysv/linux/s390/bits/rseq.h new file mode 100644 index 0000000000..19d3755837 --- /dev/null +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Signature required before each abort handler code. */ +#define RSEQ_SIG 0x53053053 diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist index 334def033c..42316d8666 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist @@ -2005,6 +2005,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist index 536f4c4ced..c6c4e55a77 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist @@ -1911,6 +1911,8 @@ GLIBC_2.29 __fentry__ F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/sh/libc.abilist b/sysdeps/unix/sysv/linux/sh/libc.abilist index 30ae3b6ebb..8652dfea59 100644 --- a/sysdeps/unix/sysv/linux/sh/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/libc.abilist @@ -1887,6 +1887,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist index 68b107d080..95b58dfa67 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist @@ -1999,6 +1999,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist index e5b6a4da50..bfd24f9d1c 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist @@ -1940,6 +1940,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/sys/rseq.h b/sysdeps/unix/sysv/linux/sys/rseq.h new file mode 100644 index 0000000000..83c8976f50 --- /dev/null +++ b/sysdeps/unix/sysv/linux/sys/rseq.h @@ -0,0 +1,65 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +#define _SYS_RSEQ_H 1 + +/* We use the structures declarations from the kernel headers. */ +#include <linux/rseq.h> +/* Architecture-specific rseq signature. */ +#include <bits/rseq.h> +#include <stdint.h> + +enum rseq_register_state +{ + /* Value RSEQ_REGISTER_ALLOWED means it is allowed to update + the refcount field and to register/unregister rseq. */ + RSEQ_REGISTER_ALLOWED = 0, + /* Value RSEQ_REGISTER_ONGOING means a rseq registration is in progress, + so it is temporarily forbidden to update the refcount field or to + register/unregister rseq for this thread or signal handlers nested + on this thread. */ + RSEQ_REGISTER_ONGOING = 1, + /* Value RSEQ_REGISTER_EXITING means it is forbidden to update the + refcount field or to register/unregister rseq for the rest of the + thread's lifetime. */ + RSEQ_REGISTER_EXITING = 2, +}; + +struct rseq_lib_abi +{ + uint32_t register_state; /* enum rseq_register_state. */ + /* The refcount field keeps track of rseq users, so early adopters + of rseq can cooperate amongst each other and with glibc to + share rseq thread registration. The refcount field can only be + updated when allowed by the value of field register_state. + Registering rseq should be performed when incrementing refcount + from 0 to 1, and unregistering rseq should be performed when + decrementing refcount from 1 to 0. */ + uint32_t refcount; +}; + +/* volatile because fields can be read/updated by the kernel. */ +extern __thread volatile struct rseq __rseq_abi +__attribute__ ((tls_model ("initial-exec"))); + +/* volatile because fields can be read/updated by signal handlers. */ +extern __thread volatile struct rseq_lib_abi __rseq_lib_abi +__attribute__ ((tls_model ("initial-exec"))); + +#endif /* sys/rseq.h */ diff --git a/sysdeps/unix/sysv/linux/x86/bits/rseq.h b/sysdeps/unix/sysv/linux/x86/bits/rseq.h new file mode 100644 index 0000000000..19d3755837 --- /dev/null +++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h @@ -0,0 +1,24 @@ +/* Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _SYS_RSEQ_H +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." +#endif + +/* Signature required before each abort handler code. */ +#define RSEQ_SIG 0x53053053 diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist index 86dfb0c94d..e9f8411fb2 100644 --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist @@ -1898,6 +1898,8 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 GLIBC_2.3 __ctype_b_loc F GLIBC_2.3 __ctype_tolower_loc F GLIBC_2.3 __ctype_toupper_loc F diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist index dd688263aa..f9432d07f1 100644 --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist @@ -2149,3 +2149,5 @@ GLIBC_2.28 thrd_yield F GLIBC_2.29 getcpu F GLIBC_2.29 posix_spawn_file_actions_addchdir_np F GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F +GLIBC_2.30 __rseq_abi T 0x20 +GLIBC_2.30 __rseq_lib_abi T 0x8 -- 2.17.1 ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-02-12 19:43 ` [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) Mathieu Desnoyers @ 2019-03-22 20:09 ` Carlos O'Donell 2019-03-25 15:54 ` Mathieu Desnoyers 0 siblings, 1 reply; 52+ messages in thread From: Carlos O'Donell @ 2019-03-22 20:09 UTC (permalink / raw) To: Mathieu Desnoyers, Carlos O'Donell Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api On 2/12/19 2:42 PM, Mathieu Desnoyers wrote: > Register rseq(2) TLS for each thread (including main), and unregister > for each thread (excluding main). "rseq" stands for Restartable > Sequences. Thanks, the implementation is looking good, before this goes in it needs some procedural cleanup and the following: (a) I would like to see a final position from Florian, either accept, object (objection recorded), or sustained objection (blocks this patch), on the issue of the registration protocol. I suggested a possible way forward to remove the registration protocol down the line, leaving only the __rseq_lib_abi as the residual artifact once we deprecate coordination with other libraries. Applications would have to be written with this in mind, and coordination still remains with libc for the time being. Perhaps Mathieu has another suggestion. (b) I would like to see a final position from Rich Felker, either accept, object (objection recorded), or sustained objection (blocks this patch), on the patch set as a whole. I trust Rich's opinion as an alternative libc maintainer in this matter and as a independent implementor of similar standards. Rich will have to implement this for musl, and it would be good to have consensus from him. So while I reviewed this patch, and provided nit-picky feedback below, I think we need a whole new thread, to finalize the way forward for the registration process. Next steps: - Start a new thread to talk *only* about options for removing the refcounted registration. > See the rseq(2) man page proposed here: > https://lkml.org/lkml/2018/9/19/647 Thank you for having a man page, that makes for a very easy to follow description of the structures, syscall and use cases. > > This patch is based on glibc-2.29. The rseq(2) system call was merged > into Linux 4.18. > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > CC: Carlos O'Donell <carlos@redhat.com> > CC: Florian Weimer <fweimer@redhat.com> > CC: Joseph Myers <joseph@codesourcery.com> > CC: Szabolcs Nagy <szabolcs.nagy@arm.com> > CC: Thomas Gleixner <tglx@linutronix.de> > CC: Ben Maurer <bmaurer@fb.com> > CC: Peter Zijlstra <peterz@infradead.org> > CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > CC: Boqun Feng <boqun.feng@gmail.com> > CC: Will Deacon <will.deacon@arm.com> > CC: Dave Watson <davejwatson@fb.com> > CC: Paul Turner <pjt@google.com> > CC: Rich Felker <dalias@libc.org> > CC: libc-alpha@sourceware.org > CC: linux-kernel@vger.kernel.org > CC: linux-api@vger.kernel.org > --- > Changes since v1: > - Move __rseq_refcount to an extra field at the end of __rseq_abi to > eliminate one symbol. > > All libraries/programs which try to register rseq (glibc, > early-adopter applications, early-adopter libraries) should use the > rseq refcount. It becomes part of the ABI within a user-space > process, but it's not part of the ABI shared with the kernel per se. > > - Restructure how this code is organized so glibc keeps building on > non-Linux targets. > > - Use non-weak symbol for __rseq_abi. > > - Move rseq registration/unregistration implementation into its own > nptl/rseq.c compile unit. > > - Move __rseq_abi symbol under GLIBC_2.29. > > Changes since v2: > - Move __rseq_refcount to its own symbol, which is less ugly than > trying to play tricks with the rseq uapi. > - Move __rseq_abi from nptl to csu (C start up), so it can be used > across glibc, including memory allocator and sched_getcpu(). The > __rseq_refcount symbol is kept in nptl, because there is no reason > to use it elsewhere in glibc. > > Changes since v3: > - Set __rseq_refcount TLS to 1 on register/set to 0 on unregister > because glibc is the first/last user. > - Unconditionally register/unregister rseq at thread start/exit, because > glibc is the first/last user. > - Add missing abilist items. > - Rebase on glibc master commit a502c5294. > - Add NEWS entry. > > Changes since v4: > - Do not use "weak" symbols for __rseq_abi and __rseq_refcount. Based on > "System V Application Binary Interface", weak only affects the link > editor, not the dynamic linker. > - Install a new sys/rseq.h system header on Linux, which contains the > RSEQ_SIG definition, __rseq_abi declaration and __rseq_refcount > declaration. Move those definition/declarations from rseq-internal.h > to the installed sys/rseq.h header. > - Considering that rseq is only available on Linux, move csu/rseq.c to > sysdeps/unix/sysv/linux/rseq-sym.c. > - Move __rseq_refcount from nptl/rseq.c to > sysdeps/unix/sysv/linux/rseq-sym.c, so it is only defined on Linux. > - Move both ABI definitions for __rseq_abi and __rseq_refcount to > sysdeps/unix/sysv/linux/Versions, so they only appear on Linux. > - Document __rseq_abi and __rseq_refcount volatile. > - Document the RSEQ_SIG signature define. > - Move registration functions from rseq.c to rseq-internal.h static > inline functions. Introduce empty stubs in misc/rseq-internal.h, > which can be overridden by architecture code in > sysdeps/unix/sysv/linux/rseq-internal.h. > - Rename __rseq_register_current_thread and __rseq_unregister_current_thread > to rseq_register_current_thread and rseq_unregister_current_thread, > now that those are only visible as internal static inline functions. > - Invoke rseq_register_current_thread() from libc-start.c LIBC_START_MAIN > rather than nptl init, so applications not linked against > libpthread.so have rseq registered for their main() thread. Note that > it is invoked separately for SHARED and !SHARED builds. > > Changes since v5: > - Replace __rseq_refcount by __rseq_lib_abi, which contains two > uint32_t: register_state and refcount. The "register_state" field > allows inhibiting rseq registration from signal handlers nested on top > of glibc registration and occuring after rseq unregistration by glibc. > - Introduce enum rseq_register_state, which contains the states allowed > for the struct rseq_lib_abi register_state field. > > Changes since v6: > - Introduce bits/rseq.h to define RSEQ_SIG for each architecture. > The generic bits/rseq.h does not define RSEQ_SIG, meaning that each > architecture implementing rseq needs to implement bits/rseq.h. > - Rename enum item RSEQ_REGISTER_NESTED to RSEQ_REGISTER_ONGOING. > - Port to glibc-2.29. > --- > NEWS | 11 +++ > csu/libc-start.c | 12 ++- > misc/Makefile | 3 +- > misc/rseq-internal.h | 34 +++++++ > nptl/pthread_create.c | 9 ++ > sysdeps/unix/sysv/linux/Makefile | 4 +- > sysdeps/unix/sysv/linux/Versions | 4 + > sysdeps/unix/sysv/linux/aarch64/bits/rseq.h | 24 +++++ > sysdeps/unix/sysv/linux/aarch64/libc.abilist | 2 + > sysdeps/unix/sysv/linux/alpha/libc.abilist | 2 + > sysdeps/unix/sysv/linux/arm/bits/rseq.h | 24 +++++ > sysdeps/unix/sysv/linux/arm/libc.abilist | 2 + > sysdeps/unix/sysv/linux/bits/rseq.h | 24 +++++ > sysdeps/unix/sysv/linux/hppa/libc.abilist | 2 + > sysdeps/unix/sysv/linux/i386/libc.abilist | 2 + > sysdeps/unix/sysv/linux/ia64/libc.abilist | 2 + > .../sysv/linux/m68k/coldfire/libc.abilist | 2 + > .../unix/sysv/linux/m68k/m680x0/libc.abilist | 2 + > .../unix/sysv/linux/microblaze/libc.abilist | 2 + > sysdeps/unix/sysv/linux/mips/bits/rseq.h | 24 +++++ > .../sysv/linux/mips/mips32/fpu/libc.abilist | 2 + > .../sysv/linux/mips/mips32/nofpu/libc.abilist | 2 + > .../sysv/linux/mips/mips64/n32/libc.abilist | 2 + > .../sysv/linux/mips/mips64/n64/libc.abilist | 2 + > sysdeps/unix/sysv/linux/nios2/libc.abilist | 2 + > sysdeps/unix/sysv/linux/powerpc/bits/rseq.h | 24 +++++ > .../linux/powerpc/powerpc32/fpu/libc.abilist | 2 + > .../powerpc/powerpc32/nofpu/libc.abilist | 2 + > .../linux/powerpc/powerpc64/be/libc.abilist | 2 + > .../linux/powerpc/powerpc64/le/libc.abilist | 2 + > .../unix/sysv/linux/riscv/rv64/libc.abilist | 2 + > sysdeps/unix/sysv/linux/rseq-internal.h | 91 +++++++++++++++++++ > sysdeps/unix/sysv/linux/rseq-sym.c | 54 +++++++++++ > sysdeps/unix/sysv/linux/s390/bits/rseq.h | 24 +++++ > .../unix/sysv/linux/s390/s390-32/libc.abilist | 2 + > .../unix/sysv/linux/s390/s390-64/libc.abilist | 2 + > sysdeps/unix/sysv/linux/sh/libc.abilist | 2 + > .../sysv/linux/sparc/sparc32/libc.abilist | 2 + > .../sysv/linux/sparc/sparc64/libc.abilist | 2 + > sysdeps/unix/sysv/linux/sys/rseq.h | 65 +++++++++++++ > sysdeps/unix/sysv/linux/x86/bits/rseq.h | 24 +++++ > .../unix/sysv/linux/x86_64/64/libc.abilist | 2 + > .../unix/sysv/linux/x86_64/x32/libc.abilist | 2 + > 43 files changed, 501 insertions(+), 6 deletions(-) > create mode 100644 misc/rseq-internal.h > create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/arm/bits/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/mips/bits/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h > create mode 100644 sysdeps/unix/sysv/linux/rseq-sym.c > create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h > create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h > > diff --git a/NEWS b/NEWS > index 912a9bdc0f..0608c60f7d 100644 > --- a/NEWS > +++ b/NEWS > @@ -5,6 +5,17 @@ See the end for copying conditions. > Please send GNU C library bug reports via <https://sourceware.org/bugzilla/> > using `glibc' in the "product" field. > \f > +Version 2.30 > + > +Major new features: > + > +* Support for automatically registering threads with the Linux rseq(2) > + system call has been added. This system call is implemented starting > + from Linux 4.18. In order to be activated, it requires that glibc is built > + against kernel headers that include this system call, and that glibc > + detects availability of that system call at runtime. What benefit does the feature have for users? Can you talk about that please? Why would a user want to use it. Please feel free to link to an external reference. > + > +\f > Version 2.29 > > Major new features: > diff --git a/csu/libc-start.c b/csu/libc-start.c > index 5d9c3675fa..8680afc0ef 100644 > --- a/csu/libc-start.c > +++ b/csu/libc-start.c > @@ -22,6 +22,7 @@ > #include <ldsodefs.h> > #include <exit-thread.h> > #include <libc-internal.h> > +#include <rseq-internal.h> OK. > > #include <elf/dl-tunables.h> > > @@ -140,7 +141,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), > > __libc_multiple_libcs = &_dl_starting_up && !_dl_starting_up; > > -#ifndef SHARED > +#ifdef SHARED > + /* Register rseq ABI to the kernel. */ > + (void) rseq_register_current_thread (); OK, early registration of main thread in shared case. > +#else > _dl_relocate_static_pie (); > > char **ev = &argv[argc + 1]; > @@ -218,6 +222,9 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), > } > # endif > > + /* Register rseq ABI to the kernel. */ > + (void) rseq_register_current_thread (); OK, early registration of main thread in static case. > + > /* Initialize libpthread if linked in. */ > if (__pthread_initialize_minimal != NULL) > __pthread_initialize_minimal (); > @@ -230,8 +237,7 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), > # else > __pointer_chk_guard_local = pointer_chk_guard; > # endif > - > -#endif /* !SHARED */ > +#endif OK. > > /* Register the destructor of the dynamic linker if there is any. */ > if (__glibc_likely (rtld_fini != NULL)) > diff --git a/misc/Makefile b/misc/Makefile > index cf0daa1161..0ae1dbaf80 100644 > --- a/misc/Makefile > +++ b/misc/Makefile > @@ -36,7 +36,8 @@ headers := sys/uio.h bits/uio-ext.h bits/uio_lim.h \ > syslog.h sys/syslog.h \ > bits/syslog.h bits/syslog-ldbl.h bits/syslog-path.h bits/error.h \ > bits/select2.h bits/hwcap.h sys/auxv.h \ > - sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h > + sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h \ > + rseq-internal.h OK. > > routines := brk sbrk sstk ioctl \ > readv writev preadv preadv64 pwritev pwritev64 \ > diff --git a/misc/rseq-internal.h b/misc/rseq-internal.h > new file mode 100644 > index 0000000000..915122e4bf > --- /dev/null > +++ b/misc/rseq-internal.h > @@ -0,0 +1,34 @@ > +/* Copyright (C) 2018 Free Software Foundation, Inc. Add a one line description before copyright which describes the file. Is it a stub file. Is it the Linux implementation. etc. Please add a one sentence first line short description for all new files you create. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. We don't use "contributed by" markers anymore, we let the git commit author carry that information, along with git blame information. Please remove this contributed by line and all subsequent such lines. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef RSEQ_INTERNAL_H > +#define RSEQ_INTERNAL_H > + > +static inline int > +rseq_register_current_thread (void) > +{ > + return -1; > +} > + > +static inline int > +rseq_unregister_current_thread (void) > +{ > + return -1; > +} > + OK. > +#endif /* rseq-internal.h */ > diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c > index 2bd2b10727..90b3419390 100644 > --- a/nptl/pthread_create.c > +++ b/nptl/pthread_create.c > @@ -33,6 +33,7 @@ > #include <default-sched.h> > #include <futex-internal.h> > #include <tls-setup.h> > +#include <rseq-internal.h> Ok. > #include "libioP.h" > > #include <shlib-compat.h> > @@ -378,6 +379,7 @@ __free_tcb (struct pthread *pd) > START_THREAD_DEFN > { > struct pthread *pd = START_THREAD_SELF; > + bool has_rseq = false; OK. > > #if HP_TIMING_AVAIL > /* Remember the time when the thread was started. */ > @@ -396,6 +398,9 @@ START_THREAD_DEFN > if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0) == -2)) > futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE); > > + /* Register rseq TLS to the kernel. */ > + has_rseq = !rseq_register_current_thread (); OK, initialize rseq after enabling setxid but before setting up roubust list, no particularly reason it should be right here that I can see, that's OK though. > + > #ifdef __NR_set_robust_list > # ifndef __ASSUME_SET_ROBUST_LIST > if (__set_robust_list_avail >= 0) > @@ -573,6 +578,10 @@ START_THREAD_DEFN > } > #endif > > + /* Unregister rseq TLS from kernel. */ > + if (has_rseq && rseq_unregister_current_thread ()) > + abort(); OK. Calling abort() seems harsh because it kills the whole process, but I'm OK with failing catastrophically if that's what we want here. It seems unrecoverable. > + > advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd, > pd->guardsize); > > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile > index 5f8c2c7c7d..5b541469ec 100644 > --- a/sysdeps/unix/sysv/linux/Makefile > +++ b/sysdeps/unix/sysv/linux/Makefile > @@ -1,5 +1,5 @@ > ifeq ($(subdir),csu) > -sysdep_routines += errno-loc > +sysdep_routines += errno-loc rseq-sym OK. > endif > > ifeq ($(subdir),assert) > @@ -48,7 +48,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \ > bits/termios-c_iflag.h bits/termios-c_oflag.h \ > bits/termios-baud.h bits/termios-c_cflag.h \ > bits/termios-c_lflag.h bits/termios-tcflow.h \ > - bits/termios-misc.h > + bits/termios-misc.h sys/rseq.h bits/rseq.h OK. > > tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \ > tst-quota tst-sync_file_range tst-sysconf-iov_max tst-ttyname \ > diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions > index f1e12d9c69..ad88c2b7ff 100644 > --- a/sysdeps/unix/sysv/linux/Versions > +++ b/sysdeps/unix/sysv/linux/Versions > @@ -174,6 +174,10 @@ libc { > GLIBC_2.29 { > getcpu; > } > + GLIBC_2.30 { > + __rseq_abi; > + __rseq_lib_abi; > + } OK, good using GLIBC_2.30 for master. > GLIBC_PRIVATE { > # functions used in other libraries > __syscall_rt_sigqueueinfo; > diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h > new file mode 100644 > index 0000000000..543bc5388a > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h > @@ -0,0 +1,24 @@ Needs a one sentence description. > +/* Copyright (C) 2019 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Signature required before each abort handler code. */ > +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0. */ OK. > diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist > index 9c330f325e..bc937f585d 100644 > --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist > @@ -2141,3 +2141,5 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist > index f630fa4c6f..89cc8b1cfb 100644 > --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist > +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist > @@ -2036,6 +2036,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/arm/bits/rseq.h b/sysdeps/unix/sysv/linux/arm/bits/rseq.h > new file mode 100644 > index 0000000000..19d3755837 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h > @@ -0,0 +1,24 @@ Needs a one sentence description. > +/* Copyright (C) 2019 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Signature required before each abort handler code. */ > +#define RSEQ_SIG 0x53053053 Why isn't this an arm specific op code? Does the user have to mark this up as part of a constant pool when placing it in front of the abort handler to avoid disassembling the constant as code? This was at one point required to get gdb to work properly. > diff --git a/sysdeps/unix/sysv/linux/arm/libc.abilist b/sysdeps/unix/sysv/linux/arm/libc.abilist > index b96f45590f..e5055f2d4e 100644 > --- a/sysdeps/unix/sysv/linux/arm/libc.abilist > +++ b/sysdeps/unix/sysv/linux/arm/libc.abilist > @@ -126,6 +126,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.4 _Exit F > GLIBC_2.4 _IO_2_1_stderr_ D 0xa0 > GLIBC_2.4 _IO_2_1_stdin_ D 0xa0 > diff --git a/sysdeps/unix/sysv/linux/bits/rseq.h b/sysdeps/unix/sysv/linux/bits/rseq.h > new file mode 100644 > index 0000000000..d60f02cfeb > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/bits/rseq.h > @@ -0,0 +1,24 @@ > +/* Copyright (C) 2019 Free Software Foundation, Inc. Needs a one sentence description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Each architecture supporting rseq should define RSEQ_SIG as a 32-bit > + signature inserted before each rseq abort label in the code section. */ Needs a huge explanation about RSEQ_SIG and the reasons why it's per-arch. Basically cut-and-paste what you wrote to libc-alpha about this. > diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist > index 088a8ee369..546d073cdb 100644 > --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist > +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist > @@ -1883,6 +1883,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist > index f7ff2c57b9..ac1de6e4b3 100644 > --- a/sysdeps/unix/sysv/linux/i386/libc.abilist > +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist > @@ -2048,6 +2048,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist > index becd8b1033..cc3445b958 100644 > --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist > @@ -1917,6 +1917,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist > index 74e42a5209..f7e28bd5a0 100644 > --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist > +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist > @@ -127,6 +127,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.4 _Exit F > GLIBC_2.4 _IO_2_1_stderr_ D 0x98 > GLIBC_2.4 _IO_2_1_stdin_ D 0x98 > diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist > index 4af5a74e8a..b8f00f6111 100644 > --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist > +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist > @@ -1992,6 +1992,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/microblaze/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/libc.abilist > index ccef673fd2..19f191434f 100644 > --- a/sysdeps/unix/sysv/linux/microblaze/libc.abilist > +++ b/sysdeps/unix/sysv/linux/microblaze/libc.abilist > @@ -2133,3 +2133,5 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > diff --git a/sysdeps/unix/sysv/linux/mips/bits/rseq.h b/sysdeps/unix/sysv/linux/mips/bits/rseq.h > new file mode 100644 > index 0000000000..19d3755837 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/mips/bits/rseq.h > @@ -0,0 +1,24 @@ > +/* Copyright (C) 2019 Free Software Foundation, Inc. Needs a one sentence description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Signature required before each abort handler code. */ > +#define RSEQ_SIG 0x53053053 Why isn't this a mips-specific op code? > diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist > index 1054bb599e..fe43507f55 100644 > --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist > +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist > @@ -1970,6 +1970,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist > index 4f5b5ffebf..b247c6ea9b 100644 > --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist > +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist > @@ -1968,6 +1968,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist > index 943aee58d4..5339ca52b6 100644 > --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist > +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist > @@ -1976,6 +1976,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist > index 17a5d17ef9..11f24eb440 100644 > --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist > @@ -1971,6 +1971,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist > index 4d62a540fd..fd223bfc44 100644 > --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist > +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist > @@ -2174,3 +2174,5 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > diff --git a/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h > new file mode 100644 > index 0000000000..19d3755837 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h > @@ -0,0 +1,24 @@ > +/* Copyright (C) 2019 Free Software Foundation, Inc. Needs a one sentence description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Signature required before each abort handler code. */ > +#define RSEQ_SIG 0x53053053 Why isn't this an opcode specific to power? > diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist > index ecc2d6fa13..cc53178e81 100644 > --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist > +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist > @@ -1996,6 +1996,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist > index f5830f9c33..2de3134bc7 100644 > --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist > +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist > @@ -2000,6 +2000,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist > index 633d8f4792..aae3def700 100644 > --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist > +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist > @@ -126,6 +126,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 _Exit F > GLIBC_2.3 _IO_2_1_stderr_ D 0xe0 > GLIBC_2.3 _IO_2_1_stdin_ D 0xe0 > diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist > index 2c712636ef..8d582a3a9b 100644 > --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist > +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist > @@ -2231,3 +2231,5 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist > index 195bc8b2cf..155953f6cf 100644 > --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist > @@ -2103,3 +2103,5 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h > new file mode 100644 > index 0000000000..d676da3701 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/rseq-internal.h > @@ -0,0 +1,91 @@ > +/* Copyright (C) 2018 Free Software Foundation, Inc. Needs a one sentence description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef RSEQ_INTERNAL_H > +#define RSEQ_INTERNAL_H > + > +#include <sysdep.h> > + > +#ifdef __NR_rseq > + > +#include <errno.h> > +#include <sys/rseq.h> > + This needs a giant comment describing the reference count algorithm and how it is intended to work. > +static inline int > +rseq_register_current_thread (void) > +{ > + int rc, ret = 0; > + INTERNAL_SYSCALL_DECL (err); > + > + if (__rseq_abi.cpu_id == RSEQ_CPU_ID_REGISTRATION_FAILED) > + return -1; > + /* Temporarily prevent nested signal handlers from registering rseq. */ > + __rseq_lib_abi.register_state = RSEQ_REGISTER_ONGOING; OK. > + if (__rseq_lib_abi.refcount == UINT_MAX) > + { > + ret = -1; > + goto end; > + } > + if (__rseq_lib_abi.refcount++) > + goto end; OK. > + rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq), > + 0, RSEQ_SIG); > + if (!rc) > + goto end; > + if (INTERNAL_SYSCALL_ERRNO (rc, err) != EBUSY) > + __rseq_abi.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED; > + ret = -1; > +end: > + __rseq_lib_abi.register_state = RSEQ_REGISTER_ALLOWED; OK. > + return ret; > +} > + > +static inline int > +rseq_unregister_current_thread (void) > +{ > + int rc, ret = 0; > + INTERNAL_SYSCALL_DECL (err); > + > + /* Setting __rseq_register_state = RSEQ_REGISTER_EXITING for the rest of the > + thread lifetime. Ensures signal handlers nesting just before thread exit > + don't try to register rseq. */ > + __rseq_lib_abi.register_state = RSEQ_REGISTER_EXITING; > + __rseq_lib_abi.refcount = 0; > + rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq), > + RSEQ_FLAG_UNREGISTER, RSEQ_SIG); OK. > + if (!rc) > + goto end; > + ret = -1; > +end: > + return ret; > +} > +#else > +static inline int > +rseq_register_current_thread (void) > +{ > + return -1; > +} > + > +static inline int > +rseq_unregister_current_thread (void) > +{ > + return -1; > +} OK. > +#endif > + > +#endif /* rseq-internal.h */ > diff --git a/sysdeps/unix/sysv/linux/rseq-sym.c b/sysdeps/unix/sysv/linux/rseq-sym.c > new file mode 100644 > index 0000000000..99b277e9d6 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/rseq-sym.c > @@ -0,0 +1,54 @@ > +/* Copyright (C) 2018 Free Software Foundation, Inc. Needs a one sentence description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#include <sys/syscall.h> > +#include <stdint.h> > + > +#ifdef __NR_rseq > +#include <sys/rseq.h> > +#else > + > +enum rseq_cpu_id_state { > + RSEQ_CPU_ID_UNINITIALIZED = -1, > + RSEQ_CPU_ID_REGISTRATION_FAILED = -2, OK. > +}; > + > +/* linux/rseq.h defines struct rseq as aligned on 32 bytes. The kernel ABI > + size is 20 bytes. */ > +struct rseq { > + uint32_t cpu_id_start; > + uint32_t cpu_id; > + uint64_t rseq_cs; > + uint32_t flags; > +} __attribute__ ((aligned(4 * sizeof(uint64_t)))); OK, this is the kernel abi. > + > +struct rseq_lib_abi > +{ > + uint32_t register_state; > + uint32_t refcount; OK. This is the refcount coordination structure for registration. > +}; > + > +#endif > + > +/* volatile because fields can be read/updated by the kernel. */ > +__thread volatile struct rseq __rseq_abi = { > + .cpu_id = RSEQ_CPU_ID_UNINITIALIZED, OK. > +}; > + > +/* volatile because fields can be read/updated by signal handlers. */ > +__thread volatile struct rseq_lib_abi __rseq_lib_abi; OK. > diff --git a/sysdeps/unix/sysv/linux/s390/bits/rseq.h b/sysdeps/unix/sysv/linux/s390/bits/rseq.h > new file mode 100644 > index 0000000000..19d3755837 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h > @@ -0,0 +1,24 @@ > +/* Copyright (C) 2019 Free Software Foundation, Inc. Needs a one sentence description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Signature required before each abort handler code. */ > +#define RSEQ_SIG 0x53053053 Why not a s390 specific value here? > diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist > index 334def033c..42316d8666 100644 > --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist > +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist > @@ -2005,6 +2005,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist > index 536f4c4ced..c6c4e55a77 100644 > --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist > @@ -1911,6 +1911,8 @@ GLIBC_2.29 __fentry__ F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/sh/libc.abilist b/sysdeps/unix/sysv/linux/sh/libc.abilist > index 30ae3b6ebb..8652dfea59 100644 > --- a/sysdeps/unix/sysv/linux/sh/libc.abilist > +++ b/sysdeps/unix/sysv/linux/sh/libc.abilist > @@ -1887,6 +1887,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist > index 68b107d080..95b58dfa67 100644 > --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist > +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist > @@ -1999,6 +1999,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist > index e5b6a4da50..bfd24f9d1c 100644 > --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist > @@ -1940,6 +1940,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/sys/rseq.h b/sysdeps/unix/sysv/linux/sys/rseq.h > new file mode 100644 > index 0000000000..83c8976f50 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/sys/rseq.h > @@ -0,0 +1,65 @@ Needs a one sentence description. > +/* Copyright (C) 2019 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +#define _SYS_RSEQ_H 1 > + > +/* We use the structures declarations from the kernel headers. */ > +#include <linux/rseq.h> > +/* Architecture-specific rseq signature. */ > +#include <bits/rseq.h> > +#include <stdint.h> OK. > + > +enum rseq_register_state > +{ > + /* Value RSEQ_REGISTER_ALLOWED means it is allowed to update > + the refcount field and to register/unregister rseq. */ > + RSEQ_REGISTER_ALLOWED = 0, > + /* Value RSEQ_REGISTER_ONGOING means a rseq registration is in progress, > + so it is temporarily forbidden to update the refcount field or to > + register/unregister rseq for this thread or signal handlers nested > + on this thread. */ > + RSEQ_REGISTER_ONGOING = 1, > + /* Value RSEQ_REGISTER_EXITING means it is forbidden to update the > + refcount field or to register/unregister rseq for the rest of the > + thread's lifetime. */ > + RSEQ_REGISTER_EXITING = 2, OK. > +}; > + > +struct rseq_lib_abi > +{ > + uint32_t register_state; /* enum rseq_register_state. */ > + /* The refcount field keeps track of rseq users, so early adopters > + of rseq can cooperate amongst each other and with glibc to > + share rseq thread registration. The refcount field can only be > + updated when allowed by the value of field register_state. > + Registering rseq should be performed when incrementing refcount > + from 0 to 1, and unregistering rseq should be performed when > + decrementing refcount from 1 to 0. */ OK. > + uint32_t refcount; > +}; > + > +/* volatile because fields can be read/updated by the kernel. */ > +extern __thread volatile struct rseq __rseq_abi > +__attribute__ ((tls_model ("initial-exec"))); OK. > + > +/* volatile because fields can be read/updated by signal handlers. */ > +extern __thread volatile struct rseq_lib_abi __rseq_lib_abi > +__attribute__ ((tls_model ("initial-exec"))); OK. > + > +#endif /* sys/rseq.h */ > diff --git a/sysdeps/unix/sysv/linux/x86/bits/rseq.h b/sysdeps/unix/sysv/linux/x86/bits/rseq.h > new file mode 100644 > index 0000000000..19d3755837 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h > @@ -0,0 +1,24 @@ > +/* Copyright (C) 2019 Free Software Foundation, Inc. Add one line short description. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2019. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_RSEQ_H > +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead." > +#endif > + > +/* Signature required before each abort handler code. */ > +#define RSEQ_SIG 0x53053053 Why not an x86-specific op code? > diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist > index 86dfb0c94d..e9f8411fb2 100644 > --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist > @@ -1898,6 +1898,8 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > GLIBC_2.3 __ctype_b_loc F > GLIBC_2.3 __ctype_tolower_loc F > GLIBC_2.3 __ctype_toupper_loc F > diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist > index dd688263aa..f9432d07f1 100644 > --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist > +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist > @@ -2149,3 +2149,5 @@ GLIBC_2.28 thrd_yield F > GLIBC_2.29 getcpu F > GLIBC_2.29 posix_spawn_file_actions_addchdir_np F > GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F > +GLIBC_2.30 __rseq_abi T 0x20 > +GLIBC_2.30 __rseq_lib_abi T 0x8 OK. > -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-03-22 20:09 ` Carlos O'Donell @ 2019-03-25 15:54 ` Mathieu Desnoyers 2019-03-27 9:16 ` Martin Schwidefsky ` (2 more replies) 0 siblings, 3 replies; 52+ messages in thread From: Mathieu Desnoyers @ 2019-03-25 15:54 UTC (permalink / raw) To: Carlos O'Donell, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman Cc: carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api Hi Carlos, ----- On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codonell@redhat.com wrote: [...] I took care of all your comments for an upcoming round of patches, except the following that remain open (see answer inline). I'm adding Linux maintainers for ARM, aarch64, MIPS, powerpc, s390, x86 in CC to discuss the choice of code signature prior to the abort handler for each of those architectures. [...] >> diff --git a/NEWS b/NEWS >> index 912a9bdc0f..0608c60f7d 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -5,6 +5,17 @@ See the end for copying conditions. >> Please send GNU C library bug reports via <https://sourceware.org/bugzilla/> >> using `glibc' in the "product" field. >> >> +Version 2.30 >> + >> +Major new features: >> + >> +* Support for automatically registering threads with the Linux rseq(2) >> + system call has been added. This system call is implemented starting >> + from Linux 4.18. In order to be activated, it requires that glibc is built >> + against kernel headers that include this system call, and that glibc >> + detects availability of that system call at runtime. > > What benefit does the feature have for users? Can you talk about that please? > Why would a user want to use it. Please feel free to link to an external > reference. Would the following text work ? Version 2.30 Major new features: * Support for automatically registering threads with the Linux rseq(2) system call has been added. This system call is implemented starting from Linux 4.18. The Restartable Sequences ABI accelerates user-space operations on per-cpu data. It allows user-space to perform updates on per-cpu data without requiring heavy-weight atomic operations. See https://www.efficios.com/blog/2019/02/08/linux-restartable-sequences/ for further explanation. In order to be activated, it requires that glibc is built against kernel headers that include this system call, and that glibc detects availability of that system call at runtime. For reference the assembly code I'm pointing at below can be found in the Linux selftests under: tools/testing/selftests/rseq/rseq-*.h >> +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h [...] >> + >> +/* Signature required before each abort handler code. */ >> +#define RSEQ_SIG 0x53053053 > > Why isn't this an arm specific op code? Does the user have to mark this > up as part of a constant pool when placing it in front of the abort handler > to avoid disassembling the constant as code? This was at one point required > to get gdb to work properly. > For arm, the abort is defined as: #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ abort_label, version, flags, \ start_ip, post_commit_offset, abort_ip) \ ".balign 32\n\t" \ __rseq_str(table_label) ":\n\t" \ ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ ".word " __rseq_str(RSEQ_SIG) "\n\t" \ __rseq_str(label) ":\n\t" \ teardown \ "b %l[" __rseq_str(abort_label) "]\n\t" Which contains a copy of the struct rseq_cs for that critical section close to the actual critical section, within the code, followed by the signature. The reason why we have a copy of the struct rseq_cs there is to speed up entry into the critical section by using the "adr" instruction to compute the address to store into __rseq_cs->rseq_cs. AFAIU, a literal pool on ARM is defined as something which is always jumped over (never executed), which is the case here. We always have an unconditional branch instruction ("b") skipping over each RSEQ_ASM_DEFINE_ABORT(). Therefore, given that the signature is part of a literal pool on ARM, it can be any value we choose and should not need to be an actual valid instruction. aarch64 defines the abort as: #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ " b 222f\n" \ " .inst " __rseq_str(RSEQ_SIG) "\n" \ __rseq_str(label) ":\n" \ " b %l[" __rseq_str(abort_label) "]\n" \ "222:\n" Where the signature actually maps to a valid instruction. Considering that aarch64 also have literal pools, and we branch over the signature, I wonder why it's so important to ensure the signature is a valid trap instruction. Perhaps Will Deacon can enlighten us ? [...] >> +++ b/sysdeps/unix/sysv/linux/bits/rseq.h >> @@ -0,0 +1,24 @@ [...] >> + >> +/* Each architecture supporting rseq should define RSEQ_SIG as a 32-bit >> + signature inserted before each rseq abort label in the code section. */ > > Needs a huge explanation about RSEQ_SIG and the reasons why it's per-arch. > > Basically cut-and-paste what you wrote to libc-alpha about this. Updated to: /* RSEQ_SIG is a signature required before each abort handler code. It is a 32-bit value that maps to actual architecture code compiled into applications and libraries. It needs to be defined for each architecture. When choosing this value, it needs to be taken into account that generating invalid instructions may have ill effects on tools like objdump, and may also have impact on the CPU speculative execution efficiency in some cases. */ [...] >> +++ b/sysdeps/unix/sysv/linux/mips/bits/rseq.h [...] >> + >> +/* Signature required before each abort handler code. */ >> +#define RSEQ_SIG 0x53053053 > > Why isn't this a mips-specific op code? MIPS also has a literal pool just before the abort handler, and it jumps over it. My understanding is that we can use any signature value we want, and it does not need to be a valid instruction, similarly to ARM: #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ abort_label, version, flags, \ start_ip, post_commit_offset, abort_ip) \ ".balign 32\n\t" \ __rseq_str(table_label) ":\n\t" \ ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ LONG " " U32_U64_PAD(__rseq_str(start_ip)) "\n\t" \ LONG " " U32_U64_PAD(__rseq_str(post_commit_offset)) "\n\t" \ LONG " " U32_U64_PAD(__rseq_str(abort_ip)) "\n\t" \ ".word " __rseq_str(RSEQ_SIG) "\n\t" \ __rseq_str(label) ":\n\t" \ teardown \ "b %l[" __rseq_str(abort_label) "]\n\t" Perhaps Paul Burton can confirm this ? [...] >> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h [...] >> +/* Signature required before each abort handler code. */ >> +#define RSEQ_SIG 0x53053053 > > Why isn't this an opcode specific to power? On powerpc 32/64, the abort is placed in a __rseq_failure executable section: #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ ".pushsection __rseq_failure, \"ax\"\n\t" \ ".long " __rseq_str(RSEQ_SIG) "\n\t" \ __rseq_str(label) ":\n\t" \ "b %l[" __rseq_str(abort_label) "]\n\t" \ ".popsection\n\t" That section only contains snippets of those trampolines. Arguably, it would be good if disassemblers could find valid instructions there. Boqun Feng could perhaps shed some light on this signature choice ? Now would be a good time to decide once and for all whether a valid instruction would be a better choice. [...] >> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h [...] >> + >> +/* Signature required before each abort handler code. */ >> +#define RSEQ_SIG 0x53053053 > > Why not a s390 specific value here? s390 also has the abort handler in a __rseq_failure section: #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ ".pushsection __rseq_failure, \"ax\"\n\t" \ ".long " __rseq_str(RSEQ_SIG) "\n\t" \ __rseq_str(label) ":\n\t" \ teardown \ "j %l[" __rseq_str(abort_label) "]\n\t" \ ".popsection\n\t" Same question applies as powerpc: since disassemblers will try to decode that instruction, would it be better to define it as a valid one ? [...] >> +++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h [...] >> +/* Signature required before each abort handler code. */ >> +#define RSEQ_SIG 0x53053053 > > Why not an x86-specific op code? On x86, we use this 4-byte signature as operand to a "no-op" instruction taking 4-byte immediate operand: x86-32: #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ ".pushsection __rseq_failure, \"ax\"\n\t" \ /* Disassembler-friendly signature: nopl <sig>. */ \ ".byte 0x0f, 0x1f, 0x05\n\t" \ ".long " __rseq_str(RSEQ_SIG) "\n\t" \ __rseq_str(label) ":\n\t" \ teardown \ "jmp %l[" __rseq_str(abort_label) "]\n\t" \ ".popsection\n\t" x86-64: #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ ".pushsection __rseq_failure, \"ax\"\n\t" \ /* Disassembler-friendly signature: nopl <sig>(%rip). */\ ".byte 0x0f, 0x1f, 0x05\n\t" \ ".long " __rseq_str(RSEQ_SIG) "\n\t" \ __rseq_str(label) ":\n\t" \ teardown \ "jmp %l[" __rseq_str(abort_label) "]\n\t" \ ".popsection\n\t" Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-03-25 15:54 ` Mathieu Desnoyers @ 2019-03-27 9:16 ` Martin Schwidefsky 2019-03-27 20:01 ` Mathieu Desnoyers 2019-03-27 20:38 ` Carlos O'Donell 2019-04-02 6:02 ` Michael Ellerman 2019-04-04 20:50 ` Carlos O'Donell 2 siblings, 2 replies; 52+ messages in thread From: Martin Schwidefsky @ 2019-03-27 9:16 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Carlos O'Donell, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Russell King, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api On Mon, 25 Mar 2019 11:54:32 -0400 (EDT) Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h > [...] > >> + > >> +/* Signature required before each abort handler code. */ > >> +#define RSEQ_SIG 0x53053053 > > > > Why not a s390 specific value here? > > s390 also has the abort handler in a __rseq_failure section: > > #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ > ".pushsection __rseq_failure, \"ax\"\n\t" \ > ".long " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > teardown \ > "j %l[" __rseq_str(abort_label) "]\n\t" \ > ".popsection\n\t" > > Same question applies as powerpc: since disassemblers will try to decode > that instruction, would it be better to define it as a valid one ? > > [...] A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction. And please replace that "j %l[...]" with a "jg %l[...]", the branch target range of the "j" instruction is 64K, not enough for the general case. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-03-27 9:16 ` Martin Schwidefsky @ 2019-03-27 20:01 ` Mathieu Desnoyers 2019-03-27 20:38 ` Carlos O'Donell 1 sibling, 0 replies; 52+ messages in thread From: Mathieu Desnoyers @ 2019-03-27 20:01 UTC (permalink / raw) To: schwidefsky Cc: Carlos O'Donell, Paul Burton, Will Deacon, Boqun Feng, heiko carstens, gor, Russell King, ARM Linux, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api ----- On Mar 27, 2019, at 5:16 AM, schwidefsky schwidefsky@de.ibm.com wrote: > On Mon, 25 Mar 2019 11:54:32 -0400 (EDT) > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >> >> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h >> [...] >> >> + >> >> +/* Signature required before each abort handler code. */ >> >> +#define RSEQ_SIG 0x53053053 >> > >> > Why not a s390 specific value here? >> >> s390 also has the abort handler in a __rseq_failure section: >> >> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ >> ".pushsection __rseq_failure, \"ax\"\n\t" \ >> ".long " __rseq_str(RSEQ_SIG) "\n\t" \ >> __rseq_str(label) ":\n\t" \ >> teardown \ >> "j %l[" __rseq_str(abort_label) "]\n\t" \ >> ".popsection\n\t" >> >> Same question applies as powerpc: since disassemblers will try to decode >> that instruction, would it be better to define it as a valid one ? >> >> [...] > > A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction. Based on the z/Architecture reference summary, it appears that the DIEBR instruction's opcode is B353. So I suspect that just starting with 0x53 is not sufficient to make it a valid opcode in the instruction set. Is it something we should care about ? > And please replace that "j %l[...]" with a "jg %l[...]", the branch target > range of the "j" instruction is 64K, not enough for the general case. I'll also need to use "jg" for RSEQ_ASM_DEFINE_CMPFAIL: #define RSEQ_ASM_DEFINE_CMPFAIL(label, teardown, cmpfail_label) \ ".pushsection __rseq_failure, \"ax\"\n\t" \ __rseq_str(label) ":\n\t" \ teardown \ "jg %l[" __rseq_str(cmpfail_label) "]\n\t" \ ".popsection\n\t" I'll prepare a fix for Linux selftests, and I already pushed a fix within my librseq repository. Thanks for pointing it out! Mathieu > > -- > blue skies, > Martin. > > "Reality continues to ruin my life." - Calvin. -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-03-27 9:16 ` Martin Schwidefsky 2019-03-27 20:01 ` Mathieu Desnoyers @ 2019-03-27 20:38 ` Carlos O'Donell 2019-03-28 7:50 ` Martin Schwidefsky 1 sibling, 1 reply; 52+ messages in thread From: Carlos O'Donell @ 2019-03-27 20:38 UTC (permalink / raw) To: Martin Schwidefsky, Mathieu Desnoyers Cc: Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Russell King, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api On 3/27/19 5:16 AM, Martin Schwidefsky wrote: > On Mon, 25 Mar 2019 11:54:32 -0400 (EDT) > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > >>>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h >> [...] >>>> + >>>> +/* Signature required before each abort handler code. */ >>>> +#define RSEQ_SIG 0x53053053 >>> >>> Why not a s390 specific value here? >> >> s390 also has the abort handler in a __rseq_failure section: >> >> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ >> ".pushsection __rseq_failure, \"ax\"\n\t" \ >> ".long " __rseq_str(RSEQ_SIG) "\n\t" \ >> __rseq_str(label) ":\n\t" \ >> teardown \ >> "j %l[" __rseq_str(abort_label) "]\n\t" \ >> ".popsection\n\t" >> >> Same question applies as powerpc: since disassemblers will try to decode >> that instruction, would it be better to define it as a valid one ? >> >> [...] > > A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction. > And please replace that "j %l[...]" with a "jg %l[...]", the branch target > range of the "j" instruction is 64K, not enough for the general case. Why was this particular operated selected? So on s390 the RSEQ_SIG will show up as an unexpected "divide to integer" instruction that can't be reached by any control flow? Can we use a NOP with a unique value in an immediate operand? The goal being to have something that won't confuse during a debug session, or that the debugger can ignore (like constant pools on Arm) -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-03-27 20:38 ` Carlos O'Donell @ 2019-03-28 7:50 ` Martin Schwidefsky 2019-03-28 15:43 ` Mathieu Desnoyers 0 siblings, 1 reply; 52+ messages in thread From: Martin Schwidefsky @ 2019-03-28 7:50 UTC (permalink / raw) To: Carlos O'Donell Cc: Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Russell King, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api On Wed, 27 Mar 2019 16:38:32 -0400 "Carlos O'Donell" <codonell@redhat.com> wrote: > On 3/27/19 5:16 AM, Martin Schwidefsky wrote: > > On Mon, 25 Mar 2019 11:54:32 -0400 (EDT) > > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > > >>>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h > >> [...] > >>>> + > >>>> +/* Signature required before each abort handler code. */ > >>>> +#define RSEQ_SIG 0x53053053 > >>> > >>> Why not a s390 specific value here? > >> > >> s390 also has the abort handler in a __rseq_failure section: > >> > >> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ > >> ".pushsection __rseq_failure, \"ax\"\n\t" \ > >> ".long " __rseq_str(RSEQ_SIG) "\n\t" \ > >> __rseq_str(label) ":\n\t" \ > >> teardown \ > >> "j %l[" __rseq_str(abort_label) "]\n\t" \ > >> ".popsection\n\t" > >> > >> Same question applies as powerpc: since disassemblers will try to decode > >> that instruction, would it be better to define it as a valid one ? > >> > >> [...] > > > > A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction. > > And please replace that "j %l[...]" with a "jg %l[...]", the branch target > > range of the "j" instruction is 64K, not enough for the general case. > > Why was this particular operated selected? > > So on s390 the RSEQ_SIG will show up as an unexpected "divide to integer" > instruction that can't be reached by any control flow? > > Can we use a NOP with a unique value in an immediate operand? > > The goal being to have something that won't confuse during a debug > session, or that the debugger can ignore (like constant pools on Arm) I was looking at the wrong table in regard to opcode 0x53. The pattern 0x53...... is not a known instruction as far as the disassembler is concerned. As Mathieu pointed out "diebr" is actually 0xb353.... Sorry about the confusion. But why do we need this value in the first place if it can not be reached? -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-03-28 7:50 ` Martin Schwidefsky @ 2019-03-28 15:43 ` Mathieu Desnoyers 0 siblings, 0 replies; 52+ messages in thread From: Mathieu Desnoyers @ 2019-03-28 15:43 UTC (permalink / raw) To: schwidefsky Cc: Carlos O'Donell, Paul Burton, Will Deacon, Boqun Feng, heiko carstens, gor, Russell King, ARM Linux, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api ----- On Mar 28, 2019, at 3:49 AM, schwidefsky schwidefsky@de.ibm.com wrote: > On Wed, 27 Mar 2019 16:38:32 -0400 > "Carlos O'Donell" <codonell@redhat.com> wrote: > >> On 3/27/19 5:16 AM, Martin Schwidefsky wrote: >> > On Mon, 25 Mar 2019 11:54:32 -0400 (EDT) >> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: >> > >> >>>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h >> >> [...] >> >>>> + >> >>>> +/* Signature required before each abort handler code. */ >> >>>> +#define RSEQ_SIG 0x53053053 >> >>> >> >>> Why not a s390 specific value here? >> >> >> >> s390 also has the abort handler in a __rseq_failure section: >> >> >> >> #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ >> >> ".pushsection __rseq_failure, \"ax\"\n\t" \ >> >> ".long " __rseq_str(RSEQ_SIG) "\n\t" \ >> >> __rseq_str(label) ":\n\t" \ >> >> teardown \ >> >> "j %l[" __rseq_str(abort_label) "]\n\t" \ >> >> ".popsection\n\t" >> >> >> >> Same question applies as powerpc: since disassemblers will try to decode >> >> that instruction, would it be better to define it as a valid one ? >> >> >> >> [...] >> > >> > A 4-byte sequence starting with 0x53 is decoded as a "diebr" instruction. >> > And please replace that "j %l[...]" with a "jg %l[...]", the branch target >> > range of the "j" instruction is 64K, not enough for the general case. >> >> Why was this particular operated selected? The 0x53053053 signature was selected by myself on x86, where it is a 4-byte operand to a no-op instruction. That value looks like "SEQSEQSE" in hexadecimal. The goal was to have an uncommon code signature value. Then it has been used as-is on arm and mips within literal pools (which seems fine), and also on s390 and powerpc where those seem to generate invalid instruction within a separate code section. I'm mainly concerned about the choice of this value on s390 and powerpc. >> >> So on s390 the RSEQ_SIG will show up as an unexpected "divide to integer" >> instruction that can't be reached by any control flow? >> >> Can we use a NOP with a unique value in an immediate operand? >> >> The goal being to have something that won't confuse during a debug >> session, or that the debugger can ignore (like constant pools on Arm) > > I was looking at the wrong table in regard to opcode 0x53. The pattern > 0x53...... is not a known instruction as far as the disassembler is > concerned. As Mathieu pointed out "diebr" is actually 0xb353.... > Sorry about the confusion. > > But why do we need this value in the first place if it can not be reached? One reason is to help disassemblers in tools like objdump, gdb, and so on. Another reason for making this a valid instruction is if the CPU speculative execution can be helped by making this instruction a valid one, even though it's not reachable through normal execution. (this appears to be important at least on aarch64) However, we want that instruction to be an uncommon one, to reduce the chances that an attacker can use the rseq abort mechanism to redirect execution to an unrelated code block that would happen to follow that same 4-byte signature. For instance, I would not use a no-op which is typically generated by compilers there. In summary: * x86: Uses a no-op instruction ending with a 4-byte operand. * aarch64: Uses a trap instruction which also has a seldom-used operand value. * arm: The signature is in a literal pool (there is a jump over the signature). * mips: The signature is in a literal pool (there is a jump over the signature). * powerpc: the signature is an unreachable invalid(?) instruction within an executable section. * s390: the signature is an unreachable invalid instruction within an executable section. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-03-25 15:54 ` Mathieu Desnoyers 2019-03-27 9:16 ` Martin Schwidefsky @ 2019-04-02 6:02 ` Michael Ellerman 2019-04-02 7:08 ` Florian Weimer 2019-04-04 20:16 ` Carlos O'Donell 2019-04-04 20:50 ` Carlos O'Donell 2 siblings, 2 replies; 52+ messages in thread From: Michael Ellerman @ 2019-04-02 6:02 UTC (permalink / raw) To: Mathieu Desnoyers, Carlos O'Donell, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras Cc: carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes: > Hi Carlos, > > ----- On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codonell@redhat.com wrote: ... > > [...] >>> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h > [...] >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why isn't this an opcode specific to power? > > On powerpc 32/64, the abort is placed in a __rseq_failure executable section: > > #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ > ".pushsection __rseq_failure, \"ax\"\n\t" \ > ".long " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > "b %l[" __rseq_str(abort_label) "]\n\t" \ > ".popsection\n\t" > > That section only contains snippets of those trampolines. Arguably, it would be > good if disassemblers could find valid instructions there. Boqun Feng could perhaps > shed some light on this signature choice ? Now would be a good time to decide > once and for all whether a valid instruction would be a better choice. I'm a bit vague on what we're trying to do here. But it seems like you want some sort of "eye catcher" prior to the branch? That value is a valid instruction on current CPUs (rlwimi. r5,r24,6,1,9), and even if it wasn't it could become one in future. If you change it to 0x8053530 that is both a valid instruction and is a nop (conditional trap immediate but with no conditions set). cheers ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-02 6:02 ` Michael Ellerman @ 2019-04-02 7:08 ` Florian Weimer 2019-04-04 20:33 ` Carlos O'Donell 2019-04-04 20:16 ` Carlos O'Donell 1 sibling, 1 reply; 52+ messages in thread From: Florian Weimer @ 2019-04-02 7:08 UTC (permalink / raw) To: Michael Ellerman Cc: Mathieu Desnoyers, Carlos O'Donell, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api * Michael Ellerman: > I'm a bit vague on what we're trying to do here. > > But it seems like you want some sort of "eye catcher" prior to the branch? > > That value is a valid instruction on current CPUs (rlwimi. > r5,r24,6,1,9), and even if it wasn't it could become one in future. > > If you change it to 0x8053530 that is both a valid instruction and is a > nop (conditional trap immediate but with no conditions set). I think we need something that is very unlikely to appear in the instruction stream. It's just a marker. The instruction will never be executed, and it does not have to be a trap, either (I believe that a standard trap instruction would be a bad choice). Thanks, Florian ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-02 7:08 ` Florian Weimer @ 2019-04-04 20:33 ` Carlos O'Donell 2019-04-05 9:16 ` Florian Weimer 0 siblings, 1 reply; 52+ messages in thread From: Carlos O'Donell @ 2019-04-04 20:33 UTC (permalink / raw) To: Florian Weimer, Michael Ellerman Cc: Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api On 4/2/19 3:08 AM, Florian Weimer wrote: > * Michael Ellerman: > >> I'm a bit vague on what we're trying to do here. >> >> But it seems like you want some sort of "eye catcher" prior to the branch? >> >> That value is a valid instruction on current CPUs (rlwimi. >> r5,r24,6,1,9), and even if it wasn't it could become one in future. >> >> If you change it to 0x8053530 that is both a valid instruction and is a >> nop (conditional trap immediate but with no conditions set). > > I think we need something that is very unlikely to appear in the > instruction stream. It's just a marker. The instruction will never be > executed, and it does not have to be a trap, either (I believe that a > standard trap instruction would be a bad choice). I assume you want to avoid a standard trap instruction because it would be common, and so not meet the intent of the RSEQ_SIG choice as being something that is *uncommon* right? It is valuable that it be a trap, particularly for constant pools because it means that a jump into the constant pool will trap. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-04 20:33 ` Carlos O'Donell @ 2019-04-05 9:16 ` Florian Weimer 2019-04-05 15:40 ` Carlos O'Donell 0 siblings, 1 reply; 52+ messages in thread From: Florian Weimer @ 2019-04-05 9:16 UTC (permalink / raw) To: Carlos O'Donell Cc: Michael Ellerman, Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api * Carlos O'Donell: > On 4/2/19 3:08 AM, Florian Weimer wrote: >> * Michael Ellerman: >> >>> I'm a bit vague on what we're trying to do here. >>> >>> But it seems like you want some sort of "eye catcher" prior to the branch? >>> >>> That value is a valid instruction on current CPUs (rlwimi. >>> r5,r24,6,1,9), and even if it wasn't it could become one in future. >>> >>> If you change it to 0x8053530 that is both a valid instruction and is a >>> nop (conditional trap immediate but with no conditions set). >> >> I think we need something that is very unlikely to appear in the >> instruction stream. It's just a marker. The instruction will never be >> executed, and it does not have to be a trap, either (I believe that a >> standard trap instruction would be a bad choice). > > I assume you want to avoid a standard trap instruction because it would > be common, and so not meet the intent of the RSEQ_SIG choice as being something > that is *uncommon* right? Ideally, RSEQ_SIG would be something that does not show up in the instruction stream at all, so that it is a reliable marker for the start of an rseq handler. I assume the intent here is that the kernel provides some validation on the program counter it reads from the rseq area, so that we do not end up with some easily-abused gadget in every process image. > It is valuable that it be a trap, particularly for constant pools because > it means that a jump into the constant pool will trap. Sorry, I don't understand why this matters in this context. Would you please elaborate? Thanks, Florian ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-05 9:16 ` Florian Weimer @ 2019-04-05 15:40 ` Carlos O'Donell 2019-04-08 21:45 ` Tulio Magno Quites Machado Filho 0 siblings, 1 reply; 52+ messages in thread From: Carlos O'Donell @ 2019-04-05 15:40 UTC (permalink / raw) To: Florian Weimer Cc: Michael Ellerman, Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api On 4/5/19 5:16 AM, Florian Weimer wrote: > * Carlos O'Donell: >> It is valuable that it be a trap, particularly for constant pools because >> it means that a jump into the constant pool will trap. > > Sorry, I don't understand why this matters in this context. Would you > please elaborate? Sorry, I wasn't very clear. My point is only that any accidental jumps, either with off-by-one (like you fixed in gcc/glibc's signal unwinding most recently), result in a process fault rather than executing RSEQ_SIG as a valid instruction *and then* continuing onwards to the handler. A process fault is achieved either by a trap, or an invalid instruction, or a privileged insn (like suggested for MIPS in this thread). -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-05 15:40 ` Carlos O'Donell @ 2019-04-08 21:45 ` Tulio Magno Quites Machado Filho 2019-04-08 22:06 ` Carlos O'Donell 0 siblings, 1 reply; 52+ messages in thread From: Tulio Magno Quites Machado Filho @ 2019-04-08 21:45 UTC (permalink / raw) To: Carlos O'Donell, Florian Weimer, Michael Meissner, Alan Modra, Peter Bergner, Michael Ellerman Cc: Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api Carlos O'Donell <codonell@redhat.com> writes: > On 4/5/19 5:16 AM, Florian Weimer wrote: >> * Carlos O'Donell: >>> It is valuable that it be a trap, particularly for constant pools because >>> it means that a jump into the constant pool will trap. >> >> Sorry, I don't understand why this matters in this context. Would you >> please elaborate? > > Sorry, I wasn't very clear. > > My point is only that any accidental jumps, either with off-by-one (like you > fixed in gcc/glibc's signal unwinding most recently), result in a process fault > rather than executing RSEQ_SIG as a valid instruction *and then* continuing > onwards to the handler. > > A process fault is achieved either by a trap, or an invalid instruction, or > a privileged insn (like suggested for MIPS in this thread). In that case, mtmsr (Move to Machine State Register) seems a good candidate. mtmsr is available both on 32 and 64 bits since their first implementations. It's a privileged instruction and should never appear in userspace code (causes SIGILL). Any comments? -- Tulio Magno ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-08 21:45 ` Tulio Magno Quites Machado Filho @ 2019-04-08 22:06 ` Carlos O'Donell 2019-04-09 4:28 ` Michael Ellerman 2019-04-09 16:34 ` Mathieu Desnoyers 0 siblings, 2 replies; 52+ messages in thread From: Carlos O'Donell @ 2019-04-08 22:06 UTC (permalink / raw) To: Tulio Magno Quites Machado Filho, Florian Weimer, Michael Meissner, Alan Modra, Peter Bergner, Michael Ellerman, Mathieu Desnoyers Cc: Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api On 4/8/19 3:20 PM, Tulio Magno Quites Machado Filho wrote: > Carlos O'Donell <codonell@redhat.com> writes: > >> On 4/5/19 5:16 AM, Florian Weimer wrote: >>> * Carlos O'Donell: >>>> It is valuable that it be a trap, particularly for constant pools because >>>> it means that a jump into the constant pool will trap. >>> >>> Sorry, I don't understand why this matters in this context. Would you >>> please elaborate? >> >> Sorry, I wasn't very clear. >> >> My point is only that any accidental jumps, either with off-by-one (like you >> fixed in gcc/glibc's signal unwinding most recently), result in a process fault >> rather than executing RSEQ_SIG as a valid instruction *and then* continuing >> onwards to the handler. >> >> A process fault is achieved either by a trap, or an invalid instruction, or >> a privileged insn (like suggested for MIPS in this thread). > > In that case, mtmsr (Move to Machine State Register) seems a good candidate. > > mtmsr is available both on 32 and 64 bits since their first implementations. > > It's a privileged instruction and should never appear in userspace > code (causes SIGILL). > > Any comments? That seems good to me. Mathieu, What's required to move this forward for POWER? -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-08 22:06 ` Carlos O'Donell @ 2019-04-09 4:28 ` Michael Ellerman 2019-04-09 9:52 ` Alan Modra 2019-04-09 16:34 ` Mathieu Desnoyers 1 sibling, 1 reply; 52+ messages in thread From: Michael Ellerman @ 2019-04-09 4:28 UTC (permalink / raw) To: Carlos O'Donell, Tulio Magno Quites Machado Filho, Florian Weimer, Michael Meissner, Alan Modra, Peter Bergner, Mathieu Desnoyers Cc: Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api Carlos O'Donell <codonell@redhat.com> writes: > On 4/8/19 3:20 PM, Tulio Magno Quites Machado Filho wrote: >> Carlos O'Donell <codonell@redhat.com> writes: >> >>> On 4/5/19 5:16 AM, Florian Weimer wrote: >>>> * Carlos O'Donell: >>>>> It is valuable that it be a trap, particularly for constant pools because >>>>> it means that a jump into the constant pool will trap. >>>> >>>> Sorry, I don't understand why this matters in this context. Would you >>>> please elaborate? >>> >>> Sorry, I wasn't very clear. >>> >>> My point is only that any accidental jumps, either with off-by-one (like you >>> fixed in gcc/glibc's signal unwinding most recently), result in a process fault >>> rather than executing RSEQ_SIG as a valid instruction *and then* continuing >>> onwards to the handler. >>> >>> A process fault is achieved either by a trap, or an invalid instruction, or >>> a privileged insn (like suggested for MIPS in this thread). >> >> In that case, mtmsr (Move to Machine State Register) seems a good candidate. >> >> mtmsr is available both on 32 and 64 bits since their first implementations. >> >> It's a privileged instruction and should never appear in userspace >> code (causes SIGILL). I'd much rather we use a trap with a specific immediate value. Otherwise someone's going to waste time one day puzzling over why userspace is doing mtmsr. It would also complicate things if we ever wanted to emulate mtmsr. If we want something that is a trap rather than a nop then use 0x0fe50553. That's "compare the value in r5 with 0x553 and then trap unconditionally". It shows up in objdump as: 10000000: 53 05 e5 0f twui r5,1363 The immediate can be anything, I chose that value to mimic the x86 value Mathieu mentioned. There's no reason that instruction would ever be generated because the immediate value serves no purpose. So it satisfies the "very unlikely to appear" criteria AFAICS. cheers ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-09 4:28 ` Michael Ellerman @ 2019-04-09 9:52 ` Alan Modra 2019-04-09 14:13 ` Tulio Magno Quites Machado Filho 2019-04-18 15:33 ` Mathieu Desnoyers 0 siblings, 2 replies; 52+ messages in thread From: Alan Modra @ 2019-04-09 9:52 UTC (permalink / raw) To: Michael Ellerman Cc: Carlos O'Donell, Tulio Magno Quites Machado Filho, Florian Weimer, Michael Meissner, Peter Bergner, Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api On Tue, Apr 09, 2019 at 02:23:53PM +1000, Michael Ellerman wrote: > I'd much rather we use a trap with a specific immediate value. Otherwise > someone's going to waste time one day puzzling over why userspace is > doing mtmsr. It's data. We have other data in executable sections. Anyone who wonders about odd disassembly just hasn't realized they are disassembling data. > It would also complicate things if we ever wanted to emulate mtmsr. No, because it won't be executed. If I understand correctly, the only reason to choose an illegal, trap or privileged insn is to halt execution earlier rather than later when a program goes off in the weeds. > If we want something that is a trap rather than a nop then use 0x0fe50553. > > That's "compare the value in r5 with 0x553 and then trap unconditionally". > > It shows up in objdump as: > > 10000000: 53 05 e5 0f twui r5,1363 > > > The immediate can be anything, I chose that value to mimic the x86 value > Mathieu mentioned. > > There's no reason that instruction would ever be generated because the > immediate value serves no purpose. So it satisfies the "very unlikely > to appear" criteria AFAICS. Yes, looks fine to me, except that in VLE mode (do we care?) ".long 0x0fe50553" disassembles as 0: 0f e5 se_cmphl r5,r30 2: 05 53 se_mullw r3,r5 No illegal/trap/privileged insn there. ".long 0x0fe5000b" might be better to cover VLE. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-09 9:52 ` Alan Modra @ 2019-04-09 14:13 ` Tulio Magno Quites Machado Filho 2019-04-09 14:58 ` Carlos O'Donell 2019-04-18 15:33 ` Mathieu Desnoyers 1 sibling, 1 reply; 52+ messages in thread From: Tulio Magno Quites Machado Filho @ 2019-04-09 14:13 UTC (permalink / raw) To: Alan Modra, Michael Ellerman Cc: Carlos O'Donell, Florian Weimer, Michael Meissner, Peter Bergner, Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel Alan Modra <amodra@gmail.com> writes: > On Tue, Apr 09, 2019 at 02:23:53PM +1000, Michael Ellerman wrote: >> I'd much rather we use a trap with a specific immediate value. Otherwise >> someone's going to waste time one day puzzling over why userspace is >> doing mtmsr. > > It's data. We have other data in executable sections. Anyone who > wonders about odd disassembly just hasn't realized they are > disassembling data. > >> It would also complicate things if we ever wanted to emulate mtmsr. > > No, because it won't be executed. If I understand correctly, the only > reason to choose an illegal, trap or privileged insn is to halt > execution earlier rather than later when a program goes off in the > weeds. > >> If we want something that is a trap rather than a nop then use 0x0fe50553. >> >> That's "compare the value in r5 with 0x553 and then trap unconditionally". >> >> It shows up in objdump as: >> >> 10000000: 53 05 e5 0f twui r5,1363 >> >> >> The immediate can be anything, I chose that value to mimic the x86 value >> Mathieu mentioned. >> >> There's no reason that instruction would ever be generated because the >> immediate value serves no purpose. So it satisfies the "very unlikely >> to appear" criteria AFAICS. > > Yes, looks fine to me, except that in VLE mode (do we care?) > ".long 0x0fe50553" disassembles as > 0: 0f e5 se_cmphl r5,r30 > 2: 05 53 se_mullw r3,r5 > No illegal/trap/privileged insn there. > > ".long 0x0fe5000b" might be better to cover VLE. Looks good for me too. Actually, it better fits what Carlos O'Donnell had requested: >>> I think the order of preference is: >>> >>> 1. An uncommon insn (with random immediate values), in a literal pool, that is >>> not a useful ROP/JOP sequence (very uncommon) >>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) >>> 2b. A NOP to avoid affecting speculative execution (maybe uncommon) >>> >>> With 2a/2b being roughly equivalent depending on speculative execution policy. -- Tulio Magno ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-09 14:13 ` Tulio Magno Quites Machado Filho @ 2019-04-09 14:58 ` Carlos O'Donell 2019-04-09 15:58 ` Mathieu Desnoyers 0 siblings, 1 reply; 52+ messages in thread From: Carlos O'Donell @ 2019-04-09 14:58 UTC (permalink / raw) To: Tulio Magno Quites Machado Filho, Alan Modra, Michael Ellerman Cc: Florian Weimer, Michael Meissner, Peter Bergner, Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api On 4/9/19 9:58 AM, Tulio Magno Quites Machado Filho wrote: > Alan Modra <amodra@gmail.com> writes: >> Yes, looks fine to me, except that in VLE mode (do we care?) >> ".long 0x0fe50553" disassembles as >> 0: 0f e5 se_cmphl r5,r30 >> 2: 05 53 se_mullw r3,r5 >> No illegal/trap/privileged insn there. >> >> ".long 0x0fe5000b" might be better to cover VLE. > > Looks good for me too. The requirement that it be a valid instruction is simply to aid in the disassembly of rseq regions which may be hand written assembly with a thin veneer of CFI/DWARF information. It has already been pointed out that POWER uses data in the instruction stream for jump tables to implement switch statements, but that specific use has compiler support and one presumes good debug information. So as Alan says, there is already data in the insn stream, though such things can't be good for performance (pollutes D-cache, problematic for speculative execution). > Actually, it better fits what Carlos O'Donnell had requested: > >>>> I think the order of preference is: >>>> >>>> 1. An uncommon insn (with random immediate values), in a literal pool, that is >>>> not a useful ROP/JOP sequence (very uncommon) >>>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) >>>> 2b. A NOP to avoid affecting speculative execution (maybe uncommon) >>>> >>>> With 2a/2b being roughly equivalent depending on speculative execution policy. Yes, though "in a literal pool" is something that is not required, since users might not want literal pools and so we shouldn't require that feature (it also pollutes D-cache). Keep in mind the insn will never execute. If a trap insn calls out the nature of the signature more clearly then use that instead. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-09 14:58 ` Carlos O'Donell @ 2019-04-09 15:58 ` Mathieu Desnoyers 0 siblings, 0 replies; 52+ messages in thread From: Mathieu Desnoyers @ 2019-04-09 15:58 UTC (permalink / raw) To: Carlos O'Donell Cc: Tulio Magno Quites Machado Filho, Alan Modra, Michael Ellerman, Florian Weimer, Michael Meissner, Peter Bergner, Paul Burton, Will Deacon, Boqun Feng, heiko carstens, gor, schwidefsky, Russell King, ARM Linux, Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api ----- On Apr 9, 2019, at 10:13 AM, Carlos O'Donell codonell@redhat.com wrote: > On 4/9/19 9:58 AM, Tulio Magno Quites Machado Filho wrote: >> Alan Modra <amodra@gmail.com> writes: >>> Yes, looks fine to me, except that in VLE mode (do we care?) >>> ".long 0x0fe50553" disassembles as >>> 0: 0f e5 se_cmphl r5,r30 >>> 2: 05 53 se_mullw r3,r5 >>> No illegal/trap/privileged insn there. >>> >>> ".long 0x0fe5000b" might be better to cover VLE. >> >> Looks good for me too. > > The requirement that it be a valid instruction is simply to aid in the > disassembly of rseq regions which may be hand written assembly with a > thin veneer of CFI/DWARF information. > > It has already been pointed out that POWER uses data in the instruction > stream for jump tables to implement switch statements, but that specific > use has compiler support and one presumes good debug information. So as > Alan says, there is already data in the insn stream, though such things > can't be good for performance (pollutes D-cache, problematic for > speculative execution). > >> Actually, it better fits what Carlos O'Donnell had requested: >> >>>>> I think the order of preference is: >>>>> >>>>> 1. An uncommon insn (with random immediate values), in a literal pool, that is >>>>> not a useful ROP/JOP sequence (very uncommon) >>>>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) >>>>> 2b. A NOP to avoid affecting speculative execution (maybe uncommon) >>>>> >>>>> With 2a/2b being roughly equivalent depending on speculative execution policy. > > Yes, though "in a literal pool" is something that is not required, since > users might not want literal pools and so we shouldn't require that > feature (it also pollutes D-cache). > > Keep in mind the insn will never execute. > > If a trap insn calls out the nature of the signature more clearly then > use that instead. So based on the recent discussions, there are a few things we can conclude: - Choosing a random value and relying on literal pools is a bad idea, because some compilation environments disable them entirely, - We ideally want the signature to be a valid instruction in the instruction set so disassembler/emulator tools don't get confused and we don't hurt speculative execution. - Best option is a trap with an unlikely immediate opcode, because it traps on the instruction in case the program try to execute it by mistake, - Second best would be a no-op with an unlikely immediate opcode, - We may want to stay away from privileged instructions because they can confuse emulators with may try to emulate them, - Some architectures have big endian/little endian variants. We may need to carefully #ifdef each case so the numeric value matches actual instructions, - Some architectures have extensions to their instruction set (e.g. ARM thumb, power VLE) which can be combined with the basic instruction set within the same program. We need to decide whether we care what those signatures look like in those instruction set extensions or not. Is it a best effort to match real instructions or a hard requirement ? If it's a hard requirement, we may need to extend the rseq system call with new flags to accept more than one signature. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-09 9:52 ` Alan Modra 2019-04-09 14:13 ` Tulio Magno Quites Machado Filho @ 2019-04-18 15:33 ` Mathieu Desnoyers 1 sibling, 0 replies; 52+ messages in thread From: Mathieu Desnoyers @ 2019-04-18 15:33 UTC (permalink / raw) To: Alan Modra Cc: Michael Ellerman, Carlos O'Donell, Tulio Magno Quites Machado Filho, Florian Weimer, Michael Meissner, Peter Bergner, Paul Burton, Will Deacon, Boqun Feng, heiko carstens, gor, schwidefsky, Russell King, ARM Linux, Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api ----- On Apr 9, 2019, at 5:29 AM, Alan Modra amodra@gmail.com wrote: > On Tue, Apr 09, 2019 at 02:23:53PM +1000, Michael Ellerman wrote: >> I'd much rather we use a trap with a specific immediate value. Otherwise >> someone's going to waste time one day puzzling over why userspace is >> doing mtmsr. > > It's data. We have other data in executable sections. Anyone who > wonders about odd disassembly just hasn't realized they are > disassembling data. > >> It would also complicate things if we ever wanted to emulate mtmsr. > > No, because it won't be executed. If I understand correctly, the only > reason to choose an illegal, trap or privileged insn is to halt > execution earlier rather than later when a program goes off in the > weeds. > >> If we want something that is a trap rather than a nop then use 0x0fe50553. >> >> That's "compare the value in r5 with 0x553 and then trap unconditionally". >> >> It shows up in objdump as: >> >> 10000000: 53 05 e5 0f twui r5,1363 >> >> >> The immediate can be anything, I chose that value to mimic the x86 value >> Mathieu mentioned. >> >> There's no reason that instruction would ever be generated because the >> immediate value serves no purpose. So it satisfies the "very unlikely >> to appear" criteria AFAICS. > > Yes, looks fine to me, except that in VLE mode (do we care?) > ".long 0x0fe50553" disassembles as > 0: 0f e5 se_cmphl r5,r30 > 2: 05 53 se_mullw r3,r5 > No illegal/trap/privileged insn there. > > ".long 0x0fe5000b" might be better to cover VLE. Can you share with us the objdump output of ".long 0x0fe5000b" in VLE mode ? VLE mode support does not appear to be available in typical toolchains. Also, is VLE mode only for powerpc 32 be, or also for powerpc 64 be/le ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-08 22:06 ` Carlos O'Donell 2019-04-09 4:28 ` Michael Ellerman @ 2019-04-09 16:34 ` Mathieu Desnoyers 1 sibling, 0 replies; 52+ messages in thread From: Mathieu Desnoyers @ 2019-04-09 16:34 UTC (permalink / raw) To: Carlos O'Donell Cc: Tulio Magno Quites Machado Filho, Florian Weimer, Michael Meissner, Alan Modra, Peter Bergner, Michael Ellerman, Paul Burton, Will Deacon, Boqun Feng, heiko carstens, gor, schwidefsky, Russell King, ARM Linux, Benjamin Herrenschmidt, Paul Mackerras, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api ----- On Apr 8, 2019, at 5:45 PM, Carlos O'Donell codonell@redhat.com wrote: > On 4/8/19 3:20 PM, Tulio Magno Quites Machado Filho wrote: >> Carlos O'Donell <codonell@redhat.com> writes: >> >>> On 4/5/19 5:16 AM, Florian Weimer wrote: >>>> * Carlos O'Donell: >>>>> It is valuable that it be a trap, particularly for constant pools because >>>>> it means that a jump into the constant pool will trap. >>>> >>>> Sorry, I don't understand why this matters in this context. Would you >>>> please elaborate? >>> >>> Sorry, I wasn't very clear. >>> >>> My point is only that any accidental jumps, either with off-by-one (like you >>> fixed in gcc/glibc's signal unwinding most recently), result in a process fault >>> rather than executing RSEQ_SIG as a valid instruction *and then* continuing >>> onwards to the handler. >>> >>> A process fault is achieved either by a trap, or an invalid instruction, or >>> a privileged insn (like suggested for MIPS in this thread). >> >> In that case, mtmsr (Move to Machine State Register) seems a good candidate. >> >> mtmsr is available both on 32 and 64 bits since their first implementations. >> >> It's a privileged instruction and should never appear in userspace >> code (causes SIGILL). >> >> Any comments? > > That seems good to me. > > Mathieu, > > What's required to move this forward for POWER? First, we need to decide whether we need rseq to support more than one signature per process to cover the VLE extension. If so, we'd need to extend rseq with a new flag for future kernels. Then, ideally, we'd need a patch on top of the Linux kernel tools/testing/selftests/rseq/rseq-ppc.h file that updates the signature value. I think the current discussion leads us towards a trap: /* * TODO: document instruction objdump output on each architecture and VLE * instruction sets. */ #define RSEQ_SIG 0x0fe5000b Should we do anything specific for big/little endian ? Is the byte order of the instruction encoding the same as data ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-02 6:02 ` Michael Ellerman 2019-04-02 7:08 ` Florian Weimer @ 2019-04-04 20:16 ` Carlos O'Donell 1 sibling, 0 replies; 52+ messages in thread From: Carlos O'Donell @ 2019-04-04 20:16 UTC (permalink / raw) To: Michael Ellerman, Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras Cc: carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api On 4/2/19 2:02 AM, Michael Ellerman wrote: > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes: >> Hi Carlos, >> >> ----- On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codonell@redhat.com wrote: > ... >> >> [...] >>>> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h >> [...] >>>> +/* Signature required before each abort handler code. */ >>>> +#define RSEQ_SIG 0x53053053 >>> >>> Why isn't this an opcode specific to power? >> >> On powerpc 32/64, the abort is placed in a __rseq_failure executable section: >> >> #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ >> ".pushsection __rseq_failure, \"ax\"\n\t" \ >> ".long " __rseq_str(RSEQ_SIG) "\n\t" \ >> __rseq_str(label) ":\n\t" \ >> "b %l[" __rseq_str(abort_label) "]\n\t" \ >> ".popsection\n\t" >> >> That section only contains snippets of those trampolines. Arguably, it would be >> good if disassemblers could find valid instructions there. Boqun Feng could perhaps >> shed some light on this signature choice ? Now would be a good time to decide >> once and for all whether a valid instruction would be a better choice. > > I'm a bit vague on what we're trying to do here. > > But it seems like you want some sort of "eye catcher" prior to the branch? > > That value is a valid instruction on current CPUs (rlwimi. > r5,r24,6,1,9), and even if it wasn't it could become one in future. > > If you change it to 0x8053530 that is both a valid instruction and is a > nop (conditional trap immediate but with no conditions set). The s390 IBM team needs to respond to this and I want to make sure they ACK the NOP being used here because it impacts them directly. I'd like to see Martin's opinion on this. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-03-25 15:54 ` Mathieu Desnoyers 2019-03-27 9:16 ` Martin Schwidefsky 2019-04-02 6:02 ` Michael Ellerman @ 2019-04-04 20:50 ` Carlos O'Donell [not found] ` <20190404214151.6ogrm34dok52az4h@pburton-laptop> 2 siblings, 1 reply; 52+ messages in thread From: Carlos O'Donell @ 2019-04-04 20:50 UTC (permalink / raw) To: Mathieu Desnoyers, Paul Burton, Will Deacon, Boqun Feng, Heiko Carstens, Vasily Gorbik, Martin Schwidefsky, Russell King, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman Cc: carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api On 3/25/19 11:54 AM, Mathieu Desnoyers wrote: > Hi Carlos, > > ----- On Mar 22, 2019, at 4:09 PM, Carlos O'Donell codonell@redhat.com wrote: > > [...] > > I took care of all your comments for an upcoming round of patches, except the > following that remain open (see answer inline). I'm adding Linux maintainers > for ARM, aarch64, MIPS, powerpc, s390, x86 in CC to discuss the choice of > code signature prior to the abort handler for each of those architectures. Thank you for kicking off this conversation. Every architecture should have a reasonable RSEQ_SIG that applies to their ISA with a comment about why that instruction was chosen. It should be a conscious choice, without a default. > * Support for automatically registering threads with the Linux rseq(2) > system call has been added. This system call is implemented starting > from Linux 4.18. The Restartable Sequences ABI accelerates user-space > operations on per-cpu data. It allows user-space to perform updates > on per-cpu data without requiring heavy-weight atomic operations. See > https://www.efficios.com/blog/2019/02/08/linux-restartable-sequences/ > for further explanation. > > In order to be activated, it requires that glibc is built against > kernel headers that include this system call, and that glibc detects > availability of that system call at runtime. Suggest: * Support for automatically registering threads with the Linux rseq(2) system call has been added. This system call is implemented starting from Linux 4.18. The Restartable Sequences ABI accelerates user-space operations on per-cpu data. It allows user-space to perform updates on per-cpu data without requiring heavy-weight atomic operations. Automatically registering threads allows all libraries, including libc, to make immediate use of the rseq(2) support by using the documented ABI. See 'man 2 rseq' for the details of the ABI shared between libc and the kernel. > > For reference the assembly code I'm pointing at below can be found > in the Linux selftests under: > > tools/testing/selftests/rseq/rseq-*.h OK. >>> +++ b/sysdeps/unix/sysv/linux/arm/bits/rseq.h > [...] >>> + >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why isn't this an arm specific op code? Does the user have to mark this >> up as part of a constant pool when placing it in front of the abort handler >> to avoid disassembling the constant as code? This was at one point required >> to get gdb to work properly. >> > > For arm, the abort is defined as: > > #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ > abort_label, version, flags, \ > start_ip, post_commit_offset, abort_ip) \ > ".balign 32\n\t" \ > __rseq_str(table_label) ":\n\t" \ > ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ > ".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \ > ".word " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > teardown \ > "b %l[" __rseq_str(abort_label) "]\n\t" > > Which contains a copy of the struct rseq_cs for that critical section > close to the actual critical section, within the code, followed by the > signature. The reason why we have a copy of the struct rseq_cs there is > to speed up entry into the critical section by using the "adr" instruction > to compute the address to store into __rseq_cs->rseq_cs. > > AFAIU, a literal pool on ARM is defined as something which is always > jumped over (never executed), which is the case here. We always have > an unconditional branch instruction ("b") skipping over each > RSEQ_ASM_DEFINE_ABORT(). > > Therefore, given that the signature is part of a literal pool on ARM, > it can be any value we choose and should not need to be an actual valid > instruction. > > aarch64 defines the abort as: > > #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ > " b 222f\n" \ > " .inst " __rseq_str(RSEQ_SIG) "\n" \ > __rseq_str(label) ":\n" \ > " b %l[" __rseq_str(abort_label) "]\n" \ > "222:\n" > > Where the signature actually maps to a valid instruction. Considering that > aarch64 also have literal pools, and we branch over the signature, I wonder > why it's so important to ensure the signature is a valid trap instruction. > Perhaps Will Deacon can enlighten us ? In the event that you accidentally jump to it then you trap? However, you want an *uncommon* trap insn. I think the order of preference is: 1. An uncommon insn (with random immediate values), in a literal pool, that is not a useful ROP/JOP sequence (very uncommon) 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) 2b. A NOP to avoid affecting speculative execution (maybe uncommon) With 2a/2b being roughly equivalent depending on speculative execution policy. >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why isn't this a mips-specific op code? > > MIPS also has a literal pool just before the abort handler, and it > jumps over it. My understanding is that we can use any signature value > we want, and it does not need to be a valid instruction, similarly to ARM: > > #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ > abort_label, version, flags, \ > start_ip, post_commit_offset, abort_ip) \ > ".balign 32\n\t" \ > __rseq_str(table_label) ":\n\t" \ > ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ > LONG " " U32_U64_PAD(__rseq_str(start_ip)) "\n\t" \ > LONG " " U32_U64_PAD(__rseq_str(post_commit_offset)) "\n\t" \ > LONG " " U32_U64_PAD(__rseq_str(abort_ip)) "\n\t" \ > ".word " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > teardown \ > "b %l[" __rseq_str(abort_label) "]\n\t" > > Perhaps Paul Burton can confirm this ? Yes please. You also want to avoid the value being a valid MIPS insn that's common. Did you check that? > [...] >>> +++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h > [...] >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why isn't this an opcode specific to power? > > On powerpc 32/64, the abort is placed in a __rseq_failure executable section: > > #define RSEQ_ASM_DEFINE_ABORT(label, abort_label) \ > ".pushsection __rseq_failure, \"ax\"\n\t" \ > ".long " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > "b %l[" __rseq_str(abort_label) "]\n\t" \ > ".popsection\n\t" > > That section only contains snippets of those trampolines. Arguably, it would be > good if disassemblers could find valid instructions there. Boqun Feng could perhaps > shed some light on this signature choice ? Now would be a good time to decide > once and for all whether a valid instruction would be a better choice. This seems questionable too. > [...] >>> +++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h > [...] >>> + >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why not a s390 specific value here? > > s390 also has the abort handler in a __rseq_failure section: > > #define RSEQ_ASM_DEFINE_ABORT(label, teardown, abort_label) \ > ".pushsection __rseq_failure, \"ax\"\n\t" \ > ".long " __rseq_str(RSEQ_SIG) "\n\t" \ > __rseq_str(label) ":\n\t" \ > teardown \ > "j %l[" __rseq_str(abort_label) "]\n\t" \ > ".popsection\n\t" > > Same question applies as powerpc: since disassemblers will try to decode > that instruction, would it be better to define it as a valid one ? Yes, I think it needs to be a valid uncommon insn or nop. > [...] >>> +++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h > [...] >>> +/* Signature required before each abort handler code. */ >>> +#define RSEQ_SIG 0x53053053 >> >> Why not an x86-specific op code? > > On x86, we use this 4-byte signature as operand to a "no-op" instruction > taking 4-byte immediate operand: That makes perfect sense. Thanks. So what is left to audit? In summary: - Why does AArch64 choose a trap? - Is the current choice of 0x53053053 OK for MIPS? Does it map to a valid insn? - What better choice is there for power? Pick a real uncommon insn or nop? - What better choice is there for s390? Pick a real uncommon insn or nop? - Todays choice could become something special in the future since it's unassigned. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
[parent not found: <20190404214151.6ogrm34dok52az4h@pburton-laptop>]
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) [not found] ` <20190404214151.6ogrm34dok52az4h@pburton-laptop> @ 2019-04-09 17:24 ` Mathieu Desnoyers 2019-04-18 22:42 ` Mathieu Desnoyers 1 sibling, 0 replies; 52+ messages in thread From: Mathieu Desnoyers @ 2019-04-09 17:24 UTC (permalink / raw) To: Paul Burton Cc: Carlos O'Donell, Will Deacon, Boqun Feng, heiko carstens, gor, schwidefsky, Russell King, ARM Linux, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api ----- On Apr 4, 2019, at 5:41 PM, Paul Burton paul.burton@mips.com wrote: > Hi Carlos / all, > > On Thu, Apr 04, 2019 at 04:50:08PM -0400, Carlos O'Donell wrote: >> > > > +/* Signature required before each abort handler code. */ >> > > > +#define RSEQ_SIG 0x53053053 >> > > >> > > Why isn't this a mips-specific op code? >> > >> > MIPS also has a literal pool just before the abort handler, and it >> > jumps over it. My understanding is that we can use any signature value >> > we want, and it does not need to be a valid instruction, similarly to ARM: >> > >> > #define __RSEQ_ASM_DEFINE_ABORT(table_label, label, teardown, \ >> > abort_label, version, flags, \ >> > start_ip, post_commit_offset, abort_ip) \ >> > ".balign 32\n\t" \ >> > __rseq_str(table_label) ":\n\t" \ >> > ".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \ >> > LONG " " U32_U64_PAD(__rseq_str(start_ip)) "\n\t" \ >> > LONG " " U32_U64_PAD(__rseq_str(post_commit_offset)) "\n\t" \ >> > LONG " " U32_U64_PAD(__rseq_str(abort_ip)) "\n\t" \ >> > ".word " __rseq_str(RSEQ_SIG) "\n\t" \ >> > __rseq_str(label) ":\n\t" \ >> > teardown \ >> > "b %l[" __rseq_str(abort_label) "]\n\t" >> > >> > Perhaps Paul Burton can confirm this ? >> >> Yes please. >> >> You also want to avoid the value being a valid MIPS insn that's common. >> >> Did you check that? > > This does not decode as a standard MIPS instruction, though it does > decode for both the microMIPS (ori) & nanoMIPS (lwxs; sll) ISAs. > > I imagine I copied the value from another architecture when porting, and > since it doesn't get executed it seemed fine. > > One maybe nicer option along the same lines would be 0x72736571 or > 0x71657372 (ASCII 'rseq') neither of which decode as a MIPS instruction. > >> I think the order of preference is: >> >> 1. An uncommon insn (with random immediate values), in a literal pool, that is >> not a useful ROP/JOP sequence (very uncommon) > > For that option on MIPS we could do something like: > > sll $0, $0, 31 # effectively a nop, but looks weird > >> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) > > Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS > & classic MIPS) so that could be something like: > > break 0x7273 # ASCII 'rs' > > That's pretty unlikely to be seen in normal code, or the teq instruction > has a rarely used code field (4b in microMIPS, 5b in nanoMIPS, 10b in > classic MIPS) that's meaningless to hardware so something like this > would be possible: > > teq $0, $0, 0x8 # ASCII backspace > >> 2b. A NOP to avoid affecting speculative execution (maybe uncommon) >> >> With 2a/2b being roughly equivalent depending on speculative execution policy. > > There are a bunch of potential odd looking nops possible, one of which > would be the sll I mentioned above. > > Another option would be to use a priveleged instruction which userland > code can't execute & should normally never contain. That would decode as > a valid instruction & effectively behave like a trap instruction but > look very odd to anyone reading disassembled code. eg: > > mfc0 $0, 13 # Try to read the cause register; take SIGILL > > In order to handle MIPS vs microMIPS vs nanoMIPS differences I'm > thinking it may be best to switch to one of these real instructions that > looks strange. The ugly part would be the nest of #ifdef's to deal with > endianness & ISA when defining it as a number... Note that we can have different signatures for each sub-architecture, as long as they don't have to co-exist within the same process. Ideally we'd need a patch on top of the Linux kernel tools/testing/selftests/rseq/rseq-mips.h file that updates the signature value. I think the current discussion leads us towards a trap with unlikely immediate operand. Note that we can special-case with #ifdef for each sub-architecture and endianness if need be. /* * TODO: document trap instruction objdump output on each sub-architecture * instruction sets. */ #define RSEQ_SIG 0x######## Should we do anything specific for big/little endian ? Is the byte order of the instruction encoding the same as data ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) [not found] ` <20190404214151.6ogrm34dok52az4h@pburton-laptop> 2019-04-09 17:24 ` Mathieu Desnoyers @ 2019-04-18 22:42 ` Mathieu Desnoyers 2019-04-24 15:28 ` Mathieu Desnoyers 1 sibling, 1 reply; 52+ messages in thread From: Mathieu Desnoyers @ 2019-04-18 22:42 UTC (permalink / raw) To: Paul Burton Cc: Carlos O'Donell, Will Deacon, Boqun Feng, heiko carstens, gor, schwidefsky, Russell King, ARM Linux, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api ----- On Apr 4, 2019, at 5:41 PM, Paul Burton paul.burton@mips.com wrote: [...] >> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) > > Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS > & classic MIPS) so that could be something like: > > break 0x7273 # ASCII 'rs' > Hi Paul, I like this uncommon break instruction as signature choice. However, if I try to compile assembler with a break 0x7273 instruction with mips64 and mips32 toolchains (gcc version 8.2.0 (Ubuntu 8.2.0-1ubuntu2~18.04)) I get: /tmp/ccVh9F7T.s: Assembler messages: /tmp/ccVh9F7T.s:24: Error: operand 1 out of range `break 0x7273' It works up to the value 0x3FF, which seems to use the top 10 code bits: a: 03ff 0007 break 0x3ff Would a "break 0x350" be a good choice as well ? Any idea why 0x7273 is not accepted by my assembler ? I also tried crafting the assembler with values between 0x3FF and 0x7273 in the 20 code bits. It seems fine from an objdump perspective: ".long 0x03FFFC7\n\t" generates: 10: 003f ffc7 break 0x3f,0x3ff What I don't understand is why the instruction generated by my toolchain ends with the last 6 bits "000111", whereas the mips32 instruction set specifies break as ending with "001101" [1]. What am I missing ? Also, the nanomips break code [2] has a completely different instruction layout. Should we use a different signature when compiling for nanomips ? What #ifdef should we use ? Do I need a special toolchain to generate nanomips binaries ? Thanks, Mathieu [1] http://hades.mech.northwestern.edu/images/1/16/MIPS32_Architecture_Volume_II-A_Instruction_Set.pdf [2] https://s3-eu-west-1.amazonaws.com/downloads-mips/I7200/I7200+product+launch/MIPS_nanomips32_ISA_TRM_01_01_MD01247.pdf -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) 2019-04-18 22:42 ` Mathieu Desnoyers @ 2019-04-24 15:28 ` Mathieu Desnoyers [not found] ` <20190424231303.zu2irxd5g3v7yqey@pburton-laptop> 0 siblings, 1 reply; 52+ messages in thread From: Mathieu Desnoyers @ 2019-04-24 15:28 UTC (permalink / raw) To: Paul Burton Cc: Carlos O'Donell, Will Deacon, Boqun Feng, heiko carstens, gor, schwidefsky, Russell King, ARM Linux, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api ----- On Apr 18, 2019, at 2:58 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > ----- On Apr 4, 2019, at 5:41 PM, Paul Burton paul.burton@mips.com wrote: > [...] > >>> 2a. A uncommon TRAP hopefully with some immediate data encoded (maybe uncommon) >> >> Our break instruction has a 19b immediate in nanoMIPS (20b for microMIPS >> & classic MIPS) so that could be something like: >> >> break 0x7273 # ASCII 'rs' >> > > Hi Paul, > > I like this uncommon break instruction as signature choice. > > However, if I try to compile assembler with a break 0x7273 instruction > with mips64 and mips32 toolchains (gcc version 8.2.0 (Ubuntu > 8.2.0-1ubuntu2~18.04)) > I get: > > /tmp/ccVh9F7T.s: Assembler messages: > /tmp/ccVh9F7T.s:24: Error: operand 1 out of range `break 0x7273' > > It works up to the value 0x3FF, which seems to use the top 10 > code bits: > > a: 03ff 0007 break 0x3ff > > Would a "break 0x350" be a good choice as well ? > > Any idea why 0x7273 is not accepted by my assembler ? > > I also tried crafting the assembler with values between 0x3FF and 0x7273 > in the 20 code bits. It seems fine from an objdump perspective: > > ".long 0x03FFFC7\n\t" > > generates: > > 10: 003f ffc7 break 0x3f,0x3ff > > What I don't understand is why the instruction generated by my > toolchain ends with the last 6 bits "000111", whereas the mips32 > instruction set specifies break as ending with "001101" [1]. > What am I missing ? > > Also, the nanomips break code [2] has a completely different > instruction layout. Should we use a different signature when > compiling for nanomips ? What #ifdef should we use ? Do I > need a special toolchain to generate nanomips binaries ? Hi Paul, I'm still waiting for feedback on the MIPS front. Meanwhile, I plan to use #define RSEQ_SIG 0x0350000d which maps to: 0350000d break 0x350 and use RSEQ_SIG in assembly with: ".word " __rseq_str(RSEQ_SIG) "\n\t" on big and little endian MIPS, for MIPS32 and MIPS64, based on code generated with gcc version 8.2.0 (Ubuntu 8.2.0-1ubuntu2~18.04). Let me know if it needs to be tweaked. Thanks, Mathieu > > Thanks, > > Mathieu > > [1] > http://hades.mech.northwestern.edu/images/1/16/MIPS32_Architecture_Volume_II-A_Instruction_Set.pdf > [2] > https://s3-eu-west-1.amazonaws.com/downloads-mips/I7200/I7200+product+launch/MIPS_nanomips32_ISA_TRM_01_01_MD01247.pdf > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 52+ messages in thread
[parent not found: <20190424231303.zu2irxd5g3v7yqey@pburton-laptop>]
* Re: [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) [not found] ` <20190424231303.zu2irxd5g3v7yqey@pburton-laptop> @ 2019-04-25 1:57 ` Maciej W. Rozycki 0 siblings, 0 replies; 52+ messages in thread From: Maciej W. Rozycki @ 2019-04-25 1:57 UTC (permalink / raw) To: Paul Burton Cc: Mathieu Desnoyers, Carlos O'Donell, Will Deacon, Boqun Feng, heiko carstens, gor, schwidefsky, Russell King, ARM Linux, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Dave Watson, Paul Turner, Rich Felker, linux-kernel, linux-api On Wed, 24 Apr 2019, Paul Burton wrote: > > > Any idea why 0x7273 is not accepted by my assembler ? > > I don't know why the assembler wants a smaller immediate than the > instruction encoding allows... There's a comment in the binutils file > include/opcode/mips.h that reads: > > > A breakpoint instruction uses OP, CODE and SPEC (10 bits of the > > breakpoint instruction are not defined; Kane says the breakpoint code > > field in BREAK is 20 bits; yet MIPS assemblers and debuggers only use > > ten bits). An optional two-operand form of break/sdbbp allows the > > lower ten bits to be set too, and MIPS32 and later architectures allow > > 20 bits to be set with a signal operand (using CODE20). > > I suspect there's some history here that predates my involvement (or > possibly just predates me). A useful explanation is in the Linux kernel (always good to look there), in arch/mips/kernel/traps.c: /* * There is the ancient bug in the MIPS assemblers that the break * code starts left to bit 16 instead to bit 6 in the opcode. * Gas is bug-compatible, but not always, grrr... * We handle both cases with a simple heuristics. --macro */ Unfortunately the bug has been carried over to the microMIPS instruction encoding in libopcodes for no reason (i.e. likely by copying the table mechanically without analysing it) and I didn't catch it when upstreaming. We should have permitted setting all bits in the 20-bit code field in the microMIPS encoding with a single operand, but you need two, like with the regular MIPS instruction set. The note on the MIPS32 assembly ISA permitting to set all the 20 bits with a single operand is a stale comment referring to the situation before binutils commit 1586d91e32ea ("/ 0 should send SIGFPE not SIGTRAP..."), <https://sourceware.org/ml/binutils/2004-07/msg00260.html>, which addressed a user ABI compatibility issue as discussed upthread here: <https://sourceware.org/ml/binutils/2004-06/msg00188.html> and previously: <https://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i=40C9F5A4.2050606%40avtrex.com>. As this is my mistake with the stale note, I have applied a fix to binutils now, commit cd0923370be1 ("MIPS/include: opcode/mips.h: Update stale comment for CODE20 operand"), so that it is clear that it is only SDBBP that accepts a single 20-bit operand for the code field (for the MIPS32 and later ISAs). FWIW, Maciej ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 3/4] support record failure: allow use from constructor 2019-02-12 19:43 [PATCH 0/4] Restartable Sequences support for glibc 2.30 Mathieu Desnoyers 2019-02-12 19:43 ` [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) Mathieu Desnoyers @ 2019-02-12 19:43 ` Mathieu Desnoyers 2019-03-22 20:30 ` Carlos O'Donell 2019-02-12 19:43 ` [PATCH 2/4] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux Mathieu Desnoyers ` (3 subsequent siblings) 5 siblings, 1 reply; 52+ messages in thread From: Mathieu Desnoyers @ 2019-02-12 19:43 UTC (permalink / raw) To: Carlos O'Donell Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Mathieu Desnoyers Expose support_record_failure_init () so constructors can explicitly initialize the record failure API. This is preferred to lazy initialization at first use, because lazy initialization does not cover use in constructors within forked children processes (forked from parent constructor). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Carlos O'Donell <carlos@redhat.com> CC: Florian Weimer <fweimer@redhat.com> CC: Joseph Myers <joseph@codesourcery.com> CC: Szabolcs Nagy <szabolcs.nagy@arm.com> CC: libc-alpha@sourceware.org --- support/check.h | 4 ++++ support/support_record_failure.c | 18 +++++++++++++----- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/support/check.h b/support/check.h index eb3d2487a7..f8684a477a 100644 --- a/support/check.h +++ b/support/check.h @@ -88,6 +88,10 @@ void support_test_verify_exit_impl (int status, const char *file, int line, does not support reporting failures from a DSO. */ void support_record_failure (void); +/* Initialize record failure. Calling this is only needed when + recording failures from constructors. */ +void support_record_failure_init (void); + /* Static assertion, under a common name for both C++ and C11. */ #ifdef __cplusplus # define support_static_assert static_assert diff --git a/support/support_record_failure.c b/support/support_record_failure.c index a8ffd1fb7d..86d9c71409 100644 --- a/support/support_record_failure.c +++ b/support/support_record_failure.c @@ -32,8 +32,12 @@ zero, the failure of a test can be detected. The init constructor function below puts *state on a shared - annonymous mapping, so that failure reports from subprocesses - propagate to the parent process. */ + anonymous mapping, so that failure reports from subprocesses + propagate to the parent process. + + support_record_failure_init is exposed so it can be called explicitly + in case this API needs to be used from a constructor. */ + struct test_failures { unsigned int counter; @@ -41,10 +45,14 @@ struct test_failures }; static struct test_failures *state; -static __attribute__ ((constructor)) void -init (void) +__attribute__ ((constructor)) void +support_record_failure_init (void) { - void *ptr = mmap (NULL, sizeof (*state), PROT_READ | PROT_WRITE, + void *ptr; + + if (state != NULL) + return; + ptr = mmap (NULL, sizeof (*state), PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0); if (ptr == MAP_FAILED) { -- 2.17.1 ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 3/4] support record failure: allow use from constructor 2019-02-12 19:43 ` [PATCH 3/4] support record failure: allow use from constructor Mathieu Desnoyers @ 2019-03-22 20:30 ` Carlos O'Donell 0 siblings, 0 replies; 52+ messages in thread From: Carlos O'Donell @ 2019-03-22 20:30 UTC (permalink / raw) To: Mathieu Desnoyers, Carlos O'Donell Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha On 2/12/19 2:42 PM, Mathieu Desnoyers wrote: > Expose support_record_failure_init () so constructors can explicitly > initialize the record failure API. > > This is preferred to lazy initialization at first use, because > lazy initialization does not cover use in constructors within > forked children processes (forked from parent constructor). LGTM. Blocked by patch 1. Reviewed-by: Carlos O'Donell <carlos@redhat.com> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > CC: Carlos O'Donell <carlos@redhat.com> > CC: Florian Weimer <fweimer@redhat.com> > CC: Joseph Myers <joseph@codesourcery.com> > CC: Szabolcs Nagy <szabolcs.nagy@arm.com> > CC: libc-alpha@sourceware.org > --- > support/check.h | 4 ++++ > support/support_record_failure.c | 18 +++++++++++++----- > 2 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/support/check.h b/support/check.h > index eb3d2487a7..f8684a477a 100644 > --- a/support/check.h > +++ b/support/check.h > @@ -88,6 +88,10 @@ void support_test_verify_exit_impl (int status, const char *file, int line, > does not support reporting failures from a DSO. */ > void support_record_failure (void); > > +/* Initialize record failure. Calling this is only needed when > + recording failures from constructors. */ > +void support_record_failure_init (void); OK. > + > /* Static assertion, under a common name for both C++ and C11. */ > #ifdef __cplusplus > # define support_static_assert static_assert > diff --git a/support/support_record_failure.c b/support/support_record_failure.c > index a8ffd1fb7d..86d9c71409 100644 > --- a/support/support_record_failure.c > +++ b/support/support_record_failure.c > @@ -32,8 +32,12 @@ > zero, the failure of a test can be detected. > > The init constructor function below puts *state on a shared > - annonymous mapping, so that failure reports from subprocesses > - propagate to the parent process. */ > + anonymous mapping, so that failure reports from subprocesses > + propagate to the parent process. > + > + support_record_failure_init is exposed so it can be called explicitly > + in case this API needs to be used from a constructor. */ > + > struct test_failures > { > unsigned int counter; > @@ -41,10 +45,14 @@ struct test_failures > }; > static struct test_failures *state; > > -static __attribute__ ((constructor)) void > -init (void) > +__attribute__ ((constructor)) void > +support_record_failure_init (void) > { > - void *ptr = mmap (NULL, sizeof (*state), PROT_READ | PROT_WRITE, > + void *ptr; > + > + if (state != NULL) > + return; > + ptr = mmap (NULL, sizeof (*state), PROT_READ | PROT_WRITE, OK. > MAP_ANONYMOUS | MAP_SHARED, -1, 0); > if (ptr == MAP_FAILED) > { > -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 2/4] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux 2019-02-12 19:43 [PATCH 0/4] Restartable Sequences support for glibc 2.30 Mathieu Desnoyers 2019-02-12 19:43 ` [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) Mathieu Desnoyers 2019-02-12 19:43 ` [PATCH 3/4] support record failure: allow use from constructor Mathieu Desnoyers @ 2019-02-12 19:43 ` Mathieu Desnoyers 2019-03-22 20:14 ` Carlos O'Donell 2019-02-12 19:51 ` [PATCH 4/4] rseq registration tests (v2) Mathieu Desnoyers ` (2 subsequent siblings) 5 siblings, 1 reply; 52+ messages in thread From: Mathieu Desnoyers @ 2019-02-12 19:43 UTC (permalink / raw) To: Carlos O'Donell Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Mathieu Desnoyers, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner, linux-kernel, linux-api When available, use the cpu_id field from __rseq_abi on Linux to implement sched_getcpu(). Fall-back on the vgetcpu vDSO if unavailable. Benchmarks: x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading glibc sched_getcpu(): 13.7 ns (baseline) glibc sched_getcpu() using rseq: 2.5 ns (speedup: 5.5x) inline load cpuid from __rseq_abi TLS: 0.8 ns (speedup: 17.1x) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Carlos O'Donell <carlos@redhat.com> CC: Florian Weimer <fweimer@redhat.com> CC: Joseph Myers <joseph@codesourcery.com> CC: Szabolcs Nagy <szabolcs.nagy@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ben Maurer <bmaurer@fb.com> CC: Peter Zijlstra <peterz@infradead.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Boqun Feng <boqun.feng@gmail.com> CC: Will Deacon <will.deacon@arm.com> CC: Dave Watson <davejwatson@fb.com> CC: Paul Turner <pjt@google.com> CC: libc-alpha@sourceware.org CC: linux-kernel@vger.kernel.org CC: linux-api@vger.kernel.org --- sysdeps/unix/sysv/linux/sched_getcpu.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c index fb0d317f83..8bfb03778b 100644 --- a/sysdeps/unix/sysv/linux/sched_getcpu.c +++ b/sysdeps/unix/sysv/linux/sched_getcpu.c @@ -24,8 +24,8 @@ #endif #include <sysdep-vdso.h> -int -sched_getcpu (void) +static int +vsyscall_sched_getcpu (void) { #ifdef __NR_getcpu unsigned int cpu; @@ -37,3 +37,24 @@ sched_getcpu (void) return -1; #endif } + +#ifdef __NR_rseq +#include <linux/rseq.h> + +extern __attribute__ ((tls_model ("initial-exec"))) +__thread volatile struct rseq __rseq_abi; + +int +sched_getcpu (void) +{ + int cpu_id = __rseq_abi.cpu_id; + + return cpu_id >= 0 ? cpu_id : vsyscall_sched_getcpu (); +} +#else +int +sched_getcpu (void) +{ + return vsyscall_sched_getcpu (); +} +#endif -- 2.17.1 ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 2/4] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux 2019-02-12 19:43 ` [PATCH 2/4] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux Mathieu Desnoyers @ 2019-03-22 20:14 ` Carlos O'Donell 0 siblings, 0 replies; 52+ messages in thread From: Carlos O'Donell @ 2019-03-22 20:14 UTC (permalink / raw) To: Mathieu Desnoyers, Carlos O'Donell Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner, linux-kernel, linux-api On 2/12/19 2:42 PM, Mathieu Desnoyers wrote: > When available, use the cpu_id field from __rseq_abi on Linux to > implement sched_getcpu(). Fall-back on the vgetcpu vDSO if unavailable. > > Benchmarks: > > x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading This patch looks good to me for master, but is blocked on patch 1/4 being reworked. Reviewed-by: Carlos O'Donell <carlos@redhat.com> > glibc sched_getcpu(): 13.7 ns (baseline) > glibc sched_getcpu() using rseq: 2.5 ns (speedup: 5.5x) > inline load cpuid from __rseq_abi TLS: 0.8 ns (speedup: 17.1x) > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > CC: Carlos O'Donell <carlos@redhat.com> > CC: Florian Weimer <fweimer@redhat.com> > CC: Joseph Myers <joseph@codesourcery.com> > CC: Szabolcs Nagy <szabolcs.nagy@arm.com> > CC: Thomas Gleixner <tglx@linutronix.de> > CC: Ben Maurer <bmaurer@fb.com> > CC: Peter Zijlstra <peterz@infradead.org> > CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > CC: Boqun Feng <boqun.feng@gmail.com> > CC: Will Deacon <will.deacon@arm.com> > CC: Dave Watson <davejwatson@fb.com> > CC: Paul Turner <pjt@google.com> > CC: libc-alpha@sourceware.org > CC: linux-kernel@vger.kernel.org > CC: linux-api@vger.kernel.org > --- > sysdeps/unix/sysv/linux/sched_getcpu.c | 25 +++++++++++++++++++++++-- > 1 file changed, 23 insertions(+), 2 deletions(-) > > diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c > index fb0d317f83..8bfb03778b 100644 > --- a/sysdeps/unix/sysv/linux/sched_getcpu.c > +++ b/sysdeps/unix/sysv/linux/sched_getcpu.c > @@ -24,8 +24,8 @@ > #endif > #include <sysdep-vdso.h> > > -int > -sched_getcpu (void) > +static int > +vsyscall_sched_getcpu (void) OK. > { > #ifdef __NR_getcpu > unsigned int cpu; > @@ -37,3 +37,24 @@ sched_getcpu (void) > return -1; > #endif > } > + > +#ifdef __NR_rseq > +#include <linux/rseq.h> > + > +extern __attribute__ ((tls_model ("initial-exec"))) > +__thread volatile struct rseq __rseq_abi; OK. > + > +int > +sched_getcpu (void) > +{ > + int cpu_id = __rseq_abi.cpu_id; > + > + return cpu_id >= 0 ? cpu_id : vsyscall_sched_getcpu (); OK. Impressive :-) > +} > +#else > +int > +sched_getcpu (void) > +{ > + return vsyscall_sched_getcpu (); OK. > +} > +#endif > -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH 4/4] rseq registration tests (v2) 2019-02-12 19:43 [PATCH 0/4] Restartable Sequences support for glibc 2.30 Mathieu Desnoyers ` (2 preceding siblings ...) 2019-02-12 19:43 ` [PATCH 2/4] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux Mathieu Desnoyers @ 2019-02-12 19:51 ` Mathieu Desnoyers 2019-03-27 21:21 ` Carlos O'Donell 2019-03-22 17:34 ` [PATCH 0/4] Restartable Sequences support for glibc 2.30 Carlos O'Donell 2019-03-22 17:39 ` Florian Weimer 5 siblings, 1 reply; 52+ messages in thread From: Mathieu Desnoyers @ 2019-02-12 19:51 UTC (permalink / raw) To: Carlos O'Donell Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Mathieu Desnoyers, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner These tests validate that rseq is registered from various execution contexts (main thread, constructor, destructor, other threads, other threads created from constructor and destructor, forked process (without exec), pthread_atfork handlers, pthread setspecific destructors, C++ thread and process destructors, signal handlers, atexit handlers). tst-rseq.c only links against libc.so, testing registration of rseq in a non-multithreaded environment. tst-rseq-nptl.c also links against libpthread.so, testing registration of rseq in a multithreaded environment. See the Linux kernel selftests for extensive rseq stress-tests. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Carlos O'Donell <carlos@redhat.com> CC: Florian Weimer <fweimer@redhat.com> CC: Joseph Myers <joseph@codesourcery.com> CC: Szabolcs Nagy <szabolcs.nagy@arm.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ben Maurer <bmaurer@fb.com> CC: Peter Zijlstra <peterz@infradead.org> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Boqun Feng <boqun.feng@gmail.com> CC: Will Deacon <will.deacon@arm.com> CC: Dave Watson <davejwatson@fb.com> CC: Paul Turner <pjt@google.com> CC: libc-alpha@sourceware.org --- Changes since v1: - Rename tst-rseq.c to tst-rseq-nptl.c. - Introduce tst-rseq.c testing rseq registration in a non-multithreaded environment. --- sysdeps/unix/sysv/linux/Makefile | 4 +- sysdeps/unix/sysv/linux/tst-rseq-nptl.c | 381 ++++++++++++++++++++++++ sysdeps/unix/sysv/linux/tst-rseq.c | 110 +++++++ 3 files changed, 493 insertions(+), 2 deletions(-) create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl.c create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.c diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile index 5b541469ec..5f69f644a8 100644 --- a/sysdeps/unix/sysv/linux/Makefile +++ b/sysdeps/unix/sysv/linux/Makefile @@ -53,7 +53,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \ tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \ tst-quota tst-sync_file_range tst-sysconf-iov_max tst-ttyname \ test-errno-linux tst-memfd_create tst-mlock2 tst-pkey \ - tst-rlimit-infinity tst-ofdlocks + tst-rlimit-infinity tst-ofdlocks tst-rseq tests-internal += tst-ofdlocks-compat @@ -230,5 +230,5 @@ ifeq ($(subdir),nptl) tests += tst-align-clone tst-getpid1 \ tst-thread-affinity-pthread tst-thread-affinity-pthread2 \ tst-thread-affinity-sched -tests-internal += tst-setgetname +tests-internal += tst-setgetname tst-rseq-nptl endif diff --git a/sysdeps/unix/sysv/linux/tst-rseq-nptl.c b/sysdeps/unix/sysv/linux/tst-rseq-nptl.c new file mode 100644 index 0000000000..f4be4d2ae1 --- /dev/null +++ b/sysdeps/unix/sysv/linux/tst-rseq-nptl.c @@ -0,0 +1,381 @@ +/* Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +/* These tests validate that rseq is registered from various execution + contexts (main thread, constructor, destructor, other threads, other + threads created from constructor and destructor, forked process + (without exec), pthread_atfork handlers, pthread setspecific + destructors, C++ thread and process destructors, signal handlers, + atexit handlers). + + See the Linux kernel selftests for extensive rseq stress-tests. */ + +#include <sys/syscall.h> +#include <unistd.h> +#include <stdio.h> +#include <support/check.h> + +#ifdef __NR_rseq +#define HAS_RSEQ +#endif + +#ifdef HAS_RSEQ +#include <sys/rseq.h> +#include <pthread.h> +#include <syscall.h> +#include <stdlib.h> +#include <error.h> +#include <errno.h> +#include <string.h> +#include <stdint.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <signal.h> +#include <atomic.h> + +static pthread_key_t rseq_test_key; + +static int +rseq_thread_registered (void) +{ + return (int32_t) __rseq_abi.cpu_id >= 0; +} + +static int +do_rseq_main_test (void) +{ + if (raise (SIGUSR1)) + FAIL_EXIT1 ("error raising signal"); + if (pthread_setspecific (rseq_test_key, (void *) 1l)) + FAIL_EXIT1 ("error in pthread_setspecific"); + if (!rseq_thread_registered ()) + { + FAIL_RET ("rseq not registered in main thread"); + } + return 0; +} + +static void +cancel_routine (void *arg) +{ + if (!rseq_thread_registered ()) + { + printf ("rseq not registered in cancel routine\n"); + support_record_failure (); + } +} + +static int cancel_thread_ready; + +static void +test_cancel_thread (void) +{ + pthread_cleanup_push (cancel_routine, NULL); + atomic_store_release (&cancel_thread_ready, 1); + for (;;) + usleep (100); + pthread_cleanup_pop (0); +} + +static void * +thread_function (void * arg) +{ + int i = (int) (intptr_t) arg; + + if (raise (SIGUSR1)) + FAIL_EXIT1 ("error raising signal"); + if (i == 0) + test_cancel_thread (); + if (pthread_setspecific (rseq_test_key, (void *) 1l)) + FAIL_EXIT1 ("error in pthread_setspecific"); + return rseq_thread_registered () ? NULL : (void *) 1l; +} + +static void +sighandler (int sig) +{ + if (!rseq_thread_registered ()) + { + printf ("rseq not registered in signal handler\n"); + support_record_failure (); + } +} + +static void +setup_signals (void) +{ + struct sigaction sa; + + sigemptyset (&sa.sa_mask); + sigaddset (&sa.sa_mask, SIGUSR1); + sa.sa_flags = 0; + sa.sa_handler = sighandler; + if (sigaction (SIGUSR1, &sa, NULL) != 0) + { + FAIL_EXIT1 ("sigaction failure: %s", strerror (errno)); + } +} + +#define N 7 +static const int t[N] = { 1, 2, 6, 5, 4, 3, 50 }; + +static int +do_rseq_threads_test (int nr_threads) +{ + pthread_t th[nr_threads]; + int i; + int result = 0; + pthread_attr_t at; + + if (pthread_attr_init (&at) != 0) + { + FAIL_EXIT1 ("attr_init failed"); + } + + if (pthread_attr_setstacksize (&at, 1 * 1024 * 1024) != 0) + { + FAIL_EXIT1 ("attr_setstacksize failed"); + } + + cancel_thread_ready = 0; + for (i = 0; i < nr_threads; ++i) + if (pthread_create (&th[i], NULL, thread_function, + (void *) (intptr_t) i) != 0) + { + FAIL_EXIT1 ("creation of thread %d failed", i); + } + + if (pthread_attr_destroy (&at) != 0) + { + FAIL_EXIT1 ("attr_destroy failed"); + } + + while (!atomic_load_acquire (&cancel_thread_ready)) + usleep (100); + + if (pthread_cancel (th[0])) + FAIL_EXIT1 ("error in pthread_cancel"); + + for (i = 0; i < nr_threads; ++i) + { + void *v; + if (pthread_join (th[i], &v) != 0) + { + printf ("join of thread %d failed\n", i); + result = 1; + } + else if (i != 0 && v != NULL) + { + printf ("join %d successful, but child failed\n", i); + result = 1; + } + else if (i == 0 && v == NULL) + { + printf ("join %d successful, child did not fail as expected\n", i); + result = 1; + } + } + return result; +} + +static int +sys_rseq (volatile struct rseq *rseq_abi, uint32_t rseq_len, + int flags, uint32_t sig) +{ + return syscall (__NR_rseq, rseq_abi, rseq_len, flags, sig); +} + +static int +rseq_available (void) +{ + int rc; + + rc = sys_rseq (NULL, 0, 0, 0); + if (rc != -1) + FAIL_EXIT1 ("Unexpected rseq return value %d", rc); + switch (errno) + { + case ENOSYS: + return 0; + case EINVAL: + return 1; + default: + FAIL_EXIT1 ("Unexpected rseq error %s", strerror (errno)); + } +} + +static int +do_rseq_fork_test (void) +{ + int status; + pid_t pid, retpid; + + pid = fork (); + switch (pid) + { + case 0: + exit (do_rseq_main_test ()); + case -1: + FAIL_EXIT1 ("Unexpected fork error %s", strerror (errno)); + } + retpid = TEMP_FAILURE_RETRY (waitpid (pid, &status, 0)); + if (retpid != pid) + { + FAIL_EXIT1 ("waitpid returned %ld, expected %ld", + (long int) retpid, (long int) pid); + } + if (WEXITSTATUS (status)) + { + printf ("rseq not registered in child\n"); + return 1; + } + return 0; +} + +static int +do_rseq_test (void) +{ + int i, result = 0; + + if (!rseq_available ()) + { + FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test"); + } + setup_signals (); + if (raise (SIGUSR1)) + FAIL_EXIT1 ("error raising signal"); + if (do_rseq_main_test ()) + result = 1; + for (i = 0; i < N; i++) + { + if (do_rseq_threads_test (t[i])) + result = 1; + } + if (do_rseq_fork_test ()) + result = 1; + return result; +} + +static void +atfork_prepare (void) +{ + if (!rseq_thread_registered ()) + { + printf ("rseq not registered in pthread atfork prepare\n"); + support_record_failure (); + } +} + +static void +atfork_parent (void) +{ + if (!rseq_thread_registered ()) + { + printf ("rseq not registered in pthread atfork parent\n"); + support_record_failure (); + } +} + +static void +atfork_child (void) +{ + if (!rseq_thread_registered ()) + { + printf ("rseq not registered in pthread atfork child\n"); + support_record_failure (); + } +} + +static void +rseq_key_destructor (void *arg) +{ + /* Cannot use deferred failure reporting after main () returns. */ + if (!rseq_thread_registered ()) + FAIL_EXIT1 ("rseq not registered in pthread key destructor"); +} + +static void +do_rseq_create_key (void) +{ + if (pthread_key_create (&rseq_test_key, rseq_key_destructor)) + FAIL_EXIT1 ("error in pthread_key_create"); +} + +static void +do_rseq_delete_key (void) +{ + if (pthread_key_delete (rseq_test_key)) + FAIL_EXIT1 ("error in pthread_key_delete"); +} + +static void +atexit_handler (void) +{ + /* Cannot use deferred failure reporting after main () returns. */ + if (!rseq_thread_registered ()) + FAIL_EXIT1 ("rseq not registered in atexit handler"); +} + +static void __attribute__ ((constructor)) +do_rseq_constructor_test (void) +{ + support_record_failure_init (); + if (atexit (atexit_handler)) + { + FAIL_EXIT1 ("error calling atexit"); + } + do_rseq_create_key (); + if (pthread_atfork (atfork_prepare, atfork_parent, atfork_child)) + FAIL_EXIT1 ("error calling pthread_atfork"); + if (do_rseq_test ()) + FAIL_EXIT1 ("rseq not registered within constructor"); +} + +static void __attribute__ ((destructor)) +do_rseq_destructor_test (void) +{ + /* Cannot use deferred failure reporting after main () returns. */ + if (do_rseq_test ()) + FAIL_EXIT1 ("rseq not registered within destructor"); + do_rseq_delete_key (); +} + +/* Test C++ destructor called at thread and process exit. */ +void +__call_tls_dtors (void) +{ + /* Cannot use deferred failure reporting after main () returns. */ + if (!rseq_thread_registered ()) + FAIL_EXIT1 ("rseq not registered in C++ thread/process exit destructor"); +} +#else +static int +do_rseq_test (void) +{ + FAIL_UNSUPPORTED ("kernel headers do not support rseq, skipping test"); + return 0; +} +#endif + +static int +do_test (void) +{ + return do_rseq_test (); +} + +#include <support/test-driver.c> diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c new file mode 100644 index 0000000000..aaa1ad79fe --- /dev/null +++ b/sysdeps/unix/sysv/linux/tst-rseq.c @@ -0,0 +1,110 @@ +/* Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +/* These tests validate that rseq is registered from main in an executable + not linked against libpthread. */ + +#include <sys/syscall.h> +#include <unistd.h> +#include <stdio.h> +#include <support/check.h> + +#ifdef __NR_rseq +#define HAS_RSEQ +#endif + +#ifdef HAS_RSEQ +#include <sys/rseq.h> +#include <syscall.h> +#include <stdlib.h> +#include <error.h> +#include <errno.h> +#include <stdint.h> +#include <string.h> + +static int +rseq_thread_registered (void) +{ + return (int32_t) __rseq_abi.cpu_id >= 0; +} + +static int +do_rseq_main_test (void) +{ + if (!rseq_thread_registered ()) + { + FAIL_RET ("rseq not registered in main thread"); + } + return 0; +} + +static int +sys_rseq (volatile struct rseq *rseq_abi, uint32_t rseq_len, + int flags, uint32_t sig) +{ + return syscall (__NR_rseq, rseq_abi, rseq_len, flags, sig); +} + +static int +rseq_available (void) +{ + int rc; + + rc = sys_rseq (NULL, 0, 0, 0); + if (rc != -1) + FAIL_EXIT1 ("Unexpected rseq return value %d", rc); + switch (errno) + { + case ENOSYS: + return 0; + case EINVAL: + return 1; + default: + FAIL_EXIT1 ("Unexpected rseq error %s", strerror (errno)); + } +} + +static int +do_rseq_test (void) +{ + int result = 0; + + if (!rseq_available ()) + { + FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test"); + } + if (do_rseq_main_test ()) + result = 1; + return result; +} +#else +static int +do_rseq_test (void) +{ + FAIL_UNSUPPORTED ("kernel headers do not support rseq, skipping test"); + return 0; +} +#endif + +static int +do_test (void) +{ + return do_rseq_test (); +} + +#include <support/test-driver.c> -- 2.17.1 ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 4/4] rseq registration tests (v2) 2019-02-12 19:51 ` [PATCH 4/4] rseq registration tests (v2) Mathieu Desnoyers @ 2019-03-27 21:21 ` Carlos O'Donell 2019-04-04 20:04 ` Mathieu Desnoyers 0 siblings, 1 reply; 52+ messages in thread From: Carlos O'Donell @ 2019-03-27 21:21 UTC (permalink / raw) To: Mathieu Desnoyers, Carlos O'Donell Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner On 2/12/19 2:42 PM, Mathieu Desnoyers wrote: > These tests validate that rseq is registered from various execution > contexts (main thread, constructor, destructor, other threads, other > threads created from constructor and destructor, forked process > (without exec), pthread_atfork handlers, pthread setspecific > destructors, C++ thread and process destructors, signal handlers, > atexit handlers). Thanks, this set of tests looks good, and covers the main uses. > tst-rseq.c only links against libc.so, testing registration of rseq in > a non-multithreaded environment. OK. > tst-rseq-nptl.c also links against libpthread.so, testing registration > of rseq in a multithreaded environment. OK. > See the Linux kernel selftests for extensive rseq stress-tests. Yeah, that's where they should be, here we just want to exercise the basic set of tests to ensure that integration with glibc is working as expected. This patch looks good to me, but is blocked on patch 1 acceptance, Cleanup the following: - One line short dedscriptions for files. - Removal of contributed by lines. - Answer small thread stack question. and I'd say we're ready with this patch too. Reviewed-by: Carlos O'Donell <carlos@redhat.com> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > CC: Carlos O'Donell <carlos@redhat.com> > CC: Florian Weimer <fweimer@redhat.com> > CC: Joseph Myers <joseph@codesourcery.com> > CC: Szabolcs Nagy <szabolcs.nagy@arm.com> > CC: Thomas Gleixner <tglx@linutronix.de> > CC: Ben Maurer <bmaurer@fb.com> > CC: Peter Zijlstra <peterz@infradead.org> > CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > CC: Boqun Feng <boqun.feng@gmail.com> > CC: Will Deacon <will.deacon@arm.com> > CC: Dave Watson <davejwatson@fb.com> > CC: Paul Turner <pjt@google.com> > CC: libc-alpha@sourceware.org > --- > Changes since v1: > - Rename tst-rseq.c to tst-rseq-nptl.c. > - Introduce tst-rseq.c testing rseq registration in a non-multithreaded > environment. > --- > sysdeps/unix/sysv/linux/Makefile | 4 +- > sysdeps/unix/sysv/linux/tst-rseq-nptl.c | 381 ++++++++++++++++++++++++ > sysdeps/unix/sysv/linux/tst-rseq.c | 110 +++++++ > 3 files changed, 493 insertions(+), 2 deletions(-) > create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl.c > create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.c > > diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile > index 5b541469ec..5f69f644a8 100644 > --- a/sysdeps/unix/sysv/linux/Makefile > +++ b/sysdeps/unix/sysv/linux/Makefile > @@ -53,7 +53,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \ > tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \ > tst-quota tst-sync_file_range tst-sysconf-iov_max tst-ttyname \ > test-errno-linux tst-memfd_create tst-mlock2 tst-pkey \ > - tst-rlimit-infinity tst-ofdlocks > + tst-rlimit-infinity tst-ofdlocks tst-rseq OK. > tests-internal += tst-ofdlocks-compat > > > @@ -230,5 +230,5 @@ ifeq ($(subdir),nptl) > tests += tst-align-clone tst-getpid1 \ > tst-thread-affinity-pthread tst-thread-affinity-pthread2 \ > tst-thread-affinity-sched > -tests-internal += tst-setgetname > +tests-internal += tst-setgetname tst-rseq-nptl OK. > endif > diff --git a/sysdeps/unix/sysv/linux/tst-rseq-nptl.c b/sysdeps/unix/sysv/linux/tst-rseq-nptl.c > new file mode 100644 > index 0000000000..f4be4d2ae1 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/tst-rseq-nptl.c > @@ -0,0 +1,381 @@ Need a one line short description. > +/* Copyright (C) 2018 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. As noted in the other reviews, please remove the contributed by lines. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +/* These tests validate that rseq is registered from various execution > + contexts (main thread, constructor, destructor, other threads, other > + threads created from constructor and destructor, forked process > + (without exec), pthread_atfork handlers, pthread setspecific > + destructors, C++ thread and process destructors, signal handlers, > + atexit handlers). > + > + See the Linux kernel selftests for extensive rseq stress-tests. */ OK. > + > +#include <sys/syscall.h> > +#include <unistd.h> > +#include <stdio.h> > +#include <support/check.h> > + > +#ifdef __NR_rseq > +#define HAS_RSEQ > +#endif > + > +#ifdef HAS_RSEQ > +#include <sys/rseq.h> > +#include <pthread.h> > +#include <syscall.h> > +#include <stdlib.h> > +#include <error.h> > +#include <errno.h> > +#include <string.h> > +#include <stdint.h> > +#include <sys/types.h> > +#include <sys/wait.h> > +#include <signal.h> > +#include <atomic.h> > + > +static pthread_key_t rseq_test_key; OK. > + > +static int > +rseq_thread_registered (void) > +{ > + return (int32_t) __rseq_abi.cpu_id >= 0; OK. > +} > + > +static int > +do_rseq_main_test (void) > +{ > + if (raise (SIGUSR1)) > + FAIL_EXIT1 ("error raising signal"); > + if (pthread_setspecific (rseq_test_key, (void *) 1l)) > + FAIL_EXIT1 ("error in pthread_setspecific"); > + if (!rseq_thread_registered ()) > + { > + FAIL_RET ("rseq not registered in main thread"); OK. > + } > + return 0; > +} > + > +static void > +cancel_routine (void *arg) > +{ > + if (!rseq_thread_registered ()) > + { > + printf ("rseq not registered in cancel routine\n"); > + support_record_failure (); > + } > +} OK. > + > +static int cancel_thread_ready; > + > +static void > +test_cancel_thread (void) > +{ > + pthread_cleanup_push (cancel_routine, NULL); > + atomic_store_release (&cancel_thread_ready, 1); > + for (;;) > + usleep (100); > + pthread_cleanup_pop (0); OK. > +} > + > +static void * > +thread_function (void * arg) > +{ > + int i = (int) (intptr_t) arg; > + > + if (raise (SIGUSR1)) > + FAIL_EXIT1 ("error raising signal"); > + if (i == 0) > + test_cancel_thread (); > + if (pthread_setspecific (rseq_test_key, (void *) 1l)) > + FAIL_EXIT1 ("error in pthread_setspecific"); > + return rseq_thread_registered () ? NULL : (void *) 1l; > +} OK. > + > +static void > +sighandler (int sig) > +{ > + if (!rseq_thread_registered ()) > + { > + printf ("rseq not registered in signal handler\n"); > + support_record_failure (); > + } > +} OK. > + > +static void > +setup_signals (void) > +{ > + struct sigaction sa; > + > + sigemptyset (&sa.sa_mask); > + sigaddset (&sa.sa_mask, SIGUSR1); > + sa.sa_flags = 0; > + sa.sa_handler = sighandler; > + if (sigaction (SIGUSR1, &sa, NULL) != 0) > + { > + FAIL_EXIT1 ("sigaction failure: %s", strerror (errno)); OK. > + } > +} > + > +#define N 7 > +static const int t[N] = { 1, 2, 6, 5, 4, 3, 50 }; OK. > + > +static int > +do_rseq_threads_test (int nr_threads) > +{ > + pthread_t th[nr_threads]; > + int i; > + int result = 0; > + pthread_attr_t at; > + > + if (pthread_attr_init (&at) != 0) > + { > + FAIL_EXIT1 ("attr_init failed"); > + } > + > + if (pthread_attr_setstacksize (&at, 1 * 1024 * 1024) != 0) Why set it that small? Why set it at all? > + { > + FAIL_EXIT1 ("attr_setstacksize failed"); > + } > + > + cancel_thread_ready = 0; > + for (i = 0; i < nr_threads; ++i) > + if (pthread_create (&th[i], NULL, thread_function, > + (void *) (intptr_t) i) != 0) > + { > + FAIL_EXIT1 ("creation of thread %d failed", i); > + } > + > + if (pthread_attr_destroy (&at) != 0) > + { > + FAIL_EXIT1 ("attr_destroy failed"); > + } > + > + while (!atomic_load_acquire (&cancel_thread_ready)) > + usleep (100); > + > + if (pthread_cancel (th[0])) > + FAIL_EXIT1 ("error in pthread_cancel"); > + > + for (i = 0; i < nr_threads; ++i) > + { > + void *v; > + if (pthread_join (th[i], &v) != 0) > + { > + printf ("join of thread %d failed\n", i); > + result = 1; > + } > + else if (i != 0 && v != NULL) > + { > + printf ("join %d successful, but child failed\n", i); > + result = 1; > + } > + else if (i == 0 && v == NULL) > + { > + printf ("join %d successful, child did not fail as expected\n", i); > + result = 1; > + } > + } > + return result; > +} > + > +static int > +sys_rseq (volatile struct rseq *rseq_abi, uint32_t rseq_len, > + int flags, uint32_t sig) > +{ > + return syscall (__NR_rseq, rseq_abi, rseq_len, flags, sig); OK. Decided in other mail that rseq should not be public. > +} > + > +static int > +rseq_available (void) > +{ > + int rc; > + > + rc = sys_rseq (NULL, 0, 0, 0); > + if (rc != -1) > + FAIL_EXIT1 ("Unexpected rseq return value %d", rc); > + switch (errno) > + { > + case ENOSYS: > + return 0; > + case EINVAL: > + return 1; > + default: > + FAIL_EXIT1 ("Unexpected rseq error %s", strerror (errno)); > + } OK > +} > + > +static int > +do_rseq_fork_test (void) > +{ > + int status; > + pid_t pid, retpid; > + > + pid = fork (); > + switch (pid) > + { > + case 0: > + exit (do_rseq_main_test ()); > + case -1: > + FAIL_EXIT1 ("Unexpected fork error %s", strerror (errno)); > + } > + retpid = TEMP_FAILURE_RETRY (waitpid (pid, &status, 0)); > + if (retpid != pid) > + { > + FAIL_EXIT1 ("waitpid returned %ld, expected %ld", > + (long int) retpid, (long int) pid); > + } > + if (WEXITSTATUS (status)) > + { > + printf ("rseq not registered in child\n"); > + return 1; > + } > + return 0; OK. > +} > + > +static int > +do_rseq_test (void) > +{ > + int i, result = 0; > + > + if (!rseq_available ()) > + { > + FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test"); OK. > + } > + setup_signals (); > + if (raise (SIGUSR1)) > + FAIL_EXIT1 ("error raising signal"); > + if (do_rseq_main_test ()) > + result = 1; > + for (i = 0; i < N; i++) > + { > + if (do_rseq_threads_test (t[i])) > + result = 1; > + } > + if (do_rseq_fork_test ()) > + result = 1; > + return result; OK. > +} > + > +static void > +atfork_prepare (void) > +{ > + if (!rseq_thread_registered ()) > + { > + printf ("rseq not registered in pthread atfork prepare\n"); > + support_record_failure (); > + } > +} > + > +static void > +atfork_parent (void) > +{ > + if (!rseq_thread_registered ()) > + { > + printf ("rseq not registered in pthread atfork parent\n"); > + support_record_failure (); > + } > +} > + > +static void > +atfork_child (void) > +{ > + if (!rseq_thread_registered ()) > + { > + printf ("rseq not registered in pthread atfork child\n"); > + support_record_failure (); > + } > +} > + > +static void > +rseq_key_destructor (void *arg) > +{ > + /* Cannot use deferred failure reporting after main () returns. */ > + if (!rseq_thread_registered ()) > + FAIL_EXIT1 ("rseq not registered in pthread key destructor"); > +} > + > +static void > +do_rseq_create_key (void) > +{ > + if (pthread_key_create (&rseq_test_key, rseq_key_destructor)) > + FAIL_EXIT1 ("error in pthread_key_create"); > +} > + > +static void > +do_rseq_delete_key (void) > +{ > + if (pthread_key_delete (rseq_test_key)) > + FAIL_EXIT1 ("error in pthread_key_delete"); > +} OK. These two could do with becoming xthread_key_create/xthread_key_delete wrappers in support/*, but it's not critical. > + > +static void > +atexit_handler (void) > +{ > + /* Cannot use deferred failure reporting after main () returns. */ > + if (!rseq_thread_registered ()) > + FAIL_EXIT1 ("rseq not registered in atexit handler"); > +} > + > +static void __attribute__ ((constructor)) > +do_rseq_constructor_test (void) > +{ > + support_record_failure_init (); > + if (atexit (atexit_handler)) > + { > + FAIL_EXIT1 ("error calling atexit"); > + } > + do_rseq_create_key (); > + if (pthread_atfork (atfork_prepare, atfork_parent, atfork_child)) > + FAIL_EXIT1 ("error calling pthread_atfork"); > + if (do_rseq_test ()) > + FAIL_EXIT1 ("rseq not registered within constructor"); > +} OK. > + > +static void __attribute__ ((destructor)) > +do_rseq_destructor_test (void) > +{ > + /* Cannot use deferred failure reporting after main () returns. */ > + if (do_rseq_test ()) > + FAIL_EXIT1 ("rseq not registered within destructor"); > + do_rseq_delete_key (); > +} > + > +/* Test C++ destructor called at thread and process exit. */ > +void > +__call_tls_dtors (void) > +{ > + /* Cannot use deferred failure reporting after main () returns. */ > + if (!rseq_thread_registered ()) > + FAIL_EXIT1 ("rseq not registered in C++ thread/process exit destructor"); > +} > +#else > +static int > +do_rseq_test (void) > +{ > + FAIL_UNSUPPORTED ("kernel headers do not support rseq, skipping test"); OK. > + return 0; > +} > +#endif > + > +static int > +do_test (void) > +{ > + return do_rseq_test (); OK. > +} > + > +#include <support/test-driver.c> > diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c > new file mode 100644 > index 0000000000..aaa1ad79fe > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/tst-rseq.c > @@ -0,0 +1,110 @@ Need a one-line short description. > +/* Copyright (C) 2018 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + Contributed by Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, 2018. Remove please. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <http://www.gnu.org/licenses/>. */ > + > +/* These tests validate that rseq is registered from main in an executable > + not linked against libpthread. */ > + > +#include <sys/syscall.h> > +#include <unistd.h> > +#include <stdio.h> > +#include <support/check.h> > + > +#ifdef __NR_rseq > +#define HAS_RSEQ > +#endif > + > +#ifdef HAS_RSEQ > +#include <sys/rseq.h> > +#include <syscall.h> > +#include <stdlib.h> > +#include <error.h> > +#include <errno.h> > +#include <stdint.h> > +#include <string.h> > + > +static int > +rseq_thread_registered (void) > +{ > + return (int32_t) __rseq_abi.cpu_id >= 0; > +} > + > +static int > +do_rseq_main_test (void) > +{ > + if (!rseq_thread_registered ()) > + { > + FAIL_RET ("rseq not registered in main thread"); > + } > + return 0; > +} > + > +static int > +sys_rseq (volatile struct rseq *rseq_abi, uint32_t rseq_len, > + int flags, uint32_t sig) > +{ > + return syscall (__NR_rseq, rseq_abi, rseq_len, flags, sig); > +} > + > +static int > +rseq_available (void) > +{ > + int rc; > + > + rc = sys_rseq (NULL, 0, 0, 0); > + if (rc != -1) > + FAIL_EXIT1 ("Unexpected rseq return value %d", rc); > + switch (errno) > + { > + case ENOSYS: > + return 0; > + case EINVAL: > + return 1; > + default: > + FAIL_EXIT1 ("Unexpected rseq error %s", strerror (errno)); > + } > +} > + > +static int > +do_rseq_test (void) > +{ > + int result = 0; > + > + if (!rseq_available ()) > + { > + FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test"); > + } > + if (do_rseq_main_test ()) > + result = 1; > + return result; > +} > +#else > +static int > +do_rseq_test (void) > +{ > + FAIL_UNSUPPORTED ("kernel headers do not support rseq, skipping test"); > + return 0; > +} > +#endif > + > +static int > +do_test (void) > +{ > + return do_rseq_test (); OK. > +} > + > +#include <support/test-driver.c> > -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 4/4] rseq registration tests (v2) 2019-03-27 21:21 ` Carlos O'Donell @ 2019-04-04 20:04 ` Mathieu Desnoyers 2019-04-04 20:11 ` Carlos O'Donell 0 siblings, 1 reply; 52+ messages in thread From: Mathieu Desnoyers @ 2019-04-04 20:04 UTC (permalink / raw) To: Carlos O'Donell Cc: carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner ----- On Mar 27, 2019, at 5:21 PM, Carlos O'Donell codonell@redhat.com wrote: > >> + >> +static int >> +do_rseq_threads_test (int nr_threads) >> +{ >> + pthread_t th[nr_threads]; >> + int i; >> + int result = 0; >> + pthread_attr_t at; >> + >> + if (pthread_attr_init (&at) != 0) >> + { >> + FAIL_EXIT1 ("attr_init failed"); >> + } >> + >> + if (pthread_attr_setstacksize (&at, 1 * 1024 * 1024) != 0) > > Why set it that small? Why set it at all? I copied that boilerplate code from pre-existing glibc nptl tests, in a true "monkey see, monkey do" fashion. If we look at e.g. nptl/tst-fork1.c, those stack size limitations were introduced by this commit: commit 4d1a02efc1117763c67fe012642381e3106500cf Author: Ulrich Drepper <drepper@redhat.com> Date: Sun Mar 7 10:40:53 2004 +0000 Update. 2004-03-07 Ulrich Drepper <drepper@redhat.com> * tst-once4.c: Remove unnecessary macro definition. * tst-mutex7.c (do_test): Limit thread stack size. * tst-once2.c (do_test): Likewise. * tst-tls3.c (do_test): Likewise. * tst-tls1.c (do_test): Likewise. * tst-signal3.c (do_test): Likewise. * tst-kill6.c (do_test): Likewise. * tst-key4.c (do_test): Likewise. * tst-join4.c (do_test): Likewise. * tst-fork1.c (do_test): Likewise. * tst-context1.c (do_test): Likewise. * tst-cond2.c (do_test): Likewise. * tst-cond10.c (do_test): Likewise. * tst-clock2.c (do_test): Likewise. * tst-cancel10.c (do_test): Likewise. * tst-basic2.c (do_test): Likewise. * tst-barrier4.c (do_test): Likewise. And that commit does not state _why_ the thread stack size needs to be limited. Nor on which architectures it matters, and what are the parameters (e.g. number of threads) for which it matters. In the case of rseq nptl tests, we spawn up to 50 threads in addition to main. Should we limit the thread stack size to 1MB ? That's indeed a good question. Anyone would like to voice an opinion on the matter ? >> +static void >> +do_rseq_create_key (void) >> +{ >> + if (pthread_key_create (&rseq_test_key, rseq_key_destructor)) >> + FAIL_EXIT1 ("error in pthread_key_create"); >> +} >> + >> +static void >> +do_rseq_delete_key (void) >> +{ >> + if (pthread_key_delete (rseq_test_key)) >> + FAIL_EXIT1 ("error in pthread_key_delete"); >> +} > > OK. These two could do with becoming xthread_key_create/xthread_key_delete > wrappers in support/*, but it's not critical. I'll add them in a separate commit and modify this commit to use them instead. Thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 4/4] rseq registration tests (v2) 2019-04-04 20:04 ` Mathieu Desnoyers @ 2019-04-04 20:11 ` Carlos O'Donell 2019-04-04 20:23 ` Mathieu Desnoyers 2019-04-05 10:01 ` Florian Weimer 0 siblings, 2 replies; 52+ messages in thread From: Carlos O'Donell @ 2019-04-04 20:11 UTC (permalink / raw) To: Mathieu Desnoyers Cc: carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner On 4/4/19 4:04 PM, Mathieu Desnoyers wrote: > ----- On Mar 27, 2019, at 5:21 PM, Carlos O'Donell codonell@redhat.com wrote: > >> >>> + >>> +static int >>> +do_rseq_threads_test (int nr_threads) >>> +{ >>> + pthread_t th[nr_threads]; >>> + int i; >>> + int result = 0; >>> + pthread_attr_t at; >>> + >>> + if (pthread_attr_init (&at) != 0) >>> + { >>> + FAIL_EXIT1 ("attr_init failed"); >>> + } >>> + >>> + if (pthread_attr_setstacksize (&at, 1 * 1024 * 1024) != 0) >> >> Why set it that small? Why set it at all? > > I copied that boilerplate code from pre-existing glibc nptl tests, > in a true "monkey see, monkey do" fashion. > > If we look at e.g. nptl/tst-fork1.c, those stack size limitations > were introduced by this commit: > > commit 4d1a02efc1117763c67fe012642381e3106500cf > Author: Ulrich Drepper <drepper@redhat.com> > Date: Sun Mar 7 10:40:53 2004 +0000 > > Update. > > 2004-03-07 Ulrich Drepper <drepper@redhat.com> > > * tst-once4.c: Remove unnecessary macro definition. > > * tst-mutex7.c (do_test): Limit thread stack size. > * tst-once2.c (do_test): Likewise. > * tst-tls3.c (do_test): Likewise. > * tst-tls1.c (do_test): Likewise. > * tst-signal3.c (do_test): Likewise. > * tst-kill6.c (do_test): Likewise. > * tst-key4.c (do_test): Likewise. > * tst-join4.c (do_test): Likewise. > * tst-fork1.c (do_test): Likewise. > * tst-context1.c (do_test): Likewise. > * tst-cond2.c (do_test): Likewise. > * tst-cond10.c (do_test): Likewise. > * tst-clock2.c (do_test): Likewise. > * tst-cancel10.c (do_test): Likewise. > * tst-basic2.c (do_test): Likewise. > * tst-barrier4.c (do_test): Likewise. > > And that commit does not state _why_ the thread stack size needs to > be limited. Nor on which architectures it matters, and what are the > parameters (e.g. number of threads) for which it matters. > > In the case of rseq nptl tests, we spawn up to 50 threads in addition > to main. > > Should we limit the thread stack size to 1MB ? That's indeed a good > question. Anyone would like to voice an opinion on the matter ? Unless you have a strong reason for selecting 1MiB you should just remove the boiler plate code and let the architecture default stack sizes apply. The above commit is a good example of a failure to provide a comment that gives intent for the implementation and therefore you have no idea why 1MiB was selected. Magic numbers should have comments, and a patch like the one you reference would not be accepted today. The only real worry we have with testing is thread reap rate which seems to be slow in the kernel and sometimes we've seen the kernel be unable to clone new threads because of this reason. Even then on the worst architecture, hppa, I can create ~300 threads in a test without any problems. In summary: - Remove the thread stack size setting. - Optional: Send a distinct patch to cleanup the tests which have such boiler plate ;-) -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 4/4] rseq registration tests (v2) 2019-04-04 20:11 ` Carlos O'Donell @ 2019-04-04 20:23 ` Mathieu Desnoyers 2019-04-05 10:01 ` Florian Weimer 1 sibling, 0 replies; 52+ messages in thread From: Mathieu Desnoyers @ 2019-04-04 20:23 UTC (permalink / raw) To: Carlos O'Donell Cc: carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner ----- On Apr 4, 2019, at 4:11 PM, Carlos O'Donell codonell@redhat.com wrote: > On 4/4/19 4:04 PM, Mathieu Desnoyers wrote: >> ----- On Mar 27, 2019, at 5:21 PM, Carlos O'Donell codonell@redhat.com wrote: >> >>> >>>> + >>>> +static int >>>> +do_rseq_threads_test (int nr_threads) >>>> +{ >>>> + pthread_t th[nr_threads]; >>>> + int i; >>>> + int result = 0; >>>> + pthread_attr_t at; >>>> + >>>> + if (pthread_attr_init (&at) != 0) >>>> + { >>>> + FAIL_EXIT1 ("attr_init failed"); >>>> + } >>>> + >>>> + if (pthread_attr_setstacksize (&at, 1 * 1024 * 1024) != 0) >>> >>> Why set it that small? Why set it at all? >> >> I copied that boilerplate code from pre-existing glibc nptl tests, >> in a true "monkey see, monkey do" fashion. >> >> If we look at e.g. nptl/tst-fork1.c, those stack size limitations >> were introduced by this commit: >> >> commit 4d1a02efc1117763c67fe012642381e3106500cf >> Author: Ulrich Drepper <drepper@redhat.com> >> Date: Sun Mar 7 10:40:53 2004 +0000 >> >> Update. >> >> 2004-03-07 Ulrich Drepper <drepper@redhat.com> >> >> * tst-once4.c: Remove unnecessary macro definition. >> >> * tst-mutex7.c (do_test): Limit thread stack size. >> * tst-once2.c (do_test): Likewise. >> * tst-tls3.c (do_test): Likewise. >> * tst-tls1.c (do_test): Likewise. >> * tst-signal3.c (do_test): Likewise. >> * tst-kill6.c (do_test): Likewise. >> * tst-key4.c (do_test): Likewise. >> * tst-join4.c (do_test): Likewise. >> * tst-fork1.c (do_test): Likewise. >> * tst-context1.c (do_test): Likewise. >> * tst-cond2.c (do_test): Likewise. >> * tst-cond10.c (do_test): Likewise. >> * tst-clock2.c (do_test): Likewise. >> * tst-cancel10.c (do_test): Likewise. >> * tst-basic2.c (do_test): Likewise. >> * tst-barrier4.c (do_test): Likewise. >> >> And that commit does not state _why_ the thread stack size needs to >> be limited. Nor on which architectures it matters, and what are the >> parameters (e.g. number of threads) for which it matters. >> >> In the case of rseq nptl tests, we spawn up to 50 threads in addition >> to main. >> >> Should we limit the thread stack size to 1MB ? That's indeed a good >> question. Anyone would like to voice an opinion on the matter ? > > Unless you have a strong reason for selecting 1MiB you should just > remove the boiler plate code and let the architecture default stack > sizes apply. > > The above commit is a good example of a failure to provide a comment > that gives intent for the implementation and therefore you have no > idea why 1MiB was selected. Magic numbers should have comments, and > a patch like the one you reference would not be accepted today. > > The only real worry we have with testing is thread reap rate which > seems to be slow in the kernel and sometimes we've seen the kernel > be unable to clone new threads because of this reason. Even then on > the worst architecture, hppa, I can create ~300 threads in a test > without any problems. > > In summary: > - Remove the thread stack size setting. > - Optional: Send a distinct patch to cleanup the tests which have > such boiler plate ;-) I'll do the 1st item, and let the 2nd one to another brave soul. ;) Thanks, Mathieu > > -- > Cheers, > Carlos. -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 4/4] rseq registration tests (v2) 2019-04-04 20:11 ` Carlos O'Donell 2019-04-04 20:23 ` Mathieu Desnoyers @ 2019-04-05 10:01 ` Florian Weimer 2019-04-05 13:50 ` Carlos O'Donell 1 sibling, 1 reply; 52+ messages in thread From: Florian Weimer @ 2019-04-05 10:01 UTC (permalink / raw) To: Carlos O'Donell Cc: Mathieu Desnoyers, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner * Carlos O'Donell: > The above commit is a good example of a failure to provide a comment > that gives intent for the implementation and therefore you have no > idea why 1MiB was selected. Magic numbers should have comments, and > a patch like the one you reference would not be accepted today. > > The only real worry we have with testing is thread reap rate which > seems to be slow in the kernel and sometimes we've seen the kernel > be unable to clone new threads because of this reason. Even then on > the worst architecture, hppa, I can create ~300 threads in a test > without any problems. Delayed reaping in the kernel (after signaling thread exit) does *not* affect the stack allocation. With a valid test, the stack is queued for reuse. Only kernel-side data structures stick around. My guess is that in 2004, 64-bit systems were still around that didn't have enough physical backing store for 50 * 8 MiB thread stacks, so the overcommit limiter in the kernel would kick in. I don't think this is a problem anymore, and in the off chance that it is, you can still use ulimit -s to reduce the default if necessary. Thanks, Florian ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 4/4] rseq registration tests (v2) 2019-04-05 10:01 ` Florian Weimer @ 2019-04-05 13:50 ` Carlos O'Donell 2019-04-05 15:27 ` Florian Weimer 0 siblings, 1 reply; 52+ messages in thread From: Carlos O'Donell @ 2019-04-05 13:50 UTC (permalink / raw) To: Florian Weimer Cc: Mathieu Desnoyers, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner On 4/5/19 6:01 AM, Florian Weimer wrote: > * Carlos O'Donell: > >> The above commit is a good example of a failure to provide a comment >> that gives intent for the implementation and therefore you have no >> idea why 1MiB was selected. Magic numbers should have comments, and >> a patch like the one you reference would not be accepted today. >> >> The only real worry we have with testing is thread reap rate which >> seems to be slow in the kernel and sometimes we've seen the kernel >> be unable to clone new threads because of this reason. Even then on >> the worst architecture, hppa, I can create ~300 threads in a test >> without any problems. > > Delayed reaping in the kernel (after signaling thread exit) does *not* > affect the stack allocation. With a valid test, the stack is queued for > reuse. Only kernel-side data structures stick around. Unless you run out of mappings? The kernel must handle CLONE_CHILD_CLEARTID in a timely fashion or glibc will be unable to free the stacks and the cache could grow beyond the maximum limit (note that free_stacks() is only a one-shot attempt to lower the limit and does not need to succeed). > My guess is that in 2004, 64-bit systems were still around that didn't > have enough physical backing store for 50 * 8 MiB thread stacks, so the > overcommit limiter in the kernel would kick in. I don't think this is a > problem anymore, and in the off chance that it is, you can still use > ulimit -s to reduce the default if necessary. Agreed. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 4/4] rseq registration tests (v2) 2019-04-05 13:50 ` Carlos O'Donell @ 2019-04-05 15:27 ` Florian Weimer 2019-04-05 17:38 ` Carlos O'Donell 0 siblings, 1 reply; 52+ messages in thread From: Florian Weimer @ 2019-04-05 15:27 UTC (permalink / raw) To: Carlos O'Donell Cc: Mathieu Desnoyers, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner * Carlos O'Donell: > On 4/5/19 6:01 AM, Florian Weimer wrote: >> * Carlos O'Donell: >> >>> The above commit is a good example of a failure to provide a comment >>> that gives intent for the implementation and therefore you have no >>> idea why 1MiB was selected. Magic numbers should have comments, and >>> a patch like the one you reference would not be accepted today. >>> >>> The only real worry we have with testing is thread reap rate which >>> seems to be slow in the kernel and sometimes we've seen the kernel >>> be unable to clone new threads because of this reason. Even then on >>> the worst architecture, hppa, I can create ~300 threads in a test >>> without any problems. >> >> Delayed reaping in the kernel (after signaling thread exit) does *not* >> affect the stack allocation. With a valid test, the stack is queued for >> reuse. Only kernel-side data structures stick around. > > Unless you run out of mappings? The kernel must handle CLONE_CHILD_CLEARTID > in a timely fashion or glibc will be unable to free the stacks and the cache > could grow beyond the maximum limit (note that free_stacks() is only a > one-shot attempt to lower the limit and does not need to succeed). The reaping problem is that we get CLONE_CHILD_CLEARTID notification (and the application spawns a new thread) before the old thread is completely gone. It's no longer running, so we can safely remove the stack, but not all kernel data structures have been deallocated at that point. Or do we have tests that spawn detached threads in a loop, expecting not to exceed the thread/task limit of the user? That's a different problem and probably an invalid test. Thanks, Florian ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 4/4] rseq registration tests (v2) 2019-04-05 15:27 ` Florian Weimer @ 2019-04-05 17:38 ` Carlos O'Donell 2019-04-05 19:43 ` Florian Weimer 0 siblings, 1 reply; 52+ messages in thread From: Carlos O'Donell @ 2019-04-05 17:38 UTC (permalink / raw) To: Florian Weimer Cc: Mathieu Desnoyers, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner On 4/5/19 11:27 AM, Florian Weimer wrote: > * Carlos O'Donell: > >> On 4/5/19 6:01 AM, Florian Weimer wrote: >>> * Carlos O'Donell: >>> >>>> The above commit is a good example of a failure to provide a comment >>>> that gives intent for the implementation and therefore you have no >>>> idea why 1MiB was selected. Magic numbers should have comments, and >>>> a patch like the one you reference would not be accepted today. >>>> >>>> The only real worry we have with testing is thread reap rate which >>>> seems to be slow in the kernel and sometimes we've seen the kernel >>>> be unable to clone new threads because of this reason. Even then on >>>> the worst architecture, hppa, I can create ~300 threads in a test >>>> without any problems. >>> >>> Delayed reaping in the kernel (after signaling thread exit) does *not* >>> affect the stack allocation. With a valid test, the stack is queued for >>> reuse. Only kernel-side data structures stick around. >> >> Unless you run out of mappings? The kernel must handle CLONE_CHILD_CLEARTID >> in a timely fashion or glibc will be unable to free the stacks and the cache >> could grow beyond the maximum limit (note that free_stacks() is only a >> one-shot attempt to lower the limit and does not need to succeed). > > The reaping problem is that we get CLONE_CHILD_CLEARTID notification > (and the application spawns a new thread) before the old thread is > completely gone. It's no longer running, so we can safely remove the > stack, but not all kernel data structures have been deallocated at that > point. A call to pthread_join does not re-evaluate the stack cache limits and does not free anything from the cache. Therefore you can have hundreds of threads exit, go through free_stacks(), fail to free their stacks, and *then* hit pthread_join(), still fail to free any stacks (because we don't re-reap the stacks there, is that our fault in our __deallocate_stack() impl?), and then try to do some other operation that requires memory and run out. > Or do we have tests that spawn detached threads in a loop, expecting not > to exceed the thread/task limit of the user? That's a different problem > and probably an invalid test. No, that only happens in out 4 test cases and doesn't involve many threads. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 4/4] rseq registration tests (v2) 2019-04-05 17:38 ` Carlos O'Donell @ 2019-04-05 19:43 ` Florian Weimer 2019-04-08 22:07 ` Carlos O'Donell 0 siblings, 1 reply; 52+ messages in thread From: Florian Weimer @ 2019-04-05 19:43 UTC (permalink / raw) To: Carlos O'Donell Cc: Mathieu Desnoyers, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha, Thomas Gleixner, Ben Maurer, Peter Zijlstra, Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson, Paul Turner * Carlos O'Donell: > A call to pthread_join does not re-evaluate the stack cache limits and does > not free anything from the cache. Are you sure? I assumed that we have this call stack: pthread_join __pthread_timedjoin_ex __free_tcb __deallocate_stack queue_stack free_stacks And since we call __free_tcb only after the futex wait on the TID completes, free_stacks observe the stack of the just-joined thread as unused. (We should probably trim the Cc: list at this point, sorry.) Thanks, Florian ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 4/4] rseq registration tests (v2) 2019-04-05 19:43 ` Florian Weimer @ 2019-04-08 22:07 ` Carlos O'Donell 0 siblings, 0 replies; 52+ messages in thread From: Carlos O'Donell @ 2019-04-08 22:07 UTC (permalink / raw) To: Florian Weimer Cc: Mathieu Desnoyers, carlos, Joseph Myers, Szabolcs Nagy, libc-alpha On 4/5/19 3:43 PM, Florian Weimer wrote: > * Carlos O'Donell: > >> A call to pthread_join does not re-evaluate the stack cache limits and does >> not free anything from the cache. > > Are you sure? I assumed that we have this call stack: > > pthread_join > __pthread_timedjoin_ex > __free_tcb > __deallocate_stack > queue_stack > free_stacks > > And since we call __free_tcb only after the futex wait on the TID > completes, free_stacks observe the stack of the just-joined thread as > unused. (CC list trimmed) You are correct. Thanks for noting the call to queue_stacks in __deallocate_stack, I missed that, I thought we added directly to the cache. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/4] Restartable Sequences support for glibc 2.30 2019-02-12 19:43 [PATCH 0/4] Restartable Sequences support for glibc 2.30 Mathieu Desnoyers ` (3 preceding siblings ...) 2019-02-12 19:51 ` [PATCH 4/4] rseq registration tests (v2) Mathieu Desnoyers @ 2019-03-22 17:34 ` Carlos O'Donell 2019-03-22 17:55 ` Mathieu Desnoyers 2019-03-22 17:39 ` Florian Weimer 5 siblings, 1 reply; 52+ messages in thread From: Carlos O'Donell @ 2019-03-22 17:34 UTC (permalink / raw) To: Mathieu Desnoyers, Carlos O'Donell Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha On 2/12/19 2:42 PM, Mathieu Desnoyers wrote: > The only point that still appears to not reach concensus is whether it's > acceptable to define the RSEQ_SIG code signature for each architecture. > If I missed other points that failed to reach concensus, please let me > know! I have no objection to RSEQ_SIG being a signature that each architecture uses. Can you summarize the previous discussions here? -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/4] Restartable Sequences support for glibc 2.30 2019-03-22 17:34 ` [PATCH 0/4] Restartable Sequences support for glibc 2.30 Carlos O'Donell @ 2019-03-22 17:55 ` Mathieu Desnoyers 2019-03-22 18:04 ` Carlos O'Donell 0 siblings, 1 reply; 52+ messages in thread From: Mathieu Desnoyers @ 2019-03-22 17:55 UTC (permalink / raw) To: Carlos O'Donell Cc: carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha ----- On Mar 22, 2019, at 1:34 PM, Carlos O'Donell codonell@redhat.com wrote: > On 2/12/19 2:42 PM, Mathieu Desnoyers wrote: >> The only point that still appears to not reach concensus is whether it's >> acceptable to define the RSEQ_SIG code signature for each architecture. >> If I missed other points that failed to reach concensus, please let me >> know! > > I have no objection to RSEQ_SIG being a signature that each architecture uses. > > Can you summarize the previous discussions here? The culprit is whether we provide an architecture-agnostic "generic" value for RSEQ_SIG for architectures that do not define a specific RSEQ_SIG, or leave RSEQ_SIG undefined for architectures that do not define it yet. Considering that RSEQ_SIG ends up being a 32-bit value that maps to actual architecture code compiled into applications and libraries, it's understandable that it needs to be defined for each architecture specifically. For instance, generating invalid instructions may have ill effects on tools like objdump, and may also have impact on the CPU speculative execution efficiency in some cases. There has been ideas floating around about having a "generic" value for RSEQ_SIG that would apply to all architectures that do not override it. However, if we have this, it makes it impossible for those architectures to ever define a different value if they ever choose so, because that value ends up being an ABI that is built into applications and libraries including the glibc rseq header. Changing that value would break applications. So I favor _not_ defining any generic RSEQ_SIG, and adding per-architecture RSEQ_SIG define gradually. Therefore, user-space can #ifdef on whether RSEQ_SIG is defined or not after including the glibc rseq header. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/4] Restartable Sequences support for glibc 2.30 2019-03-22 17:55 ` Mathieu Desnoyers @ 2019-03-22 18:04 ` Carlos O'Donell 0 siblings, 0 replies; 52+ messages in thread From: Carlos O'Donell @ 2019-03-22 18:04 UTC (permalink / raw) To: Mathieu Desnoyers Cc: carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha On 3/22/19 1:55 PM, Mathieu Desnoyers wrote: > ----- On Mar 22, 2019, at 1:34 PM, Carlos O'Donell codonell@redhat.com wrote: > >> On 2/12/19 2:42 PM, Mathieu Desnoyers wrote: >>> The only point that still appears to not reach concensus is whether it's >>> acceptable to define the RSEQ_SIG code signature for each architecture. >>> If I missed other points that failed to reach concensus, please let me >>> know! >> >> I have no objection to RSEQ_SIG being a signature that each architecture uses. >> >> Can you summarize the previous discussions here? > > The culprit is whether we provide an architecture-agnostic "generic" value > for RSEQ_SIG for architectures that do not define a specific RSEQ_SIG, or > leave RSEQ_SIG undefined for architectures that do not define it yet. > > Considering that RSEQ_SIG ends up being a 32-bit value that maps to actual > architecture code compiled into applications and libraries, it's > understandable that it needs to be defined for each architecture specifically. > For instance, generating invalid instructions may have ill effects on tools like > objdump, and may also have impact on the CPU speculative execution efficiency > in some cases. > > There has been ideas floating around about having a "generic" value for > RSEQ_SIG that would apply to all architectures that do not override it. > However, if we have this, it makes it impossible for those architectures > to ever define a different value if they ever choose so, because that value > ends up being an ABI that is built into applications and libraries including > the glibc rseq header. Changing that value would break applications. > > So I favor _not_ defining any generic RSEQ_SIG, and adding per-architecture > RSEQ_SIG define gradually. Therefore, user-space can #ifdef on whether > RSEQ_SIG is defined or not after including the glibc rseq header. That makes perfect sense. I was particularly worried about hardware that wasn't designed for having constant pools and injecting invalid instructions into that stream can worse speculative execution and break developer tools that need to know about new instructions (like valgrind). -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/4] Restartable Sequences support for glibc 2.30 2019-02-12 19:43 [PATCH 0/4] Restartable Sequences support for glibc 2.30 Mathieu Desnoyers ` (4 preceding siblings ...) 2019-03-22 17:34 ` [PATCH 0/4] Restartable Sequences support for glibc 2.30 Carlos O'Donell @ 2019-03-22 17:39 ` Florian Weimer 2019-03-22 17:47 ` Carlos O'Donell 5 siblings, 1 reply; 52+ messages in thread From: Florian Weimer @ 2019-03-22 17:39 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Carlos O'Donell, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha * Mathieu Desnoyers: > The only point that still appears to not reach concensus is whether it's > acceptable to define the RSEQ_SIG code signature for each architecture. > If I missed other points that failed to reach concensus, please let me > know! I still think the registration mechanism is very problematic and should be avoided. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/4] Restartable Sequences support for glibc 2.30 2019-03-22 17:39 ` Florian Weimer @ 2019-03-22 17:47 ` Carlos O'Donell 2019-03-22 17:51 ` Florian Weimer 0 siblings, 1 reply; 52+ messages in thread From: Carlos O'Donell @ 2019-03-22 17:47 UTC (permalink / raw) To: Florian Weimer, Mathieu Desnoyers Cc: Carlos O'Donell, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha On 3/22/19 1:39 PM, Florian Weimer wrote: > * Mathieu Desnoyers: > >> The only point that still appears to not reach concensus is whether it's >> acceptable to define the RSEQ_SIG code signature for each architecture. >> If I missed other points that failed to reach concensus, please let me >> know! > > I still think the registration mechanism is very problematic and > should be avoided. The *entire* registration mechanism? -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/4] Restartable Sequences support for glibc 2.30 2019-03-22 17:47 ` Carlos O'Donell @ 2019-03-22 17:51 ` Florian Weimer 2019-03-22 19:46 ` Carlos O'Donell 0 siblings, 1 reply; 52+ messages in thread From: Florian Weimer @ 2019-03-22 17:51 UTC (permalink / raw) To: Carlos O'Donell Cc: Mathieu Desnoyers, Carlos O'Donell, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha * Carlos O'Donell: > On 3/22/19 1:39 PM, Florian Weimer wrote: >> * Mathieu Desnoyers: >> >>> The only point that still appears to not reach concensus is whether it's >>> acceptable to define the RSEQ_SIG code signature for each architecture. >>> If I missed other points that failed to reach concensus, please let me >>> know! >> >> I still think the registration mechanism is very problematic and >> should be avoided. > > The *entire* registration mechanism? The reference-counting part. It's going to be of limited use, for a few years at most, and we'll have to carry it forward indefinitely. I don't think it's worth the complexity. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/4] Restartable Sequences support for glibc 2.30 2019-03-22 17:51 ` Florian Weimer @ 2019-03-22 19:46 ` Carlos O'Donell 2019-03-22 20:14 ` Mathieu Desnoyers 0 siblings, 1 reply; 52+ messages in thread From: Carlos O'Donell @ 2019-03-22 19:46 UTC (permalink / raw) To: Florian Weimer Cc: Mathieu Desnoyers, Carlos O'Donell, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha On 3/22/19 1:51 PM, Florian Weimer wrote: > * Carlos O'Donell: > >> On 3/22/19 1:39 PM, Florian Weimer wrote: >>> * Mathieu Desnoyers: >>> >>>> The only point that still appears to not reach concensus is whether it's >>>> acceptable to define the RSEQ_SIG code signature for each architecture. >>>> If I missed other points that failed to reach concensus, please let me >>>> know! >>> >>> I still think the registration mechanism is very problematic and >>> should be avoided. >> >> The *entire* registration mechanism? > > The reference-counting part. It's going to be of limited use, for a > few years at most, and we'll have to carry it forward indefinitely. > I don't think it's worth the complexity. I can understand Mathieu's position here, he wants to enable all kinds of users, and wants to write libraries that use rseq today but which work with future glibc. This is a perfectly reasonable thing to want. The question we have to ask is the cost. My suggestion is as follows, tell me what you think: (a) Add a RSEQ_REGISTER_ALWAYS. The meaning of which is that the core C library does unconditional registration/unregistration for all threads, and that your application must not call rseq with flags 0 (register)/RSEQ_FLAG_UNREGISTER. (b) In a few years we remove all the ref count code and define a __rseq_lib_abi with a register_state that is set to a constant value of RSEQ_REGISTER_ALWAYS, and do nothing else. This way we have a way to backout the ref count process and just leave a public data symbol as the only part of the ABI. My idea is that we just need one more RSEQ_REGISTER_* value to indicate that libc has taken over unconditional registration. Thoughts? -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH 0/4] Restartable Sequences support for glibc 2.30 2019-03-22 19:46 ` Carlos O'Donell @ 2019-03-22 20:14 ` Mathieu Desnoyers 0 siblings, 0 replies; 52+ messages in thread From: Mathieu Desnoyers @ 2019-03-22 20:14 UTC (permalink / raw) To: Carlos O'Donell Cc: Florian Weimer, carlos, Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha ----- On Mar 22, 2019, at 3:45 PM, Carlos O'Donell codonell@redhat.com wrote: > On 3/22/19 1:51 PM, Florian Weimer wrote: >> * Carlos O'Donell: >> >>> On 3/22/19 1:39 PM, Florian Weimer wrote: >>>> * Mathieu Desnoyers: >>>> >>>>> The only point that still appears to not reach concensus is whether it's >>>>> acceptable to define the RSEQ_SIG code signature for each architecture. >>>>> If I missed other points that failed to reach concensus, please let me >>>>> know! >>>> >>>> I still think the registration mechanism is very problematic and >>>> should be avoided. >>> >>> The *entire* registration mechanism? >> >> The reference-counting part. It's going to be of limited use, for a >> few years at most, and we'll have to carry it forward indefinitely. >> I don't think it's worth the complexity. > > I can understand Mathieu's position here, he wants to enable all > kinds of users, and wants to write libraries that use rseq today > but which work with future glibc. This is a perfectly reasonable > thing to want. The question we have to ask is the cost. > > My suggestion is as follows, tell me what you think: > > (a) Add a RSEQ_REGISTER_ALWAYS. The meaning of which is that the > core C library does unconditional registration/unregistration > for all threads, and that your application must not call > rseq with flags 0 (register)/RSEQ_FLAG_UNREGISTER. > > (b) In a few years we remove all the ref count code and define > a __rseq_lib_abi with a register_state that is set to > a constant value of RSEQ_REGISTER_ALWAYS, and do nothing else. > > This way we have a way to backout the ref count process and just > leave a public data symbol as the only part of the ABI. > > My idea is that we just need one more RSEQ_REGISTER_* value to > indicate that libc has taken over unconditional registration. > > Thoughts? I think we can do even simpler. We can move the TLS refcount and state to an external library (librseq). glibc would expose a new global "int" variable symbol __rseq_handled acting as a boolean. It would be initially 0. glibc would set it to 1 in its C startup code when it effectively handles rseq registration. That symbol would _not_ be a TLS (it's global). librseq would be a new library used by early rseq adopters. It would expose a rseq register/unregister API, which internally would: - Check whether __rseq_handled is true. If so, it would do nothing, leaving rseq registration to the libc. - If __rseq_handled is false, deal with many early adopters with TLS refcount and state variables internal to librseq.so. That should take care of minimizing those metrics: - glibc ABI complexity and maintenance burden in the long term, - pain for rseq early adopters when upgrading to newer glibc, Does that make sense ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2019-04-25 0:41 UTC | newest] Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-12 19:43 [PATCH 0/4] Restartable Sequences support for glibc 2.30 Mathieu Desnoyers 2019-02-12 19:43 ` [PATCH 1/4] glibc: Perform rseq(2) registration at C startup and thread creation (v7) Mathieu Desnoyers 2019-03-22 20:09 ` Carlos O'Donell 2019-03-25 15:54 ` Mathieu Desnoyers 2019-03-27 9:16 ` Martin Schwidefsky 2019-03-27 20:01 ` Mathieu Desnoyers 2019-03-27 20:38 ` Carlos O'Donell 2019-03-28 7:50 ` Martin Schwidefsky 2019-03-28 15:43 ` Mathieu Desnoyers 2019-04-02 6:02 ` Michael Ellerman 2019-04-02 7:08 ` Florian Weimer 2019-04-04 20:33 ` Carlos O'Donell 2019-04-05 9:16 ` Florian Weimer 2019-04-05 15:40 ` Carlos O'Donell 2019-04-08 21:45 ` Tulio Magno Quites Machado Filho 2019-04-08 22:06 ` Carlos O'Donell 2019-04-09 4:28 ` Michael Ellerman 2019-04-09 9:52 ` Alan Modra 2019-04-09 14:13 ` Tulio Magno Quites Machado Filho 2019-04-09 14:58 ` Carlos O'Donell 2019-04-09 15:58 ` Mathieu Desnoyers 2019-04-18 15:33 ` Mathieu Desnoyers 2019-04-09 16:34 ` Mathieu Desnoyers 2019-04-04 20:16 ` Carlos O'Donell 2019-04-04 20:50 ` Carlos O'Donell [not found] ` <20190404214151.6ogrm34dok52az4h@pburton-laptop> 2019-04-09 17:24 ` Mathieu Desnoyers 2019-04-18 22:42 ` Mathieu Desnoyers 2019-04-24 15:28 ` Mathieu Desnoyers [not found] ` <20190424231303.zu2irxd5g3v7yqey@pburton-laptop> 2019-04-25 1:57 ` Maciej W. Rozycki 2019-02-12 19:43 ` [PATCH 3/4] support record failure: allow use from constructor Mathieu Desnoyers 2019-03-22 20:30 ` Carlos O'Donell 2019-02-12 19:43 ` [PATCH 2/4] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux Mathieu Desnoyers 2019-03-22 20:14 ` Carlos O'Donell 2019-02-12 19:51 ` [PATCH 4/4] rseq registration tests (v2) Mathieu Desnoyers 2019-03-27 21:21 ` Carlos O'Donell 2019-04-04 20:04 ` Mathieu Desnoyers 2019-04-04 20:11 ` Carlos O'Donell 2019-04-04 20:23 ` Mathieu Desnoyers 2019-04-05 10:01 ` Florian Weimer 2019-04-05 13:50 ` Carlos O'Donell 2019-04-05 15:27 ` Florian Weimer 2019-04-05 17:38 ` Carlos O'Donell 2019-04-05 19:43 ` Florian Weimer 2019-04-08 22:07 ` Carlos O'Donell 2019-03-22 17:34 ` [PATCH 0/4] Restartable Sequences support for glibc 2.30 Carlos O'Donell 2019-03-22 17:55 ` Mathieu Desnoyers 2019-03-22 18:04 ` Carlos O'Donell 2019-03-22 17:39 ` Florian Weimer 2019-03-22 17:47 ` Carlos O'Donell 2019-03-22 17:51 ` Florian Weimer 2019-03-22 19:46 ` Carlos O'Donell 2019-03-22 20:14 ` Mathieu Desnoyers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).