* [RFC PATCH 0/3] RISC-V: Use WRS.STO for atomic_spin_nop @ 2024-04-18 9:46 Christoph Müllner 2024-04-18 9:46 ` [RFC PATCH 1/3] RISC-V: Sync hwprobe: Sync extension bits with Linux 6.8 Christoph Müllner ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Christoph Müllner @ 2024-04-18 9:46 UTC (permalink / raw) To: libc-alpha, Adhemerval Zanella, Palmer Dabbelt, Darius Rad, Andrew Waterman, Philipp Tomsich, Evan Green, Kito Cheng, Jeff Law, Vineet Gupta Cc: Christoph Müllner This series utilizes the recently introduced hwprobe() and ifunc support in RISC-V to implement atomic_spin_nop. The generic code uses the PAUSE instruction (specified in Zihintpause as HINT instruction). If hwprobe() report availability of Zawrs, then WRS.STO is used instead. All specification are ratified. However, The second patch is not ready to land in glibc yet, because the RISCV_HWPROBE_EXT_ZAWRS macro is not defined in upstream Linux yet (there is just a patch from Andrew Jones on LKML). Therefore, this patch is marked as RFC. See also: https://lore.kernel.org/all/20240315134009.580167-10-ajones@ventanamicro.com/ The first patch of this series imports all HWPROBE macros from Linux 6.8. As this patch has not further dependencies, it could be merged any time. This patch was tested with a simple test code that calls pthread_spin_lock() twice (triggering the spinning). This program was compiled for rv64gc and executed using QEMU. A small modification in QEMU was used to report if WRS.STO was executed (-cpu "rv64,zawrs=false" vs -cpu "rv64,zawrs=true"). Christoph Müllner (3): RISC-V: Sync hwprobe: Sync extension bits with Linux 6.8 RISC-V: hwprobe: Add Zawrs test bit RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs sysdeps/riscv/multiarch/cpu-relax_generic.S | 31 +++++++++++++++ sysdeps/riscv/multiarch/cpu-relax_zawrs.S | 28 +++++++++++++ .../unix/sysv/linux/riscv/atomic-machine.h | 3 ++ .../unix/sysv/linux/riscv/multiarch/Makefile | 8 ++++ .../sysv/linux/riscv/multiarch/cpu-relax.c | 39 +++++++++++++++++++ .../linux/riscv/multiarch/ifunc-impl-list.c | 32 +++++++++++++-- sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 30 ++++++++++++++ 7 files changed, 167 insertions(+), 4 deletions(-) create mode 100644 sysdeps/riscv/multiarch/cpu-relax_generic.S create mode 100644 sysdeps/riscv/multiarch/cpu-relax_zawrs.S create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c -- 2.44.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 1/3] RISC-V: Sync hwprobe: Sync extension bits with Linux 6.8 2024-04-18 9:46 [RFC PATCH 0/3] RISC-V: Use WRS.STO for atomic_spin_nop Christoph Müllner @ 2024-04-18 9:46 ` Christoph Müllner 2024-04-18 9:46 ` [RFC PATCH 2/3] RISC-V: hwprobe: Add Zawrs test bit Christoph Müllner 2024-04-18 9:46 ` [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs Christoph Müllner 2 siblings, 0 replies; 11+ messages in thread From: Christoph Müllner @ 2024-04-18 9:46 UTC (permalink / raw) To: libc-alpha, Adhemerval Zanella, Palmer Dabbelt, Darius Rad, Andrew Waterman, Philipp Tomsich, Evan Green, Kito Cheng, Jeff Law, Vineet Gupta Cc: Christoph Müllner This patch imports additional extension bits for hwprobe from Linux 6.8. This patch does not change existing behaviour as non of the newly defined bits are used anywhere. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> --- sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 29 +++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h index 8ecb43bb69..4856189f3c 100644 --- a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h +++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h @@ -50,6 +50,35 @@ struct riscv_hwprobe { #define RISCV_HWPROBE_EXT_ZBB (1 << 4) #define RISCV_HWPROBE_EXT_ZBS (1 << 5) #define RISCV_HWPROBE_EXT_ZICBOZ (1 << 6) +#define RISCV_HWPROBE_EXT_ZBC (1 << 7) +#define RISCV_HWPROBE_EXT_ZBKB (1 << 8) +#define RISCV_HWPROBE_EXT_ZBKC (1 << 9) +#define RISCV_HWPROBE_EXT_ZBKX (1 << 10) +#define RISCV_HWPROBE_EXT_ZKND (1 << 11) +#define RISCV_HWPROBE_EXT_ZKNE (1 << 12) +#define RISCV_HWPROBE_EXT_ZKNH (1 << 13) +#define RISCV_HWPROBE_EXT_ZKSED (1 << 14) +#define RISCV_HWPROBE_EXT_ZKSH (1 << 15) +#define RISCV_HWPROBE_EXT_ZKT (1 << 16) +#define RISCV_HWPROBE_EXT_ZVBB (1 << 17) +#define RISCV_HWPROBE_EXT_ZVBC (1 << 18) +#define RISCV_HWPROBE_EXT_ZVKB (1 << 19) +#define RISCV_HWPROBE_EXT_ZVKG (1 << 20) +#define RISCV_HWPROBE_EXT_ZVKNED (1 << 21) +#define RISCV_HWPROBE_EXT_ZVKNHA (1 << 22) +#define RISCV_HWPROBE_EXT_ZVKNHB (1 << 23) +#define RISCV_HWPROBE_EXT_ZVKSED (1 << 24) +#define RISCV_HWPROBE_EXT_ZVKSH (1 << 25) +#define RISCV_HWPROBE_EXT_ZVKT (1 << 26) +#define RISCV_HWPROBE_EXT_ZFH (1 << 27) +#define RISCV_HWPROBE_EXT_ZFHMIN (1 << 28) +#define RISCV_HWPROBE_EXT_ZIHINTNTL (1 << 29) +#define RISCV_HWPROBE_EXT_ZVFH (1 << 30) +#define RISCV_HWPROBE_EXT_ZVFHMIN (1 << 31) +#define RISCV_HWPROBE_EXT_ZFA (1ULL << 32) +#define RISCV_HWPROBE_EXT_ZTSO (1ULL << 33) +#define RISCV_HWPROBE_EXT_ZACAS (1ULL << 34) +#define RISCV_HWPROBE_EXT_ZICOND (1ULL << 35) #define RISCV_HWPROBE_KEY_CPUPERF_0 5 #define RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0) #define RISCV_HWPROBE_MISALIGNED_EMULATED (1 << 0) -- 2.44.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 2/3] RISC-V: hwprobe: Add Zawrs test bit 2024-04-18 9:46 [RFC PATCH 0/3] RISC-V: Use WRS.STO for atomic_spin_nop Christoph Müllner 2024-04-18 9:46 ` [RFC PATCH 1/3] RISC-V: Sync hwprobe: Sync extension bits with Linux 6.8 Christoph Müllner @ 2024-04-18 9:46 ` Christoph Müllner 2024-04-18 9:46 ` [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs Christoph Müllner 2 siblings, 0 replies; 11+ messages in thread From: Christoph Müllner @ 2024-04-18 9:46 UTC (permalink / raw) To: libc-alpha, Adhemerval Zanella, Palmer Dabbelt, Darius Rad, Andrew Waterman, Philipp Tomsich, Evan Green, Kito Cheng, Jeff Law, Vineet Gupta Cc: Christoph Müllner This patch adds a hwprobe test bit for Zawrs. The bit position is not settled as the corresponding kernel patch did not land upstream so far: https://lore.kernel.org/all/20240315134009.580167-10-ajones@ventanamicro.com/ Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> --- sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 1 + 1 file changed, 1 insertion(+) diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h index 4856189f3c..16526c3aec 100644 --- a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h +++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h @@ -79,6 +79,7 @@ struct riscv_hwprobe { #define RISCV_HWPROBE_EXT_ZTSO (1ULL << 33) #define RISCV_HWPROBE_EXT_ZACAS (1ULL << 34) #define RISCV_HWPROBE_EXT_ZICOND (1ULL << 35) +#define RISCV_HWPROBE_EXT_ZAWRS (1ULL << 36) #define RISCV_HWPROBE_KEY_CPUPERF_0 5 #define RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0) #define RISCV_HWPROBE_MISALIGNED_EMULATED (1 << 0) -- 2.44.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs 2024-04-18 9:46 [RFC PATCH 0/3] RISC-V: Use WRS.STO for atomic_spin_nop Christoph Müllner 2024-04-18 9:46 ` [RFC PATCH 1/3] RISC-V: Sync hwprobe: Sync extension bits with Linux 6.8 Christoph Müllner 2024-04-18 9:46 ` [RFC PATCH 2/3] RISC-V: hwprobe: Add Zawrs test bit Christoph Müllner @ 2024-04-18 9:46 ` Christoph Müllner 2024-04-18 17:17 ` Palmer Dabbelt 2 siblings, 1 reply; 11+ messages in thread From: Christoph Müllner @ 2024-04-18 9:46 UTC (permalink / raw) To: libc-alpha, Adhemerval Zanella, Palmer Dabbelt, Darius Rad, Andrew Waterman, Philipp Tomsich, Evan Green, Kito Cheng, Jeff Law, Vineet Gupta Cc: Christoph Müllner The macro atomic_spin_nop can be used to implement arch-specific CPU yielding that is used in busy loops (e.g. in pthread_spin_lock). This patch introduces an ifunc-based implementation for RISC-V, that uses Zihintpause's PAUSE instruction for that matter (as PAUSE is a HINT instruction there is not dependency to Zihintpause at runtime). Further, we test for Zawrs via hwprobe() and if found we use WRS.STO instead of PAUSE. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> --- sysdeps/riscv/multiarch/cpu-relax_generic.S | 31 +++++++++++++++ sysdeps/riscv/multiarch/cpu-relax_zawrs.S | 28 +++++++++++++ .../unix/sysv/linux/riscv/atomic-machine.h | 3 ++ .../unix/sysv/linux/riscv/multiarch/Makefile | 8 ++++ .../sysv/linux/riscv/multiarch/cpu-relax.c | 39 +++++++++++++++++++ .../linux/riscv/multiarch/ifunc-impl-list.c | 32 +++++++++++++-- 6 files changed, 137 insertions(+), 4 deletions(-) create mode 100644 sysdeps/riscv/multiarch/cpu-relax_generic.S create mode 100644 sysdeps/riscv/multiarch/cpu-relax_zawrs.S create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c diff --git a/sysdeps/riscv/multiarch/cpu-relax_generic.S b/sysdeps/riscv/multiarch/cpu-relax_generic.S new file mode 100644 index 0000000000..d3ccfdce84 --- /dev/null +++ b/sysdeps/riscv/multiarch/cpu-relax_generic.S @@ -0,0 +1,31 @@ +/* CPU strand yielding for busy loops. RISC-V version. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + <https://www.gnu.org/licenses/>. */ + +#include <sysdep.h> +#include <sys/asm.h> + +.option push +.option arch, +zihintpause +ENTRY (__cpu_relax_generic) + /* While we can use the `pause` instruction without + the need of Zihintpause (because it is a HINT instruction), + we still have to enable Zihintpause for the assembler. */ + pause + ret +END (__cpu_relax_generic) +.option pop diff --git a/sysdeps/riscv/multiarch/cpu-relax_zawrs.S b/sysdeps/riscv/multiarch/cpu-relax_zawrs.S new file mode 100644 index 0000000000..6d27b354df --- /dev/null +++ b/sysdeps/riscv/multiarch/cpu-relax_zawrs.S @@ -0,0 +1,28 @@ +/* CPU strand yielding for busy loops. RISC-V version with Zawrs. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + <https://www.gnu.org/licenses/>. */ + +#include <sysdep.h> +#include <sys/asm.h> + +.option push +.option arch, +zawrs +ENTRY (__cpu_relax_zawrs) + wrs.sto + ret +END (__cpu_relax_zawrs) +.option pop diff --git a/sysdeps/unix/sysv/linux/riscv/atomic-machine.h b/sysdeps/unix/sysv/linux/riscv/atomic-machine.h index c1c9d949a0..02b9b7a421 100644 --- a/sysdeps/unix/sysv/linux/riscv/atomic-machine.h +++ b/sysdeps/unix/sysv/linux/riscv/atomic-machine.h @@ -178,4 +178,7 @@ # error "ISAs that do not subsume the A extension are not supported" #endif /* !__riscv_atomic */ +extern void __cpu_relax (void); +#define atomic_spin_nop() __cpu_relax() + #endif /* bits/atomic.h */ diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile index fcef5659d4..0cdf37a38b 100644 --- a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile @@ -1,3 +1,11 @@ +# nscd uses atomic_spin_nop which in turn requires cpu_relax +ifeq ($(subdir),nscd) +sysdep_routines += \ + cpu-relax \ + cpu-relax_generic \ + cpu-relax_zawrs +endif + ifeq ($(subdir),string) sysdep_routines += \ memcpy \ diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c b/sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c new file mode 100644 index 0000000000..5aeb120e21 --- /dev/null +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c @@ -0,0 +1,39 @@ +/* Multiple versions of cpu-relax. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +# include <ifunc-init.h> +# include <riscv-ifunc.h> +# include <sys/hwprobe.h> + +void __cpu_relax (void); +extern __typeof (__cpu_relax) __cpu_relax_generic attribute_hidden; +extern __typeof (__cpu_relax) __cpu_relax_zawrs attribute_hidden; + +static inline __typeof (__cpu_relax) * +select_cpu_relax_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_func) +{ + unsigned long long int v; + if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_IMA_EXT_0, &v) == 0 + && (v & RISCV_HWPROBE_EXT_ZAWRS)) + return __cpu_relax_zawrs; + + return __cpu_relax_generic; +} + +riscv_libc_ifunc (__cpu_relax, select_cpu_relax_ifunc); diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c index 9f806d7a9e..9c7a8c2e1f 100644 --- a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c @@ -20,24 +20,48 @@ #include <string.h> #include <sys/hwprobe.h> +#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0])) + +void __cpu_relax (void); + size_t __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, size_t max) { size_t i = max; + struct riscv_hwprobe pairs[] = { + { .key = RISCV_HWPROBE_KEY_IMA_EXT_0 }, + { .key = RISCV_HWPROBE_KEY_CPUPERF_0 }, + }; bool fast_unaligned = false; + bool has_zawrs = false; + + if (__riscv_hwprobe (pairs, ARRAY_SIZE (pairs), 0, NULL, 0) == 0) + { + struct riscv_hwprobe *pair; - struct riscv_hwprobe pair = { .key = RISCV_HWPROBE_KEY_CPUPERF_0 }; - if (__riscv_hwprobe (&pair, 1, 0, NULL, 0) == 0 - && (pair.value & RISCV_HWPROBE_MISALIGNED_MASK) + /* RISCV_HWPROBE_KEY_IMA_EXT_0 */ + pair = &pairs[0]; + if (pair->value & RISCV_HWPROBE_EXT_ZAWRS) + has_zawrs = true; + + /* RISCV_HWPROBE_KEY_CPUPERF_0 */ + pair = &pairs[1]; + if ((pair->value & RISCV_HWPROBE_MISALIGNED_MASK) == RISCV_HWPROBE_MISALIGNED_FAST) - fast_unaligned = true; + fast_unaligned = true; + } IFUNC_IMPL (i, name, memcpy, IFUNC_IMPL_ADD (array, i, memcpy, fast_unaligned, __memcpy_noalignment) IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic)) + IFUNC_IMPL (i, name, __cpu_relax, + IFUNC_IMPL_ADD (array, i, __cpu_relax, has_zawrs, + __cpu_relax_zawrs) + IFUNC_IMPL_ADD (array, i, __cpu_relax, 1, __cpu_relax_generic)) + return 0; } -- 2.44.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs 2024-04-18 9:46 ` [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs Christoph Müllner @ 2024-04-18 17:17 ` Palmer Dabbelt 2024-04-18 20:03 ` Vineet Gupta 2024-04-18 20:19 ` Christoph Müllner 0 siblings, 2 replies; 11+ messages in thread From: Palmer Dabbelt @ 2024-04-18 17:17 UTC (permalink / raw) To: christoph.muellner Cc: libc-alpha, adhemerval.zanella, Darius Rad, Andrew Waterman, philipp.tomsich, Evan Green, kito.cheng, jeffreyalaw, Vineet Gupta, christoph.muellner On Thu, 18 Apr 2024 02:46:35 PDT (-0700), christoph.muellner@vrull.eu wrote: > The macro atomic_spin_nop can be used to implement arch-specific > CPU yielding that is used in busy loops (e.g. in pthread_spin_lock). > This patch introduces an ifunc-based implementation for RISC-V, > that uses Zihintpause's PAUSE instruction for that matter (as PAUSE > is a HINT instruction there is not dependency to Zihintpause at > runtime). Further, we test for Zawrs via hwprobe() and if found > we use WRS.STO instead of PAUSE. > > Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> > --- > sysdeps/riscv/multiarch/cpu-relax_generic.S | 31 +++++++++++++++ > sysdeps/riscv/multiarch/cpu-relax_zawrs.S | 28 +++++++++++++ > .../unix/sysv/linux/riscv/atomic-machine.h | 3 ++ > .../unix/sysv/linux/riscv/multiarch/Makefile | 8 ++++ > .../sysv/linux/riscv/multiarch/cpu-relax.c | 39 +++++++++++++++++++ > .../linux/riscv/multiarch/ifunc-impl-list.c | 32 +++++++++++++-- > 6 files changed, 137 insertions(+), 4 deletions(-) > create mode 100644 sysdeps/riscv/multiarch/cpu-relax_generic.S > create mode 100644 sysdeps/riscv/multiarch/cpu-relax_zawrs.S > create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c > > diff --git a/sysdeps/riscv/multiarch/cpu-relax_generic.S b/sysdeps/riscv/multiarch/cpu-relax_generic.S > new file mode 100644 > index 0000000000..d3ccfdce84 > --- /dev/null > +++ b/sysdeps/riscv/multiarch/cpu-relax_generic.S > @@ -0,0 +1,31 @@ > +/* CPU strand yielding for busy loops. RISC-V version. > + Copyright (C) 2024 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library. If not, see > + <https://www.gnu.org/licenses/>. */ > + > +#include <sysdep.h> > +#include <sys/asm.h> > + > +.option push > +.option arch, +zihintpause > +ENTRY (__cpu_relax_generic) > + /* While we can use the `pause` instruction without > + the need of Zihintpause (because it is a HINT instruction), > + we still have to enable Zihintpause for the assembler. */ > + pause > + ret > +END (__cpu_relax_generic) > +.option pop > diff --git a/sysdeps/riscv/multiarch/cpu-relax_zawrs.S b/sysdeps/riscv/multiarch/cpu-relax_zawrs.S > new file mode 100644 > index 0000000000..6d27b354df > --- /dev/null > +++ b/sysdeps/riscv/multiarch/cpu-relax_zawrs.S > @@ -0,0 +1,28 @@ > +/* CPU strand yielding for busy loops. RISC-V version with Zawrs. > + Copyright (C) 2024 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library. If not, see > + <https://www.gnu.org/licenses/>. */ > + > +#include <sysdep.h> > +#include <sys/asm.h> > + > +.option push > +.option arch, +zawrs > +ENTRY (__cpu_relax_zawrs) > + wrs.sto This has the same forward progress/eventual success violation as the code you sent for GCC and Linux does. It doesn't really matter if the user of the reservation is in a builtin, an asm block, or a function. The compiler just doesn't know about those reservation rules and isn't going to generate code that follows them. That said, as far as I can tell the same grey area in the WRS spec that I pointed out during one of the Linux reviews still exists. So if there's some hardware that has a more generous set of LR->WRS rules than the ISA-required LR->SC rules that's fine, we just need to know what the hardware actually is so we can write down the rules and then stick to them -- it's just a performance thing for WRS, so having weaker rules seems entirely plausible. I don't know of any hardware with WRS, but sorry if I missed one. Do you have something publicly availiable that behaves this way? > + ret > +END (__cpu_relax_zawrs) > +.option pop > diff --git a/sysdeps/unix/sysv/linux/riscv/atomic-machine.h b/sysdeps/unix/sysv/linux/riscv/atomic-machine.h > index c1c9d949a0..02b9b7a421 100644 > --- a/sysdeps/unix/sysv/linux/riscv/atomic-machine.h > +++ b/sysdeps/unix/sysv/linux/riscv/atomic-machine.h > @@ -178,4 +178,7 @@ > # error "ISAs that do not subsume the A extension are not supported" > #endif /* !__riscv_atomic */ > > +extern void __cpu_relax (void); > +#define atomic_spin_nop() __cpu_relax() > + > #endif /* bits/atomic.h */ > diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile > index fcef5659d4..0cdf37a38b 100644 > --- a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile > +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile > @@ -1,3 +1,11 @@ > +# nscd uses atomic_spin_nop which in turn requires cpu_relax > +ifeq ($(subdir),nscd) > +sysdep_routines += \ > + cpu-relax \ > + cpu-relax_generic \ > + cpu-relax_zawrs > +endif > + > ifeq ($(subdir),string) > sysdep_routines += \ > memcpy \ > diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c b/sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c > new file mode 100644 > index 0000000000..5aeb120e21 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c > @@ -0,0 +1,39 @@ > +/* Multiple versions of cpu-relax. > + All versions must be listed in ifunc-impl-list.c. > + Copyright (C) 2024 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <https://www.gnu.org/licenses/>. */ > + > +# include <ifunc-init.h> > +# include <riscv-ifunc.h> > +# include <sys/hwprobe.h> > + > +void __cpu_relax (void); > +extern __typeof (__cpu_relax) __cpu_relax_generic attribute_hidden; > +extern __typeof (__cpu_relax) __cpu_relax_zawrs attribute_hidden; > + > +static inline __typeof (__cpu_relax) * > +select_cpu_relax_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_func) > +{ > + unsigned long long int v; > + if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_IMA_EXT_0, &v) == 0 > + && (v & RISCV_HWPROBE_EXT_ZAWRS)) > + return __cpu_relax_zawrs; > + > + return __cpu_relax_generic; > +} > + > +riscv_libc_ifunc (__cpu_relax, select_cpu_relax_ifunc); > diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c > index 9f806d7a9e..9c7a8c2e1f 100644 > --- a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c > +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c > @@ -20,24 +20,48 @@ > #include <string.h> > #include <sys/hwprobe.h> > > +#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0])) > + > +void __cpu_relax (void); > + > size_t > __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > size_t max) > { > size_t i = max; > + struct riscv_hwprobe pairs[] = { > + { .key = RISCV_HWPROBE_KEY_IMA_EXT_0 }, > + { .key = RISCV_HWPROBE_KEY_CPUPERF_0 }, > + }; > > bool fast_unaligned = false; > + bool has_zawrs = false; > + > + if (__riscv_hwprobe (pairs, ARRAY_SIZE (pairs), 0, NULL, 0) == 0) > + { > + struct riscv_hwprobe *pair; > > - struct riscv_hwprobe pair = { .key = RISCV_HWPROBE_KEY_CPUPERF_0 }; > - if (__riscv_hwprobe (&pair, 1, 0, NULL, 0) == 0 > - && (pair.value & RISCV_HWPROBE_MISALIGNED_MASK) > + /* RISCV_HWPROBE_KEY_IMA_EXT_0 */ > + pair = &pairs[0]; > + if (pair->value & RISCV_HWPROBE_EXT_ZAWRS) > + has_zawrs = true; > + > + /* RISCV_HWPROBE_KEY_CPUPERF_0 */ > + pair = &pairs[1]; > + if ((pair->value & RISCV_HWPROBE_MISALIGNED_MASK) > == RISCV_HWPROBE_MISALIGNED_FAST) > - fast_unaligned = true; > + fast_unaligned = true; > + } > > IFUNC_IMPL (i, name, memcpy, > IFUNC_IMPL_ADD (array, i, memcpy, fast_unaligned, > __memcpy_noalignment) > IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic)) > > + IFUNC_IMPL (i, name, __cpu_relax, > + IFUNC_IMPL_ADD (array, i, __cpu_relax, has_zawrs, > + __cpu_relax_zawrs) > + IFUNC_IMPL_ADD (array, i, __cpu_relax, 1, __cpu_relax_generic)) > + > return 0; > } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs 2024-04-18 17:17 ` Palmer Dabbelt @ 2024-04-18 20:03 ` Vineet Gupta 2024-04-18 20:25 ` Christoph Müllner 2024-04-18 20:19 ` Christoph Müllner 1 sibling, 1 reply; 11+ messages in thread From: Vineet Gupta @ 2024-04-18 20:03 UTC (permalink / raw) To: Palmer Dabbelt, christoph.muellner Cc: libc-alpha, adhemerval.zanella, Darius Rad, Andrew Waterman, philipp.tomsich, Evan Green, kito.cheng, jeffreyalaw, gnu-toolchain Hi Christoph, My 2 cents. On 4/18/24 10:17, Palmer Dabbelt wrote: > On Thu, 18 Apr 2024 02:46:35 PDT (-0700), christoph.muellner@vrull.eu wrote: >> The macro atomic_spin_nop can be used to implement arch-specific >> CPU yielding that is used in busy loops (e.g. in pthread_spin_lock). >> This patch introduces an ifunc-based implementation for RISC-V, >> that uses Zihintpause's PAUSE instruction for that matter (as PAUSE >> is a HINT instruction there is not dependency to Zihintpause at >> runtime). Further, we test for Zawrs via hwprobe() and if found >> we use WRS.STO instead of PAUSE. [snip] >> +ENTRY (__cpu_relax_generic) >> + /* While we can use the `pause` instruction without >> + the need of Zihintpause (because it is a HINT instruction), >> + we still have to enable Zihintpause for the assembler. */ >> + pause >> + ret >> +END (__cpu_relax_generic) [snip] >> +.option push >> +.option arch, +zawrs >> +ENTRY (__cpu_relax_zawrs) >> + wrs.sto > This has the same forward progress/eventual success violation as the > code you sent for GCC and Linux does. It doesn't really matter if the > user of the reservation is in a builtin, an asm block, or a function. > The compiler just doesn't know about those reservation rules and isn't > going to generate code that follows them. So, this routine maps to atomic_spin_nop () called in generic code in a bunch of places. The easiest case is nptl/pthread_spin_lock.c which looks something like this __pthread_spin_lock (pthread_spinlock_t *lock) ... do { atomic_spin_nop (); val = atomic_load_relaxed (lock); } while (val != 0); This works fine for a PAUSE based implementation which doesn't need a prior reservation, but WRS does and even if both the load and spin are inlined, reservation happens after WRS which is wrong. So I think before we can implement this optimization for riscv, we need a generic glibc change to replace of atomic_spin_nop () with a variant which also takes the lock/memory under consideration. The fallback implementation of atomic_load_and_spin_if_cond() will be what the above loop does. For arches that do implement the API, based on ISA semantics, they can choose to ignore the mem arg completely (RISC-V PAUSE based implementation) or implement a little loop which explicitly takes reservation on the mem/lock and calls WRS. Does that make sense ? We just need to sort out the 2nd use of atomic_spin_nop in nptl/pthread_mutex_lock.c where this conversion might not be straight fwd. The recent back off back spin looping is getting in the way of obvious solution, however for discussion sake if we ignore it, the code looks like this: PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex) ... int cnt=0, max_cnt = something; do { if (cnt ++>= max_cnt) { LLL_MUTEX_LOCK (mutex); break; } atomic_spin_nop (); } while (LLL_MUTEX_READ_LOCK (mutex) != 0 || LLL_MUTEX_TRYLOCK (mutex) != 0); And this seems like it can be converted to a atomic_load_and_spin_if_cond(mutex, !=0 ) kind of API. Let me know if you want me to spin this - pun intended ;-) -Vineet > That said, as far as I can tell the same grey area in the WRS spec that > I pointed out during one of the Linux reviews still exists. So if > there's some hardware that has a more generous set of LR->WRS rules than > the ISA-required LR->SC rules that's fine, we just need to know what the > hardware actually is so we can write down the rules and then stick to > them -- it's just a performance thing for WRS, so having weaker rules > seems entirely plausible. > > I don't know of any hardware with WRS, but sorry if I missed one. Do > you have something publicly availiable that behaves this way? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs 2024-04-18 20:03 ` Vineet Gupta @ 2024-04-18 20:25 ` Christoph Müllner 0 siblings, 0 replies; 11+ messages in thread From: Christoph Müllner @ 2024-04-18 20:25 UTC (permalink / raw) To: Vineet Gupta Cc: Palmer Dabbelt, libc-alpha, adhemerval.zanella, Darius Rad, Andrew Waterman, philipp.tomsich, Evan Green, kito.cheng, jeffreyalaw, gnu-toolchain On Thu, Apr 18, 2024 at 10:03 PM Vineet Gupta <vineetg@rivosinc.com> wrote: > > Hi Christoph, > > My 2 cents. > > On 4/18/24 10:17, Palmer Dabbelt wrote: > > On Thu, 18 Apr 2024 02:46:35 PDT (-0700), christoph.muellner@vrull.eu wrote: > >> The macro atomic_spin_nop can be used to implement arch-specific > >> CPU yielding that is used in busy loops (e.g. in pthread_spin_lock). > >> This patch introduces an ifunc-based implementation for RISC-V, > >> that uses Zihintpause's PAUSE instruction for that matter (as PAUSE > >> is a HINT instruction there is not dependency to Zihintpause at > >> runtime). Further, we test for Zawrs via hwprobe() and if found > >> we use WRS.STO instead of PAUSE. > > [snip] > > >> +ENTRY (__cpu_relax_generic) > >> + /* While we can use the `pause` instruction without > >> + the need of Zihintpause (because it is a HINT instruction), > >> + we still have to enable Zihintpause for the assembler. */ > >> + pause > >> + ret > >> +END (__cpu_relax_generic) > > [snip] > > >> +.option push > >> +.option arch, +zawrs > >> +ENTRY (__cpu_relax_zawrs) > >> + wrs.sto > > This has the same forward progress/eventual success violation as the > > code you sent for GCC and Linux does. It doesn't really matter if the > > user of the reservation is in a builtin, an asm block, or a function. > > The compiler just doesn't know about those reservation rules and isn't > > going to generate code that follows them. > > So, this routine maps to atomic_spin_nop () called in generic code in a > bunch of places. > > The easiest case is nptl/pthread_spin_lock.c which looks something like this > > __pthread_spin_lock (pthread_spinlock_t *lock) > ... > do > { > atomic_spin_nop (); > val = atomic_load_relaxed (lock); > } > while (val != 0); > > This works fine for a PAUSE based implementation which doesn't need a > prior reservation, but WRS does and even if both the load and spin are > inlined, reservation happens after WRS which is wrong. > > So I think before we can implement this optimization for riscv, we need > a generic glibc change to replace of atomic_spin_nop () with a variant > which also takes the lock/memory under consideration. > The fallback implementation of atomic_load_and_spin_if_cond() will be > what the above loop does. For arches that do implement the API, based on > ISA semantics, they can choose to ignore the mem arg completely (RISC-V > PAUSE based implementation) or implement a little loop which explicitly > takes reservation on the mem/lock and calls WRS. > > Does that make sense ? I did something similar in the Linux patch a while ago. Here I wanted to avoid changes in arch-independent code. But you are right, wrs.sto does not make sense without a valid reservation. > > We just need to sort out the 2nd use of atomic_spin_nop in > nptl/pthread_mutex_lock.c where this conversion might not be straight fwd. > The recent back off back spin looping is getting in the way of obvious > solution, however for discussion sake if we ignore it, the code looks > like this: > > PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex) > ... > int cnt=0, max_cnt = something; > do > { > if (cnt ++>= max_cnt) > { > LLL_MUTEX_LOCK (mutex); > break; > } > atomic_spin_nop (); > } > while (LLL_MUTEX_READ_LOCK (mutex) != 0 || LLL_MUTEX_TRYLOCK > (mutex) != 0); > > And this seems like it can be converted to a > atomic_load_and_spin_if_cond(mutex, !=0 ) kind of API. > > Let me know if you want me to spin this - pun intended ;-) > > -Vineet > > > > That said, as far as I can tell the same grey area in the WRS spec that > > I pointed out during one of the Linux reviews still exists. So if > > there's some hardware that has a more generous set of LR->WRS rules than > > the ISA-required LR->SC rules that's fine, we just need to know what the > > hardware actually is so we can write down the rules and then stick to > > them -- it's just a performance thing for WRS, so having weaker rules > > seems entirely plausible. > > > > I don't know of any hardware with WRS, but sorry if I missed one. Do > > you have something publicly availiable that behaves this way? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs 2024-04-18 17:17 ` Palmer Dabbelt 2024-04-18 20:03 ` Vineet Gupta @ 2024-04-18 20:19 ` Christoph Müllner 2024-04-18 20:36 ` Vineet Gupta 1 sibling, 1 reply; 11+ messages in thread From: Christoph Müllner @ 2024-04-18 20:19 UTC (permalink / raw) To: Palmer Dabbelt Cc: libc-alpha, adhemerval.zanella, Darius Rad, Andrew Waterman, philipp.tomsich, Evan Green, kito.cheng, jeffreyalaw, Vineet Gupta On Thu, Apr 18, 2024 at 7:17 PM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Thu, 18 Apr 2024 02:46:35 PDT (-0700), christoph.muellner@vrull.eu wrote: > > The macro atomic_spin_nop can be used to implement arch-specific > > CPU yielding that is used in busy loops (e.g. in pthread_spin_lock). > > This patch introduces an ifunc-based implementation for RISC-V, > > that uses Zihintpause's PAUSE instruction for that matter (as PAUSE > > is a HINT instruction there is not dependency to Zihintpause at > > runtime). Further, we test for Zawrs via hwprobe() and if found > > we use WRS.STO instead of PAUSE. > > > > Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> > > --- > > sysdeps/riscv/multiarch/cpu-relax_generic.S | 31 +++++++++++++++ > > sysdeps/riscv/multiarch/cpu-relax_zawrs.S | 28 +++++++++++++ > > .../unix/sysv/linux/riscv/atomic-machine.h | 3 ++ > > .../unix/sysv/linux/riscv/multiarch/Makefile | 8 ++++ > > .../sysv/linux/riscv/multiarch/cpu-relax.c | 39 +++++++++++++++++++ > > .../linux/riscv/multiarch/ifunc-impl-list.c | 32 +++++++++++++-- > > 6 files changed, 137 insertions(+), 4 deletions(-) > > create mode 100644 sysdeps/riscv/multiarch/cpu-relax_generic.S > > create mode 100644 sysdeps/riscv/multiarch/cpu-relax_zawrs.S > > create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c > > > > diff --git a/sysdeps/riscv/multiarch/cpu-relax_generic.S b/sysdeps/riscv/multiarch/cpu-relax_generic.S > > new file mode 100644 > > index 0000000000..d3ccfdce84 > > --- /dev/null > > +++ b/sysdeps/riscv/multiarch/cpu-relax_generic.S > > @@ -0,0 +1,31 @@ > > +/* CPU strand yielding for busy loops. RISC-V version. > > + Copyright (C) 2024 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful, > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library. If not, see > > + <https://www.gnu.org/licenses/>. */ > > + > > +#include <sysdep.h> > > +#include <sys/asm.h> > > + > > +.option push > > +.option arch, +zihintpause > > +ENTRY (__cpu_relax_generic) > > + /* While we can use the `pause` instruction without > > + the need of Zihintpause (because it is a HINT instruction), > > + we still have to enable Zihintpause for the assembler. */ > > + pause > > + ret > > +END (__cpu_relax_generic) > > +.option pop > > diff --git a/sysdeps/riscv/multiarch/cpu-relax_zawrs.S b/sysdeps/riscv/multiarch/cpu-relax_zawrs.S > > new file mode 100644 > > index 0000000000..6d27b354df > > --- /dev/null > > +++ b/sysdeps/riscv/multiarch/cpu-relax_zawrs.S > > @@ -0,0 +1,28 @@ > > +/* CPU strand yielding for busy loops. RISC-V version with Zawrs. > > + Copyright (C) 2024 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful, > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library. If not, see > > + <https://www.gnu.org/licenses/>. */ > > + > > +#include <sysdep.h> > > +#include <sys/asm.h> > > + > > +.option push > > +.option arch, +zawrs > > +ENTRY (__cpu_relax_zawrs) > > + wrs.sto > > This has the same forward progress/eventual success violation as the > code you sent for GCC and Linux does. It doesn't really matter if the > user of the reservation is in a builtin, an asm block, or a function. > The compiler just doesn't know about those reservation rules and isn't > going to generate code that follows them. I see. The main issue is that we don't have a valid reservation when calling WRS, so the whole use of Zawrs instructions is pointless. So the only way to move Zawrs forward would be to adjust the locking routines (introducing new primitives that have to be implemented for all architectures). > > That said, as far as I can tell the same grey area in the WRS spec that > I pointed out during one of the Linux reviews still exists. So if > there's some hardware that has a more generous set of LR->WRS rules than > the ISA-required LR->SC rules that's fine, we just need to know what the > hardware actually is so we can write down the rules and then stick to > them -- it's just a performance thing for WRS, so having weaker rules > seems entirely plausible. > > I don't know of any hardware with WRS, but sorry if I missed one. Do > you have something publicly availiable that behaves this way? > > > + ret > > +END (__cpu_relax_zawrs) > > +.option pop > > diff --git a/sysdeps/unix/sysv/linux/riscv/atomic-machine.h b/sysdeps/unix/sysv/linux/riscv/atomic-machine.h > > index c1c9d949a0..02b9b7a421 100644 > > --- a/sysdeps/unix/sysv/linux/riscv/atomic-machine.h > > +++ b/sysdeps/unix/sysv/linux/riscv/atomic-machine.h > > @@ -178,4 +178,7 @@ > > # error "ISAs that do not subsume the A extension are not supported" > > #endif /* !__riscv_atomic */ > > > > +extern void __cpu_relax (void); > > +#define atomic_spin_nop() __cpu_relax() > > + > > #endif /* bits/atomic.h */ > > diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile > > index fcef5659d4..0cdf37a38b 100644 > > --- a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile > > +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile > > @@ -1,3 +1,11 @@ > > +# nscd uses atomic_spin_nop which in turn requires cpu_relax > > +ifeq ($(subdir),nscd) > > +sysdep_routines += \ > > + cpu-relax \ > > + cpu-relax_generic \ > > + cpu-relax_zawrs > > +endif > > + > > ifeq ($(subdir),string) > > sysdep_routines += \ > > memcpy \ > > diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c b/sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c > > new file mode 100644 > > index 0000000000..5aeb120e21 > > --- /dev/null > > +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c > > @@ -0,0 +1,39 @@ > > +/* Multiple versions of cpu-relax. > > + All versions must be listed in ifunc-impl-list.c. > > + Copyright (C) 2024 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful, > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + <https://www.gnu.org/licenses/>. */ > > + > > +# include <ifunc-init.h> > > +# include <riscv-ifunc.h> > > +# include <sys/hwprobe.h> > > + > > +void __cpu_relax (void); > > +extern __typeof (__cpu_relax) __cpu_relax_generic attribute_hidden; > > +extern __typeof (__cpu_relax) __cpu_relax_zawrs attribute_hidden; > > + > > +static inline __typeof (__cpu_relax) * > > +select_cpu_relax_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_func) > > +{ > > + unsigned long long int v; > > + if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_IMA_EXT_0, &v) == 0 > > + && (v & RISCV_HWPROBE_EXT_ZAWRS)) > > + return __cpu_relax_zawrs; > > + > > + return __cpu_relax_generic; > > +} > > + > > +riscv_libc_ifunc (__cpu_relax, select_cpu_relax_ifunc); > > diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c > > index 9f806d7a9e..9c7a8c2e1f 100644 > > --- a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c > > +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c > > @@ -20,24 +20,48 @@ > > #include <string.h> > > #include <sys/hwprobe.h> > > > > +#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0])) > > + > > +void __cpu_relax (void); > > + > > size_t > > __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > size_t max) > > { > > size_t i = max; > > + struct riscv_hwprobe pairs[] = { > > + { .key = RISCV_HWPROBE_KEY_IMA_EXT_0 }, > > + { .key = RISCV_HWPROBE_KEY_CPUPERF_0 }, > > + }; > > > > bool fast_unaligned = false; > > + bool has_zawrs = false; > > + > > + if (__riscv_hwprobe (pairs, ARRAY_SIZE (pairs), 0, NULL, 0) == 0) > > + { > > + struct riscv_hwprobe *pair; > > > > - struct riscv_hwprobe pair = { .key = RISCV_HWPROBE_KEY_CPUPERF_0 }; > > - if (__riscv_hwprobe (&pair, 1, 0, NULL, 0) == 0 > > - && (pair.value & RISCV_HWPROBE_MISALIGNED_MASK) > > + /* RISCV_HWPROBE_KEY_IMA_EXT_0 */ > > + pair = &pairs[0]; > > + if (pair->value & RISCV_HWPROBE_EXT_ZAWRS) > > + has_zawrs = true; > > + > > + /* RISCV_HWPROBE_KEY_CPUPERF_0 */ > > + pair = &pairs[1]; > > + if ((pair->value & RISCV_HWPROBE_MISALIGNED_MASK) > > == RISCV_HWPROBE_MISALIGNED_FAST) > > - fast_unaligned = true; > > + fast_unaligned = true; > > + } > > > > IFUNC_IMPL (i, name, memcpy, > > IFUNC_IMPL_ADD (array, i, memcpy, fast_unaligned, > > __memcpy_noalignment) > > IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic)) > > > > + IFUNC_IMPL (i, name, __cpu_relax, > > + IFUNC_IMPL_ADD (array, i, __cpu_relax, has_zawrs, > > + __cpu_relax_zawrs) > > + IFUNC_IMPL_ADD (array, i, __cpu_relax, 1, __cpu_relax_generic)) > > + > > return 0; > > } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs 2024-04-18 20:19 ` Christoph Müllner @ 2024-04-18 20:36 ` Vineet Gupta 2024-04-18 21:10 ` Palmer Dabbelt 0 siblings, 1 reply; 11+ messages in thread From: Vineet Gupta @ 2024-04-18 20:36 UTC (permalink / raw) To: Christoph Müllner, Palmer Dabbelt Cc: libc-alpha, adhemerval.zanella, Darius Rad, Andrew Waterman, philipp.tomsich, Evan Green, kito.cheng, jeffreyalaw On 4/18/24 13:19, Christoph Müllner wrote: >> This has the same forward progress/eventual success violation as the >> code you sent for GCC and Linux does. It doesn't really matter if the >> user of the reservation is in a builtin, an asm block, or a function. >> The compiler just doesn't know about those reservation rules and isn't >> going to generate code that follows them. > I see. The main issue is that we don't have a valid reservation when > calling WRS, > so the whole use of Zawrs instructions is pointless. > So the only way to move Zawrs forward would be to adjust the locking routines > (introducing new primitives that have to be implemented for all architectures). Not explicitly anyways - the generic fallback will take care of every arch, except SPARC/x86 which implement atomic_spin_nop wth pause like semantics, but even they don't need to change at all if we implement new API atomic_load_and_spin_if_cond_whatever () in terms of existing atomic_spin_nop () -Vineet ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs 2024-04-18 20:36 ` Vineet Gupta @ 2024-04-18 21:10 ` Palmer Dabbelt 2024-04-19 14:09 ` Andrew Jones 0 siblings, 1 reply; 11+ messages in thread From: Palmer Dabbelt @ 2024-04-18 21:10 UTC (permalink / raw) To: Vineet Gupta, ajones, Charlie Jenkins Cc: christoph.muellner, libc-alpha, adhemerval.zanella, Darius Rad, Andrew Waterman, philipp.tomsich, Evan Green, kito.cheng, jeffreyalaw On Thu, 18 Apr 2024 13:36:32 PDT (-0700), Vineet Gupta wrote: > On 4/18/24 13:19, Christoph Müllner wrote: >>> This has the same forward progress/eventual success violation as the >>> code you sent for GCC and Linux does. It doesn't really matter if the >>> user of the reservation is in a builtin, an asm block, or a function. >>> The compiler just doesn't know about those reservation rules and isn't >>> going to generate code that follows them. >> I see. The main issue is that we don't have a valid reservation when >> calling WRS, >> so the whole use of Zawrs instructions is pointless. >> So the only way to move Zawrs forward would be to adjust the locking routines >> (introducing new primitives that have to be implemented for all architectures). > > Not explicitly anyways - the generic fallback will take care of every > arch, except SPARC/x86 which implement atomic_spin_nop wth pause like > semantics, but even they don't need to change at all if we implement new > API atomic_load_and_spin_if_cond_whatever () in terms of existing > atomic_spin_nop () Ya, sounds about right. IIRC I just hooked some of the LLL macros when doing the POC/estimates, but it was at least a year ago so I forget exactly how it all fit together. Whatever it is, they end up with basically the same load->cond->lr->beq->{wrs,sc} patterns as a bunch of the Linux routines should have (sort of a test-and-test-and-set type pattern, or how arm64 does the load-and-cmpxchg routines). Adding Charlie and Drew, as we were talking about the Linux side of things recently. > -Vineet ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs 2024-04-18 21:10 ` Palmer Dabbelt @ 2024-04-19 14:09 ` Andrew Jones 0 siblings, 0 replies; 11+ messages in thread From: Andrew Jones @ 2024-04-19 14:09 UTC (permalink / raw) To: Palmer Dabbelt Cc: Vineet Gupta, Charlie Jenkins, christoph.muellner, libc-alpha, adhemerval.zanella, Darius Rad, Andrew Waterman, philipp.tomsich, Evan Green, kito.cheng, jeffreyalaw On Thu, Apr 18, 2024 at 02:10:42PM -0700, Palmer Dabbelt wrote: > On Thu, 18 Apr 2024 13:36:32 PDT (-0700), Vineet Gupta wrote: > > On 4/18/24 13:19, Christoph Müllner wrote: > > > > This has the same forward progress/eventual success violation as the > > > > code you sent for GCC and Linux does. It doesn't really matter if the > > > > user of the reservation is in a builtin, an asm block, or a function. > > > > The compiler just doesn't know about those reservation rules and isn't > > > > going to generate code that follows them. > > > I see. The main issue is that we don't have a valid reservation when > > > calling WRS, > > > so the whole use of Zawrs instructions is pointless. > > > So the only way to move Zawrs forward would be to adjust the locking routines > > > (introducing new primitives that have to be implemented for all architectures). > > > > Not explicitly anyways - the generic fallback will take care of every > > arch, except SPARC/x86 which implement atomic_spin_nop wth pause like > > semantics, but even they don't need to change at all if we implement new > > API atomic_load_and_spin_if_cond_whatever () in terms of existing > > atomic_spin_nop () > > Ya, sounds about right. > > IIRC I just hooked some of the LLL macros when doing the POC/estimates, but > it was at least a year ago so I forget exactly how it all fit together. > Whatever it is, they end up with basically the same > load->cond->lr->beq->{wrs,sc} patterns as a bunch of the Linux routines > should have (sort of a test-and-test-and-set type pattern, or how arm64 does > the load-and-cmpxchg routines). > > Adding Charlie and Drew, as we were talking about the Linux side of things > recently. I've just now posted another version [1]. [1] https://lore.kernel.org/all/20240419135321.70781-8-ajones@ventanamicro.com/ Thanks, drew ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-04-19 14:09 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-04-18 9:46 [RFC PATCH 0/3] RISC-V: Use WRS.STO for atomic_spin_nop Christoph Müllner 2024-04-18 9:46 ` [RFC PATCH 1/3] RISC-V: Sync hwprobe: Sync extension bits with Linux 6.8 Christoph Müllner 2024-04-18 9:46 ` [RFC PATCH 2/3] RISC-V: hwprobe: Add Zawrs test bit Christoph Müllner 2024-04-18 9:46 ` [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs Christoph Müllner 2024-04-18 17:17 ` Palmer Dabbelt 2024-04-18 20:03 ` Vineet Gupta 2024-04-18 20:25 ` Christoph Müllner 2024-04-18 20:19 ` Christoph Müllner 2024-04-18 20:36 ` Vineet Gupta 2024-04-18 21:10 ` Palmer Dabbelt 2024-04-19 14:09 ` Andrew Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).