* [PATCH v4 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface @ 2023-07-06 19:29 Evan Green 2023-07-06 19:29 ` [PATCH v4 1/3] riscv: Add Linux hwprobe syscall support Evan Green ` (3 more replies) 0 siblings, 4 replies; 17+ messages in thread From: Evan Green @ 2023-07-06 19:29 UTC (permalink / raw) To: libc-alpha; +Cc: palmer, slewis, vineetg, Florian Weimer, Evan Green This series illustrates the use of a recently accepted Linux syscall that enumerates architectural information about the RISC-V cores the system is running on. In this series we expose a small wrapper function around the syscall. An ifunc selector for memcpy queries it to see if unaligned access is "fast" on this hardware. If it is, it selects a newly provided implementation of memcpy that doesn't work hard at aligning the src and destination buffers. I opted to spin the whole series, though it's perfectly safe to take just the first two patches for the hwprobe interface and abandon the third patch as a separate issue. Performance numbers were compared using a small test program [1], run on a D1 Nezha board, which supports fast unaligned access. "Fast" here means copying unaligned words is faster than copying byte-wise, but still slower than copying aligned words. Here's the speed of various memcpy()s with the generic implementation: memcpy size 1 count 1000000 offset 0 took 109564 us memcpy size 3 count 1000000 offset 0 took 138425 us memcpy size 4 count 1000000 offset 0 took 148374 us memcpy size 7 count 1000000 offset 0 took 178433 us memcpy size 8 count 1000000 offset 0 took 188430 us memcpy size f count 1000000 offset 0 took 266118 us memcpy size f count 1000000 offset 1 took 265940 us memcpy size f count 1000000 offset 3 took 265934 us memcpy size f count 1000000 offset 7 took 266215 us memcpy size f count 1000000 offset 8 took 265954 us memcpy size f count 1000000 offset 9 took 265886 us memcpy size 10 count 1000000 offset 0 took 195308 us memcpy size 11 count 1000000 offset 0 took 205161 us memcpy size 17 count 1000000 offset 0 took 274376 us memcpy size 18 count 1000000 offset 0 took 199188 us memcpy size 19 count 1000000 offset 0 took 209258 us memcpy size 1f count 1000000 offset 0 took 278263 us memcpy size 20 count 1000000 offset 0 took 207364 us memcpy size 21 count 1000000 offset 0 took 217143 us memcpy size 3f count 1000000 offset 0 took 300023 us memcpy size 40 count 1000000 offset 0 took 231063 us memcpy size 41 count 1000000 offset 0 took 241259 us memcpy size 7c count 100000 offset 0 took 32807 us memcpy size 7f count 100000 offset 0 took 36274 us memcpy size ff count 100000 offset 0 took 47818 us memcpy size ff count 100000 offset 0 took 47932 us memcpy size 100 count 100000 offset 0 took 40468 us memcpy size 200 count 100000 offset 0 took 64245 us memcpy size 27f count 100000 offset 0 took 82549 us memcpy size 400 count 100000 offset 0 took 111254 us memcpy size 407 count 100000 offset 0 took 119364 us memcpy size 800 count 100000 offset 0 took 203899 us memcpy size 87f count 100000 offset 0 took 222465 us memcpy size 87f count 100000 offset 3 took 222289 us memcpy size 1000 count 100000 offset 0 took 388846 us memcpy size 1000 count 100000 offset 1 took 468827 us memcpy size 1000 count 100000 offset 3 took 397098 us memcpy size 1000 count 100000 offset 4 took 397379 us memcpy size 1000 count 100000 offset 5 took 397368 us memcpy size 1000 count 100000 offset 7 took 396867 us memcpy size 1000 count 100000 offset 8 took 389227 us memcpy size 1000 count 100000 offset 9 took 395949 us memcpy size 3000 count 50000 offset 0 took 674837 us memcpy size 3000 count 50000 offset 1 took 676944 us memcpy size 3000 count 50000 offset 3 took 679709 us memcpy size 3000 count 50000 offset 4 took 680829 us memcpy size 3000 count 50000 offset 5 took 678024 us memcpy size 3000 count 50000 offset 7 took 681097 us memcpy size 3000 count 50000 offset 8 took 670004 us memcpy size 3000 count 50000 offset 9 took 674553 us Here is that same test run with the assembly memcpy() in this series: memcpy size 1 count 1000000 offset 0 took 92703 us memcpy size 3 count 1000000 offset 0 took 112527 us memcpy size 4 count 1000000 offset 0 took 120481 us memcpy size 7 count 1000000 offset 0 took 149558 us memcpy size 8 count 1000000 offset 0 took 90617 us memcpy size f count 1000000 offset 0 took 174373 us memcpy size f count 1000000 offset 1 took 178615 us memcpy size f count 1000000 offset 3 took 178845 us memcpy size f count 1000000 offset 7 took 178636 us memcpy size f count 1000000 offset 8 took 174442 us memcpy size f count 1000000 offset 9 took 178660 us memcpy size 10 count 1000000 offset 0 took 99845 us memcpy size 11 count 1000000 offset 0 took 112522 us memcpy size 17 count 1000000 offset 0 took 179735 us memcpy size 18 count 1000000 offset 0 took 110870 us memcpy size 19 count 1000000 offset 0 took 121472 us memcpy size 1f count 1000000 offset 0 took 188231 us memcpy size 20 count 1000000 offset 0 took 119571 us memcpy size 21 count 1000000 offset 0 took 132429 us memcpy size 3f count 1000000 offset 0 took 227021 us memcpy size 40 count 1000000 offset 0 took 166416 us memcpy size 41 count 1000000 offset 0 took 180206 us memcpy size 7c count 100000 offset 0 took 28602 us memcpy size 7f count 100000 offset 0 took 31676 us memcpy size ff count 100000 offset 0 took 39257 us memcpy size ff count 100000 offset 0 took 39176 us memcpy size 100 count 100000 offset 0 took 21928 us memcpy size 200 count 100000 offset 0 took 35814 us memcpy size 27f count 100000 offset 0 took 60315 us memcpy size 400 count 100000 offset 0 took 63652 us memcpy size 407 count 100000 offset 0 took 73160 us memcpy size 800 count 100000 offset 0 took 121532 us memcpy size 87f count 100000 offset 0 took 147269 us memcpy size 87f count 100000 offset 3 took 144744 us memcpy size 1000 count 100000 offset 0 took 232057 us memcpy size 1000 count 100000 offset 1 took 254319 us memcpy size 1000 count 100000 offset 3 took 256973 us memcpy size 1000 count 100000 offset 4 took 257655 us memcpy size 1000 count 100000 offset 5 took 259456 us memcpy size 1000 count 100000 offset 7 took 260849 us memcpy size 1000 count 100000 offset 8 took 232347 us memcpy size 1000 count 100000 offset 9 took 254330 us memcpy size 3000 count 50000 offset 0 took 382376 us memcpy size 3000 count 50000 offset 1 took 389872 us memcpy size 3000 count 50000 offset 3 took 385310 us memcpy size 3000 count 50000 offset 4 took 389748 us memcpy size 3000 count 50000 offset 5 took 391707 us memcpy size 3000 count 50000 offset 7 took 386778 us memcpy size 3000 count 50000 offset 8 took 385691 us memcpy size 3000 count 50000 offset 9 took 392030 us The assembly routine is measurably better. [1] https://pastebin.com/DRyECNQW Changes in v4: - Remove __USE_GNU (Florian) - __nonnull, __wur, __THROW, and __fortified_attr_access decorations (Florian) - change long to long int (Florian) - Fix comment formatting (Florian) - Update backup kernel header content copy. - Fix function declaration formatting (Florian) - Changed export versions to 2.38 - Fixed comment style (Florian) Changes in v3: - Update argument types to match v4 kernel interface - Add the "return" to the vsyscall - Fix up vdso arg types to match kernel v4 version - Remove ifdef around INLINE_VSYSCALL (Adhemerval) - Word align dest for large memcpy()s. - Add tags - Remove spurious blank line from sysdeps/riscv/memcpy.c Changes in v2: - hwprobe.h: Use __has_include and duplicate Linux content to make compilation work when Linux headers are absent (Adhemerval) - hwprobe.h: Put declaration under __USE_GNU (Adhemerval) - Use INLINE_SYSCALL_CALL (Adhemerval) - Update versions - Update UNALIGNED_MASK to match kernel v3 series. - Add vDSO interface - Used _MASK instead of _FAST value itself. Evan Green (3): riscv: Add Linux hwprobe syscall support riscv: Add hwprobe vdso call support riscv: Add and use alignment-ignorant memcpy sysdeps/riscv/memcopy.h | 26 ++++ sysdeps/riscv/memcpy.c | 64 +++++++++ sysdeps/riscv/memcpy_noalignment.S | 121 ++++++++++++++++++ sysdeps/unix/sysv/linux/dl-vdso-setup.c | 10 ++ sysdeps/unix/sysv/linux/dl-vdso-setup.h | 3 + sysdeps/unix/sysv/linux/riscv/Makefile | 8 +- sysdeps/unix/sysv/linux/riscv/Versions | 3 + sysdeps/unix/sysv/linux/riscv/hwprobe.c | 31 +++++ .../unix/sysv/linux/riscv/memcpy-generic.c | 24 ++++ .../unix/sysv/linux/riscv/rv32/libc.abilist | 1 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 1 + sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 72 +++++++++++ sysdeps/unix/sysv/linux/riscv/sysdep.h | 1 + 13 files changed, 363 insertions(+), 2 deletions(-) create mode 100644 sysdeps/riscv/memcopy.h create mode 100644 sysdeps/riscv/memcpy.c create mode 100644 sysdeps/riscv/memcpy_noalignment.S create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h -- 2.34.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v4 1/3] riscv: Add Linux hwprobe syscall support 2023-07-06 19:29 [PATCH v4 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green @ 2023-07-06 19:29 ` Evan Green 2023-07-07 8:15 ` Florian Weimer 2023-07-06 19:29 ` [PATCH v4 2/3] riscv: Add hwprobe vdso call support Evan Green ` (2 subsequent siblings) 3 siblings, 1 reply; 17+ messages in thread From: Evan Green @ 2023-07-06 19:29 UTC (permalink / raw) To: libc-alpha; +Cc: palmer, slewis, vineetg, Florian Weimer, Evan Green Add awareness and a thin wrapper function around a new Linux system call that allows callers to get architecture and microarchitecture information about the CPUs from the kernel. This can be used to do things like dynamically choose a memcpy implementation. Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> --- Changes in v4: - Remove __USE_GNU (Florian) - __nonnull, __wur, __THROW, and __fortified_attr_access decorations (Florian) - change long to long int (Florian) - Fix comment formatting (Florian) - Update backup kernel header content copy. - Fix function declaration formatting (Florian) - Changed export versions to 2.38 Changes in v3: - Update argument types to match v4 kernel interface Changes in v2: - hwprobe.h: Use __has_include and duplicate Linux content to make compilation work when Linux headers are absent (Adhemerval) - hwprobe.h: Put declaration under __USE_GNU (Adhemerval) - Use INLINE_SYSCALL_CALL (Adhemerval) - Update versions - Update UNALIGNED_MASK to match kernel v3 series. sysdeps/unix/sysv/linux/riscv/Makefile | 4 +- sysdeps/unix/sysv/linux/riscv/Versions | 3 + sysdeps/unix/sysv/linux/riscv/hwprobe.c | 30 ++++++++ .../unix/sysv/linux/riscv/rv32/libc.abilist | 1 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 1 + sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 72 +++++++++++++++++++ 6 files changed, 109 insertions(+), 2 deletions(-) create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile index 4b6eacb32f..45cc29e40d 100644 --- a/sysdeps/unix/sysv/linux/riscv/Makefile +++ b/sysdeps/unix/sysv/linux/riscv/Makefile @@ -1,6 +1,6 @@ ifeq ($(subdir),misc) -sysdep_headers += sys/cachectl.h -sysdep_routines += flush-icache +sysdep_headers += sys/cachectl.h sys/hwprobe.h +sysdep_routines += flush-icache hwprobe endif ifeq ($(subdir),stdlib) diff --git a/sysdeps/unix/sysv/linux/riscv/Versions b/sysdeps/unix/sysv/linux/riscv/Versions index 5625d2a0b8..0c4016382d 100644 --- a/sysdeps/unix/sysv/linux/riscv/Versions +++ b/sysdeps/unix/sysv/linux/riscv/Versions @@ -8,4 +8,7 @@ libc { GLIBC_2.27 { __riscv_flush_icache; } + GLIBC_2.38 { + __riscv_hwprobe; + } } diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c new file mode 100644 index 0000000000..a8a14d29a5 --- /dev/null +++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c @@ -0,0 +1,30 @@ +/* RISC-V hardware feature probing support on Linux + Copyright (C) 2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <sys/syscall.h> +#include <sys/hwprobe.h> +#include <sysdep.h> + +int __riscv_hwprobe (struct riscv_hwprobe *pairs, size_t pair_count, + size_t cpu_count, unsigned long int *cpus, + unsigned int flags) +{ + return INLINE_SYSCALL_CALL (riscv_hwprobe, pairs, pair_count, + cpu_count, cpus, flags); +} diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist index b9740a1afc..8fab4a606f 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist @@ -2436,3 +2436,4 @@ GLIBC_2.38 strlcat F GLIBC_2.38 strlcpy F GLIBC_2.38 wcslcat F GLIBC_2.38 wcslcpy F +GLIBC_2.38 __riscv_hwprobe F diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist index e3b4656aa2..1ebb91deed 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist @@ -2636,3 +2636,4 @@ GLIBC_2.38 strlcat F GLIBC_2.38 strlcpy F GLIBC_2.38 wcslcat F GLIBC_2.38 wcslcpy F +GLIBC_2.38 __riscv_hwprobe F diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h new file mode 100644 index 0000000000..b27af5cb07 --- /dev/null +++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h @@ -0,0 +1,72 @@ +/* RISC-V architecture probe interface + Copyright (C) 2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + <https://www.gnu.org/licenses/>. */ + +#ifndef _SYS_HWPROBE_H +#define _SYS_HWPROBE_H 1 + +#include <features.h> +#include <stddef.h> +#ifdef __has_include +# if __has_include (<asm/hwprobe.h>) +# include <asm/hwprobe.h> +# endif +#endif + +/* Define a (probably stale) version of the interface if the Linux headers + aren't present. */ +#ifndef RISCV_HWPROBE_KEY_MVENDORID +struct riscv_hwprobe { + signed long long int key; + unsigned long long int value; +}; + +#define RISCV_HWPROBE_KEY_MVENDORID 0 +#define RISCV_HWPROBE_KEY_MARCHID 1 +#define RISCV_HWPROBE_KEY_MIMPID 2 +#define RISCV_HWPROBE_KEY_BASE_BEHAVIOR 3 +#define RISCV_HWPROBE_BASE_BEHAVIOR_IMA (1 << 0) +#define RISCV_HWPROBE_KEY_IMA_EXT_0 4 +#define RISCV_HWPROBE_IMA_FD (1 << 0) +#define RISCV_HWPROBE_IMA_C (1 << 1) +#define RISCV_HWPROBE_IMA_V (1 << 2) +#define RISCV_HWPROBE_EXT_ZBA (1 << 3) +#define RISCV_HWPROBE_EXT_ZBB (1 << 4) +#define RISCV_HWPROBE_EXT_ZBS (1 << 5) +#define RISCV_HWPROBE_KEY_CPUPERF_0 5 +#define RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0) +#define RISCV_HWPROBE_MISALIGNED_EMULATED (1 << 0) +#define RISCV_HWPROBE_MISALIGNED_SLOW (2 << 0) +#define RISCV_HWPROBE_MISALIGNED_FAST (3 << 0) +#define RISCV_HWPROBE_MISALIGNED_UNSUPPORTED (4 << 0) +#define RISCV_HWPROBE_MISALIGNED_MASK (7 << 0) + +#endif /* RISCV_HWPROBE_KEY_MVENDORID */ + +__BEGIN_DECLS + +extern int __riscv_hwprobe (struct riscv_hwprobe *pairs, size_t pair_count, + size_t cpu_count, unsigned long int *cpus, + unsigned int flags) + __THROW __nonnull ((1)) __wur + __fortified_attr_access (__read_write__, 1, 2) + __fortified_attr_access (__read_only__, 4, 3); + +__END_DECLS + +#endif /* sys/hwprobe.h */ -- 2.34.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/3] riscv: Add Linux hwprobe syscall support 2023-07-06 19:29 ` [PATCH v4 1/3] riscv: Add Linux hwprobe syscall support Evan Green @ 2023-07-07 8:15 ` Florian Weimer 2023-07-07 22:10 ` Evan Green 0 siblings, 1 reply; 17+ messages in thread From: Florian Weimer @ 2023-07-07 8:15 UTC (permalink / raw) To: Evan Green; +Cc: libc-alpha, palmer, slewis, vineetg * Evan Green: > Add awareness and a thin wrapper function around a new Linux system call > that allows callers to get architecture and microarchitecture > information about the CPUs from the kernel. This can be used to > do things like dynamically choose a memcpy implementation. I missed before that you intend this for use in IFUNC resolvers, or at the very least I think I forgot to raise this caveat. RISC-V is not a HIDDEN_VAR_NEEDS_DYNAMIC_RELOC target, so this is not completely impossible, but in general, extern function calls in IFUNC resolvers tend to not work well. The issue is that the GOT pointer for a function like __riscv_hwprobe may not have been set up when the dynamic linker invokes the IFUNC resolver. There are several ways to solve this. You could pass the function pointer to the IFUNC resolver (which may require a marker symbol and GCC changes). Or you could a hidden wrapper function to libc_nonshared.a that checks if the function pointer to the libc implementation has been set up by a relocation and uses that, and falls back to a direct system call otherwise. Thanks, Florian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/3] riscv: Add Linux hwprobe syscall support 2023-07-07 8:15 ` Florian Weimer @ 2023-07-07 22:10 ` Evan Green 2023-07-10 9:17 ` Florian Weimer 0 siblings, 1 reply; 17+ messages in thread From: Evan Green @ 2023-07-07 22:10 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha, palmer, slewis, vineetg On Fri, Jul 7, 2023 at 1:16 AM Florian Weimer <fweimer@redhat.com> wrote: > > * Evan Green: > > > Add awareness and a thin wrapper function around a new Linux system call > > that allows callers to get architecture and microarchitecture > > information about the CPUs from the kernel. This can be used to > > do things like dynamically choose a memcpy implementation. > > I missed before that you intend this for use in IFUNC resolvers, or at > the very least I think I forgot to raise this caveat. > > RISC-V is not a HIDDEN_VAR_NEEDS_DYNAMIC_RELOC target, so this is not > completely impossible, but in general, extern function calls in IFUNC > resolvers tend to not work well. The issue is that the GOT pointer for > a function like __riscv_hwprobe may not have been set up when the > dynamic linker invokes the IFUNC resolver. There are several ways to > solve this. You could pass the function pointer to the IFUNC resolver > (which may require a marker symbol and GCC changes). Or you could a > hidden wrapper function to libc_nonshared.a that checks if the function > pointer to the libc implementation has been set up by a relocation and > uses that, and falls back to a direct system call otherwise. That makes sense, but then I'm confused about how it's been working in my testing. What experiment should I try to see this problem in action? An early constructor maybe? Could I alternatively convert the implementation of the external function into one that calls an internal helper, and then just call the internal helper from the ifunc resolver (oh, I've reinvented a protected symbol)? Or is that a violation of some rule? Are there any examples of that last suggestion I can refer to? Specifically the part about checking if the symbol has been relocated yet or not. -Evan ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/3] riscv: Add Linux hwprobe syscall support 2023-07-07 22:10 ` Evan Green @ 2023-07-10 9:17 ` Florian Weimer 2023-07-11 17:08 ` Evan Green 0 siblings, 1 reply; 17+ messages in thread From: Florian Weimer @ 2023-07-10 9:17 UTC (permalink / raw) To: Evan Green; +Cc: libc-alpha, palmer, slewis, vineetg * Evan Green: > On Fri, Jul 7, 2023 at 1:16 AM Florian Weimer <fweimer@redhat.com> wrote: >> >> * Evan Green: >> >> > Add awareness and a thin wrapper function around a new Linux system call >> > that allows callers to get architecture and microarchitecture >> > information about the CPUs from the kernel. This can be used to >> > do things like dynamically choose a memcpy implementation. >> >> I missed before that you intend this for use in IFUNC resolvers, or at >> the very least I think I forgot to raise this caveat. >> >> RISC-V is not a HIDDEN_VAR_NEEDS_DYNAMIC_RELOC target, so this is not >> completely impossible, but in general, extern function calls in IFUNC >> resolvers tend to not work well. The issue is that the GOT pointer for >> a function like __riscv_hwprobe may not have been set up when the >> dynamic linker invokes the IFUNC resolver. There are several ways to >> solve this. You could pass the function pointer to the IFUNC resolver >> (which may require a marker symbol and GCC changes). Or you could a >> hidden wrapper function to libc_nonshared.a that checks if the function >> pointer to the libc implementation has been set up by a relocation and >> uses that, and falls back to a direct system call otherwise. > > That makes sense, but then I'm confused about how it's been working in > my testing. What experiment should I try to see this problem in > action? An early constructor maybe? This script should reproduce it: cat >libifunc.c <<'EOF' #include <dlfcn.h> typedef void *(*malloc_fptr) (size_t); static malloc_fptr malloc_address (void) { malloc_fptr result = dlsym (RTLD_NEXT, "malloc"); asm (""); /* Prevent tail call. */ return result; } static void * malloc_impl_indirect (size_t size) { return malloc_address () (size); } static void * malloc_resolve (void) { #ifdef DIRECT return malloc_address (); #else return &malloc_impl_indirect; #endif } void *malloc (size_t) __attribute__ ((ifunc ("malloc_resolve"))); EOF gcc -DDIRECT -O2 -g -Wl,-z,now -shared -fPIC -o libifunc.so libifunc.c LD_PRELOAD=./libifunc.so /bin/true --help The problem here is that dlsym is called during relocation, and that function pointer has not been set up yet. It shouldn't be too hard to come up with a variant that uses __riscv_hwprobe instead. > Could I alternatively convert the implementation of the external > function into one that calls an internal helper, and then just call > the internal helper from the ifunc resolver (oh, I've reinvented a > protected symbol)? Or is that a violation of some rule? It depends on where the internal helper is located. The issue manifests in user-compiled code, not directly in glibc, so there is no straight fix for this in glibc. > Are there any examples of that last suggestion I can refer to? > Specifically the part about checking if the symbol has been relocated > yet or not. You can load the symbol value using assembler code, so that you can check if it is null, or perhaps you can make it weak (so that GCC treats NULL as a valid value). Thanks, Florian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/3] riscv: Add Linux hwprobe syscall support 2023-07-10 9:17 ` Florian Weimer @ 2023-07-11 17:08 ` Evan Green 0 siblings, 0 replies; 17+ messages in thread From: Evan Green @ 2023-07-11 17:08 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha, palmer, slewis, vineetg On Mon, Jul 10, 2023 at 2:17 AM Florian Weimer <fweimer@redhat.com> wrote: > > * Evan Green: > > > On Fri, Jul 7, 2023 at 1:16 AM Florian Weimer <fweimer@redhat.com> wrote: > >> > >> * Evan Green: > >> > >> > Add awareness and a thin wrapper function around a new Linux system call > >> > that allows callers to get architecture and microarchitecture > >> > information about the CPUs from the kernel. This can be used to > >> > do things like dynamically choose a memcpy implementation. > >> > >> I missed before that you intend this for use in IFUNC resolvers, or at > >> the very least I think I forgot to raise this caveat. > >> > >> RISC-V is not a HIDDEN_VAR_NEEDS_DYNAMIC_RELOC target, so this is not > >> completely impossible, but in general, extern function calls in IFUNC > >> resolvers tend to not work well. The issue is that the GOT pointer for > >> a function like __riscv_hwprobe may not have been set up when the > >> dynamic linker invokes the IFUNC resolver. There are several ways to > >> solve this. You could pass the function pointer to the IFUNC resolver > >> (which may require a marker symbol and GCC changes). Or you could a > >> hidden wrapper function to libc_nonshared.a that checks if the function > >> pointer to the libc implementation has been set up by a relocation and > >> uses that, and falls back to a direct system call otherwise. > > > > That makes sense, but then I'm confused about how it's been working in > > my testing. What experiment should I try to see this problem in > > action? An early constructor maybe? > > This script should reproduce it: > > cat >libifunc.c <<'EOF' > #include <dlfcn.h> > > typedef void *(*malloc_fptr) (size_t); > > static malloc_fptr > malloc_address (void) > { > malloc_fptr result = dlsym (RTLD_NEXT, "malloc"); > asm (""); /* Prevent tail call. */ > return result; > } > > > static void * > malloc_impl_indirect (size_t size) > { > return malloc_address () (size); > } > > static void * > malloc_resolve (void) > { > #ifdef DIRECT > return malloc_address (); > #else > return &malloc_impl_indirect; > #endif > } > > void *malloc (size_t) __attribute__ ((ifunc ("malloc_resolve"))); > EOF > gcc -DDIRECT -O2 -g -Wl,-z,now -shared -fPIC -o libifunc.so libifunc.c > LD_PRELOAD=./libifunc.so /bin/true --help > > The problem here is that dlsym is called during relocation, and that > function pointer has not been set up yet. > > It shouldn't be too hard to come up with a variant that uses > __riscv_hwprobe instead. > > > Could I alternatively convert the implementation of the external > > function into one that calls an internal helper, and then just call > > the internal helper from the ifunc resolver (oh, I've reinvented a > > protected symbol)? Or is that a violation of some rule? > > It depends on where the internal helper is located. The issue manifests > in user-compiled code, not directly in glibc, so there is no straight > fix for this in glibc. > > > Are there any examples of that last suggestion I can refer to? > > Specifically the part about checking if the symbol has been relocated > > yet or not. > > You can load the symbol value using assembler code, so that you can > check if it is null, or perhaps you can make it weak (so that GCC treats > NULL as a valid value). > Thank you for the example, that helped a lot. I had mistakenly thought you were pointing to a case where my memcpy ifunc selector broke down, but now I realize you're talking about future ifunc selectors that exist in applications and other libraries. I've got something now that creates a weak alias, __riscv_hwprobe_weak, as well as a statically compiled __riscv_hwprobe_early() that either routes through the weak function or makes the syscall directly as you suggested. I was able to adapt your example to verify it fixes the crash. I'll send that out shortly, included in a respin of this series. -Evan ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v4 2/3] riscv: Add hwprobe vdso call support 2023-07-06 19:29 [PATCH v4 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green 2023-07-06 19:29 ` [PATCH v4 1/3] riscv: Add Linux hwprobe syscall support Evan Green @ 2023-07-06 19:29 ` Evan Green 2023-07-06 19:29 ` [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy Evan Green 2023-07-06 20:11 ` [PATCH v4 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Palmer Dabbelt 3 siblings, 0 replies; 17+ messages in thread From: Evan Green @ 2023-07-06 19:29 UTC (permalink / raw) To: libc-alpha; +Cc: palmer, slewis, vineetg, Florian Weimer, Evan Green The new riscv_hwprobe syscall also comes with a vDSO for faster answers to your most common questions. Call in today to speak with a kernel representative near you! Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> --- (no changes since v3) Changes in v3: - Add the "return" to the vsyscall - Fix up vdso arg types to match kernel v4 version - Remove ifdef around INLINE_VSYSCALL (Adhemerval) Changes in v2: - Add vDSO interface sysdeps/unix/sysv/linux/dl-vdso-setup.c | 10 ++++++++++ sysdeps/unix/sysv/linux/dl-vdso-setup.h | 3 +++ sysdeps/unix/sysv/linux/riscv/hwprobe.c | 5 +++-- sysdeps/unix/sysv/linux/riscv/sysdep.h | 1 + 4 files changed, 17 insertions(+), 2 deletions(-) diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.c b/sysdeps/unix/sysv/linux/dl-vdso-setup.c index 97eaaeac37..ed8b1ef426 100644 --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.c +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.c @@ -71,6 +71,16 @@ PROCINFO_CLASS int (*_dl_vdso_clock_getres_time64) (clockid_t, # ifdef HAVE_GET_TBFREQ PROCINFO_CLASS uint64_t (*_dl_vdso_get_tbfreq)(void) RELRO; # endif + +/* RISC-V specific ones. */ +# ifdef HAVE_RISCV_HWPROBE +PROCINFO_CLASS int (*_dl_vdso_riscv_hwprobe)(void *, + size_t, + size_t, + unsigned long *, + unsigned int) RELRO; +# endif + #endif #undef RELRO diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.h b/sysdeps/unix/sysv/linux/dl-vdso-setup.h index 867072b897..39eafd5316 100644 --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.h +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.h @@ -47,6 +47,9 @@ setup_vdso_pointers (void) #ifdef HAVE_GET_TBFREQ GLRO(dl_vdso_get_tbfreq) = dl_vdso_vsym (HAVE_GET_TBFREQ); #endif +#ifdef HAVE_RISCV_HWPROBE + GLRO(dl_vdso_riscv_hwprobe) = dl_vdso_vsym (HAVE_RISCV_HWPROBE); +#endif } #endif diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c index a8a14d29a5..14f7136998 100644 --- a/sysdeps/unix/sysv/linux/riscv/hwprobe.c +++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c @@ -20,11 +20,12 @@ #include <sys/syscall.h> #include <sys/hwprobe.h> #include <sysdep.h> +#include <sysdep-vdso.h> int __riscv_hwprobe (struct riscv_hwprobe *pairs, size_t pair_count, size_t cpu_count, unsigned long int *cpus, unsigned int flags) { - return INLINE_SYSCALL_CALL (riscv_hwprobe, pairs, pair_count, - cpu_count, cpus, flags); + /* The vDSO may be able to provide the answer without a syscall. */ + return INLINE_VSYSCALL(riscv_hwprobe, 5, pairs, pair_count, cpu_count, cpus, flags); } diff --git a/sysdeps/unix/sysv/linux/riscv/sysdep.h b/sysdeps/unix/sysv/linux/riscv/sysdep.h index 5583b96d23..ee015dfeb6 100644 --- a/sysdeps/unix/sysv/linux/riscv/sysdep.h +++ b/sysdeps/unix/sysv/linux/riscv/sysdep.h @@ -156,6 +156,7 @@ /* List of system calls which are supported as vsyscalls (for RV32 and RV64). */ # define HAVE_GETCPU_VSYSCALL "__vdso_getcpu" +# define HAVE_RISCV_HWPROBE "__vdso_riscv_hwprobe" # undef HAVE_INTERNAL_BRK_ADDR_SYMBOL # define HAVE_INTERNAL_BRK_ADDR_SYMBOL 1 -- 2.34.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy 2023-07-06 19:29 [PATCH v4 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green 2023-07-06 19:29 ` [PATCH v4 1/3] riscv: Add Linux hwprobe syscall support Evan Green 2023-07-06 19:29 ` [PATCH v4 2/3] riscv: Add hwprobe vdso call support Evan Green @ 2023-07-06 19:29 ` Evan Green 2023-07-07 9:22 ` Richard Henderson 2023-07-08 2:16 ` Stefan O'Rear 2023-07-06 20:11 ` [PATCH v4 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Palmer Dabbelt 3 siblings, 2 replies; 17+ messages in thread From: Evan Green @ 2023-07-06 19:29 UTC (permalink / raw) To: libc-alpha; +Cc: palmer, slewis, vineetg, Florian Weimer, Evan Green For CPU implementations that can perform unaligned accesses with little or no performance penalty, create a memcpy implementation that does not bother aligning buffers. It will use a block of integer registers, a single integer register, and fall back to bytewise copy for the remainder. Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> --- Changes in v4: - Fixed comment style (Florian) Changes in v3: - Word align dest for large memcpy()s. - Add tags - Remove spurious blank line from sysdeps/riscv/memcpy.c Changes in v2: - Used _MASK instead of _FAST value itself. --- sysdeps/riscv/memcopy.h | 26 ++++ sysdeps/riscv/memcpy.c | 64 +++++++++ sysdeps/riscv/memcpy_noalignment.S | 121 ++++++++++++++++++ sysdeps/unix/sysv/linux/riscv/Makefile | 4 + .../unix/sysv/linux/riscv/memcpy-generic.c | 24 ++++ 5 files changed, 239 insertions(+) create mode 100644 sysdeps/riscv/memcopy.h create mode 100644 sysdeps/riscv/memcpy.c create mode 100644 sysdeps/riscv/memcpy_noalignment.S create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c diff --git a/sysdeps/riscv/memcopy.h b/sysdeps/riscv/memcopy.h new file mode 100644 index 0000000000..2b685c8aa0 --- /dev/null +++ b/sysdeps/riscv/memcopy.h @@ -0,0 +1,26 @@ +/* memcopy.h -- definitions for memory copy functions. RISC-V version. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <sysdeps/generic/memcopy.h> + +/* Redefine the generic memcpy implementation to __memcpy_generic, so + the memcpy ifunc can select between generic and special versions. + In rtld, don't bother with all the ifunciness. */ +#if IS_IN (libc) +#define MEMCPY __memcpy_generic +#endif diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/riscv/memcpy.c new file mode 100644 index 0000000000..fdb8dc3208 --- /dev/null +++ b/sysdeps/riscv/memcpy.c @@ -0,0 +1,64 @@ +/* Multiple versions of memcpy. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2017-2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#if IS_IN (libc) +/* Redefine memcpy so that the compiler won't complain about the type + mismatch with the IFUNC selector in strong_alias, below. */ +# undef memcpy +# define memcpy __redirect_memcpy +# include <string.h> +#include <ifunc-init.h> +#include <sys/hwprobe.h> + +#define INIT_ARCH() + +extern __typeof (__redirect_memcpy) __libc_memcpy; + +extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden; +extern __typeof (__redirect_memcpy) __memcpy_noalignment attribute_hidden; + +static inline __typeof (__redirect_memcpy) * +select_memcpy_ifunc (void) +{ + INIT_ARCH (); + + struct riscv_hwprobe pair; + + pair.key = RISCV_HWPROBE_KEY_CPUPERF_0; + if (__riscv_hwprobe(&pair, 1, 0, NULL, 0) != 0) + return __memcpy_generic; + + if ((pair.key > 0) && + (pair.value & RISCV_HWPROBE_MISALIGNED_MASK) == + RISCV_HWPROBE_MISALIGNED_FAST) + return __memcpy_noalignment; + + return __memcpy_generic; +} + +libc_ifunc (__libc_memcpy, select_memcpy_ifunc ()); + +# undef memcpy +strong_alias (__libc_memcpy, memcpy); +# ifdef SHARED +__hidden_ver1 (memcpy, __GI_memcpy, __redirect_memcpy) + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memcpy); +# endif + +#endif diff --git a/sysdeps/riscv/memcpy_noalignment.S b/sysdeps/riscv/memcpy_noalignment.S new file mode 100644 index 0000000000..80f5e09ebb --- /dev/null +++ b/sysdeps/riscv/memcpy_noalignment.S @@ -0,0 +1,121 @@ +/* memcpy for RISC-V, ignoring buffer alignment + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + <https://www.gnu.org/licenses/>. */ + +#include <sysdep.h> +#include <sys/asm.h> + +/* void *memcpy(void *, const void *, size_t) */ +ENTRY (__memcpy_noalignment) + move t6, a0 /* Preserve return value */ + + /* Round down to the nearest "page" size */ + andi a4, a2, ~((16*SZREG)-1) + beqz a4, 2f + add a3, a1, a4 + + /* Copy the first word to get dest word aligned */ + andi a5, t6, SZREG-1 + beqz a5, 1f + REG_L a6, (a1) + REG_S a6, (t6) + + /* Align dst up to a word, move src and size as well. */ + addi t6, t6, SZREG-1 + andi t6, t6, ~(SZREG-1) + sub a5, t6, a0 + add a1, a1, a5 + sub a2, a2, a5 + + /* Recompute page count */ + andi a4, a2, ~((16*SZREG)-1) + beqz a4, 2f + +1: + /* Copy "pages" (chunks of 16 registers) */ + REG_L a4, 0(a1) + REG_L a5, SZREG(a1) + REG_L a6, 2*SZREG(a1) + REG_L a7, 3*SZREG(a1) + REG_L t0, 4*SZREG(a1) + REG_L t1, 5*SZREG(a1) + REG_L t2, 6*SZREG(a1) + REG_L t3, 7*SZREG(a1) + REG_L t4, 8*SZREG(a1) + REG_L t5, 9*SZREG(a1) + REG_S a4, 0(t6) + REG_S a5, SZREG(t6) + REG_S a6, 2*SZREG(t6) + REG_S a7, 3*SZREG(t6) + REG_S t0, 4*SZREG(t6) + REG_S t1, 5*SZREG(t6) + REG_S t2, 6*SZREG(t6) + REG_S t3, 7*SZREG(t6) + REG_S t4, 8*SZREG(t6) + REG_S t5, 9*SZREG(t6) + REG_L a4, 10*SZREG(a1) + REG_L a5, 11*SZREG(a1) + REG_L a6, 12*SZREG(a1) + REG_L a7, 13*SZREG(a1) + REG_L t0, 14*SZREG(a1) + REG_L t1, 15*SZREG(a1) + addi a1, a1, 16*SZREG + REG_S a4, 10*SZREG(t6) + REG_S a5, 11*SZREG(t6) + REG_S a6, 12*SZREG(t6) + REG_S a7, 13*SZREG(t6) + REG_S t0, 14*SZREG(t6) + REG_S t1, 15*SZREG(t6) + addi t6, t6, 16*SZREG + bltu a1, a3, 1b + andi a2, a2, (16*SZREG)-1 /* Update count */ + +2: + /* Remainder is smaller than a page, compute native word count */ + beqz a2, 6f + andi a5, a2, ~(SZREG-1) + andi a2, a2, (SZREG-1) + add a3, a1, a5 + /* Jump directly to byte copy if no words. */ + beqz a5, 4f + +3: + /* Use single native register copy */ + REG_L a4, 0(a1) + addi a1, a1, SZREG + REG_S a4, 0(t6) + addi t6, t6, SZREG + bltu a1, a3, 3b + + /* Jump directly out if no more bytes */ + beqz a2, 6f + +4: + /* Copy the last few individual bytes */ + add a3, a1, a2 +5: + lb a4, 0(a1) + addi a1, a1, 1 + sb a4, 0(t6) + addi t6, t6, 1 + bltu a1, a3, 5b +6: + ret + +END (__memcpy_noalignment) + +hidden_def (__memcpy_noalignment) diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile index 45cc29e40d..aa9ea443d6 100644 --- a/sysdeps/unix/sysv/linux/riscv/Makefile +++ b/sysdeps/unix/sysv/linux/riscv/Makefile @@ -7,6 +7,10 @@ ifeq ($(subdir),stdlib) gen-as-const-headers += ucontext_i.sym endif +ifeq ($(subdir),string) +sysdep_routines += memcpy memcpy-generic memcpy_noalignment +endif + abi-variants := ilp32 ilp32d lp64 lp64d ifeq (,$(filter $(default-abi),$(abi-variants))) diff --git a/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c new file mode 100644 index 0000000000..0abe03f7f5 --- /dev/null +++ b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c @@ -0,0 +1,24 @@ +/* Re-include the default memcpy implementation. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <string.h> + +extern __typeof (memcpy) __memcpy_generic; +hidden_proto(__memcpy_generic) + +#include <string/memcpy.c> -- 2.34.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy 2023-07-06 19:29 ` [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy Evan Green @ 2023-07-07 9:22 ` Richard Henderson 2023-07-07 15:25 ` Jeff Law 2023-07-08 2:16 ` Stefan O'Rear 1 sibling, 1 reply; 17+ messages in thread From: Richard Henderson @ 2023-07-07 9:22 UTC (permalink / raw) To: Evan Green, libc-alpha; +Cc: palmer, slewis, vineetg, Florian Weimer On 7/6/23 20:29, Evan Green wrote: > + /* Copy the last few individual bytes */ > + add a3, a1, a2 > +5: > + lb a4, 0(a1) > + addi a1, a1, 1 > + sb a4, 0(t6) > + addi t6, t6, 1 > + bltu a1, a3, 5b > +6: > + ret The only time you should be copying individual bytes is when the copy is smaller than SZREG. Otherwise the tail can be handled like add srcend, a1, a2 add dstend, a0, a2 REG_L tmp, -SZREG(srcend) REG_S tmp, -SZREG(dstend) There are other tricks that can be used to reduce the number of branches -- please examine the x86 code. See e.g. the copy_0_15 block in sysdeps/x86_64/multiarch/memmove-ssse3.S. r~ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy 2023-07-07 9:22 ` Richard Henderson @ 2023-07-07 15:25 ` Jeff Law 2023-07-07 21:37 ` Evan Green 0 siblings, 1 reply; 17+ messages in thread From: Jeff Law @ 2023-07-07 15:25 UTC (permalink / raw) To: Richard Henderson, Evan Green, libc-alpha Cc: palmer, slewis, vineetg, Florian Weimer On 7/7/23 03:22, Richard Henderson via Libc-alpha wrote: > On 7/6/23 20:29, Evan Green wrote: >> + /* Copy the last few individual bytes */ >> + add a3, a1, a2 >> +5: >> + lb a4, 0(a1) >> + addi a1, a1, 1 >> + sb a4, 0(t6) >> + addi t6, t6, 1 >> + bltu a1, a3, 5b >> +6: >> + ret > > The only time you should be copying individual bytes is when the copy is > smaller than SZREG. Otherwise the tail can be handled like > > add srcend, a1, a2 > add dstend, a0, a2 > REG_L tmp, -SZREG(srcend) > REG_S tmp, -SZREG(dstend) > > There are other tricks that can be used to reduce the number of branches > -- please examine the x86 code. See e.g. the copy_0_15 block in > sysdeps/x86_64/multiarch/memmove-ssse3.S. The bits we've got here from VRULL use this trick. Evan, I'm happy to pass those bits along if you want to take a look. I have no strong opinions if this should be fixed before integration or as a follow-up. jeff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy 2023-07-07 15:25 ` Jeff Law @ 2023-07-07 21:37 ` Evan Green 2023-07-07 22:15 ` Jeff Law 0 siblings, 1 reply; 17+ messages in thread From: Evan Green @ 2023-07-07 21:37 UTC (permalink / raw) To: Jeff Law Cc: Richard Henderson, libc-alpha, palmer, slewis, vineetg, Florian Weimer On Fri, Jul 7, 2023 at 8:25 AM Jeff Law <jeffreyalaw@gmail.com> wrote: > > > > On 7/7/23 03:22, Richard Henderson via Libc-alpha wrote: > > On 7/6/23 20:29, Evan Green wrote: > >> + /* Copy the last few individual bytes */ > >> + add a3, a1, a2 > >> +5: > >> + lb a4, 0(a1) > >> + addi a1, a1, 1 > >> + sb a4, 0(t6) > >> + addi t6, t6, 1 > >> + bltu a1, a3, 5b > >> +6: > >> + ret > > > > The only time you should be copying individual bytes is when the copy is > > smaller than SZREG. Otherwise the tail can be handled like > > > > add srcend, a1, a2 > > add dstend, a0, a2 > > REG_L tmp, -SZREG(srcend) > > REG_S tmp, -SZREG(dstend) > > > > There are other tricks that can be used to reduce the number of branches > > -- please examine the x86 code. See e.g. the copy_0_15 block in > > sysdeps/x86_64/multiarch/memmove-ssse3.S. > The bits we've got here from VRULL use this trick. > > Evan, I'm happy to pass those bits along if you want to take a look. > > I have no strong opinions if this should be fixed before integration or > as a follow-up. This is the vrull patch, right? https://patchwork.sourceware.org/project/glibc/patch/20230207001618.458947-13-christoph.muellner@vrull.eu/ Sure, I can add the overlapping word access as suggested by Richard, it's a good idea. My preference is a followup patch, but I am ok either way. I should be able to get it sent next week. -Evan ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy 2023-07-07 21:37 ` Evan Green @ 2023-07-07 22:15 ` Jeff Law 0 siblings, 0 replies; 17+ messages in thread From: Jeff Law @ 2023-07-07 22:15 UTC (permalink / raw) To: Evan Green Cc: Richard Henderson, libc-alpha, palmer, slewis, vineetg, Florian Weimer On 7/7/23 15:37, Evan Green wrote: >>> There are other tricks that can be used to reduce the number of branches >>> -- please examine the x86 code. See e.g. the copy_0_15 block in >>> sysdeps/x86_64/multiarch/memmove-ssse3.S. >> The bits we've got here from VRULL use this trick. >> >> Evan, I'm happy to pass those bits along if you want to take a look. >> >> I have no strong opinions if this should be fixed before integration or >> as a follow-up. > > This is the vrull patch, right? > https://patchwork.sourceware.org/project/glibc/patch/20230207001618.458947-13-christoph.muellner@vrull.eu/ Yea. I didn't diff it, just quickly scanned and it looks like the same bits we're running here. > > Sure, I can add the overlapping word access as suggested by Richard, > it's a good idea. My preference is a followup patch, but I am ok > either way. I should be able to get it sent next week. So it looks like we're in the "slushy ABI freeze" state for glibc and it's unclear to me if this patch (either variant) can reasonably land in glibc-2.38. It's not ideal, but such is life. Jeff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy 2023-07-06 19:29 ` [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy Evan Green 2023-07-07 9:22 ` Richard Henderson @ 2023-07-08 2:16 ` Stefan O'Rear 2023-07-10 16:19 ` Evan Green 1 sibling, 1 reply; 17+ messages in thread From: Stefan O'Rear @ 2023-07-08 2:16 UTC (permalink / raw) To: Evan Green, Stefan O'Rear via Libc-alpha Cc: Palmer Dabbelt, slewis, Vineet Gupta, Florian Weimer On Thu, Jul 6, 2023, at 3:29 PM, Evan Green wrote: > For CPU implementations that can perform unaligned accesses with little > or no performance penalty, create a memcpy implementation that does not > bother aligning buffers. It will use a block of integer registers, a > single integer register, and fall back to bytewise copy for the > remainder. > > Signed-off-by: Evan Green <evan@rivosinc.com> > Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> > > --- > > Changes in v4: > - Fixed comment style (Florian) > > Changes in v3: > - Word align dest for large memcpy()s. > - Add tags > - Remove spurious blank line from sysdeps/riscv/memcpy.c > > Changes in v2: > - Used _MASK instead of _FAST value itself. > > > --- > sysdeps/riscv/memcopy.h | 26 ++++ > sysdeps/riscv/memcpy.c | 64 +++++++++ > sysdeps/riscv/memcpy_noalignment.S | 121 ++++++++++++++++++ > sysdeps/unix/sysv/linux/riscv/Makefile | 4 + > .../unix/sysv/linux/riscv/memcpy-generic.c | 24 ++++ > 5 files changed, 239 insertions(+) > create mode 100644 sysdeps/riscv/memcopy.h > create mode 100644 sysdeps/riscv/memcpy.c > create mode 100644 sysdeps/riscv/memcpy_noalignment.S > create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > > diff --git a/sysdeps/riscv/memcopy.h b/sysdeps/riscv/memcopy.h > new file mode 100644 > index 0000000000..2b685c8aa0 > --- /dev/null > +++ b/sysdeps/riscv/memcopy.h > @@ -0,0 +1,26 @@ > +/* memcopy.h -- definitions for memory copy functions. RISC-V version. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <https://www.gnu.org/licenses/>. */ > + > +#include <sysdeps/generic/memcopy.h> > + > +/* Redefine the generic memcpy implementation to __memcpy_generic, so > + the memcpy ifunc can select between generic and special versions. > + In rtld, don't bother with all the ifunciness. */ > +#if IS_IN (libc) > +#define MEMCPY __memcpy_generic > +#endif > diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/riscv/memcpy.c > new file mode 100644 > index 0000000000..fdb8dc3208 > --- /dev/null > +++ b/sysdeps/riscv/memcpy.c > @@ -0,0 +1,64 @@ > +/* Multiple versions of memcpy. > + All versions must be listed in ifunc-impl-list.c. > + Copyright (C) 2017-2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <https://www.gnu.org/licenses/>. */ > + > +#if IS_IN (libc) > +/* Redefine memcpy so that the compiler won't complain about the type > + mismatch with the IFUNC selector in strong_alias, below. */ > +# undef memcpy > +# define memcpy __redirect_memcpy > +# include <string.h> > +#include <ifunc-init.h> > +#include <sys/hwprobe.h> > + > +#define INIT_ARCH() > + > +extern __typeof (__redirect_memcpy) __libc_memcpy; > + > +extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden; > +extern __typeof (__redirect_memcpy) __memcpy_noalignment > attribute_hidden; > + > +static inline __typeof (__redirect_memcpy) * > +select_memcpy_ifunc (void) > +{ > + INIT_ARCH (); > + > + struct riscv_hwprobe pair; > + > + pair.key = RISCV_HWPROBE_KEY_CPUPERF_0; > + if (__riscv_hwprobe(&pair, 1, 0, NULL, 0) != 0) > + return __memcpy_generic; > + > + if ((pair.key > 0) && > + (pair.value & RISCV_HWPROBE_MISALIGNED_MASK) == > + RISCV_HWPROBE_MISALIGNED_FAST) > + return __memcpy_noalignment; It's unclear whether this is semantically correct as a use of __riscv_hwprobe. [1] describes the result of hwprobe as "what's possible to enable", leaving open the possibility that additional system calls are needed to determine whether unaligned accesses are supported right now in the current process, and [2] adds an (inherited, IIUC) prctl for unaligned access which doesn't affect the return value of hwprobe and would break this code as written. (There is nothing in either the privileged spec or the SBI spec to prohibit an implementation which provides FAST unaligned access from supporting an optional strict alignment checking mode and making it available through fw_feature.) -s [1]: https://lore.kernel.org/linux-riscv/mhng-97928779-5d76-4390-a84c-398fdc6a0a4f@palmer-ri-x1c9/ [2]: https://lore.kernel.org/linux-riscv/20230624122049.7886-6-cleger@rivosinc.com/ > + > + return __memcpy_generic; > +} > + > +libc_ifunc (__libc_memcpy, select_memcpy_ifunc ()); > + > +# undef memcpy > +strong_alias (__libc_memcpy, memcpy); > +# ifdef SHARED > +__hidden_ver1 (memcpy, __GI_memcpy, __redirect_memcpy) > + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memcpy); > +# endif > + > +#endif > diff --git a/sysdeps/riscv/memcpy_noalignment.S > b/sysdeps/riscv/memcpy_noalignment.S > new file mode 100644 > index 0000000000..80f5e09ebb > --- /dev/null > +++ b/sysdeps/riscv/memcpy_noalignment.S > @@ -0,0 +1,121 @@ > +/* memcpy for RISC-V, ignoring buffer alignment > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library. If not, see > + <https://www.gnu.org/licenses/>. */ > + > +#include <sysdep.h> > +#include <sys/asm.h> > + > +/* void *memcpy(void *, const void *, size_t) */ > +ENTRY (__memcpy_noalignment) > + move t6, a0 /* Preserve return value */ > + > + /* Round down to the nearest "page" size */ > + andi a4, a2, ~((16*SZREG)-1) > + beqz a4, 2f > + add a3, a1, a4 > + > + /* Copy the first word to get dest word aligned */ > + andi a5, t6, SZREG-1 > + beqz a5, 1f > + REG_L a6, (a1) > + REG_S a6, (t6) > + > + /* Align dst up to a word, move src and size as well. */ > + addi t6, t6, SZREG-1 > + andi t6, t6, ~(SZREG-1) > + sub a5, t6, a0 > + add a1, a1, a5 > + sub a2, a2, a5 > + > + /* Recompute page count */ > + andi a4, a2, ~((16*SZREG)-1) > + beqz a4, 2f > + > +1: > + /* Copy "pages" (chunks of 16 registers) */ > + REG_L a4, 0(a1) > + REG_L a5, SZREG(a1) > + REG_L a6, 2*SZREG(a1) > + REG_L a7, 3*SZREG(a1) > + REG_L t0, 4*SZREG(a1) > + REG_L t1, 5*SZREG(a1) > + REG_L t2, 6*SZREG(a1) > + REG_L t3, 7*SZREG(a1) > + REG_L t4, 8*SZREG(a1) > + REG_L t5, 9*SZREG(a1) > + REG_S a4, 0(t6) > + REG_S a5, SZREG(t6) > + REG_S a6, 2*SZREG(t6) > + REG_S a7, 3*SZREG(t6) > + REG_S t0, 4*SZREG(t6) > + REG_S t1, 5*SZREG(t6) > + REG_S t2, 6*SZREG(t6) > + REG_S t3, 7*SZREG(t6) > + REG_S t4, 8*SZREG(t6) > + REG_S t5, 9*SZREG(t6) > + REG_L a4, 10*SZREG(a1) > + REG_L a5, 11*SZREG(a1) > + REG_L a6, 12*SZREG(a1) > + REG_L a7, 13*SZREG(a1) > + REG_L t0, 14*SZREG(a1) > + REG_L t1, 15*SZREG(a1) > + addi a1, a1, 16*SZREG > + REG_S a4, 10*SZREG(t6) > + REG_S a5, 11*SZREG(t6) > + REG_S a6, 12*SZREG(t6) > + REG_S a7, 13*SZREG(t6) > + REG_S t0, 14*SZREG(t6) > + REG_S t1, 15*SZREG(t6) > + addi t6, t6, 16*SZREG > + bltu a1, a3, 1b > + andi a2, a2, (16*SZREG)-1 /* Update count */ > + > +2: > + /* Remainder is smaller than a page, compute native word count */ > + beqz a2, 6f > + andi a5, a2, ~(SZREG-1) > + andi a2, a2, (SZREG-1) > + add a3, a1, a5 > + /* Jump directly to byte copy if no words. */ > + beqz a5, 4f > + > +3: > + /* Use single native register copy */ > + REG_L a4, 0(a1) > + addi a1, a1, SZREG > + REG_S a4, 0(t6) > + addi t6, t6, SZREG > + bltu a1, a3, 3b > + > + /* Jump directly out if no more bytes */ > + beqz a2, 6f > + > +4: > + /* Copy the last few individual bytes */ > + add a3, a1, a2 > +5: > + lb a4, 0(a1) > + addi a1, a1, 1 > + sb a4, 0(t6) > + addi t6, t6, 1 > + bltu a1, a3, 5b > +6: > + ret > + > +END (__memcpy_noalignment) > + > +hidden_def (__memcpy_noalignment) > diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile > b/sysdeps/unix/sysv/linux/riscv/Makefile > index 45cc29e40d..aa9ea443d6 100644 > --- a/sysdeps/unix/sysv/linux/riscv/Makefile > +++ b/sysdeps/unix/sysv/linux/riscv/Makefile > @@ -7,6 +7,10 @@ ifeq ($(subdir),stdlib) > gen-as-const-headers += ucontext_i.sym > endif > > +ifeq ($(subdir),string) > +sysdep_routines += memcpy memcpy-generic memcpy_noalignment > +endif > + > abi-variants := ilp32 ilp32d lp64 lp64d > > ifeq (,$(filter $(default-abi),$(abi-variants))) > diff --git a/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > new file mode 100644 > index 0000000000..0abe03f7f5 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > @@ -0,0 +1,24 @@ > +/* Re-include the default memcpy implementation. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <https://www.gnu.org/licenses/>. */ > + > +#include <string.h> > + > +extern __typeof (memcpy) __memcpy_generic; > +hidden_proto(__memcpy_generic) > + > +#include <string/memcpy.c> > -- > 2.34.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy 2023-07-08 2:16 ` Stefan O'Rear @ 2023-07-10 16:19 ` Evan Green 2023-07-12 5:22 ` Stefan O'Rear 0 siblings, 1 reply; 17+ messages in thread From: Evan Green @ 2023-07-10 16:19 UTC (permalink / raw) To: Stefan O'Rear Cc: Stefan O'Rear via Libc-alpha, Palmer Dabbelt, slewis, Vineet Gupta, Florian Weimer On Fri, Jul 7, 2023 at 7:17 PM Stefan O'Rear <sorear@fastmail.com> wrote: > > > > On Thu, Jul 6, 2023, at 3:29 PM, Evan Green wrote: > > For CPU implementations that can perform unaligned accesses with little > > or no performance penalty, create a memcpy implementation that does not > > bother aligning buffers. It will use a block of integer registers, a > > single integer register, and fall back to bytewise copy for the > > remainder. > > > > Signed-off-by: Evan Green <evan@rivosinc.com> > > Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> > > > > --- > > > > Changes in v4: > > - Fixed comment style (Florian) > > > > Changes in v3: > > - Word align dest for large memcpy()s. > > - Add tags > > - Remove spurious blank line from sysdeps/riscv/memcpy.c > > > > Changes in v2: > > - Used _MASK instead of _FAST value itself. > > > > > > --- > > sysdeps/riscv/memcopy.h | 26 ++++ > > sysdeps/riscv/memcpy.c | 64 +++++++++ > > sysdeps/riscv/memcpy_noalignment.S | 121 ++++++++++++++++++ > > sysdeps/unix/sysv/linux/riscv/Makefile | 4 + > > .../unix/sysv/linux/riscv/memcpy-generic.c | 24 ++++ > > 5 files changed, 239 insertions(+) > > create mode 100644 sysdeps/riscv/memcopy.h > > create mode 100644 sysdeps/riscv/memcpy.c > > create mode 100644 sysdeps/riscv/memcpy_noalignment.S > > create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > > > > diff --git a/sysdeps/riscv/memcopy.h b/sysdeps/riscv/memcopy.h > > new file mode 100644 > > index 0000000000..2b685c8aa0 > > --- /dev/null > > +++ b/sysdeps/riscv/memcopy.h > > @@ -0,0 +1,26 @@ > > +/* memcopy.h -- definitions for memory copy functions. RISC-V version. > > + Copyright (C) 2023 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful, > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + <https://www.gnu.org/licenses/>. */ > > + > > +#include <sysdeps/generic/memcopy.h> > > + > > +/* Redefine the generic memcpy implementation to __memcpy_generic, so > > + the memcpy ifunc can select between generic and special versions. > > + In rtld, don't bother with all the ifunciness. */ > > +#if IS_IN (libc) > > +#define MEMCPY __memcpy_generic > > +#endif > > diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/riscv/memcpy.c > > new file mode 100644 > > index 0000000000..fdb8dc3208 > > --- /dev/null > > +++ b/sysdeps/riscv/memcpy.c > > @@ -0,0 +1,64 @@ > > +/* Multiple versions of memcpy. > > + All versions must be listed in ifunc-impl-list.c. > > + Copyright (C) 2017-2023 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful, > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + <https://www.gnu.org/licenses/>. */ > > + > > +#if IS_IN (libc) > > +/* Redefine memcpy so that the compiler won't complain about the type > > + mismatch with the IFUNC selector in strong_alias, below. */ > > +# undef memcpy > > +# define memcpy __redirect_memcpy > > +# include <string.h> > > +#include <ifunc-init.h> > > +#include <sys/hwprobe.h> > > + > > +#define INIT_ARCH() > > + > > +extern __typeof (__redirect_memcpy) __libc_memcpy; > > + > > +extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden; > > +extern __typeof (__redirect_memcpy) __memcpy_noalignment > > attribute_hidden; > > + > > +static inline __typeof (__redirect_memcpy) * > > +select_memcpy_ifunc (void) > > +{ > > + INIT_ARCH (); > > + > > + struct riscv_hwprobe pair; > > + > > + pair.key = RISCV_HWPROBE_KEY_CPUPERF_0; > > + if (__riscv_hwprobe(&pair, 1, 0, NULL, 0) != 0) > > + return __memcpy_generic; > > + > > + if ((pair.key > 0) && > > + (pair.value & RISCV_HWPROBE_MISALIGNED_MASK) == > > + RISCV_HWPROBE_MISALIGNED_FAST) > > + return __memcpy_noalignment; > > It's unclear whether this is semantically correct as a use of > __riscv_hwprobe. [1] describes the result of hwprobe as "what's possible > to enable", leaving open the possibility that additional system calls are Right, think of it like "cpuid for risc-v". > needed to determine whether unaligned accesses are supported right now in > the current process, and [2] adds an (inherited, IIUC) prctl for > unaligned access which doesn't affect the return value of hwprobe and > would break this code as written. > > (There is nothing in either the privileged spec or the SBI spec to > prohibit an implementation which provides FAST unaligned access from > supporting an optional strict alignment checking mode and making it > available through fw_feature.) I think your point about prctls() muddying the waters of what hwprobe is trying to report is generally valid. However in this case, the prctl() proposed in [2] only disables unaligned accesses for systems where Linux is handling the trapping/emulation of unaligned accesses. These systems would only ever report SLOW/EMULATED out of hwprobe. Since we're only making changes on FAST systems, the prctl()'s setting doesn't come into play. -Evan > > -s > > [1]: https://lore.kernel.org/linux-riscv/mhng-97928779-5d76-4390-a84c-398fdc6a0a4f@palmer-ri-x1c9/ > [2]: https://lore.kernel.org/linux-riscv/20230624122049.7886-6-cleger@rivosinc.com/ > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy 2023-07-10 16:19 ` Evan Green @ 2023-07-12 5:22 ` Stefan O'Rear 0 siblings, 0 replies; 17+ messages in thread From: Stefan O'Rear @ 2023-07-12 5:22 UTC (permalink / raw) To: Evan Green Cc: Stefan O'Rear via Libc-alpha, Palmer Dabbelt, slewis, Vineet Gupta, Florian Weimer On Mon, Jul 10, 2023, at 12:19 PM, Evan Green wrote: > On Fri, Jul 7, 2023 at 7:17 PM Stefan O'Rear <sorear@fastmail.com> wrote: >> >> >> >> On Thu, Jul 6, 2023, at 3:29 PM, Evan Green wrote: >> > For CPU implementations that can perform unaligned accesses with little >> > or no performance penalty, create a memcpy implementation that does not >> > bother aligning buffers. It will use a block of integer registers, a >> > single integer register, and fall back to bytewise copy for the >> > remainder. >> > >> > Signed-off-by: Evan Green <evan@rivosinc.com> >> > Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> >> > >> > --- >> > >> > Changes in v4: >> > - Fixed comment style (Florian) >> > >> > Changes in v3: >> > - Word align dest for large memcpy()s. >> > - Add tags >> > - Remove spurious blank line from sysdeps/riscv/memcpy.c >> > >> > Changes in v2: >> > - Used _MASK instead of _FAST value itself. >> > >> > >> > --- >> > sysdeps/riscv/memcopy.h | 26 ++++ >> > sysdeps/riscv/memcpy.c | 64 +++++++++ >> > sysdeps/riscv/memcpy_noalignment.S | 121 ++++++++++++++++++ >> > sysdeps/unix/sysv/linux/riscv/Makefile | 4 + >> > .../unix/sysv/linux/riscv/memcpy-generic.c | 24 ++++ >> > 5 files changed, 239 insertions(+) >> > create mode 100644 sysdeps/riscv/memcopy.h >> > create mode 100644 sysdeps/riscv/memcpy.c >> > create mode 100644 sysdeps/riscv/memcpy_noalignment.S >> > create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c >> > >> > diff --git a/sysdeps/riscv/memcopy.h b/sysdeps/riscv/memcopy.h >> > new file mode 100644 >> > index 0000000000..2b685c8aa0 >> > --- /dev/null >> > +++ b/sysdeps/riscv/memcopy.h >> > @@ -0,0 +1,26 @@ >> > +/* memcopy.h -- definitions for memory copy functions. RISC-V version. >> > + Copyright (C) 2023 Free Software Foundation, Inc. >> > + This file is part of the GNU C Library. >> > + >> > + The GNU C Library is free software; you can redistribute it and/or >> > + modify it under the terms of the GNU Lesser General Public >> > + License as published by the Free Software Foundation; either >> > + version 2.1 of the License, or (at your option) any later version. >> > + >> > + The GNU C Library is distributed in the hope that it will be useful, >> > + but WITHOUT ANY WARRANTY; without even the implied warranty of >> > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> > + Lesser General Public License for more details. >> > + >> > + You should have received a copy of the GNU Lesser General Public >> > + License along with the GNU C Library; if not, see >> > + <https://www.gnu.org/licenses/>. */ >> > + >> > +#include <sysdeps/generic/memcopy.h> >> > + >> > +/* Redefine the generic memcpy implementation to __memcpy_generic, so >> > + the memcpy ifunc can select between generic and special versions. >> > + In rtld, don't bother with all the ifunciness. */ >> > +#if IS_IN (libc) >> > +#define MEMCPY __memcpy_generic >> > +#endif >> > diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/riscv/memcpy.c >> > new file mode 100644 >> > index 0000000000..fdb8dc3208 >> > --- /dev/null >> > +++ b/sysdeps/riscv/memcpy.c >> > @@ -0,0 +1,64 @@ >> > +/* Multiple versions of memcpy. >> > + All versions must be listed in ifunc-impl-list.c. >> > + Copyright (C) 2017-2023 Free Software Foundation, Inc. >> > + This file is part of the GNU C Library. >> > + >> > + The GNU C Library is free software; you can redistribute it and/or >> > + modify it under the terms of the GNU Lesser General Public >> > + License as published by the Free Software Foundation; either >> > + version 2.1 of the License, or (at your option) any later version. >> > + >> > + The GNU C Library is distributed in the hope that it will be useful, >> > + but WITHOUT ANY WARRANTY; without even the implied warranty of >> > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> > + Lesser General Public License for more details. >> > + >> > + You should have received a copy of the GNU Lesser General Public >> > + License along with the GNU C Library; if not, see >> > + <https://www.gnu.org/licenses/>. */ >> > + >> > +#if IS_IN (libc) >> > +/* Redefine memcpy so that the compiler won't complain about the type >> > + mismatch with the IFUNC selector in strong_alias, below. */ >> > +# undef memcpy >> > +# define memcpy __redirect_memcpy >> > +# include <string.h> >> > +#include <ifunc-init.h> >> > +#include <sys/hwprobe.h> >> > + >> > +#define INIT_ARCH() >> > + >> > +extern __typeof (__redirect_memcpy) __libc_memcpy; >> > + >> > +extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden; >> > +extern __typeof (__redirect_memcpy) __memcpy_noalignment >> > attribute_hidden; >> > + >> > +static inline __typeof (__redirect_memcpy) * >> > +select_memcpy_ifunc (void) >> > +{ >> > + INIT_ARCH (); >> > + >> > + struct riscv_hwprobe pair; >> > + >> > + pair.key = RISCV_HWPROBE_KEY_CPUPERF_0; >> > + if (__riscv_hwprobe(&pair, 1, 0, NULL, 0) != 0) >> > + return __memcpy_generic; >> > + >> > + if ((pair.key > 0) && >> > + (pair.value & RISCV_HWPROBE_MISALIGNED_MASK) == >> > + RISCV_HWPROBE_MISALIGNED_FAST) >> > + return __memcpy_noalignment; >> >> It's unclear whether this is semantically correct as a use of >> __riscv_hwprobe. [1] describes the result of hwprobe as "what's possible >> to enable", leaving open the possibility that additional system calls are > > Right, think of it like "cpuid for risc-v". We're missing a good analog of cpuid's OSXSAVE result, or perhaps more accurately xgetbv. I was pushing for including the xgetbv information in hwprobe, under the mistaken impression that vvar was per-process; given that hwprobe cannot usefully return per-process data, it's probably doing the right thing right now, but this puts us back in the place of having no usable userspace ABI for features that can be disabled per-process, or might be disabled per-process in the future. >> needed to determine whether unaligned accesses are supported right now in >> the current process, and [2] adds an (inherited, IIUC) prctl for >> unaligned access which doesn't affect the return value of hwprobe and >> would break this code as written. >> >> (There is nothing in either the privileged spec or the SBI spec to >> prohibit an implementation which provides FAST unaligned access from >> supporting an optional strict alignment checking mode and making it >> available through fw_feature.) > > I think your point about prctls() muddying the waters of what hwprobe > is trying to report is generally valid. However in this case, the > prctl() proposed in [2] only disables unaligned accesses for systems > where Linux is handling the trapping/emulation of unaligned accesses. > These systems would only ever report SLOW/EMULATED out of hwprobe. > Since we're only making changes on FAST systems, the prctl()'s setting > doesn't come into play. Looking at other architectures with PR_SET_UNALIGN support, none of them do anything to stop the status from affecting children after fork or exec, so I think the proposed riscv handling is not special and we don't need to do anything special to handle PR_GET_UNALIGN here; if user code relies on fixups to handle mostly-aligned data, it will crash under PR_UNALIGN_SIGBUS, and that's expected and not an ABI issue. I wonder if we should take the same approach with PR_RISCV_V_VSTATE_CTRL_OFF and say that it requires buyin from any processes run under its influence and crashes are expected otherwise. -s > -Evan > >> >> -s >> >> [1]: https://lore.kernel.org/linux-riscv/mhng-97928779-5d76-4390-a84c-398fdc6a0a4f@palmer-ri-x1c9/ >> [2]: https://lore.kernel.org/linux-riscv/20230624122049.7886-6-cleger@rivosinc.com/ >> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-07-06 19:29 [PATCH v4 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green ` (2 preceding siblings ...) 2023-07-06 19:29 ` [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy Evan Green @ 2023-07-06 20:11 ` Palmer Dabbelt 2023-07-06 22:20 ` Jeff Law 3 siblings, 1 reply; 17+ messages in thread From: Palmer Dabbelt @ 2023-07-06 20:11 UTC (permalink / raw) To: Evan Green, Jeff Law Cc: libc-alpha, slewis, Vineet Gupta, fweimer, Evan Green On Thu, 06 Jul 2023 12:29:43 PDT (-0700), Evan Green wrote: > > This series illustrates the use of a recently accepted Linux syscall that > enumerates architectural information about the RISC-V cores the system > is running on. In this series we expose a small wrapper function around > the syscall. An ifunc selector for memcpy queries it to see if unaligned > access is "fast" on this hardware. If it is, it selects a newly provided > implementation of memcpy that doesn't work hard at aligning the src and > destination buffers. > > I opted to spin the whole series, though it's perfectly safe to take > just the first two patches for the hwprobe interface and abandon the > third patch as a separate issue. Thanks. Given that it has a meaningful performance increase on the T-Head hardware it seems reasonable to take it for the next release. I don't remember if I've looked super closely at the implementation, I'll do so before testing and merging it -- certainly not this week, though, as the merge window will probably eat all my spare cycles. The only issue on my end is the assembly memcpy routine, which we were generally trying to avoid. +Jeff, as IIUC the Ventana folks were interested in memcpy on fast-misaligned systems. Do you guys happen to have one lying around for the C implementation? It'd be nice to see if we're getting any real performance benefit from the assembly. > Performance numbers were compared using a small test program [1], run on > a D1 Nezha board, which supports fast unaligned access. "Fast" here > means copying unaligned words is faster than copying byte-wise, but > still slower than copying aligned words. Here's the speed of various > memcpy()s with the generic implementation: > > memcpy size 1 count 1000000 offset 0 took 109564 us > memcpy size 3 count 1000000 offset 0 took 138425 us > memcpy size 4 count 1000000 offset 0 took 148374 us > memcpy size 7 count 1000000 offset 0 took 178433 us > memcpy size 8 count 1000000 offset 0 took 188430 us > memcpy size f count 1000000 offset 0 took 266118 us > memcpy size f count 1000000 offset 1 took 265940 us > memcpy size f count 1000000 offset 3 took 265934 us > memcpy size f count 1000000 offset 7 took 266215 us > memcpy size f count 1000000 offset 8 took 265954 us > memcpy size f count 1000000 offset 9 took 265886 us > memcpy size 10 count 1000000 offset 0 took 195308 us > memcpy size 11 count 1000000 offset 0 took 205161 us > memcpy size 17 count 1000000 offset 0 took 274376 us > memcpy size 18 count 1000000 offset 0 took 199188 us > memcpy size 19 count 1000000 offset 0 took 209258 us > memcpy size 1f count 1000000 offset 0 took 278263 us > memcpy size 20 count 1000000 offset 0 took 207364 us > memcpy size 21 count 1000000 offset 0 took 217143 us > memcpy size 3f count 1000000 offset 0 took 300023 us > memcpy size 40 count 1000000 offset 0 took 231063 us > memcpy size 41 count 1000000 offset 0 took 241259 us > memcpy size 7c count 100000 offset 0 took 32807 us > memcpy size 7f count 100000 offset 0 took 36274 us > memcpy size ff count 100000 offset 0 took 47818 us > memcpy size ff count 100000 offset 0 took 47932 us > memcpy size 100 count 100000 offset 0 took 40468 us > memcpy size 200 count 100000 offset 0 took 64245 us > memcpy size 27f count 100000 offset 0 took 82549 us > memcpy size 400 count 100000 offset 0 took 111254 us > memcpy size 407 count 100000 offset 0 took 119364 us > memcpy size 800 count 100000 offset 0 took 203899 us > memcpy size 87f count 100000 offset 0 took 222465 us > memcpy size 87f count 100000 offset 3 took 222289 us > memcpy size 1000 count 100000 offset 0 took 388846 us > memcpy size 1000 count 100000 offset 1 took 468827 us > memcpy size 1000 count 100000 offset 3 took 397098 us > memcpy size 1000 count 100000 offset 4 took 397379 us > memcpy size 1000 count 100000 offset 5 took 397368 us > memcpy size 1000 count 100000 offset 7 took 396867 us > memcpy size 1000 count 100000 offset 8 took 389227 us > memcpy size 1000 count 100000 offset 9 took 395949 us > memcpy size 3000 count 50000 offset 0 took 674837 us > memcpy size 3000 count 50000 offset 1 took 676944 us > memcpy size 3000 count 50000 offset 3 took 679709 us > memcpy size 3000 count 50000 offset 4 took 680829 us > memcpy size 3000 count 50000 offset 5 took 678024 us > memcpy size 3000 count 50000 offset 7 took 681097 us > memcpy size 3000 count 50000 offset 8 took 670004 us > memcpy size 3000 count 50000 offset 9 took 674553 us > > Here is that same test run with the assembly memcpy() in this series: > memcpy size 1 count 1000000 offset 0 took 92703 us > memcpy size 3 count 1000000 offset 0 took 112527 us > memcpy size 4 count 1000000 offset 0 took 120481 us > memcpy size 7 count 1000000 offset 0 took 149558 us > memcpy size 8 count 1000000 offset 0 took 90617 us > memcpy size f count 1000000 offset 0 took 174373 us > memcpy size f count 1000000 offset 1 took 178615 us > memcpy size f count 1000000 offset 3 took 178845 us > memcpy size f count 1000000 offset 7 took 178636 us > memcpy size f count 1000000 offset 8 took 174442 us > memcpy size f count 1000000 offset 9 took 178660 us > memcpy size 10 count 1000000 offset 0 took 99845 us > memcpy size 11 count 1000000 offset 0 took 112522 us > memcpy size 17 count 1000000 offset 0 took 179735 us > memcpy size 18 count 1000000 offset 0 took 110870 us > memcpy size 19 count 1000000 offset 0 took 121472 us > memcpy size 1f count 1000000 offset 0 took 188231 us > memcpy size 20 count 1000000 offset 0 took 119571 us > memcpy size 21 count 1000000 offset 0 took 132429 us > memcpy size 3f count 1000000 offset 0 took 227021 us > memcpy size 40 count 1000000 offset 0 took 166416 us > memcpy size 41 count 1000000 offset 0 took 180206 us > memcpy size 7c count 100000 offset 0 took 28602 us > memcpy size 7f count 100000 offset 0 took 31676 us > memcpy size ff count 100000 offset 0 took 39257 us > memcpy size ff count 100000 offset 0 took 39176 us > memcpy size 100 count 100000 offset 0 took 21928 us > memcpy size 200 count 100000 offset 0 took 35814 us > memcpy size 27f count 100000 offset 0 took 60315 us > memcpy size 400 count 100000 offset 0 took 63652 us > memcpy size 407 count 100000 offset 0 took 73160 us > memcpy size 800 count 100000 offset 0 took 121532 us > memcpy size 87f count 100000 offset 0 took 147269 us > memcpy size 87f count 100000 offset 3 took 144744 us > memcpy size 1000 count 100000 offset 0 took 232057 us > memcpy size 1000 count 100000 offset 1 took 254319 us > memcpy size 1000 count 100000 offset 3 took 256973 us > memcpy size 1000 count 100000 offset 4 took 257655 us > memcpy size 1000 count 100000 offset 5 took 259456 us > memcpy size 1000 count 100000 offset 7 took 260849 us > memcpy size 1000 count 100000 offset 8 took 232347 us > memcpy size 1000 count 100000 offset 9 took 254330 us > memcpy size 3000 count 50000 offset 0 took 382376 us > memcpy size 3000 count 50000 offset 1 took 389872 us > memcpy size 3000 count 50000 offset 3 took 385310 us > memcpy size 3000 count 50000 offset 4 took 389748 us > memcpy size 3000 count 50000 offset 5 took 391707 us > memcpy size 3000 count 50000 offset 7 took 386778 us > memcpy size 3000 count 50000 offset 8 took 385691 us > memcpy size 3000 count 50000 offset 9 took 392030 us > > The assembly routine is measurably better. > > [1] https://pastebin.com/DRyECNQW > > > Changes in v4: > - Remove __USE_GNU (Florian) > - __nonnull, __wur, __THROW, and __fortified_attr_access decorations > (Florian) > - change long to long int (Florian) > - Fix comment formatting (Florian) > - Update backup kernel header content copy. > - Fix function declaration formatting (Florian) > - Changed export versions to 2.38 > - Fixed comment style (Florian) > > Changes in v3: > - Update argument types to match v4 kernel interface > - Add the "return" to the vsyscall > - Fix up vdso arg types to match kernel v4 version > - Remove ifdef around INLINE_VSYSCALL (Adhemerval) > - Word align dest for large memcpy()s. > - Add tags > - Remove spurious blank line from sysdeps/riscv/memcpy.c > > Changes in v2: > - hwprobe.h: Use __has_include and duplicate Linux content to make > compilation work when Linux headers are absent (Adhemerval) > - hwprobe.h: Put declaration under __USE_GNU (Adhemerval) > - Use INLINE_SYSCALL_CALL (Adhemerval) > - Update versions > - Update UNALIGNED_MASK to match kernel v3 series. > - Add vDSO interface > - Used _MASK instead of _FAST value itself. > > Evan Green (3): > riscv: Add Linux hwprobe syscall support > riscv: Add hwprobe vdso call support > riscv: Add and use alignment-ignorant memcpy > > sysdeps/riscv/memcopy.h | 26 ++++ > sysdeps/riscv/memcpy.c | 64 +++++++++ > sysdeps/riscv/memcpy_noalignment.S | 121 ++++++++++++++++++ > sysdeps/unix/sysv/linux/dl-vdso-setup.c | 10 ++ > sysdeps/unix/sysv/linux/dl-vdso-setup.h | 3 + > sysdeps/unix/sysv/linux/riscv/Makefile | 8 +- > sysdeps/unix/sysv/linux/riscv/Versions | 3 + > sysdeps/unix/sysv/linux/riscv/hwprobe.c | 31 +++++ > .../unix/sysv/linux/riscv/memcpy-generic.c | 24 ++++ > .../unix/sysv/linux/riscv/rv32/libc.abilist | 1 + > .../unix/sysv/linux/riscv/rv64/libc.abilist | 1 + > sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 72 +++++++++++ > sysdeps/unix/sysv/linux/riscv/sysdep.h | 1 + > 13 files changed, 363 insertions(+), 2 deletions(-) > create mode 100644 sysdeps/riscv/memcopy.h > create mode 100644 sysdeps/riscv/memcpy.c > create mode 100644 sysdeps/riscv/memcpy_noalignment.S > create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-07-06 20:11 ` [PATCH v4 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Palmer Dabbelt @ 2023-07-06 22:20 ` Jeff Law 0 siblings, 0 replies; 17+ messages in thread From: Jeff Law @ 2023-07-06 22:20 UTC (permalink / raw) To: Palmer Dabbelt, Evan Green, Jeff Law Cc: libc-alpha, slewis, Vineet Gupta, fweimer On 7/6/23 14:11, Palmer Dabbelt wrote: > > Thanks. Given that it has a meaningful performance increase on the > T-Head hardware it seems reasonable to take it for the next release. I > don't remember if I've looked super closely at the implementation, I'll > do so before testing and merging it -- certainly not this week, though, > as the merge window will probably eat all my spare cycles. > > The only issue on my end is the assembly memcpy routine, which we were > generally trying to avoid. +Jeff, as IIUC the Ventana folks were > interested in memcpy on fast-misaligned systems. Do you guys happen to > have one lying around for the C implementation? It'd be nice to see if > we're getting any real performance benefit from the assembly. It's just an assembly version from the VRULL team. It's a fairly typical decision tree based on the amount of data being copied. Each of the variants tries to avoid loops by unrolling them in a sensible way. What's never been 100% clear to me is whether or not the full decision tree is actually that profitable in practice. With that in mind, I wouldn't object to Evan's implementation. It's a bit simplistic, but I'm OK with that until someone proves additional complexity is really needed. And I suspect we'll be using "V" based copiers soon anyway. Jeff ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2023-07-12 5:22 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-07-06 19:29 [PATCH v4 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green 2023-07-06 19:29 ` [PATCH v4 1/3] riscv: Add Linux hwprobe syscall support Evan Green 2023-07-07 8:15 ` Florian Weimer 2023-07-07 22:10 ` Evan Green 2023-07-10 9:17 ` Florian Weimer 2023-07-11 17:08 ` Evan Green 2023-07-06 19:29 ` [PATCH v4 2/3] riscv: Add hwprobe vdso call support Evan Green 2023-07-06 19:29 ` [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy Evan Green 2023-07-07 9:22 ` Richard Henderson 2023-07-07 15:25 ` Jeff Law 2023-07-07 21:37 ` Evan Green 2023-07-07 22:15 ` Jeff Law 2023-07-08 2:16 ` Stefan O'Rear 2023-07-10 16:19 ` Evan Green 2023-07-12 5:22 ` Stefan O'Rear 2023-07-06 20:11 ` [PATCH v4 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Palmer Dabbelt 2023-07-06 22:20 ` Jeff Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).