* [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface @ 2023-02-21 19:15 Evan Green 2023-02-21 19:15 ` [PATCH v2 1/3] riscv: Add Linux hwprobe syscall support Evan Green ` (3 more replies) 0 siblings, 4 replies; 27+ messages in thread From: Evan Green @ 2023-02-21 19:15 UTC (permalink / raw) To: libc-alpha; +Cc: palmer, slewis, vineetg, Evan Green This series illustrates the use of a proposed Linux syscall that enumerates architectural information about the RISC-V cores the system is running on. In this series we expose a small wrapper function around the syscall. An ifunc selector for memcpy queries it to see if unaligned access is "fast" on this hardware. If it is, it selects a newly provided implementation of memcpy that doesn't work hard at aligning the src and destination buffers. This is somewhat of a proof of concept for the syscall itself, but I do find that in my goofy memcpy test [1], the unaligned memcpy performed at least as well as the generic C version. This is however on Qemu on an M1 mac, so not a test of any real hardware (more a smoke test that the implementation isn't silly). v3 of the Linux series can be found at [2]. [1] https://pastebin.com/Nj8ixpkX [2] https://lore.kernel.org/lkml/20230221190858.3159617-1-evan@rivosinc.com/T/#t Changes in v2: - hwprobe.h: Use __has_include and duplicate Linux content to make compilation work when Linux headers are absent (Adhemerval) - hwprobe.h: Put declaration under __USE_GNU (Adhemerval) - Use INLINE_SYSCALL_CALL (Adhemerval) - Update versions - Update UNALIGNED_MASK to match kernel v3 series. - Add vDSO interface - Used _MASK instead of _FAST value itself. Evan Green (3): riscv: Add Linux hwprobe syscall support riscv: Add hwprobe vdso call support riscv: Add and use alignment-ignorant memcpy sysdeps/riscv/memcopy.h | 28 +++++ sysdeps/riscv/memcpy.c | 65 +++++++++++ sysdeps/riscv/memcpy_noalignment.S | 103 ++++++++++++++++++ sysdeps/unix/sysv/linux/dl-vdso-setup.c | 10 ++ sysdeps/unix/sysv/linux/dl-vdso-setup.h | 3 + sysdeps/unix/sysv/linux/riscv/Makefile | 8 +- sysdeps/unix/sysv/linux/riscv/Versions | 3 + sysdeps/unix/sysv/linux/riscv/hwprobe.c | 36 ++++++ .../unix/sysv/linux/riscv/memcpy-generic.c | 24 ++++ .../unix/sysv/linux/riscv/rv32/arch-syscall.h | 1 + .../unix/sysv/linux/riscv/rv32/libc.abilist | 1 + .../unix/sysv/linux/riscv/rv64/arch-syscall.h | 1 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 1 + sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 67 ++++++++++++ sysdeps/unix/sysv/linux/riscv/sysdep.h | 1 + sysdeps/unix/sysv/linux/syscall-names.list | 1 + 16 files changed, 351 insertions(+), 2 deletions(-) create mode 100644 sysdeps/riscv/memcopy.h create mode 100644 sysdeps/riscv/memcpy.c create mode 100644 sysdeps/riscv/memcpy_noalignment.S create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h -- 2.25.1 ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 1/3] riscv: Add Linux hwprobe syscall support 2023-02-21 19:15 [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green @ 2023-02-21 19:15 ` Evan Green 2023-03-29 18:38 ` Adhemerval Zanella Netto 2023-02-21 19:15 ` [PATCH v2 2/3] riscv: Add hwprobe vdso call support Evan Green ` (2 subsequent siblings) 3 siblings, 1 reply; 27+ messages in thread From: Evan Green @ 2023-02-21 19:15 UTC (permalink / raw) To: libc-alpha; +Cc: palmer, slewis, vineetg, Evan Green Add awareness and a thin wrapper function around a new Linux system call that allows callers to get architecture and microarchitecture information about the CPUs from the kernel. This can be used to do things like dynamically choose a memcpy implementation. Signed-off-by: Evan Green <evan@rivosinc.com> --- Changes in v2: - hwprobe.h: Use __has_include and duplicate Linux content to make compilation work when Linux headers are absent (Adhemerval) - hwprobe.h: Put declaration under __USE_GNU (Adhemerval) - Use INLINE_SYSCALL_CALL (Adhemerval) - Update versions - Update UNALIGNED_MASK to match kernel v3 series. sysdeps/unix/sysv/linux/riscv/Makefile | 4 +- sysdeps/unix/sysv/linux/riscv/Versions | 3 + sysdeps/unix/sysv/linux/riscv/hwprobe.c | 30 +++++++++ .../unix/sysv/linux/riscv/rv32/arch-syscall.h | 1 + .../unix/sysv/linux/riscv/rv32/libc.abilist | 1 + .../unix/sysv/linux/riscv/rv64/arch-syscall.h | 1 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 1 + sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 67 +++++++++++++++++++ sysdeps/unix/sysv/linux/syscall-names.list | 1 + 9 files changed, 107 insertions(+), 2 deletions(-) create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile index 4b6eacb32f..45cc29e40d 100644 --- a/sysdeps/unix/sysv/linux/riscv/Makefile +++ b/sysdeps/unix/sysv/linux/riscv/Makefile @@ -1,6 +1,6 @@ ifeq ($(subdir),misc) -sysdep_headers += sys/cachectl.h -sysdep_routines += flush-icache +sysdep_headers += sys/cachectl.h sys/hwprobe.h +sysdep_routines += flush-icache hwprobe endif ifeq ($(subdir),stdlib) diff --git a/sysdeps/unix/sysv/linux/riscv/Versions b/sysdeps/unix/sysv/linux/riscv/Versions index 5625d2a0b8..8717b62a4a 100644 --- a/sysdeps/unix/sysv/linux/riscv/Versions +++ b/sysdeps/unix/sysv/linux/riscv/Versions @@ -8,4 +8,7 @@ libc { GLIBC_2.27 { __riscv_flush_icache; } + GLIBC_2.39 { + __riscv_hwprobe; + } } diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c new file mode 100644 index 0000000000..74f68889ca --- /dev/null +++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c @@ -0,0 +1,30 @@ +/* RISC-V hardware feature probing support on Linux + Copyright (C) 2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <sys/syscall.h> +#include <sys/hwprobe.h> +#include <sysdep.h> + +int +__riscv_hwprobe (struct riscv_hwprobe *pairs, long pair_count, + long cpu_count, unsigned long *cpus, unsigned long flags) +{ + return INLINE_SYSCALL_CALL (riscv_hwprobe, pairs, pair_count, + cpu_count, cpus, flags); +} diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h b/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h index 202520ee25..2416e041c8 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h +++ b/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h @@ -198,6 +198,7 @@ #define __NR_request_key 218 #define __NR_restart_syscall 128 #define __NR_riscv_flush_icache 259 +#define __NR_riscv_hwprobe 258 #define __NR_rseq 293 #define __NR_rt_sigaction 134 #define __NR_rt_sigpending 136 diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist index 29be561b60..83b7932db7 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist @@ -2416,3 +2416,4 @@ GLIBC_2.38 __isoc23_wcstoul_l F GLIBC_2.38 __isoc23_wcstoull F GLIBC_2.38 __isoc23_wcstoull_l F GLIBC_2.38 __isoc23_wcstoumax F +GLIBC_2.39 __riscv_hwprobe F diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/arch-syscall.h b/sysdeps/unix/sysv/linux/riscv/rv64/arch-syscall.h index 4e65f337d4..a32bc82f60 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv64/arch-syscall.h +++ b/sysdeps/unix/sysv/linux/riscv/rv64/arch-syscall.h @@ -205,6 +205,7 @@ #define __NR_request_key 218 #define __NR_restart_syscall 128 #define __NR_riscv_flush_icache 259 +#define __NR_riscv_hwprobe 258 #define __NR_rseq 293 #define __NR_rt_sigaction 134 #define __NR_rt_sigpending 136 diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist index 506a4e6a65..6ddbcfb131 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist @@ -2616,3 +2616,4 @@ GLIBC_2.38 __isoc23_wcstoul_l F GLIBC_2.38 __isoc23_wcstoull F GLIBC_2.38 __isoc23_wcstoull_l F GLIBC_2.38 __isoc23_wcstoumax F +GLIBC_2.39 __riscv_hwprobe F diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h new file mode 100644 index 0000000000..e619ea43b8 --- /dev/null +++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h @@ -0,0 +1,67 @@ +/* RISC-V architecture probe interface + Copyright (C) 2023 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + <https://www.gnu.org/licenses/>. */ + +#ifndef _SYS_HWPROBE_H +#define _SYS_HWPROBE_H 1 + +#include <features.h> +#ifdef __has_include +# if __has_include (<asm/hwprobe.h>) +# include <asm/hwprobe.h> +# endif +#endif + +/* + * Define a (probably stale) version of the interface if the Linux headers + * aren't present. + */ +#ifndef RISCV_HWPROBE_KEY_MVENDORID +struct riscv_hwprobe { + signed long long key; + unsigned long long value; +}; + +#define RISCV_HWPROBE_KEY_MVENDORID 0 +#define RISCV_HWPROBE_KEY_MARCHID 1 +#define RISCV_HWPROBE_KEY_MIMPID 2 +#define RISCV_HWPROBE_KEY_BASE_BEHAVIOR 3 +#define RISCV_HWPROBE_BASE_BEHAVIOR_IMA (1 << 0) +#define RISCV_HWPROBE_KEY_IMA_EXT_0 4 +#define RISCV_HWPROBE_IMA_FD (1 << 0) +#define RISCV_HWPROBE_IMA_C (1 << 1) +#define RISCV_HWPROBE_KEY_CPUPERF_0 5 +#define RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0) +#define RISCV_HWPROBE_MISALIGNED_EMULATED (1 << 0) +#define RISCV_HWPROBE_MISALIGNED_SLOW (2 << 0) +#define RISCV_HWPROBE_MISALIGNED_FAST (3 << 0) +#define RISCV_HWPROBE_MISALIGNED_MASK (7 << 0) + +#endif // RISCV_HWPROBE_KEY_MVENDORID + +__BEGIN_DECLS + +#ifdef __USE_GNU +int +__riscv_hwprobe (struct riscv_hwprobe *pairs, long pair_count, + long cpu_count, unsigned long *cpus, unsigned long flags); +#endif + +__END_DECLS + +#endif /* sys/hwprobe.h */ diff --git a/sysdeps/unix/sysv/linux/syscall-names.list b/sysdeps/unix/sysv/linux/syscall-names.list index 822498d3e3..4f4a62e91c 100644 --- a/sysdeps/unix/sysv/linux/syscall-names.list +++ b/sysdeps/unix/sysv/linux/syscall-names.list @@ -477,6 +477,7 @@ renameat2 request_key restart_syscall riscv_flush_icache +riscv_hwprobe rmdir rseq rt_sigaction -- 2.25.1 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/3] riscv: Add Linux hwprobe syscall support 2023-02-21 19:15 ` [PATCH v2 1/3] riscv: Add Linux hwprobe syscall support Evan Green @ 2023-03-29 18:38 ` Adhemerval Zanella Netto 0 siblings, 0 replies; 27+ messages in thread From: Adhemerval Zanella Netto @ 2023-03-29 18:38 UTC (permalink / raw) To: libc-alpha On 21/02/23 16:15, Evan Green wrote: > Add awareness and a thin wrapper function around a new Linux system call > that allows callers to get architecture and microarchitecture > information about the CPUs from the kernel. This can be used to > do things like dynamically choose a memcpy implementation. > > Signed-off-by: Evan Green <evan@rivosinc.com> > --- > > Changes in v2: > - hwprobe.h: Use __has_include and duplicate Linux content to make > compilation work when Linux headers are absent (Adhemerval) > - hwprobe.h: Put declaration under __USE_GNU (Adhemerval) > - Use INLINE_SYSCALL_CALL (Adhemerval) > - Update versions > - Update UNALIGNED_MASK to match kernel v3 series. > > sysdeps/unix/sysv/linux/riscv/Makefile | 4 +- > sysdeps/unix/sysv/linux/riscv/Versions | 3 + > sysdeps/unix/sysv/linux/riscv/hwprobe.c | 30 +++++++++ > .../unix/sysv/linux/riscv/rv32/arch-syscall.h | 1 + > .../unix/sysv/linux/riscv/rv32/libc.abilist | 1 + > .../unix/sysv/linux/riscv/rv64/arch-syscall.h | 1 + > .../unix/sysv/linux/riscv/rv64/libc.abilist | 1 + > sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 67 +++++++++++++++++++ > sysdeps/unix/sysv/linux/syscall-names.list | 1 + > 9 files changed, 107 insertions(+), 2 deletions(-) > create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h > > diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile > index 4b6eacb32f..45cc29e40d 100644 > --- a/sysdeps/unix/sysv/linux/riscv/Makefile > +++ b/sysdeps/unix/sysv/linux/riscv/Makefile > @@ -1,6 +1,6 @@ > ifeq ($(subdir),misc) > -sysdep_headers += sys/cachectl.h > -sysdep_routines += flush-icache > +sysdep_headers += sys/cachectl.h sys/hwprobe.h > +sysdep_routines += flush-icache hwprobe > endif > > ifeq ($(subdir),stdlib) > diff --git a/sysdeps/unix/sysv/linux/riscv/Versions b/sysdeps/unix/sysv/linux/riscv/Versions > index 5625d2a0b8..8717b62a4a 100644 > --- a/sysdeps/unix/sysv/linux/riscv/Versions > +++ b/sysdeps/unix/sysv/linux/riscv/Versions > @@ -8,4 +8,7 @@ libc { > GLIBC_2.27 { > __riscv_flush_icache; > } > + GLIBC_2.39 { > + __riscv_hwprobe; > + } > } > diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c > new file mode 100644 > index 0000000000..74f68889ca > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c > @@ -0,0 +1,30 @@ > +/* RISC-V hardware feature probing support on Linux > + Copyright (C) 2023 Free Software Foundation, Inc. > + > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public License as > + published by the Free Software Foundation; either version 2.1 of the > + License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <https://www.gnu.org/licenses/>. */ > + > +#include <sys/syscall.h> > +#include <sys/hwprobe.h> > +#include <sysdep.h> > + > +int > +__riscv_hwprobe (struct riscv_hwprobe *pairs, long pair_count, > + long cpu_count, unsigned long *cpus, unsigned long flags) > +{ > + return INLINE_SYSCALL_CALL (riscv_hwprobe, pairs, pair_count, > + cpu_count, cpus, flags); > +} > diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h b/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h > index 202520ee25..2416e041c8 100644 > --- a/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h > +++ b/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h > @@ -198,6 +198,7 @@ > #define __NR_request_key 218 > #define __NR_restart_syscall 128 > #define __NR_riscv_flush_icache 259 > +#define __NR_riscv_hwprobe 258 > #define __NR_rseq 293 > #define __NR_rt_sigaction 134 > #define __NR_rt_sigpending 136 > diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist > index 29be561b60..83b7932db7 100644 > --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist > +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist > @@ -2416,3 +2416,4 @@ GLIBC_2.38 __isoc23_wcstoul_l F > GLIBC_2.38 __isoc23_wcstoull F > GLIBC_2.38 __isoc23_wcstoull_l F > GLIBC_2.38 __isoc23_wcstoumax F > +GLIBC_2.39 __riscv_hwprobe F > diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/arch-syscall.h b/sysdeps/unix/sysv/linux/riscv/rv64/arch-syscall.h > index 4e65f337d4..a32bc82f60 100644 > --- a/sysdeps/unix/sysv/linux/riscv/rv64/arch-syscall.h > +++ b/sysdeps/unix/sysv/linux/riscv/rv64/arch-syscall.h > @@ -205,6 +205,7 @@ > #define __NR_request_key 218 > #define __NR_restart_syscall 128 > #define __NR_riscv_flush_icache 259 > +#define __NR_riscv_hwprobe 258 > #define __NR_rseq 293 > #define __NR_rt_sigaction 134 > #define __NR_rt_sigpending 136 > diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist > index 506a4e6a65..6ddbcfb131 100644 > --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist > +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist > @@ -2616,3 +2616,4 @@ GLIBC_2.38 __isoc23_wcstoul_l F > GLIBC_2.38 __isoc23_wcstoull F > GLIBC_2.38 __isoc23_wcstoull_l F > GLIBC_2.38 __isoc23_wcstoumax F > +GLIBC_2.39 __riscv_hwprobe F > diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h > new file mode 100644 > index 0000000000..e619ea43b8 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h > @@ -0,0 +1,67 @@ > +/* RISC-V architecture probe interface > + Copyright (C) 2023 Free Software Foundation, Inc. > + > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library. If not, see > + <https://www.gnu.org/licenses/>. */ > + > +#ifndef _SYS_HWPROBE_H > +#define _SYS_HWPROBE_H 1 > + > +#include <features.h> > +#ifdef __has_include > +# if __has_include (<asm/hwprobe.h>) > +# include <asm/hwprobe.h> > +# endif > +#endif > + > +/* > + * Define a (probably stale) version of the interface if the Linux headers > + * aren't present. > + */ > +#ifndef RISCV_HWPROBE_KEY_MVENDORID > +struct riscv_hwprobe { > + signed long long key; > + unsigned long long value; > +}; Indentation seems ok here, I think curly brackets should be in the next line. > + > +#define RISCV_HWPROBE_KEY_MVENDORID 0 > +#define RISCV_HWPROBE_KEY_MARCHID 1 > +#define RISCV_HWPROBE_KEY_MIMPID 2 > +#define RISCV_HWPROBE_KEY_BASE_BEHAVIOR 3 > +#define RISCV_HWPROBE_BASE_BEHAVIOR_IMA (1 << 0) > +#define RISCV_HWPROBE_KEY_IMA_EXT_0 4 > +#define RISCV_HWPROBE_IMA_FD (1 << 0) > +#define RISCV_HWPROBE_IMA_C (1 << 1) > +#define RISCV_HWPROBE_KEY_CPUPERF_0 5 > +#define RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0) > +#define RISCV_HWPROBE_MISALIGNED_EMULATED (1 << 0) > +#define RISCV_HWPROBE_MISALIGNED_SLOW (2 << 0) > +#define RISCV_HWPROBE_MISALIGNED_FAST (3 << 0) > +#define RISCV_HWPROBE_MISALIGNED_MASK (7 << 0) Indentation seems off here. > + > +#endif // RISCV_HWPROBE_KEY_MVENDORID No C99/C++ style comments in public headers. > + > +__BEGIN_DECLS > + Could you add some minimal comment on the how to use this interface, the exects parameters, etc? > +#ifdef __USE_GNU > +int > +__riscv_hwprobe (struct riscv_hwprobe *pairs, long pair_count, > + long cpu_count, unsigned long *cpus, unsigned long flags); Add 'extern' (for old -std= modes) and put the function on same line as return type. > +#endif > + > +__END_DECLS > + > +#endif /* sys/hwprobe.h */ > diff --git a/sysdeps/unix/sysv/linux/syscall-names.list b/sysdeps/unix/sysv/linux/syscall-names.list > index 822498d3e3..4f4a62e91c 100644 > --- a/sysdeps/unix/sysv/linux/syscall-names.list > +++ b/sysdeps/unix/sysv/linux/syscall-names.list > @@ -477,6 +477,7 @@ renameat2 > request_key > restart_syscall > riscv_flush_icache > +riscv_hwprobe > rmdir > rseq > rt_sigaction ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 2/3] riscv: Add hwprobe vdso call support 2023-02-21 19:15 [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green 2023-02-21 19:15 ` [PATCH v2 1/3] riscv: Add Linux hwprobe syscall support Evan Green @ 2023-02-21 19:15 ` Evan Green 2023-03-29 18:39 ` Adhemerval Zanella Netto 2023-02-21 19:15 ` [PATCH v2 3/3] riscv: Add and use alignment-ignorant memcpy Evan Green 2023-03-28 22:54 ` [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Palmer Dabbelt 3 siblings, 1 reply; 27+ messages in thread From: Evan Green @ 2023-02-21 19:15 UTC (permalink / raw) To: libc-alpha; +Cc: palmer, slewis, vineetg, Evan Green The new riscv_hwprobe syscall also comes with a vDSO for faster answers to your most common questions. Call in today to speak with a kernel representative near you! Signed-off-by: Evan Green <evan@rivosinc.com> --- Changes in v2: - Add vDSO interface sysdeps/unix/sysv/linux/dl-vdso-setup.c | 10 ++++++++++ sysdeps/unix/sysv/linux/dl-vdso-setup.h | 3 +++ sysdeps/unix/sysv/linux/riscv/hwprobe.c | 6 ++++++ sysdeps/unix/sysv/linux/riscv/sysdep.h | 1 + 4 files changed, 20 insertions(+) diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.c b/sysdeps/unix/sysv/linux/dl-vdso-setup.c index 68fa8de641..3fe304a0c7 100644 --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.c +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.c @@ -71,6 +71,16 @@ PROCINFO_CLASS int (*_dl_vdso_clock_getres_time64) (clockid_t, # ifdef HAVE_GET_TBFREQ PROCINFO_CLASS uint64_t (*_dl_vdso_get_tbfreq)(void) RELRO; # endif + +/* RISC-V specific ones. */ +# ifdef HAVE_RISCV_HWPROBE +PROCINFO_CLASS int (*_dl_vdso_riscv_hwprobe)(void *, + long, + long, + unsigned long *, + long) RELRO; +# endif + #endif #undef RELRO diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.h b/sysdeps/unix/sysv/linux/dl-vdso-setup.h index 867072b897..39eafd5316 100644 --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.h +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.h @@ -47,6 +47,9 @@ setup_vdso_pointers (void) #ifdef HAVE_GET_TBFREQ GLRO(dl_vdso_get_tbfreq) = dl_vdso_vsym (HAVE_GET_TBFREQ); #endif +#ifdef HAVE_RISCV_HWPROBE + GLRO(dl_vdso_riscv_hwprobe) = dl_vdso_vsym (HAVE_RISCV_HWPROBE); +#endif } #endif diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c index 74f68889ca..2c61a67db7 100644 --- a/sysdeps/unix/sysv/linux/riscv/hwprobe.c +++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c @@ -20,11 +20,17 @@ #include <sys/syscall.h> #include <sys/hwprobe.h> #include <sysdep.h> +#include <sysdep-vdso.h> int __riscv_hwprobe (struct riscv_hwprobe *pairs, long pair_count, long cpu_count, unsigned long *cpus, unsigned long flags) { + /* The vDSO may be able to provide the answer without a syscall. */ +#ifdef HAVE_RISCV_HWPROBE + INLINE_VSYSCALL(riscv_hwprobe, 5, pairs, pair_count, cpu_count, cpus, flags); +#else return INLINE_SYSCALL_CALL (riscv_hwprobe, pairs, pair_count, cpu_count, cpus, flags); +#endif } diff --git a/sysdeps/unix/sysv/linux/riscv/sysdep.h b/sysdeps/unix/sysv/linux/riscv/sysdep.h index 4af5fe5dbc..ba17aaaff2 100644 --- a/sysdeps/unix/sysv/linux/riscv/sysdep.h +++ b/sysdeps/unix/sysv/linux/riscv/sysdep.h @@ -155,6 +155,7 @@ /* List of system calls which are supported as vsyscalls (for RV32 and RV64). */ # define HAVE_GETCPU_VSYSCALL "__vdso_getcpu" +# define HAVE_RISCV_HWPROBE "__vdso_riscv_hwprobe" # undef HAVE_INTERNAL_BRK_ADDR_SYMBOL # define HAVE_INTERNAL_BRK_ADDR_SYMBOL 1 -- 2.25.1 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 2/3] riscv: Add hwprobe vdso call support 2023-02-21 19:15 ` [PATCH v2 2/3] riscv: Add hwprobe vdso call support Evan Green @ 2023-03-29 18:39 ` Adhemerval Zanella Netto 0 siblings, 0 replies; 27+ messages in thread From: Adhemerval Zanella Netto @ 2023-03-29 18:39 UTC (permalink / raw) To: Evan Green, libc-alpha; +Cc: palmer, slewis, vineetg On 21/02/23 16:15, Evan Green wrote: > The new riscv_hwprobe syscall also comes with a vDSO for faster answers > to your most common questions. Call in today to speak with a kernel > representative near you! > > Signed-off-by: Evan Green <evan@rivosinc.com> > --- > > Changes in v2: > - Add vDSO interface > > sysdeps/unix/sysv/linux/dl-vdso-setup.c | 10 ++++++++++ > sysdeps/unix/sysv/linux/dl-vdso-setup.h | 3 +++ > sysdeps/unix/sysv/linux/riscv/hwprobe.c | 6 ++++++ > sysdeps/unix/sysv/linux/riscv/sysdep.h | 1 + > 4 files changed, 20 insertions(+) > > diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.c b/sysdeps/unix/sysv/linux/dl-vdso-setup.c > index 68fa8de641..3fe304a0c7 100644 > --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.c > +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.c > @@ -71,6 +71,16 @@ PROCINFO_CLASS int (*_dl_vdso_clock_getres_time64) (clockid_t, > # ifdef HAVE_GET_TBFREQ > PROCINFO_CLASS uint64_t (*_dl_vdso_get_tbfreq)(void) RELRO; > # endif > + > +/* RISC-V specific ones. */ > +# ifdef HAVE_RISCV_HWPROBE > +PROCINFO_CLASS int (*_dl_vdso_riscv_hwprobe)(void *, > + long, > + long, > + unsigned long *, > + long) RELRO; > +# endif > + > #endif > > #undef RELRO > diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.h b/sysdeps/unix/sysv/linux/dl-vdso-setup.h > index 867072b897..39eafd5316 100644 > --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.h > +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.h > @@ -47,6 +47,9 @@ setup_vdso_pointers (void) > #ifdef HAVE_GET_TBFREQ > GLRO(dl_vdso_get_tbfreq) = dl_vdso_vsym (HAVE_GET_TBFREQ); > #endif > +#ifdef HAVE_RISCV_HWPROBE > + GLRO(dl_vdso_riscv_hwprobe) = dl_vdso_vsym (HAVE_RISCV_HWPROBE); > +#endif > } > > #endif > diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c > index 74f68889ca..2c61a67db7 100644 > --- a/sysdeps/unix/sysv/linux/riscv/hwprobe.c > +++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c > @@ -20,11 +20,17 @@ > #include <sys/syscall.h> > #include <sys/hwprobe.h> > #include <sysdep.h> > +#include <sysdep-vdso.h> > > int > __riscv_hwprobe (struct riscv_hwprobe *pairs, long pair_count, > long cpu_count, unsigned long *cpus, unsigned long flags) > { > + /* The vDSO may be able to provide the answer without a syscall. */ > +#ifdef HAVE_RISCV_HWPROBE > + INLINE_VSYSCALL(riscv_hwprobe, 5, pairs, pair_count, cpu_count, cpus, flags); > +#else > return INLINE_SYSCALL_CALL (riscv_hwprobe, pairs, pair_count, > cpu_count, cpus, flags); > +#endif > } The HAVE_RISCV_HWPROBE is always defined for RISCV, so there is no need to use the fallback (INLINE_VSYSCALL already issues the syscall if the dl_vdso_get_tbfreq is NULL). > diff --git a/sysdeps/unix/sysv/linux/riscv/sysdep.h b/sysdeps/unix/sysv/linux/riscv/sysdep.h > index 4af5fe5dbc..ba17aaaff2 100644 > --- a/sysdeps/unix/sysv/linux/riscv/sysdep.h > +++ b/sysdeps/unix/sysv/linux/riscv/sysdep.h > @@ -155,6 +155,7 @@ > /* List of system calls which are supported as vsyscalls (for RV32 and > RV64). */ > # define HAVE_GETCPU_VSYSCALL "__vdso_getcpu" > +# define HAVE_RISCV_HWPROBE "__vdso_riscv_hwprobe" > > # undef HAVE_INTERNAL_BRK_ADDR_SYMBOL > # define HAVE_INTERNAL_BRK_ADDR_SYMBOL 1 ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 3/3] riscv: Add and use alignment-ignorant memcpy 2023-02-21 19:15 [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green 2023-02-21 19:15 ` [PATCH v2 1/3] riscv: Add Linux hwprobe syscall support Evan Green 2023-02-21 19:15 ` [PATCH v2 2/3] riscv: Add hwprobe vdso call support Evan Green @ 2023-02-21 19:15 ` Evan Green 2023-03-28 22:54 ` [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Palmer Dabbelt 3 siblings, 0 replies; 27+ messages in thread From: Evan Green @ 2023-02-21 19:15 UTC (permalink / raw) To: libc-alpha; +Cc: palmer, slewis, vineetg, Evan Green For CPU implementations that can perform unaligned accesses with little or no performance penalty, create a memcpy implementation that does not bother aligning buffers. It will use a block of integer registers, a single integer register, and fall back to bytewise copy for the remainder. Signed-off-by: Evan Green <evan@rivosinc.com> --- Changes in v2: - Used _MASK instead of _FAST value itself. --- sysdeps/riscv/memcopy.h | 28 +++++ sysdeps/riscv/memcpy.c | 65 +++++++++++ sysdeps/riscv/memcpy_noalignment.S | 103 ++++++++++++++++++ sysdeps/unix/sysv/linux/riscv/Makefile | 4 + .../unix/sysv/linux/riscv/memcpy-generic.c | 24 ++++ 5 files changed, 224 insertions(+) create mode 100644 sysdeps/riscv/memcopy.h create mode 100644 sysdeps/riscv/memcpy.c create mode 100644 sysdeps/riscv/memcpy_noalignment.S create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c diff --git a/sysdeps/riscv/memcopy.h b/sysdeps/riscv/memcopy.h new file mode 100644 index 0000000000..21f6081b5f --- /dev/null +++ b/sysdeps/riscv/memcopy.h @@ -0,0 +1,28 @@ +/* memcopy.h -- definitions for memory copy functions. RISC-V version. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <sysdeps/generic/memcopy.h> + +/* + * Redefine the generic memcpy implementation to __memcpy_generic, so + * the memcpy ifunc can select between generic and special versions. + * In rtld, don't bother with all the ifunciness. + */ +#if IS_IN (libc) +#define MEMCPY __memcpy_generic +#endif diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/riscv/memcpy.c new file mode 100644 index 0000000000..9a72a487da --- /dev/null +++ b/sysdeps/riscv/memcpy.c @@ -0,0 +1,65 @@ +/* Multiple versions of memcpy. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2017-2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#if IS_IN (libc) +/* Redefine memcpy so that the compiler won't complain about the type + mismatch with the IFUNC selector in strong_alias, below. */ +# undef memcpy +# define memcpy __redirect_memcpy +# include <string.h> +#include <ifunc-init.h> +#include <sys/hwprobe.h> + +#define INIT_ARCH() + +extern __typeof (__redirect_memcpy) __libc_memcpy; + +extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden; +extern __typeof (__redirect_memcpy) __memcpy_noalignment attribute_hidden; + +static inline __typeof (__redirect_memcpy) * +select_memcpy_ifunc (void) +{ + INIT_ARCH (); + + struct riscv_hwprobe pair; + + pair.key = RISCV_HWPROBE_KEY_CPUPERF_0; + if (__riscv_hwprobe(&pair, 1, 0, NULL, 0) != 0) + return __memcpy_generic; + + if ((pair.key > 0) && + (pair.value & RISCV_HWPROBE_MISALIGNED_MASK) == + RISCV_HWPROBE_MISALIGNED_FAST) + return __memcpy_noalignment; + + return __memcpy_generic; +} + +libc_ifunc (__libc_memcpy, select_memcpy_ifunc ()); + +# undef memcpy +strong_alias (__libc_memcpy, memcpy); +# ifdef SHARED +__hidden_ver1 (memcpy, __GI_memcpy, __redirect_memcpy) + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memcpy); +# endif + +#endif + diff --git a/sysdeps/riscv/memcpy_noalignment.S b/sysdeps/riscv/memcpy_noalignment.S new file mode 100644 index 0000000000..fe1d9213c4 --- /dev/null +++ b/sysdeps/riscv/memcpy_noalignment.S @@ -0,0 +1,103 @@ +/* memcpy for RISC-V, ignoring buffer alignment + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + <https://www.gnu.org/licenses/>. */ + +#include <sysdep.h> +#include <sys/asm.h> + +/* void *memcpy(void *, const void *, size_t) */ +ENTRY (__memcpy_noalignment) + move t6, a0 /* Preserve return value */ + + /* Round down to the nearest "page" size */ + andi a4, a2, ~((16*SZREG)-1) + beqz a4, 2f + add a3, a1, a4 +1: + /* Copy "pages" (chunks of 16 registers) */ + REG_L a4, 0(a1) + REG_L a5, SZREG(a1) + REG_L a6, 2*SZREG(a1) + REG_L a7, 3*SZREG(a1) + REG_L t0, 4*SZREG(a1) + REG_L t1, 5*SZREG(a1) + REG_L t2, 6*SZREG(a1) + REG_L t3, 7*SZREG(a1) + REG_L t4, 8*SZREG(a1) + REG_L t5, 9*SZREG(a1) + REG_S a4, 0(t6) + REG_S a5, SZREG(t6) + REG_S a6, 2*SZREG(t6) + REG_S a7, 3*SZREG(t6) + REG_S t0, 4*SZREG(t6) + REG_S t1, 5*SZREG(t6) + REG_S t2, 6*SZREG(t6) + REG_S t3, 7*SZREG(t6) + REG_S t4, 8*SZREG(t6) + REG_S t5, 9*SZREG(t6) + REG_L a4, 10*SZREG(a1) + REG_L a5, 11*SZREG(a1) + REG_L a6, 12*SZREG(a1) + REG_L a7, 13*SZREG(a1) + REG_L t0, 14*SZREG(a1) + REG_L t1, 15*SZREG(a1) + addi a1, a1, 16*SZREG + REG_S a4, 10*SZREG(t6) + REG_S a5, 11*SZREG(t6) + REG_S a6, 12*SZREG(t6) + REG_S a7, 13*SZREG(t6) + REG_S t0, 14*SZREG(t6) + REG_S t1, 15*SZREG(t6) + addi t6, t6, 16*SZREG + bltu a1, a3, 1b + andi a2, a2, (16*SZREG)-1 /* Update count */ + +2: + /* Remainder is smaller than a page, compute native word count */ + beqz a2, 6f + andi a5, a2, ~(SZREG-1) + andi a2, a2, (SZREG-1) + add a3, a1, a5 + /* Jump directly to byte copy if no words. */ + beqz a5, 4f + +3: + /* Use single native register copy */ + REG_L a4, 0(a1) + addi a1, a1, SZREG + REG_S a4, 0(t6) + addi t6, t6, SZREG + bltu a1, a3, 3b + + /* Jump directly out if no more bytes */ + beqz a2, 6f + +4: + /* Copy the last few individual bytes */ + add a3, a1, a2 +5: + lb a4, 0(a1) + addi a1, a1, 1 + sb a4, 0(t6) + addi t6, t6, 1 + bltu a1, a3, 5b +6: + ret + +END (__memcpy_noalignment) + +hidden_def (__memcpy_noalignment) diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile index 45cc29e40d..aa9ea443d6 100644 --- a/sysdeps/unix/sysv/linux/riscv/Makefile +++ b/sysdeps/unix/sysv/linux/riscv/Makefile @@ -7,6 +7,10 @@ ifeq ($(subdir),stdlib) gen-as-const-headers += ucontext_i.sym endif +ifeq ($(subdir),string) +sysdep_routines += memcpy memcpy-generic memcpy_noalignment +endif + abi-variants := ilp32 ilp32d lp64 lp64d ifeq (,$(filter $(default-abi),$(abi-variants))) diff --git a/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c new file mode 100644 index 0000000000..0abe03f7f5 --- /dev/null +++ b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c @@ -0,0 +1,24 @@ +/* Re-include the default memcpy implementation. + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <https://www.gnu.org/licenses/>. */ + +#include <string.h> + +extern __typeof (memcpy) __memcpy_generic; +hidden_proto(__memcpy_generic) + +#include <string/memcpy.c> -- 2.25.1 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-02-21 19:15 [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green ` (2 preceding siblings ...) 2023-02-21 19:15 ` [PATCH v2 3/3] riscv: Add and use alignment-ignorant memcpy Evan Green @ 2023-03-28 22:54 ` Palmer Dabbelt 2023-03-28 23:41 ` Adhemerval Zanella Netto 3 siblings, 1 reply; 27+ messages in thread From: Palmer Dabbelt @ 2023-03-28 22:54 UTC (permalink / raw) To: Evan Green; +Cc: libc-alpha, slewis, Vineet Gupta, Evan Green On Tue, 21 Feb 2023 11:15:34 PST (-0800), Evan Green wrote: > > This series illustrates the use of a proposed Linux syscall that > enumerates architectural information about the RISC-V cores the system > is running on. In this series we expose a small wrapper function around > the syscall. An ifunc selector for memcpy queries it to see if unaligned > access is "fast" on this hardware. If it is, it selects a newly provided > implementation of memcpy that doesn't work hard at aligning the src and > destination buffers. > > This is somewhat of a proof of concept for the syscall itself, but I do > find that in my goofy memcpy test [1], the unaligned memcpy performed at > least as well as the generic C version. This is however on Qemu on an M1 > mac, so not a test of any real hardware (more a smoke test that the > implementation isn't silly). QEMU isn't a good enough benchmark to justify a new memcpy routine in glibc. Evan has a D1, which does support misaligned access and runs some simple benchmarks faster. There's also been some minor changes to the Linux side of things that warrant a v3 anyway, so he'll just post some benchmarks on HW along with that. Aside from those comments, Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> There's a lot more stuff to probe for, but I think we've got enough of a proof of concept for the hwprobe stuff that we can move forward with the core interface bits in Linux/glibc and then unleash the chaos... Unless anyone else has comments? > v3 of the Linux series can be found at [2]. > > [1] https://pastebin.com/Nj8ixpkX > [2] https://lore.kernel.org/lkml/20230221190858.3159617-1-evan@rivosinc.com/T/#t > > Changes in v2: > - hwprobe.h: Use __has_include and duplicate Linux content to make > compilation work when Linux headers are absent (Adhemerval) > - hwprobe.h: Put declaration under __USE_GNU (Adhemerval) > - Use INLINE_SYSCALL_CALL (Adhemerval) > - Update versions > - Update UNALIGNED_MASK to match kernel v3 series. > - Add vDSO interface > - Used _MASK instead of _FAST value itself. > > Evan Green (3): > riscv: Add Linux hwprobe syscall support > riscv: Add hwprobe vdso call support > riscv: Add and use alignment-ignorant memcpy > > sysdeps/riscv/memcopy.h | 28 +++++ > sysdeps/riscv/memcpy.c | 65 +++++++++++ > sysdeps/riscv/memcpy_noalignment.S | 103 ++++++++++++++++++ > sysdeps/unix/sysv/linux/dl-vdso-setup.c | 10 ++ > sysdeps/unix/sysv/linux/dl-vdso-setup.h | 3 + > sysdeps/unix/sysv/linux/riscv/Makefile | 8 +- > sysdeps/unix/sysv/linux/riscv/Versions | 3 + > sysdeps/unix/sysv/linux/riscv/hwprobe.c | 36 ++++++ > .../unix/sysv/linux/riscv/memcpy-generic.c | 24 ++++ > .../unix/sysv/linux/riscv/rv32/arch-syscall.h | 1 + > .../unix/sysv/linux/riscv/rv32/libc.abilist | 1 + > .../unix/sysv/linux/riscv/rv64/arch-syscall.h | 1 + > .../unix/sysv/linux/riscv/rv64/libc.abilist | 1 + > sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 67 ++++++++++++ > sysdeps/unix/sysv/linux/riscv/sysdep.h | 1 + > sysdeps/unix/sysv/linux/syscall-names.list | 1 + > 16 files changed, 351 insertions(+), 2 deletions(-) > create mode 100644 sysdeps/riscv/memcopy.h > create mode 100644 sysdeps/riscv/memcpy.c > create mode 100644 sysdeps/riscv/memcpy_noalignment.S > create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h Thanks! ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-28 22:54 ` [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Palmer Dabbelt @ 2023-03-28 23:41 ` Adhemerval Zanella Netto 2023-03-29 0:01 ` Palmer Dabbelt 0 siblings, 1 reply; 27+ messages in thread From: Adhemerval Zanella Netto @ 2023-03-28 23:41 UTC (permalink / raw) To: Palmer Dabbelt, Evan Green; +Cc: libc-alpha, slewis, Vineet Gupta On 28/03/23 19:54, Palmer Dabbelt wrote: > On Tue, 21 Feb 2023 11:15:34 PST (-0800), Evan Green wrote: >> >> This series illustrates the use of a proposed Linux syscall that >> enumerates architectural information about the RISC-V cores the system >> is running on. In this series we expose a small wrapper function around >> the syscall. An ifunc selector for memcpy queries it to see if unaligned >> access is "fast" on this hardware. If it is, it selects a newly provided >> implementation of memcpy that doesn't work hard at aligning the src and >> destination buffers. >> >> This is somewhat of a proof of concept for the syscall itself, but I do >> find that in my goofy memcpy test [1], the unaligned memcpy performed at >> least as well as the generic C version. This is however on Qemu on an M1 >> mac, so not a test of any real hardware (more a smoke test that the >> implementation isn't silly). > > QEMU isn't a good enough benchmark to justify a new memcpy routine in glibc. Evan has a D1, which does support misaligned access and runs some simple benchmarks faster. There's also been some minor changes to the Linux side of things that warrant a v3 anyway, so he'll just post some benchmarks on HW along with that. > > Aside from those comments, > > Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> > > There's a lot more stuff to probe for, but I think we've got enough of a proof of concept for the hwprobe stuff that we can move forward with the core interface bits in Linux/glibc and then unleash the chaos... > > Unless anyone else has comments? Until riscv_hwprobe is not on Linus tree as official Linux ABI this patchset can not be installed. We failed to enforce it on some occasion (like Intel CET) and it turned out a complete mess after some years... ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-28 23:41 ` Adhemerval Zanella Netto @ 2023-03-29 0:01 ` Palmer Dabbelt 2023-03-29 19:16 ` Adhemerval Zanella Netto 0 siblings, 1 reply; 27+ messages in thread From: Palmer Dabbelt @ 2023-03-29 0:01 UTC (permalink / raw) To: adhemerval.zanella; +Cc: Evan Green, libc-alpha, slewis, Vineet Gupta On Tue, 28 Mar 2023 16:41:10 PDT (-0700), adhemerval.zanella@linaro.org wrote: > > > On 28/03/23 19:54, Palmer Dabbelt wrote: >> On Tue, 21 Feb 2023 11:15:34 PST (-0800), Evan Green wrote: >>> >>> This series illustrates the use of a proposed Linux syscall that >>> enumerates architectural information about the RISC-V cores the system >>> is running on. In this series we expose a small wrapper function around >>> the syscall. An ifunc selector for memcpy queries it to see if unaligned >>> access is "fast" on this hardware. If it is, it selects a newly provided >>> implementation of memcpy that doesn't work hard at aligning the src and >>> destination buffers. >>> >>> This is somewhat of a proof of concept for the syscall itself, but I do >>> find that in my goofy memcpy test [1], the unaligned memcpy performed at >>> least as well as the generic C version. This is however on Qemu on an M1 >>> mac, so not a test of any real hardware (more a smoke test that the >>> implementation isn't silly). >> >> QEMU isn't a good enough benchmark to justify a new memcpy routine in glibc. Evan has a D1, which does support misaligned access and runs some simple benchmarks faster. There's also been some minor changes to the Linux side of things that warrant a v3 anyway, so he'll just post some benchmarks on HW along with that. >> >> Aside from those comments, >> >> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> >> >> There's a lot more stuff to probe for, but I think we've got enough of a proof of concept for the hwprobe stuff that we can move forward with the core interface bits in Linux/glibc and then unleash the chaos... >> >> Unless anyone else has comments? > > Until riscv_hwprobe is not on Linus tree as official Linux ABI this patchset > can not be installed. We failed to enforce it on some occasion (like Intel > CET) and it turned out a complete mess after some years... Sorry if that wasn't clear, I was asking if there were any more comments from the glibc side of things before merging the Linux code. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-29 0:01 ` Palmer Dabbelt @ 2023-03-29 19:16 ` Adhemerval Zanella Netto 2023-03-29 19:45 ` Palmer Dabbelt 0 siblings, 1 reply; 27+ messages in thread From: Adhemerval Zanella Netto @ 2023-03-29 19:16 UTC (permalink / raw) To: Palmer Dabbelt; +Cc: Evan Green, libc-alpha, slewis, Vineet Gupta On 28/03/23 21:01, Palmer Dabbelt wrote: > On Tue, 28 Mar 2023 16:41:10 PDT (-0700), adhemerval.zanella@linaro.org wrote: >> >> >> On 28/03/23 19:54, Palmer Dabbelt wrote: >>> On Tue, 21 Feb 2023 11:15:34 PST (-0800), Evan Green wrote: >>>> >>>> This series illustrates the use of a proposed Linux syscall that >>>> enumerates architectural information about the RISC-V cores the system >>>> is running on. In this series we expose a small wrapper function around >>>> the syscall. An ifunc selector for memcpy queries it to see if unaligned >>>> access is "fast" on this hardware. If it is, it selects a newly provided >>>> implementation of memcpy that doesn't work hard at aligning the src and >>>> destination buffers. >>>> >>>> This is somewhat of a proof of concept for the syscall itself, but I do >>>> find that in my goofy memcpy test [1], the unaligned memcpy performed at >>>> least as well as the generic C version. This is however on Qemu on an M1 >>>> mac, so not a test of any real hardware (more a smoke test that the >>>> implementation isn't silly). >>> >>> QEMU isn't a good enough benchmark to justify a new memcpy routine in glibc. Evan has a D1, which does support misaligned access and runs some simple benchmarks faster. There's also been some minor changes to the Linux side of things that warrant a v3 anyway, so he'll just post some benchmarks on HW along with that. >>> >>> Aside from those comments, >>> >>> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> >>> >>> There's a lot more stuff to probe for, but I think we've got enough of a proof of concept for the hwprobe stuff that we can move forward with the core interface bits in Linux/glibc and then unleash the chaos... >>> >>> Unless anyone else has comments? >> >> Until riscv_hwprobe is not on Linus tree as official Linux ABI this patchset >> can not be installed. We failed to enforce it on some occasion (like Intel >> CET) and it turned out a complete mess after some years... > > Sorry if that wasn't clear, I was asking if there were any more comments from the glibc side of things before merging the Linux code. Right, so is this already settle to be the de-factor ABI to query for system information in RISCV? Or is it still being discussed? Is it in a next branch already, and/or have been tested with a patch glibc? In any case I added some minimal comments. With the vDSO approach I think there is no need to cache the result at startup, as aarch64 and x86 does. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-29 19:16 ` Adhemerval Zanella Netto @ 2023-03-29 19:45 ` Palmer Dabbelt 2023-03-29 20:13 ` Adhemerval Zanella Netto 2023-03-30 6:20 ` Jeff Law 0 siblings, 2 replies; 27+ messages in thread From: Palmer Dabbelt @ 2023-03-29 19:45 UTC (permalink / raw) To: adhemerval.zanella; +Cc: Evan Green, libc-alpha, slewis, Vineet Gupta On Wed, 29 Mar 2023 12:16:39 PDT (-0700), adhemerval.zanella@linaro.org wrote: > > > On 28/03/23 21:01, Palmer Dabbelt wrote: >> On Tue, 28 Mar 2023 16:41:10 PDT (-0700), adhemerval.zanella@linaro.org wrote: >>> >>> >>> On 28/03/23 19:54, Palmer Dabbelt wrote: >>>> On Tue, 21 Feb 2023 11:15:34 PST (-0800), Evan Green wrote: >>>>> >>>>> This series illustrates the use of a proposed Linux syscall that >>>>> enumerates architectural information about the RISC-V cores the system >>>>> is running on. In this series we expose a small wrapper function around >>>>> the syscall. An ifunc selector for memcpy queries it to see if unaligned >>>>> access is "fast" on this hardware. If it is, it selects a newly provided >>>>> implementation of memcpy that doesn't work hard at aligning the src and >>>>> destination buffers. >>>>> >>>>> This is somewhat of a proof of concept for the syscall itself, but I do >>>>> find that in my goofy memcpy test [1], the unaligned memcpy performed at >>>>> least as well as the generic C version. This is however on Qemu on an M1 >>>>> mac, so not a test of any real hardware (more a smoke test that the >>>>> implementation isn't silly). >>>> >>>> QEMU isn't a good enough benchmark to justify a new memcpy routine in glibc. Evan has a D1, which does support misaligned access and runs some simple benchmarks faster. There's also been some minor changes to the Linux side of things that warrant a v3 anyway, so he'll just post some benchmarks on HW along with that. >>>> >>>> Aside from those comments, >>>> >>>> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> >>>> >>>> There's a lot more stuff to probe for, but I think we've got enough of a proof of concept for the hwprobe stuff that we can move forward with the core interface bits in Linux/glibc and then unleash the chaos... >>>> >>>> Unless anyone else has comments? >>> >>> Until riscv_hwprobe is not on Linus tree as official Linux ABI this patchset >>> can not be installed. We failed to enforce it on some occasion (like Intel >>> CET) and it turned out a complete mess after some years... >> >> Sorry if that wasn't clear, I was asking if there were any more comments from the glibc side of things before merging the Linux code. > > Right, so is this already settle to be the de-factor ABI to query for system > information in RISCV? Or is it still being discussed? Is it in a next branch > already, and/or have been tested with a patch glibc? It's not in for-next yet, but various patch sets / proposals have been on the lists for a few months and it seems like discussion on the kernel side has pretty much died down. That's why I was pinging the glibc side of things, if anyone here has comments on the interface then it's time to chime in. If there's no comments then we're likely to end up with this in the next release (so queue into for-next soon, Linus' master in a month or so). IIUC Evan's been testing the kernel+glibc stuff on QEMU, but he should be able to ack that explicitly (it's a little vague in the cover letter). There's also a glibc-independent kselftest as part of the kernel patch set: https://lore.kernel.org/all/20230327163203.2918455-6-evan@rivosinc.com/ . > > In any case I added some minimal comments. With the vDSO approach I think > there is no need to cache the result at startup, as aarch64 and x86 does. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-29 19:45 ` Palmer Dabbelt @ 2023-03-29 20:13 ` Adhemerval Zanella Netto 2023-03-30 18:31 ` Evan Green 2023-03-30 6:20 ` Jeff Law 1 sibling, 1 reply; 27+ messages in thread From: Adhemerval Zanella Netto @ 2023-03-29 20:13 UTC (permalink / raw) To: Palmer Dabbelt Cc: Evan Green, libc-alpha, slewis, Vineet Gupta, Arnd Bergmann On 29/03/23 16:45, Palmer Dabbelt wrote: > On Wed, 29 Mar 2023 12:16:39 PDT (-0700), adhemerval.zanella@linaro.org wrote: >> >> >> On 28/03/23 21:01, Palmer Dabbelt wrote: >>> On Tue, 28 Mar 2023 16:41:10 PDT (-0700), adhemerval.zanella@linaro.org wrote: >>>> >>>> >>>> On 28/03/23 19:54, Palmer Dabbelt wrote: >>>>> On Tue, 21 Feb 2023 11:15:34 PST (-0800), Evan Green wrote: >>>>>> >>>>>> This series illustrates the use of a proposed Linux syscall that >>>>>> enumerates architectural information about the RISC-V cores the system >>>>>> is running on. In this series we expose a small wrapper function around >>>>>> the syscall. An ifunc selector for memcpy queries it to see if unaligned >>>>>> access is "fast" on this hardware. If it is, it selects a newly provided >>>>>> implementation of memcpy that doesn't work hard at aligning the src and >>>>>> destination buffers. >>>>>> >>>>>> This is somewhat of a proof of concept for the syscall itself, but I do >>>>>> find that in my goofy memcpy test [1], the unaligned memcpy performed at >>>>>> least as well as the generic C version. This is however on Qemu on an M1 >>>>>> mac, so not a test of any real hardware (more a smoke test that the >>>>>> implementation isn't silly). >>>>> >>>>> QEMU isn't a good enough benchmark to justify a new memcpy routine in glibc. Evan has a D1, which does support misaligned access and runs some simple benchmarks faster. There's also been some minor changes to the Linux side of things that warrant a v3 anyway, so he'll just post some benchmarks on HW along with that. >>>>> >>>>> Aside from those comments, >>>>> >>>>> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> >>>>> >>>>> There's a lot more stuff to probe for, but I think we've got enough of a proof of concept for the hwprobe stuff that we can move forward with the core interface bits in Linux/glibc and then unleash the chaos... >>>>> >>>>> Unless anyone else has comments? >>>> >>>> Until riscv_hwprobe is not on Linus tree as official Linux ABI this patchset >>>> can not be installed. We failed to enforce it on some occasion (like Intel >>>> CET) and it turned out a complete mess after some years... >>> >>> Sorry if that wasn't clear, I was asking if there were any more comments from the glibc side of things before merging the Linux code. >> >> Right, so is this already settle to be the de-factor ABI to query for system >> information in RISCV? Or is it still being discussed? Is it in a next branch >> already, and/or have been tested with a patch glibc? > > It's not in for-next yet, but various patch sets / proposals have been on the lists for a few months and it seems like discussion on the kernel side has pretty much died down. That's why I was pinging the glibc side of things, if anyone here has comments on the interface then it's time to chime in. If there's no comments then we're likely to end up with this in the next release (so queue into for-next soon, Linus' master in a month or so). > > IIUC Evan's been testing the kernel+glibc stuff on QEMU, but he should be able to ack that explicitly (it's a little vague in the cover letter). There's also a glibc-independent kselftest as part of the kernel patch set: https://lore.kernel.org/all/20230327163203.2918455-6-evan@rivosinc.com/ . I am not sure if this is latest thread, but it seems that from cover letter link Arnd has raised some concerns about the interface [1] that has not been fully addressed. From libc perspective, the need to specify the query key on riscv_hwprobe should not be a problem (libc must know what tohandle, unknown tags are no use) and it simplifies the buffer management (so there is no need to query for unknown set of keys of a allocate a large buffer to handle multiple non-required pairs). However, I agree with Arnd that there should be no need to optimize for hardware that has an asymmetric set of features and, at least for glibc usage and most runtime feature selection, it does not make sense to query per-cpu information (unless you some very specific programming, like pine the process to specific cores and enable core-specific code). I also wonder how hotplug or cpusets would play with the vDSO support, and how kernel would synchronize the update, if any, to the prive vDSO data. [1] https://lore.kernel.org/lkml/20230221190858.3159617-1-evan@rivosinc.com/T/#m452cffd9f60684e9d6d6dccf595f33ecfbc99be2 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-29 20:13 ` Adhemerval Zanella Netto @ 2023-03-30 18:31 ` Evan Green 2023-03-30 19:43 ` Adhemerval Zanella Netto 0 siblings, 1 reply; 27+ messages in thread From: Evan Green @ 2023-03-30 18:31 UTC (permalink / raw) To: Adhemerval Zanella Netto Cc: Palmer Dabbelt, libc-alpha, slewis, Vineet Gupta, Arnd Bergmann Hi Adhemerval, On Wed, Mar 29, 2023 at 1:13 PM Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> wrote: > > > > On 29/03/23 16:45, Palmer Dabbelt wrote: > > On Wed, 29 Mar 2023 12:16:39 PDT (-0700), adhemerval.zanella@linaro.org wrote: > >> > >> > >> On 28/03/23 21:01, Palmer Dabbelt wrote: > >>> On Tue, 28 Mar 2023 16:41:10 PDT (-0700), adhemerval.zanella@linaro.org wrote: > >>>> > >>>> > >>>> On 28/03/23 19:54, Palmer Dabbelt wrote: > >>>>> On Tue, 21 Feb 2023 11:15:34 PST (-0800), Evan Green wrote: > >>>>>> > >>>>>> This series illustrates the use of a proposed Linux syscall that > >>>>>> enumerates architectural information about the RISC-V cores the system > >>>>>> is running on. In this series we expose a small wrapper function around > >>>>>> the syscall. An ifunc selector for memcpy queries it to see if unaligned > >>>>>> access is "fast" on this hardware. If it is, it selects a newly provided > >>>>>> implementation of memcpy that doesn't work hard at aligning the src and > >>>>>> destination buffers. > >>>>>> > >>>>>> This is somewhat of a proof of concept for the syscall itself, but I do > >>>>>> find that in my goofy memcpy test [1], the unaligned memcpy performed at > >>>>>> least as well as the generic C version. This is however on Qemu on an M1 > >>>>>> mac, so not a test of any real hardware (more a smoke test that the > >>>>>> implementation isn't silly). > >>>>> > >>>>> QEMU isn't a good enough benchmark to justify a new memcpy routine in glibc. Evan has a D1, which does support misaligned access and runs some simple benchmarks faster. There's also been some minor changes to the Linux side of things that warrant a v3 anyway, so he'll just post some benchmarks on HW along with that. > >>>>> > >>>>> Aside from those comments, > >>>>> > >>>>> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> > >>>>> > >>>>> There's a lot more stuff to probe for, but I think we've got enough of a proof of concept for the hwprobe stuff that we can move forward with the core interface bits in Linux/glibc and then unleash the chaos... > >>>>> > >>>>> Unless anyone else has comments? > >>>> > >>>> Until riscv_hwprobe is not on Linus tree as official Linux ABI this patchset > >>>> can not be installed. We failed to enforce it on some occasion (like Intel > >>>> CET) and it turned out a complete mess after some years... > >>> > >>> Sorry if that wasn't clear, I was asking if there were any more comments from the glibc side of things before merging the Linux code. > >> > >> Right, so is this already settle to be the de-factor ABI to query for system > >> information in RISCV? Or is it still being discussed? Is it in a next branch > >> already, and/or have been tested with a patch glibc? > > > > It's not in for-next yet, but various patch sets / proposals have been on the lists for a few months and it seems like discussion on the kernel side has pretty much died down. That's why I was pinging the glibc side of things, if anyone here has comments on the interface then it's time to chime in. If there's no comments then we're likely to end up with this in the next release (so queue into for-next soon, Linus' master in a month or so). > > > > IIUC Evan's been testing the kernel+glibc stuff on QEMU, but he should be able to ack that explicitly (it's a little vague in the cover letter). There's also a glibc-independent kselftest as part of the kernel patch set: https://lore.kernel.org/all/20230327163203.2918455-6-evan@rivosinc.com/ . > > I am not sure if this is latest thread, but it seems that from cover letter link > Arnd has raised some concerns about the interface [1] that has not been fully > addressed. I've replied to that thread. > > From libc perspective, the need to specify the query key on riscv_hwprobe should > not be a problem (libc must know what tohandle, unknown tags are no use) and it > simplifies the buffer management (so there is no need to query for unknown set of > keys of a allocate a large buffer to handle multiple non-required pairs). > > However, I agree with Arnd that there should be no need to optimize for hardware > that has an asymmetric set of features and, at least for glibc usage and most > runtime feature selection, it does not make sense to query per-cpu information > (unless you some very specific programming, like pine the process to specific > cores and enable core-specific code). I pushed back on that in my reply upstream, feel free to jump in there. I think you're right that glibc probably wouldn't ever use the cpuset aspect of the interface, but the gist of my reply upstream is that more specialized apps may. > > I also wonder how hotplug or cpusets would play with the vDSO support, and how > kernel would synchronize the update, if any, to the prive vDSO data. The good news is that the cached data in the vDSO is not ABI, it's hidden behind the vDSO function. So as things like hotplug start evolving and interacting with the vDSO cache data, we can update what data we cache and when we fall back to the syscall. -Evan > > [1] https://lore.kernel.org/lkml/20230221190858.3159617-1-evan@rivosinc.com/T/#m452cffd9f60684e9d6d6dccf595f33ecfbc99be2 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-30 18:31 ` Evan Green @ 2023-03-30 19:43 ` Adhemerval Zanella Netto 0 siblings, 0 replies; 27+ messages in thread From: Adhemerval Zanella Netto @ 2023-03-30 19:43 UTC (permalink / raw) To: Evan Green Cc: Palmer Dabbelt, libc-alpha, slewis, Vineet Gupta, Arnd Bergmann On 30/03/23 15:31, Evan Green wrote: > Hi Adhemerval, > > On Wed, Mar 29, 2023 at 1:13 PM Adhemerval Zanella Netto > <adhemerval.zanella@linaro.org> wrote: >> >> >> >> On 29/03/23 16:45, Palmer Dabbelt wrote: >>> On Wed, 29 Mar 2023 12:16:39 PDT (-0700), adhemerval.zanella@linaro.org wrote: >>>> >>>> >>>> On 28/03/23 21:01, Palmer Dabbelt wrote: >>>>> On Tue, 28 Mar 2023 16:41:10 PDT (-0700), adhemerval.zanella@linaro.org wrote: >>>>>> >>>>>> >>>>>> On 28/03/23 19:54, Palmer Dabbelt wrote: >>>>>>> On Tue, 21 Feb 2023 11:15:34 PST (-0800), Evan Green wrote: >>>>>>>> >>>>>>>> This series illustrates the use of a proposed Linux syscall that >>>>>>>> enumerates architectural information about the RISC-V cores the system >>>>>>>> is running on. In this series we expose a small wrapper function around >>>>>>>> the syscall. An ifunc selector for memcpy queries it to see if unaligned >>>>>>>> access is "fast" on this hardware. If it is, it selects a newly provided >>>>>>>> implementation of memcpy that doesn't work hard at aligning the src and >>>>>>>> destination buffers. >>>>>>>> >>>>>>>> This is somewhat of a proof of concept for the syscall itself, but I do >>>>>>>> find that in my goofy memcpy test [1], the unaligned memcpy performed at >>>>>>>> least as well as the generic C version. This is however on Qemu on an M1 >>>>>>>> mac, so not a test of any real hardware (more a smoke test that the >>>>>>>> implementation isn't silly). >>>>>>> >>>>>>> QEMU isn't a good enough benchmark to justify a new memcpy routine in glibc. Evan has a D1, which does support misaligned access and runs some simple benchmarks faster. There's also been some minor changes to the Linux side of things that warrant a v3 anyway, so he'll just post some benchmarks on HW along with that. >>>>>>> >>>>>>> Aside from those comments, >>>>>>> >>>>>>> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> >>>>>>> >>>>>>> There's a lot more stuff to probe for, but I think we've got enough of a proof of concept for the hwprobe stuff that we can move forward with the core interface bits in Linux/glibc and then unleash the chaos... >>>>>>> >>>>>>> Unless anyone else has comments? >>>>>> >>>>>> Until riscv_hwprobe is not on Linus tree as official Linux ABI this patchset >>>>>> can not be installed. We failed to enforce it on some occasion (like Intel >>>>>> CET) and it turned out a complete mess after some years... >>>>> >>>>> Sorry if that wasn't clear, I was asking if there were any more comments from the glibc side of things before merging the Linux code. >>>> >>>> Right, so is this already settle to be the de-factor ABI to query for system >>>> information in RISCV? Or is it still being discussed? Is it in a next branch >>>> already, and/or have been tested with a patch glibc? >>> >>> It's not in for-next yet, but various patch sets / proposals have been on the lists for a few months and it seems like discussion on the kernel side has pretty much died down. That's why I was pinging the glibc side of things, if anyone here has comments on the interface then it's time to chime in. If there's no comments then we're likely to end up with this in the next release (so queue into for-next soon, Linus' master in a month or so). >>> >>> IIUC Evan's been testing the kernel+glibc stuff on QEMU, but he should be able to ack that explicitly (it's a little vague in the cover letter). There's also a glibc-independent kselftest as part of the kernel patch set: https://lore.kernel.org/all/20230327163203.2918455-6-evan@rivosinc.com/ . >> >> I am not sure if this is latest thread, but it seems that from cover letter link >> Arnd has raised some concerns about the interface [1] that has not been fully >> addressed. > > I've replied to that thread. > >> >> From libc perspective, the need to specify the query key on riscv_hwprobe should >> not be a problem (libc must know what tohandle, unknown tags are no use) and it >> simplifies the buffer management (so there is no need to query for unknown set of >> keys of a allocate a large buffer to handle multiple non-required pairs). >> >> However, I agree with Arnd that there should be no need to optimize for hardware >> that has an asymmetric set of features and, at least for glibc usage and most >> runtime feature selection, it does not make sense to query per-cpu information >> (unless you some very specific programming, like pine the process to specific >> cores and enable core-specific code). > > I pushed back on that in my reply upstream, feel free to jump in > there. I think you're right that glibc probably wouldn't ever use the > cpuset aspect of the interface, but the gist of my reply upstream is > that more specialized apps may. Well, I still think providing the userland with asymmetric set of features is a complexity that does not pay off, but at least the interface does allow to return a concise view of the supported features. > >> >> I also wonder how hotplug or cpusets would play with the vDSO support, and how >> kernel would synchronize the update, if any, to the prive vDSO data. > > The good news is that the cached data in the vDSO is not ABI, it's > hidden behind the vDSO function. So as things like hotplug start > evolving and interacting with the vDSO cache data, we can update what > data we cache and when we fall back to the syscall. Right, I was just curious how one would synchronize the vDSO code with the concurrent update from kernel. Some time ago, I was working with another kernel developer on a vDSO getrandom and it required a lot of boilerplate and even though we did not come with a good interface for concurrent access with a structure that kernel might change concurrently. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-29 19:45 ` Palmer Dabbelt 2023-03-29 20:13 ` Adhemerval Zanella Netto @ 2023-03-30 6:20 ` Jeff Law 2023-03-30 18:43 ` Evan Green 2023-03-30 19:38 ` Adhemerval Zanella Netto 1 sibling, 2 replies; 27+ messages in thread From: Jeff Law @ 2023-03-30 6:20 UTC (permalink / raw) To: Palmer Dabbelt, adhemerval.zanella Cc: Evan Green, libc-alpha, slewis, Vineet Gupta On 3/29/23 13:45, Palmer Dabbelt wrote: > It's not in for-next yet, but various patch sets / proposals have been > on the lists for a few months and it seems like discussion on the kernel > side has pretty much died down. That's why I was pinging the glibc side > of things, if anyone here has comments on the interface then it's time > to chime in. If there's no comments then we're likely to end up with > this in the next release (so queue into for-next soon, Linus' master in > a month or so). Right. And I've suggested that we at least try to settle on the various mem* and str* implementations independently of the kernel->glibc interface question. I don't much care how we break down the problem of selecting implementations, just that we get started. That can and probably should be happening in parallel with the kernel->glibc API work. I've got some performance testing to do in this space (primarily of the VRULL implementations). It's just going to take a long time to get the data. And that implementation probably needs some revamping after all the work on the mem* and str* infrastructure that landed earlier this year. jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-30 6:20 ` Jeff Law @ 2023-03-30 18:43 ` Evan Green 2023-03-31 5:09 ` Jeff Law 2023-03-30 19:38 ` Adhemerval Zanella Netto 1 sibling, 1 reply; 27+ messages in thread From: Evan Green @ 2023-03-30 18:43 UTC (permalink / raw) To: Jeff Law Cc: Palmer Dabbelt, adhemerval.zanella, libc-alpha, slewis, Vineet Gupta On Wed, Mar 29, 2023 at 11:20 PM Jeff Law <jeffreyalaw@gmail.com> wrote: > > > > On 3/29/23 13:45, Palmer Dabbelt wrote: > > > It's not in for-next yet, but various patch sets / proposals have been > > on the lists for a few months and it seems like discussion on the kernel > > side has pretty much died down. That's why I was pinging the glibc side > > of things, if anyone here has comments on the interface then it's time > > to chime in. If there's no comments then we're likely to end up with > > this in the next release (so queue into for-next soon, Linus' master in > > a month or so). > Right. And I've suggested that we at least try to settle on the various > mem* and str* implementations independently of the kernel->glibc > interface question. This works for me. As we talked about off-list, this series cleaves pretty cleanly. One option would be to take this series now(ish, whenever the kernel series lands), then cleave off my memcpy and replace it with Vrull's when it's ready. The hope being that two incremental improvements go faster than waiting to try and land everything perfectly all at once. -Evan > > I don't much care how we break down the problem of selecting > implementations, just that we get started. That can and probably > should be happening in parallel with the kernel->glibc API work. > > I've got some performance testing to do in this space (primarily of the > VRULL implementations). It's just going to take a long time to get the > data. And that implementation probably needs some revamping after all > the work on the mem* and str* infrastructure that landed earlier this year. > > jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-30 18:43 ` Evan Green @ 2023-03-31 5:09 ` Jeff Law 0 siblings, 0 replies; 27+ messages in thread From: Jeff Law @ 2023-03-31 5:09 UTC (permalink / raw) To: Evan Green Cc: Palmer Dabbelt, adhemerval.zanella, libc-alpha, slewis, Vineet Gupta On 3/30/23 12:43, Evan Green wrote: > On Wed, Mar 29, 2023 at 11:20 PM Jeff Law <jeffreyalaw@gmail.com> wrote: >> >> >> >> On 3/29/23 13:45, Palmer Dabbelt wrote: >> >>> It's not in for-next yet, but various patch sets / proposals have been >>> on the lists for a few months and it seems like discussion on the kernel >>> side has pretty much died down. That's why I was pinging the glibc side >>> of things, if anyone here has comments on the interface then it's time >>> to chime in. If there's no comments then we're likely to end up with >>> this in the next release (so queue into for-next soon, Linus' master in >>> a month or so). >> Right. And I've suggested that we at least try to settle on the various >> mem* and str* implementations independently of the kernel->glibc >> interface question. > > This works for me. As we talked about off-list, this series cleaves > pretty cleanly. One option would be to take this series now(ish, > whenever the kernel series lands), then cleave off my memcpy and > replace it with Vrull's when it's ready. The hope being that two > incremental improvements go faster than waiting to try and land > everything perfectly all at once. No idea at this point if VRULL's is better or worse ;-) Right now I'm focused on their cboz implementation of memset. Assuming no uarch quirks it should be a slam dunk. But of course there's a quirk in our uarch, so testing testing testing. I did just spend a fair amount of time in the hottest path of their strcmp. It seems quite reasonable. Jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-30 6:20 ` Jeff Law 2023-03-30 18:43 ` Evan Green @ 2023-03-30 19:38 ` Adhemerval Zanella Netto 2023-03-31 18:07 ` Jeff Law 1 sibling, 1 reply; 27+ messages in thread From: Adhemerval Zanella Netto @ 2023-03-30 19:38 UTC (permalink / raw) To: Jeff Law, Palmer Dabbelt; +Cc: Evan Green, libc-alpha, slewis, Vineet Gupta On 30/03/23 03:20, Jeff Law wrote: > > > On 3/29/23 13:45, Palmer Dabbelt wrote: > >> It's not in for-next yet, but various patch sets / proposals have been on the lists for a few months and it seems like discussion on the kernel side has pretty much died down. That's why I was pinging the glibc side of things, if anyone here has comments on the interface then it's time to chime in. If there's no comments then we're likely to end up with this in the next release (so queue into for-next soon, Linus' master in a month or so). > Right. And I've suggested that we at least try to settle on the various mem* and str* implementations independently of the kernel->glibc interface question. > > I don't much care how we break down the problem of selecting implementations, just that we get started. That can and probably should be happening in parallel with the kernel->glibc API work. > > I've got some performance testing to do in this space (primarily of the VRULL implementations). It's just going to take a long time to get the data. And that implementation probably needs some revamping after all the work on the mem* and str* infrastructure that landed earlier this year. > I don't think glibc is the right place for code dump, specially for implementations that does not have representative performance numbers in real hardware and might require further tuning. It can be even tricky if you require different build config to testing as used to have for some ABI (for instance on powerpc with --with-cpu), at least for ifunc we have some mechanism to test multiple variants assuming the chips at least support (which should be case for unaligned). For ARM we have the optimize-routines [1] project, where we use as testbed for multiple implementations and also, due its license mechanism, make it easier to implement the optimized routines on different projects. We used to have a similar project, cortex-strings, on Linaro. So for experimental routines, where you expect to have frequent tuning based on once you have tested and benchmarks on different chips; an external project might a better idea; and sync with glibc once the routines are tested and validate. And these RISCV does seemed to be still very experimental, where performance numbers are still synthetic ones from emulators. Another possibility might to improve the generic implementation, as we have done recently where RISCV bitmanip was a matter to add just 2 files and 4 functions to optimize multiple string functions [2]. I have some WIP patches to add support for unaligned memcpy/memmove with a very simple strategy. [1] https://github.com/ARM-software/optimized-routines [2] https://sourceware.org/git/?p=glibc.git;a=commit;h=25788431c0f5264c4830415de0cdd4d9926cbad9 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-30 19:38 ` Adhemerval Zanella Netto @ 2023-03-31 18:07 ` Jeff Law 2023-03-31 18:34 ` Palmer Dabbelt 0 siblings, 1 reply; 27+ messages in thread From: Jeff Law @ 2023-03-31 18:07 UTC (permalink / raw) To: Adhemerval Zanella Netto, Palmer Dabbelt Cc: Evan Green, libc-alpha, slewis, Vineet Gupta On 3/30/23 13:38, Adhemerval Zanella Netto wrote: > > > On 30/03/23 03:20, Jeff Law wrote: >> >> >> On 3/29/23 13:45, Palmer Dabbelt wrote: >> >>> It's not in for-next yet, but various patch sets / proposals have been on the lists for a few months and it seems like discussion on the kernel side has pretty much died down. That's why I was pinging the glibc side of things, if anyone here has comments on the interface then it's time to chime in. If there's no comments then we're likely to end up with this in the next release (so queue into for-next soon, Linus' master in a month or so). >> Right. And I've suggested that we at least try to settle on the various mem* and str* implementations independently of the kernel->glibc interface question. >> >> I don't much care how we break down the problem of selecting implementations, just that we get started. That can and probably should be happening in parallel with the kernel->glibc API work. >> >> I've got some performance testing to do in this space (primarily of the VRULL implementations). It's just going to take a long time to get the data. And that implementation probably needs some revamping after all the work on the mem* and str* infrastructure that landed earlier this year. >> > > I don't think glibc is the right place for code dump, specially for implementations > that does not have representative performance numbers in real hardware and might > require further tuning. It can be even tricky if you require different build config > to testing as used to have for some ABI (for instance on powerpc with --with-cpu), > at least for ifunc we have some mechanism to test multiple variants assuming the > chips at least support (which should be case for unaligned). It's not meant to be "code dump". It's "these are the recommended implementation and we're just waiting for the final ifunc wiring to use them automatically." But I understand your point. Even if we just agree on the implementations without committing until the ifunc interface is settled is a major step forward. My larger point is that we need to work through the str* and mem* implementations and settle on those implementations and that can happen in independently of the interface discussion with the kernel team. If we've settled on specific implementations, why not go ahead and put them into the repo with the expectation that we can trivially wire them into the ifunc resolver once the abi interface is sorted out. > > So for experimental routines, where you expect to have frequent tuning based on > once you have tested and benchmarks on different chips; an external project > might a better idea; and sync with glibc once the routines are tested and validate. > And these RISCV does seemed to be still very experimental, where performance numbers > are still synthetic ones from emulators. I think we're actually a lot closer than you might think :-) My goal would be that we're not doing frequent tuning and avoid uarch specific versions if we at all can. There's a reasonable chance we can do that if we have good baseline, zbb and vector versions. I'm not including cboz memory clear right now -- there's already evidence that uarch considerations around cboz may be significant. > > Another possibility might to improve the generic implementation, as we have done > recently where RISCV bitmanip was a matter to add just 2 files and 4 functions > to optimize multiple string functions [2]. I have some WIP patches to add support > for unaligned memcpy/memmove with a very simple strategy. As I noted elsewhere. I was on the fence with pushing for improvements to the generic strcmp bits, but could be easily swayed to that position. jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-31 18:07 ` Jeff Law @ 2023-03-31 18:34 ` Palmer Dabbelt 2023-03-31 19:32 ` Adhemerval Zanella Netto 0 siblings, 1 reply; 27+ messages in thread From: Palmer Dabbelt @ 2023-03-31 18:34 UTC (permalink / raw) To: jeffreyalaw Cc: adhemerval.zanella, Evan Green, libc-alpha, slewis, Vineet Gupta On Fri, 31 Mar 2023 11:07:02 PDT (-0700), jeffreyalaw@gmail.com wrote: > > > On 3/30/23 13:38, Adhemerval Zanella Netto wrote: >> >> >> On 30/03/23 03:20, Jeff Law wrote: >>> >>> >>> On 3/29/23 13:45, Palmer Dabbelt wrote: >>> >>>> It's not in for-next yet, but various patch sets / proposals have been on the lists for a few months and it seems like discussion on the kernel side has pretty much died down. That's why I was pinging the glibc side of things, if anyone here has comments on the interface then it's time to chime in. If there's no comments then we're likely to end up with this in the next release (so queue into for-next soon, Linus' master in a month or so). >>> Right. And I've suggested that we at least try to settle on the various mem* and str* implementations independently of the kernel->glibc interface question. >>> >>> I don't much care how we break down the problem of selecting implementations, just that we get started. That can and probably should be happening in parallel with the kernel->glibc API work. >>> >>> I've got some performance testing to do in this space (primarily of the VRULL implementations). It's just going to take a long time to get the data. And that implementation probably needs some revamping after all the work on the mem* and str* infrastructure that landed earlier this year. >>> >> >> I don't think glibc is the right place for code dump, specially for implementations >> that does not have representative performance numbers in real hardware and might >> require further tuning. It can be even tricky if you require different build config >> to testing as used to have for some ABI (for instance on powerpc with --with-cpu), >> at least for ifunc we have some mechanism to test multiple variants assuming the >> chips at least support (which should be case for unaligned). > It's not meant to be "code dump". It's "these are the recommended > implementation and we're just waiting for the final ifunc wiring to use > them automatically." > > But I understand your point. Even if we just agree on the > implementations without committing until the ifunc interface is settled > is a major step forward. > > My larger point is that we need to work through the str* and mem* > implementations and settle on those implementations and that can happen > in independently of the interface discussion with the kernel team. If > we've settled on specific implementations, why not go ahead and put them > into the repo with the expectation that we can trivially wire them into > the ifunc resolver once the abi interface is sorted out. IMO that's fine: we've got a bunch of other infrastructure around these optimized routines that will need to get built (glibc_hwcaps, for example) so it's not like just having hwprobe means we're done. The only issue I see with having these in tree is that we'll end up with glibc binaries that have vendor-specific tunings, but no way to provide those with generic binaries. That means vendors will end up shipping these non-portable binaries. We've historically tried to avoid that wherever possible, but it's probably time to call that a pipe dream -- the only base we could really have is rv64gc, and that's going to be so slow it's essentially useless for any real systems. So if you guys have actual performance gain numbers to talk about, then I'm happy taking the optimized glibc routines (or at least whatever bits of them are in RISC-V land) for that hardware -- even if it means there's a build-time configuration that results in Ventana-specific binaries. I think we do want to keep pushing on the dynamic flavors of stuff, just so we can try to dig out of this hole at some point, but we're going to have a mess until the ISA get sorted out. My guess is that will take years, and blocking the optimizations until then is just going to lead to a bunch of out-of-tree ports from vendors and an even bigger mess. >> So for experimental routines, where you expect to have frequent tuning based on >> once you have tested and benchmarks on different chips; an external project >> might a better idea; and sync with glibc once the routines are tested and validate. >> And these RISCV does seemed to be still very experimental, where performance numbers >> are still synthetic ones from emulators. > I think we're actually a lot closer than you might think :-) My goal > would be that we're not doing frequent tuning and avoid uarch specific > versions if we at all can. There's a reasonable chance we can do that > if we have good baseline, zbb and vector versions. I'm not including Unfortunately there's going to be very wide variation in performance between vendors for the vector extension, we're going to have at least 3 flavors of anything there (plus whatever Allwinner/T-Head ends up needing, but that's a whole can of worms). So I think at this point we'd be better off just calling these vendor-specific routines, if there's some commonality between them we can sort it out later. > cboz memory clear right now -- there's already evidence that uarch > considerations around cboz may be significant. Yep, again there's at least 3 ways of implementing CBOZ that I've seen floating around so we're going to have a vendor-specific mess there. >> Another possibility might to improve the generic implementation, as we have done >> recently where RISCV bitmanip was a matter to add just 2 files and 4 functions >> to optimize multiple string functions [2]. I have some WIP patches to add support >> for unaligned memcpy/memmove with a very simple strategy. > As I noted elsewhere. I was on the fence with pushing for improvements > to the generic strcmp bits, but could be easily swayed to that position. > > jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-31 18:34 ` Palmer Dabbelt @ 2023-03-31 19:32 ` Adhemerval Zanella Netto 2023-03-31 20:19 ` Jeff Law 0 siblings, 1 reply; 27+ messages in thread From: Adhemerval Zanella Netto @ 2023-03-31 19:32 UTC (permalink / raw) To: Palmer Dabbelt, jeffreyalaw; +Cc: Evan Green, libc-alpha, slewis, Vineet Gupta On 31/03/23 15:34, Palmer Dabbelt wrote: > On Fri, 31 Mar 2023 11:07:02 PDT (-0700), jeffreyalaw@gmail.com wrote: >> >> >> On 3/30/23 13:38, Adhemerval Zanella Netto wrote: >>> >>> >>> On 30/03/23 03:20, Jeff Law wrote: >>>> >>>> >>>> On 3/29/23 13:45, Palmer Dabbelt wrote: >>>> >>>>> It's not in for-next yet, but various patch sets / proposals have been on the lists for a few months and it seems like discussion on the kernel side has pretty much died down. That's why I was pinging the glibc side of things, if anyone here has comments on the interface then it's time to chime in. If there's no comments then we're likely to end up with this in the next release (so queue into for-next soon, Linus' master in a month or so). >>>> Right. And I've suggested that we at least try to settle on the various mem* and str* implementations independently of the kernel->glibc interface question. >>>> >>>> I don't much care how we break down the problem of selecting implementations, just that we get started. That can and probably should be happening in parallel with the kernel->glibc API work. >>>> >>>> I've got some performance testing to do in this space (primarily of the VRULL implementations). It's just going to take a long time to get the data. And that implementation probably needs some revamping after all the work on the mem* and str* infrastructure that landed earlier this year. >>>> >>> >>> I don't think glibc is the right place for code dump, specially for implementations >>> that does not have representative performance numbers in real hardware and might >>> require further tuning. It can be even tricky if you require different build config >>> to testing as used to have for some ABI (for instance on powerpc with --with-cpu), >>> at least for ifunc we have some mechanism to test multiple variants assuming the >>> chips at least support (which should be case for unaligned). >> It's not meant to be "code dump". It's "these are the recommended >> implementation and we're just waiting for the final ifunc wiring to use >> them automatically." >> >> But I understand your point. Even if we just agree on the >> implementations without committing until the ifunc interface is settled >> is a major step forward. >> >> My larger point is that we need to work through the str* and mem* >> implementations and settle on those implementations and that can happen >> in independently of the interface discussion with the kernel team. If >> we've settled on specific implementations, why not go ahead and put them >> into the repo with the expectation that we can trivially wire them into >> the ifunc resolver once the abi interface is sorted out. > > IMO that's fine: we've got a bunch of other infrastructure around these optimized routines that will need to get built (glibc_hwcaps, for example) so it's not like just having hwprobe means we're done. > > The only issue I see with having these in tree is that we'll end up with glibc binaries that have vendor-specific tunings, but no way to provide those with generic binaries. That means vendors will end up shipping these non-portable binaries. We've historically tried to avoid that wherever possible, but it's probably time to call that a pipe dream -- the only base we could really have is rv64gc, and that's going to be so slow it's essentially useless for any real systems. > > So if you guys have actual performance gain numbers to talk about, then I'm happy taking the optimized glibc routines (or at least whatever bits of them are in RISC-V land) for that hardware -- even if it means there's a build-time configuration that results in Ventana-specific binaries. > > I think we do want to keep pushing on the dynamic flavors of stuff, just so we can try to dig out of this hole at some point, but we're going to have a mess until the ISA get sorted out. My guess is that will take years, and blocking the optimizations until then is just going to lead to a bunch of out-of-tree ports from vendors and an even bigger mess. It is still not clear to me what RISCV, as ABI and not as an specific vendor, wants to provide arch and vendor specific str* and mem* routines. Christophe has hinted that the focus is not compile-only approach, so I take --with-cpu support (similar to what some old ABI used to provide, like powerpc) is not an option. However, this is not what the RVV proposal does [3], which is to enable RVV iff you target glibc to rvv (so compile-only). And that's why I asked you guys to first define on how you want to approach it. So I take that RISCV want to follow what x86_64 and aarch64 do, which is provide optimized routines for a minimum abi (say rv64gc), and then provide runtime selection through ifunc for either ABI or vendor specific routines (including variant like the unaligned optimization). You can still follow what x86_64 and s390 recently did, which is if you define a minimum ABI version, you default the optimized version and either skip ifunc selection or setup a more restrict set (so in future, you can have a rvv-only build that does not need to provide old zbb or rv64gc support). Which then leads to how to actually test and provide such support. The str* and mem* tests consult which ifunc variant are support (ifunc-impl-list.c) on the underlying hardware; while the selector returns the best option. Both rely on how to query the hardware at or least which version are supported, so I think RISCV should first figure out this part (unless you do want to follow the compile-only approach...) So it does not make sense to me to have ifunc variants not selected or tested in repo, only to be enabled in a foreseen future. [1] https://sourceware.org/pipermail/libc-alpha/2023-February/145392.html [2] https://sourceware.org/pipermail/libc-alpha/2023-February/145414.html [3] https://sourceware.org/pipermail/libc-alpha/2023-March/thread.html > >>> So for experimental routines, where you expect to have frequent tuning based on >>> once you have tested and benchmarks on different chips; an external project >>> might a better idea; and sync with glibc once the routines are tested and validate. >>> And these RISCV does seemed to be still very experimental, where performance numbers >>> are still synthetic ones from emulators. >> I think we're actually a lot closer than you might think :-) My goal >> would be that we're not doing frequent tuning and avoid uarch specific >> versions if we at all can. There's a reasonable chance we can do that >> if we have good baseline, zbb and vector versions. I'm not including > > Unfortunately there's going to be very wide variation in performance between vendors for the vector extension, we're going to have at least 3 flavors of anything there (plus whatever Allwinner/T-Head ends up needing, but that's a whole can of worms). So I think at this point we'd be better off just calling these vendor-specific routines, if there's some commonality between them we can sort it out later. > >> cboz memory clear right now -- there's already evidence that uarch >> considerations around cboz may be significant. > > Yep, again there's at least 3 ways of implementing CBOZ that I've seen floating around so we're going to have a vendor-specific mess there. > >>> Another possibility might to improve the generic implementation, as we have done >>> recently where RISCV bitmanip was a matter to add just 2 files and 4 functions >>> to optimize multiple string functions [2]. I have some WIP patches to add support >>> for unaligned memcpy/memmove with a very simple strategy. >> As I noted elsewhere. I was on the fence with pushing for improvements >> to the generic strcmp bits, but could be easily swayed to that position. >> >> jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-31 19:32 ` Adhemerval Zanella Netto @ 2023-03-31 20:19 ` Jeff Law 2023-03-31 21:03 ` Palmer Dabbelt 0 siblings, 1 reply; 27+ messages in thread From: Jeff Law @ 2023-03-31 20:19 UTC (permalink / raw) To: Adhemerval Zanella Netto, Palmer Dabbelt Cc: Evan Green, libc-alpha, slewis, Vineet Gupta On 3/31/23 13:32, Adhemerval Zanella Netto wrote: > > It is still not clear to me what RISCV, as ABI and not as an specific vendor, > wants to provide arch and vendor specific str* and mem* routines. Christophe > has hinted that the focus is not compile-only approach, so I take --with-cpu > support (similar to what some old ABI used to provide, like powerpc) is not > an option. However, this is not what the RVV proposal does [3], which is to > enable RVV iff you target glibc to rvv (so compile-only). I believe there is consensus on the desire to use dynamic dispatch via an ifunc resolver. > > And that's why I asked you guys to first define on how you want to approach > it. I think that's already done. I don't really see any confusion in this space. The patch from the sifive team has static dispatch, they made it clear they want dynamic dispatch though. Static dispatch is just a stopgap until the dynamic dispatch work is ready AFAICT. rivos had a dynamic dispatch mechanism based on riscv_hwprobe VRULL had a dynamic dispatch based on an environment variable. This was acknowledged to be a hack which would be dropped once the kernel->glibc interface bits were sorted out. Ventana doesn't have patches in this space, but had been using the VRULL bits. I don't really have a preference as far as implementations. I just want to define good ones that cover the most important cases, particularly with regard to ISA extensions, but I'm even willing to narrow the immediate focus down further (see below). > > So I take that RISCV want to follow what x86_64 and aarch64 do, which is > provide optimized routines for a minimum abi (say rv64gc), and then provide > runtime selection through ifunc for either ABI or vendor specific routines > (including variant like the unaligned optimization). Right. That's basically what I think we're trying to do. Find a suitable implementation we can agree upon for a given ISA architecture. The belief right now is that we need one for the baseline architecture, one for architectures implementing ZBB and another for architectures that implement RVV. ZBB and RVV are not uarch variants; they are standardized, but optional ISA features. I don't think anyone is (yet!) pushing for uarch variants. In fact, I would very much like to avoid that as much as I can. Palmer might see uarch variants are inevitable, I don't (and maybe I'm being naive). You can still follow > what x86_64 and s390 recently did, which is if you define a minimum ABI > version, you default the optimized version and either skip ifunc selection > or setup a more restrict set (so in future, you can have a rvv-only build > that does not need to provide old zbb or rv64gc support). I'm focused on defining a implementation for the baseline architecture as well as one for ZBB and RVV ISAs. > > Which then leads to how to actually test and provide such support. The > str* and mem* tests consult which ifunc variant are support > (ifunc-impl-list.c) on the underlying hardware; while the selector returns > the best option. Both rely on how to query the hardware at or least which > version are supported, so I think RISCV should first figure out this part > (unless you do want to follow the compile-only approach...) > > So it does not make sense to me to have ifunc variants not selected or > tested in repo, only to be enabled in a foreseen future. I think this is the core point we disagree on. I understand your position, respectfully disagree, but I'm willing to set it aside. So perhaps we can narrow down the scope right now even further. Can we agree to try and settle on a base implementation with no ISA extensions and no uarch variants? ISTM if we can settle on those implementations that it should be usable immediately by the RV community at large and doesn't depend on the kernel->glibc interface work. Jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-31 20:19 ` Jeff Law @ 2023-03-31 21:03 ` Palmer Dabbelt 2023-03-31 21:35 ` Jeff Law 0 siblings, 1 reply; 27+ messages in thread From: Palmer Dabbelt @ 2023-03-31 21:03 UTC (permalink / raw) To: jeffreyalaw Cc: adhemerval.zanella, Evan Green, libc-alpha, slewis, Vineet Gupta On Fri, 31 Mar 2023 13:19:19 PDT (-0700), jeffreyalaw@gmail.com wrote: [just snipping the rest so we can focus on Jeff's ask, the other stuff is interesting but a longer reply and we'd probably want to fork the thread anyway...] > So perhaps we can narrow down the scope right now even further. Can we > agree to try and settle on a base implementation with no ISA extensions > and no uarch variants? ISTM if we can settle on those implementations > that it should be usable immediately by the RV community at large and > doesn't depend on the kernel->glibc interface work. That base includes V and ZBB? In that case we'd be dropping support for all existing hardware, which I would be very much against. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-31 21:03 ` Palmer Dabbelt @ 2023-03-31 21:35 ` Jeff Law 2023-03-31 21:38 ` Palmer Dabbelt 0 siblings, 1 reply; 27+ messages in thread From: Jeff Law @ 2023-03-31 21:35 UTC (permalink / raw) To: Palmer Dabbelt Cc: adhemerval.zanella, Evan Green, libc-alpha, slewis, Vineet Gupta On 3/31/23 15:03, Palmer Dabbelt wrote: > On Fri, 31 Mar 2023 13:19:19 PDT (-0700), jeffreyalaw@gmail.com wrote: > > [just snipping the rest so we can focus on Jeff's ask, the other stuff > is interesting but a longer reply and we'd probably want to fork the > thread anyway...] > >> So perhaps we can narrow down the scope right now even further. Can we >> agree to try and settle on a base implementation with no ISA extensions >> and no uarch variants? ISTM if we can settle on those implementations >> that it should be usable immediately by the RV community at large and >> doesn't depend on the kernel->glibc interface work. > > That base includes V and ZBB? In that case we'd be dropping support for > all existing hardware, which I would be very much against. No, it would not include V or ZBB. It would be something that could work on any risc-v hardware. Sorry if I wasn't clear about that. jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-31 21:35 ` Jeff Law @ 2023-03-31 21:38 ` Palmer Dabbelt 2023-03-31 22:10 ` Jeff Law 0 siblings, 1 reply; 27+ messages in thread From: Palmer Dabbelt @ 2023-03-31 21:38 UTC (permalink / raw) To: jeffreyalaw Cc: adhemerval.zanella, Evan Green, libc-alpha, slewis, Vineet Gupta On Fri, 31 Mar 2023 14:35:36 PDT (-0700), jeffreyalaw@gmail.com wrote: > > > On 3/31/23 15:03, Palmer Dabbelt wrote: >> On Fri, 31 Mar 2023 13:19:19 PDT (-0700), jeffreyalaw@gmail.com wrote: >> >> [just snipping the rest so we can focus on Jeff's ask, the other stuff >> is interesting but a longer reply and we'd probably want to fork the >> thread anyway...] >> >>> So perhaps we can narrow down the scope right now even further. Can we >>> agree to try and settle on a base implementation with no ISA extensions >>> and no uarch variants? ISTM if we can settle on those implementations >>> that it should be usable immediately by the RV community at large and >>> doesn't depend on the kernel->glibc interface work. >> >> That base includes V and ZBB? In that case we'd be dropping support for >> all existing hardware, which I would be very much against. > No, it would not include V or ZBB. It would be something that could > work on any risc-v hardware. Sorry if I wasn't clear about that. I'm still kind of confused then, maybe it's just too abstract? Is there something you could propose as being the base? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-31 21:38 ` Palmer Dabbelt @ 2023-03-31 22:10 ` Jeff Law 2023-04-07 15:36 ` Palmer Dabbelt 0 siblings, 1 reply; 27+ messages in thread From: Jeff Law @ 2023-03-31 22:10 UTC (permalink / raw) To: Palmer Dabbelt Cc: adhemerval.zanella, Evan Green, libc-alpha, slewis, Vineet Gupta On 3/31/23 15:38, Palmer Dabbelt wrote: > On Fri, 31 Mar 2023 14:35:36 PDT (-0700), jeffreyalaw@gmail.com wrote: >> >> >> On 3/31/23 15:03, Palmer Dabbelt wrote: >>> On Fri, 31 Mar 2023 13:19:19 PDT (-0700), jeffreyalaw@gmail.com wrote: >>> >>> [just snipping the rest so we can focus on Jeff's ask, the other stuff >>> is interesting but a longer reply and we'd probably want to fork the >>> thread anyway...] >>> >>>> So perhaps we can narrow down the scope right now even further. Can we >>>> agree to try and settle on a base implementation with no ISA extensions >>>> and no uarch variants? ISTM if we can settle on those implementations >>>> that it should be usable immediately by the RV community at large and >>>> doesn't depend on the kernel->glibc interface work. >>> >>> That base includes V and ZBB? In that case we'd be dropping support for >>> all existing hardware, which I would be very much against. >> No, it would not include V or ZBB. It would be something that could >> work on any risc-v hardware. Sorry if I wasn't clear about that. > > I'm still kind of confused then, maybe it's just too abstract? Is there > something you could propose as being the base? So right now we use the generic (architecture independent) routines for str* and mem*. If we look at (for example) strcmp there's hand written variants out there are are purported to have better performance than the generic code in glibc. Note that any such performance claims likely predate the work from Adhemerval and others earlier this year to reduce the reliance on hand-coded assembly. So the first step is to answer the question, for any str* or mem* where we've received a patch submission of a hand coded assembly variant (which isn't using ZBB or V), does that hand coded assembly variant significantly out perform the generic code currently in glibc. If yes and the generic code can't be significantly improved, then we should declare that hand written variant as the standard baseline for risc-v in glibc. Review, adjust, commit and move on. My hope would be that many (most, all?) of the base architecture hand coded assembly variants no longer provide any significant benefit over the current generic versions. That's my minimal proposal for now. It's not meant to solve everything in this space, but at least carve out a chunk of the work and get it resolved one way or the other. Does that help clarify what I'm suggesting? Jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface 2023-03-31 22:10 ` Jeff Law @ 2023-04-07 15:36 ` Palmer Dabbelt 0 siblings, 0 replies; 27+ messages in thread From: Palmer Dabbelt @ 2023-04-07 15:36 UTC (permalink / raw) To: jeffreyalaw Cc: adhemerval.zanella, Evan Green, libc-alpha, slewis, Vineet Gupta On Fri, 31 Mar 2023 15:10:24 PDT (-0700), jeffreyalaw@gmail.com wrote: > > > On 3/31/23 15:38, Palmer Dabbelt wrote: >> On Fri, 31 Mar 2023 14:35:36 PDT (-0700), jeffreyalaw@gmail.com wrote: >>> >>> >>> On 3/31/23 15:03, Palmer Dabbelt wrote: >>>> On Fri, 31 Mar 2023 13:19:19 PDT (-0700), jeffreyalaw@gmail.com wrote: >>>> >>>> [just snipping the rest so we can focus on Jeff's ask, the other stuff >>>> is interesting but a longer reply and we'd probably want to fork the >>>> thread anyway...] >>>> >>>>> So perhaps we can narrow down the scope right now even further. Can we >>>>> agree to try and settle on a base implementation with no ISA extensions >>>>> and no uarch variants? ISTM if we can settle on those implementations >>>>> that it should be usable immediately by the RV community at large and >>>>> doesn't depend on the kernel->glibc interface work. >>>> >>>> That base includes V and ZBB? In that case we'd be dropping support for >>>> all existing hardware, which I would be very much against. >>> No, it would not include V or ZBB. It would be something that could >>> work on any risc-v hardware. Sorry if I wasn't clear about that. >> >> I'm still kind of confused then, maybe it's just too abstract? Is there >> something you could propose as being the base? > So right now we use the generic (architecture independent) routines for > str* and mem*. > > If we look at (for example) strcmp there's hand written variants out > there are are purported to have better performance than the generic code > in glibc. > > Note that any such performance claims likely predate the work from > Adhemerval and others earlier this year to reduce the reliance on > hand-coded assembly. > > So the first step is to answer the question, for any str* or mem* where > we've received a patch submission of a hand coded assembly variant > (which isn't using ZBB or V), does that hand coded assembly variant > significantly out perform the generic code currently in glibc. If yes > and the generic code can't be significantly improved, then we should > declare that hand written variant as the standard baseline for risc-v in > glibc. Review, adjust, commit and move on. > > My hope would be that many (most, all?) of the base architecture hand > coded assembly variants no longer provide any significant benefit over > the current generic versions. > > That's my minimal proposal for now. It's not meant to solve everything > in this space, but at least carve out a chunk of the work and get it > resolved one way or the other. > > Does that help clarify what I'm suggesting? Sorry for being slow here, this fell off the queue. I think this proposal is in theory what we've done, it's just that nobody's posted patches like that -- unless I missed something? Certainly the original port had some assembly routines an we tossed those because we didn't care enough to justify them. If someone's got code then I'm happy to look, but we'd also need some benchmarks (on real HW that's publicly available) and that's usually the sticking point. That said, I'd guess that anyone trying to ship real product is going to need at least V (or some other explicitly data parallel instructions) before the performance of these routines matters. ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2023-04-07 15:36 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-02-21 19:15 [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green 2023-02-21 19:15 ` [PATCH v2 1/3] riscv: Add Linux hwprobe syscall support Evan Green 2023-03-29 18:38 ` Adhemerval Zanella Netto 2023-02-21 19:15 ` [PATCH v2 2/3] riscv: Add hwprobe vdso call support Evan Green 2023-03-29 18:39 ` Adhemerval Zanella Netto 2023-02-21 19:15 ` [PATCH v2 3/3] riscv: Add and use alignment-ignorant memcpy Evan Green 2023-03-28 22:54 ` [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Palmer Dabbelt 2023-03-28 23:41 ` Adhemerval Zanella Netto 2023-03-29 0:01 ` Palmer Dabbelt 2023-03-29 19:16 ` Adhemerval Zanella Netto 2023-03-29 19:45 ` Palmer Dabbelt 2023-03-29 20:13 ` Adhemerval Zanella Netto 2023-03-30 18:31 ` Evan Green 2023-03-30 19:43 ` Adhemerval Zanella Netto 2023-03-30 6:20 ` Jeff Law 2023-03-30 18:43 ` Evan Green 2023-03-31 5:09 ` Jeff Law 2023-03-30 19:38 ` Adhemerval Zanella Netto 2023-03-31 18:07 ` Jeff Law 2023-03-31 18:34 ` Palmer Dabbelt 2023-03-31 19:32 ` Adhemerval Zanella Netto 2023-03-31 20:19 ` Jeff Law 2023-03-31 21:03 ` Palmer Dabbelt 2023-03-31 21:35 ` Jeff Law 2023-03-31 21:38 ` Palmer Dabbelt 2023-03-31 22:10 ` Jeff Law 2023-04-07 15:36 ` Palmer Dabbelt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).