public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v9 0/6] RISC-V: ifunced memcpy using new kernel hwprobe interface
@ 2023-11-30 18:32 Evan Green
  2023-11-30 18:32 ` [PATCH v9 1/6] riscv: Add Linux hwprobe syscall support Evan Green
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Evan Green @ 2023-11-30 18:32 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, palmer, vineetg, slewis, Evan Green


This series illustrates the use of a recently accepted Linux syscall that
enumerates architectural information about the RISC-V cores the system
is running on. In this series we expose a small wrapper function around
the syscall. An ifunc selector for memcpy queries it to see if unaligned
access is "fast" on this hardware. If it is, it selects a newly provided
implementation of memcpy that doesn't work hard at aligning the src and
destination buffers.

For applications and libraries outside of glibc that want to use
__riscv_hwprobe() in ifunc selectors, this series also sends a pointer
to the riscv_hwprobe() function in as the second argument to ifunc
selectors. A new inline convenience function can help application and
library callers to check for validity and quickly probe a single key.

The memcpy implementation is independent enough from the rest of the
series that it can be omitted safely if desired.

Performance numbers were compared using a small test program [1], run on
a D1 Nezha board, which supports fast unaligned access. "Fast" here
means copying unaligned words is faster than copying byte-wise, but
still slower than copying aligned words. Here's the speed of various
memcpy()s with the generic implementation. The numbers before are using
v4's memcpy implementation, with the "copy last byte via overlapping
misaligned word" fix this should get slightly better.

memcpy size 1 count 1000000 offset 0 took 109564 us
memcpy size 3 count 1000000 offset 0 took 138425 us
memcpy size 4 count 1000000 offset 0 took 148374 us
memcpy size 7 count 1000000 offset 0 took 178433 us
memcpy size 8 count 1000000 offset 0 took 188430 us
memcpy size f count 1000000 offset 0 took 266118 us
memcpy size f count 1000000 offset 1 took 265940 us
memcpy size f count 1000000 offset 3 took 265934 us
memcpy size f count 1000000 offset 7 took 266215 us
memcpy size f count 1000000 offset 8 took 265954 us
memcpy size f count 1000000 offset 9 took 265886 us
memcpy size 10 count 1000000 offset 0 took 195308 us
memcpy size 11 count 1000000 offset 0 took 205161 us
memcpy size 17 count 1000000 offset 0 took 274376 us
memcpy size 18 count 1000000 offset 0 took 199188 us
memcpy size 19 count 1000000 offset 0 took 209258 us
memcpy size 1f count 1000000 offset 0 took 278263 us
memcpy size 20 count 1000000 offset 0 took 207364 us
memcpy size 21 count 1000000 offset 0 took 217143 us
memcpy size 3f count 1000000 offset 0 took 300023 us
memcpy size 40 count 1000000 offset 0 took 231063 us
memcpy size 41 count 1000000 offset 0 took 241259 us
memcpy size 7c count 100000 offset 0 took 32807 us
memcpy size 7f count 100000 offset 0 took 36274 us
memcpy size ff count 100000 offset 0 took 47818 us
memcpy size ff count 100000 offset 0 took 47932 us
memcpy size 100 count 100000 offset 0 took 40468 us
memcpy size 200 count 100000 offset 0 took 64245 us
memcpy size 27f count 100000 offset 0 took 82549 us
memcpy size 400 count 100000 offset 0 took 111254 us
memcpy size 407 count 100000 offset 0 took 119364 us
memcpy size 800 count 100000 offset 0 took 203899 us
memcpy size 87f count 100000 offset 0 took 222465 us
memcpy size 87f count 100000 offset 3 took 222289 us
memcpy size 1000 count 100000 offset 0 took 388846 us
memcpy size 1000 count 100000 offset 1 took 468827 us
memcpy size 1000 count 100000 offset 3 took 397098 us
memcpy size 1000 count 100000 offset 4 took 397379 us
memcpy size 1000 count 100000 offset 5 took 397368 us
memcpy size 1000 count 100000 offset 7 took 396867 us
memcpy size 1000 count 100000 offset 8 took 389227 us
memcpy size 1000 count 100000 offset 9 took 395949 us
memcpy size 3000 count 50000 offset 0 took 674837 us
memcpy size 3000 count 50000 offset 1 took 676944 us
memcpy size 3000 count 50000 offset 3 took 679709 us
memcpy size 3000 count 50000 offset 4 took 680829 us
memcpy size 3000 count 50000 offset 5 took 678024 us
memcpy size 3000 count 50000 offset 7 took 681097 us
memcpy size 3000 count 50000 offset 8 took 670004 us
memcpy size 3000 count 50000 offset 9 took 674553 us

Here is that same test run with the assembly memcpy() in this series:
memcpy size 1 count 1000000 offset 0 took 92703 us
memcpy size 3 count 1000000 offset 0 took 112527 us
memcpy size 4 count 1000000 offset 0 took 120481 us
memcpy size 7 count 1000000 offset 0 took 149558 us
memcpy size 8 count 1000000 offset 0 took 90617 us
memcpy size f count 1000000 offset 0 took 174373 us
memcpy size f count 1000000 offset 1 took 178615 us
memcpy size f count 1000000 offset 3 took 178845 us
memcpy size f count 1000000 offset 7 took 178636 us
memcpy size f count 1000000 offset 8 took 174442 us
memcpy size f count 1000000 offset 9 took 178660 us
memcpy size 10 count 1000000 offset 0 took 99845 us
memcpy size 11 count 1000000 offset 0 took 112522 us
memcpy size 17 count 1000000 offset 0 took 179735 us
memcpy size 18 count 1000000 offset 0 took 110870 us
memcpy size 19 count 1000000 offset 0 took 121472 us
memcpy size 1f count 1000000 offset 0 took 188231 us
memcpy size 20 count 1000000 offset 0 took 119571 us
memcpy size 21 count 1000000 offset 0 took 132429 us
memcpy size 3f count 1000000 offset 0 took 227021 us
memcpy size 40 count 1000000 offset 0 took 166416 us
memcpy size 41 count 1000000 offset 0 took 180206 us
memcpy size 7c count 100000 offset 0 took 28602 us
memcpy size 7f count 100000 offset 0 took 31676 us
memcpy size ff count 100000 offset 0 took 39257 us
memcpy size ff count 100000 offset 0 took 39176 us
memcpy size 100 count 100000 offset 0 took 21928 us
memcpy size 200 count 100000 offset 0 took 35814 us
memcpy size 27f count 100000 offset 0 took 60315 us
memcpy size 400 count 100000 offset 0 took 63652 us
memcpy size 407 count 100000 offset 0 took 73160 us
memcpy size 800 count 100000 offset 0 took 121532 us
memcpy size 87f count 100000 offset 0 took 147269 us
memcpy size 87f count 100000 offset 3 took 144744 us
memcpy size 1000 count 100000 offset 0 took 232057 us
memcpy size 1000 count 100000 offset 1 took 254319 us
memcpy size 1000 count 100000 offset 3 took 256973 us
memcpy size 1000 count 100000 offset 4 took 257655 us
memcpy size 1000 count 100000 offset 5 took 259456 us
memcpy size 1000 count 100000 offset 7 took 260849 us
memcpy size 1000 count 100000 offset 8 took 232347 us
memcpy size 1000 count 100000 offset 9 took 254330 us
memcpy size 3000 count 50000 offset 0 took 382376 us
memcpy size 3000 count 50000 offset 1 took 389872 us
memcpy size 3000 count 50000 offset 3 took 385310 us
memcpy size 3000 count 50000 offset 4 took 389748 us
memcpy size 3000 count 50000 offset 5 took 391707 us
memcpy size 3000 count 50000 offset 7 took 386778 us
memcpy size 3000 count 50000 offset 8 took 385691 us
memcpy size 3000 count 50000 offset 9 took 392030 us

The assembly routine is measurably better.

[1] https://pastebin.com/DRyECNQW


Changes in v9:
 - Alphabetize new entries in libc.abilist (to pass checks)
 - Fix a couple of typos causing powerpc not to build
   (build-many-glibcs)
 - Use __inline rather than inline so c89 compiles (build-many-glibcs)

Changes in v8:
 - Fix missed 2.39 in abilists (Joseph)
 - Just return -r (Florian)

Changes in v7:
 - Bumped Versions up to 2.39 (Joseph)
 - Used INTERNAL_SYSCALL_CALL, and return positive errno to match
   pthreads API (Florian).
 - Remove __THROW since it creates a warning in combination with the
   fortified access decorators.
 - Use INTERNAL_VSYSCALL_CALL (Florian)
 - Remove __THROW from function pointer type, as it creates warnings
   together with __fortified_attr_access.
 - Introduced static inline helper (Richard)
 - Use new helper function in memcpy ifunc selector (Richard)

Changes in v6:
 - Prefixed __riscv_hwprobe() parameters names with __ to avoid user
   macro namespace pollution (Joseph)
 - Introduced riscv-ifunc.h for multi-arg ifunc selectors.
 - Fix a couple regressions in the assembly from v5 :/
 - Use passed hwprobe pointer in memcpy ifunc selector.

Changes in v5:
 - Do unaligned word access for final trailing bytes (Richard)

Changes in v4:
 - Remove __USE_GNU (Florian)
 - __nonnull, __wur, __THROW, and  __fortified_attr_access decorations
  (Florian)
 - change long to long int (Florian)
 - Fix comment formatting (Florian)
 - Update backup kernel header content copy.
 - Fix function declaration formatting (Florian)
 - Changed export versions to 2.38
 - Fixed comment style (Florian)

Changes in v3:
 - Update argument types to match v4 kernel interface
 - Add the "return" to the vsyscall
 - Fix up vdso arg types to match kernel v4 version
 - Remove ifdef around INLINE_VSYSCALL (Adhemerval)
 - Word align dest for large memcpy()s.
 - Add tags
 - Remove spurious blank line from sysdeps/riscv/memcpy.c

Changes in v2:
 - hwprobe.h: Use __has_include and duplicate Linux content to make
   compilation work when Linux headers are absent (Adhemerval)
 - hwprobe.h: Put declaration under __USE_GNU (Adhemerval)
 - Use INLINE_SYSCALL_CALL (Adhemerval)
 - Update versions
 - Update UNALIGNED_MASK to match kernel v3 series.
 - Add vDSO interface
 - Used _MASK instead of _FAST value itself.

Evan Green (6):
  riscv: Add Linux hwprobe syscall support
  riscv: Add hwprobe vdso call support
  riscv: Add __riscv_hwprobe pointer to ifunc calls
  riscv: Enable multi-arg ifunc resolvers
  riscv: Add ifunc helper method to hwprobe.h
  riscv: Add and use alignment-ignorant memcpy

 include/libc-symbols.h                        |  28 ++--
 sysdeps/riscv/dl-irel.h                       |   8 +-
 sysdeps/riscv/memcopy.h                       |  26 ++++
 sysdeps/riscv/memcpy.c                        |  63 ++++++++
 sysdeps/riscv/memcpy_noalignment.S            | 138 ++++++++++++++++++
 sysdeps/riscv/riscv-ifunc.h                   |  27 ++++
 sysdeps/unix/sysv/linux/dl-vdso-setup.c       |  10 ++
 sysdeps/unix/sysv/linux/dl-vdso-setup.h       |   3 +
 sysdeps/unix/sysv/linux/riscv/Makefile        |   8 +-
 sysdeps/unix/sysv/linux/riscv/Versions        |   3 +
 sysdeps/unix/sysv/linux/riscv/hwprobe.c       |  47 ++++++
 .../unix/sysv/linux/riscv/memcpy-generic.c    |  24 +++
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h   | 107 ++++++++++++++
 sysdeps/unix/sysv/linux/riscv/sysdep.h        |   1 +
 16 files changed, 478 insertions(+), 17 deletions(-)
 create mode 100644 sysdeps/riscv/memcopy.h
 create mode 100644 sysdeps/riscv/memcpy.c
 create mode 100644 sysdeps/riscv/memcpy_noalignment.S
 create mode 100644 sysdeps/riscv/riscv-ifunc.h
 create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v9 1/6] riscv: Add Linux hwprobe syscall support
  2023-11-30 18:32 [PATCH v9 0/6] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
@ 2023-11-30 18:32 ` Evan Green
  2023-12-06 17:22   ` Adhemerval Zanella Netto
  2023-11-30 18:32 ` [PATCH v9 2/6] riscv: Add hwprobe vdso call support Evan Green
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Evan Green @ 2023-11-30 18:32 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, palmer, vineetg, slewis, Evan Green

Add awareness and a thin wrapper function around a new Linux system call
that allows callers to get architecture and microarchitecture
information about the CPUs from the kernel. This can be used to
do things like dynamically choose a memcpy implementation.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
---

Changes in v9:
 - Alphabetize new entries in libc.abilist (to pass checks)

Changes in v8:
 - Fix missed 2.39 in abilists (Joseph)
 - Just return -r (Florian)

Changes in v7:
 - Bumped Versions up to 2.39 (Joseph)
 - Used INTERNAL_SYSCALL_CALL, and return positive errno to match
   pthreads API (Florian).
 - Remove __THROW since it creates a warning in combination with the
   fortified access decorators.

Changes in v6:
 - Prefixed __riscv_hwprobe() parameters names with __ to avoid user
   macro namespace pollution (Joseph)

Changes in v4:
 - Remove __USE_GNU (Florian)
 - __nonnull, __wur, __THROW, and  __fortified_attr_access decorations
  (Florian)
 - change long to long int (Florian)
 - Fix comment formatting (Florian)
 - Update backup kernel header content copy.
 - Fix function declaration formatting (Florian)
 - Changed export versions to 2.38

Changes in v3:
 - Update argument types to match v4 kernel interface

Changes in v2:
 - hwprobe.h: Use __has_include and duplicate Linux content to make
   compilation work when Linux headers are absent (Adhemerval)
 - hwprobe.h: Put declaration under __USE_GNU (Adhemerval)
 - Use INLINE_SYSCALL_CALL (Adhemerval)
 - Update versions
 - Update UNALIGNED_MASK to match kernel v3 series.

 sysdeps/unix/sysv/linux/riscv/Makefile        |  4 +-
 sysdeps/unix/sysv/linux/riscv/Versions        |  3 +
 sysdeps/unix/sysv/linux/riscv/hwprobe.c       | 36 ++++++++++
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |  1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |  1 +
 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h   | 72 +++++++++++++++++++
 6 files changed, 115 insertions(+), 2 deletions(-)
 create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h

diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile
index 4b6eacb32f..45cc29e40d 100644
--- a/sysdeps/unix/sysv/linux/riscv/Makefile
+++ b/sysdeps/unix/sysv/linux/riscv/Makefile
@@ -1,6 +1,6 @@
 ifeq ($(subdir),misc)
-sysdep_headers += sys/cachectl.h
-sysdep_routines += flush-icache
+sysdep_headers += sys/cachectl.h sys/hwprobe.h
+sysdep_routines += flush-icache hwprobe
 endif
 
 ifeq ($(subdir),stdlib)
diff --git a/sysdeps/unix/sysv/linux/riscv/Versions b/sysdeps/unix/sysv/linux/riscv/Versions
index 5625d2a0b8..8717b62a4a 100644
--- a/sysdeps/unix/sysv/linux/riscv/Versions
+++ b/sysdeps/unix/sysv/linux/riscv/Versions
@@ -8,4 +8,7 @@ libc {
   GLIBC_2.27 {
     __riscv_flush_icache;
   }
+  GLIBC_2.39 {
+    __riscv_hwprobe;
+  }
 }
diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
new file mode 100644
index 0000000000..e28194e344
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
@@ -0,0 +1,36 @@
+/* RISC-V hardware feature probing support on Linux
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sys/syscall.h>
+#include <sys/hwprobe.h>
+#include <sysdep.h>
+#include <sysdep-vdso.h>
+
+int __riscv_hwprobe (struct riscv_hwprobe *__pairs, size_t __pair_count,
+		     size_t __cpu_count, unsigned long int *__cpus,
+		     unsigned int __flags)
+{
+  int r;
+
+  r = INTERNAL_SYSCALL_CALL (riscv_hwprobe, 5, __pairs, __pair_count,
+                             __cpu_count, __cpus, __flags);
+
+  /* Negate negative errno values to match pthreads API. */
+  return -r;
+}
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index a13c484582..6c96e25552 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2436,6 +2436,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 __riscv_hwprobe F
 GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index cf65d8d6d4..225278807b 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2636,6 +2636,7 @@ GLIBC_2.38 strlcat F
 GLIBC_2.38 strlcpy F
 GLIBC_2.38 wcslcat F
 GLIBC_2.38 wcslcpy F
+GLIBC_2.39 __riscv_hwprobe F
 GLIBC_2.39 pidfd_getpid F
 GLIBC_2.39 pidfd_spawn F
 GLIBC_2.39 pidfd_spawnp F
diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
new file mode 100644
index 0000000000..aa189a4818
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
@@ -0,0 +1,72 @@
+/* RISC-V architecture probe interface
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_HWPROBE_H
+#define _SYS_HWPROBE_H 1
+
+#include <features.h>
+#include <stddef.h>
+#ifdef __has_include
+# if __has_include (<asm/hwprobe.h>)
+#  include <asm/hwprobe.h>
+# endif
+#endif
+
+/* Define a (probably stale) version of the interface if the Linux headers
+   aren't present.  */
+#ifndef RISCV_HWPROBE_KEY_MVENDORID
+struct riscv_hwprobe {
+	signed long long int key;
+	unsigned long long int value;
+};
+
+#define RISCV_HWPROBE_KEY_MVENDORID	0
+#define RISCV_HWPROBE_KEY_MARCHID	1
+#define RISCV_HWPROBE_KEY_MIMPID	2
+#define RISCV_HWPROBE_KEY_BASE_BEHAVIOR	3
+#define		RISCV_HWPROBE_BASE_BEHAVIOR_IMA	(1 << 0)
+#define RISCV_HWPROBE_KEY_IMA_EXT_0	4
+#define		RISCV_HWPROBE_IMA_FD		(1 << 0)
+#define		RISCV_HWPROBE_IMA_C		(1 << 1)
+#define		RISCV_HWPROBE_IMA_V		(1 << 2)
+#define		RISCV_HWPROBE_EXT_ZBA		(1 << 3)
+#define		RISCV_HWPROBE_EXT_ZBB		(1 << 4)
+#define		RISCV_HWPROBE_EXT_ZBS		(1 << 5)
+#define RISCV_HWPROBE_KEY_CPUPERF_0	5
+#define		RISCV_HWPROBE_MISALIGNED_UNKNOWN	(0 << 0)
+#define		RISCV_HWPROBE_MISALIGNED_EMULATED	(1 << 0)
+#define		RISCV_HWPROBE_MISALIGNED_SLOW		(2 << 0)
+#define		RISCV_HWPROBE_MISALIGNED_FAST		(3 << 0)
+#define		RISCV_HWPROBE_MISALIGNED_UNSUPPORTED	(4 << 0)
+#define		RISCV_HWPROBE_MISALIGNED_MASK		(7 << 0)
+
+#endif /* RISCV_HWPROBE_KEY_MVENDORID */
+
+__BEGIN_DECLS
+
+extern int __riscv_hwprobe (struct riscv_hwprobe *__pairs, size_t __pair_count,
+			    size_t __cpu_count, unsigned long int *__cpus,
+			    unsigned int __flags)
+     __nonnull ((1)) __wur
+     __fortified_attr_access (__read_write__, 1, 2)
+     __fortified_attr_access (__read_only__, 4, 3);
+
+__END_DECLS
+
+#endif /* sys/hwprobe.h */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v9 2/6] riscv: Add hwprobe vdso call support
  2023-11-30 18:32 [PATCH v9 0/6] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
  2023-11-30 18:32 ` [PATCH v9 1/6] riscv: Add Linux hwprobe syscall support Evan Green
@ 2023-11-30 18:32 ` Evan Green
  2023-12-06 17:34   ` Adhemerval Zanella Netto
  2023-11-30 18:32 ` [PATCH v9 3/6] riscv: Add __riscv_hwprobe pointer to ifunc calls Evan Green
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Evan Green @ 2023-11-30 18:32 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, palmer, vineetg, slewis, Evan Green

The new riscv_hwprobe syscall also comes with a vDSO for faster answers
to your most common questions. Call in today to speak with a kernel
representative near you!

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
---

(no changes since v7)

Changes in v7:
 - Use INTERNAL_VSYSCALL_CALL (Florian)

Changes in v3:
 - Add the "return" to the vsyscall
 - Fix up vdso arg types to match kernel v4 version
 - Remove ifdef around INLINE_VSYSCALL (Adhemerval)

Changes in v2:
 - Add vDSO interface

 sysdeps/unix/sysv/linux/dl-vdso-setup.c | 10 ++++++++++
 sysdeps/unix/sysv/linux/dl-vdso-setup.h |  3 +++
 sysdeps/unix/sysv/linux/riscv/hwprobe.c | 17 ++++++++++++++---
 sysdeps/unix/sysv/linux/riscv/sysdep.h  |  1 +
 4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.c b/sysdeps/unix/sysv/linux/dl-vdso-setup.c
index 97eaaeac37..ed8b1ef426 100644
--- a/sysdeps/unix/sysv/linux/dl-vdso-setup.c
+++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.c
@@ -71,6 +71,16 @@ PROCINFO_CLASS int (*_dl_vdso_clock_getres_time64) (clockid_t,
 # ifdef HAVE_GET_TBFREQ
 PROCINFO_CLASS uint64_t (*_dl_vdso_get_tbfreq)(void) RELRO;
 # endif
+
+/* RISC-V specific ones.  */
+# ifdef HAVE_RISCV_HWPROBE
+PROCINFO_CLASS int (*_dl_vdso_riscv_hwprobe)(void *,
+                                             size_t,
+                                             size_t,
+                                             unsigned long *,
+                                             unsigned int) RELRO;
+# endif
+
 #endif
 
 #undef RELRO
diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.h b/sysdeps/unix/sysv/linux/dl-vdso-setup.h
index 867072b897..39eafd5316 100644
--- a/sysdeps/unix/sysv/linux/dl-vdso-setup.h
+++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.h
@@ -47,6 +47,9 @@ setup_vdso_pointers (void)
 #ifdef HAVE_GET_TBFREQ
   GLRO(dl_vdso_get_tbfreq) = dl_vdso_vsym (HAVE_GET_TBFREQ);
 #endif
+#ifdef HAVE_RISCV_HWPROBE
+  GLRO(dl_vdso_riscv_hwprobe) = dl_vdso_vsym (HAVE_RISCV_HWPROBE);
+#endif
 }
 
 #endif
diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
index e28194e344..6a9a44657f 100644
--- a/sysdeps/unix/sysv/linux/riscv/hwprobe.c
+++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
@@ -27,9 +27,20 @@ int __riscv_hwprobe (struct riscv_hwprobe *__pairs, size_t __pair_count,
 		     unsigned int __flags)
 {
   int r;
-
-  r = INTERNAL_SYSCALL_CALL (riscv_hwprobe, 5, __pairs, __pair_count,
-                             __cpu_count, __cpus, __flags);
+  __riscv_hwprobe_t vdso_hwprobe =
+    (__riscv_hwprobe_t)GLRO(dl_vdso_riscv_hwprobe);
+
+  if (vdso_hwprobe != NULL)
+    {
+      r = INTERNAL_VSYSCALL_CALL (vdso_hwprobe, 5, __pairs, __pair_count,
+                                  __cpu_count, __cpus, __flags);
+    }
+  else
+    {
+      r = INTERNAL_SYSCALL_CALL (riscv_hwprobe, 5, __pairs, __pair_count,
+                                 __cpu_count, __cpus, __flags);
+
+    }
 
   /* Negate negative errno values to match pthreads API. */
   return -r;
diff --git a/sysdeps/unix/sysv/linux/riscv/sysdep.h b/sysdeps/unix/sysv/linux/riscv/sysdep.h
index 5583b96d23..ee015dfeb6 100644
--- a/sysdeps/unix/sysv/linux/riscv/sysdep.h
+++ b/sysdeps/unix/sysv/linux/riscv/sysdep.h
@@ -156,6 +156,7 @@
 /* List of system calls which are supported as vsyscalls (for RV32 and
    RV64).  */
 # define HAVE_GETCPU_VSYSCALL		"__vdso_getcpu"
+# define HAVE_RISCV_HWPROBE		"__vdso_riscv_hwprobe"
 
 # undef HAVE_INTERNAL_BRK_ADDR_SYMBOL
 # define HAVE_INTERNAL_BRK_ADDR_SYMBOL 1
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v9 3/6] riscv: Add __riscv_hwprobe pointer to ifunc calls
  2023-11-30 18:32 [PATCH v9 0/6] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
  2023-11-30 18:32 ` [PATCH v9 1/6] riscv: Add Linux hwprobe syscall support Evan Green
  2023-11-30 18:32 ` [PATCH v9 2/6] riscv: Add hwprobe vdso call support Evan Green
@ 2023-11-30 18:32 ` Evan Green
  2023-12-06  0:31   ` enh
  2023-11-30 18:32 ` [PATCH v9 4/6] riscv: Enable multi-arg ifunc resolvers Evan Green
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Evan Green @ 2023-11-30 18:32 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, palmer, vineetg, slewis, Evan Green

The new __riscv_hwprobe() function is designed to be used by ifunc
selector functions. This presents a challenge for applications and
libraries, as ifunc selectors are invoked before all relocations have
been performed, so an external call to __riscv_hwprobe() from an ifunc
selector won't work. To address this, pass a pointer to the
__riscv_hwprobe() vDSO function into ifunc selectors as the second
argument (alongside dl_hwcap, which was already being passed).

Include a typedef as well for convenience, so that ifunc users don't
have to go through contortions to call this routine. Users will need to
remember to check the second argument for NULL, both to account for
older glibcs that don't pass the function, and older kernels that don't
have the vDSO pointer.

Signed-off-by: Evan Green <evan@rivosinc.com>

---

(no changes since v7)

Changes in v7:
 - Remove __THROW from function pointer type, as it creates warnings
   together with __fortified_attr_access.

 sysdeps/riscv/dl-irel.h                     |  8 ++++----
 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 10 ++++++++++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/sysdeps/riscv/dl-irel.h b/sysdeps/riscv/dl-irel.h
index eaeec5467c..2147504458 100644
--- a/sysdeps/riscv/dl-irel.h
+++ b/sysdeps/riscv/dl-irel.h
@@ -31,10 +31,10 @@ static inline ElfW(Addr)
 __attribute ((always_inline))
 elf_ifunc_invoke (ElfW(Addr) addr)
 {
-  /* The second argument is a void pointer to preserve the extension
-     fexibility.  */
-  return ((ElfW(Addr) (*) (uint64_t, void *)) (addr))
-	 (GLRO(dl_hwcap), NULL);
+  /* The third argument is a void pointer to preserve the extension
+     flexibility.  */
+  return ((ElfW(Addr) (*) (uint64_t, void *, void *)) (addr))
+	 (GLRO(dl_hwcap), GLRO(dl_vdso_riscv_hwprobe), NULL);
 }
 
 static inline void
diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
index aa189a4818..fd3be5a411 100644
--- a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
+++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
@@ -67,6 +67,16 @@ extern int __riscv_hwprobe (struct riscv_hwprobe *__pairs, size_t __pair_count,
      __fortified_attr_access (__read_write__, 1, 2)
      __fortified_attr_access (__read_only__, 4, 3);
 
+/* A pointer to the __riscv_hwprobe vDSO function is passed as the second
+   argument to ifunc selector routines. Include a function pointer type for
+   convenience in calling the function in those settings. */
+typedef int (*__riscv_hwprobe_t) (struct riscv_hwprobe *__pairs, size_t __pair_count,
+				  size_t __cpu_count, unsigned long int *__cpus,
+				  unsigned int __flags)
+     __nonnull ((1)) __wur
+     __fortified_attr_access (__read_write__, 1, 2)
+     __fortified_attr_access (__read_only__, 4, 3);
+
 __END_DECLS
 
 #endif /* sys/hwprobe.h */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v9 4/6] riscv: Enable multi-arg ifunc resolvers
  2023-11-30 18:32 [PATCH v9 0/6] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
                   ` (2 preceding siblings ...)
  2023-11-30 18:32 ` [PATCH v9 3/6] riscv: Add __riscv_hwprobe pointer to ifunc calls Evan Green
@ 2023-11-30 18:32 ` Evan Green
  2023-11-30 18:32 ` [PATCH v9 5/6] riscv: Add ifunc helper method to hwprobe.h Evan Green
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Evan Green @ 2023-11-30 18:32 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, palmer, vineetg, slewis, Evan Green

RISC-V is apparently the first architecture to pass more than one
argument to ifunc resolvers. The helper macros in libc-symbols.h,
__ifunc_resolver(), __ifunc(), and __ifunc_hidden(), are incompatible
with this. These macros have an "arg" (non-final) parameter that
represents the parameter signature of the ifunc resolver. The result is
an inability to pass the required comma through in a single preprocessor
argument.

Rearrange the __ifunc_resolver() macro to be variadic, and pass the
types as those variable parameters. Move the guts of __ifunc() and
__ifunc_hidden() into new macros, __ifunc_args(), and
__ifunc_args_hidden(), that pass the variable arguments down through to
__ifunc_resolver(). Then redefine __ifunc() and __ifunc_hidden(), which
are used in a bunch of places, to simply shuffle the arguments down into
__ifunc_args[_hidden]. Finally, define a riscv-ifunc.h header, which
provides convenience macros to those looking to write ifunc selectors
that use both arguments.

Signed-off-by: Evan Green <evan@rivosinc.com>

---

Changes in v9:
 - Fix a couple of typos causing powerpc not to build
   (build-many-glibcs)

Changes in v6:
 - Introduced riscv-ifunc.h for multi-arg ifunc selectors.

---
 include/libc-symbols.h      | 28 +++++++++++++++++-----------
 sysdeps/riscv/riscv-ifunc.h | 27 +++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 11 deletions(-)
 create mode 100644 sysdeps/riscv/riscv-ifunc.h

diff --git a/include/libc-symbols.h b/include/libc-symbols.h
index 5794614488..b45fad72b3 100644
--- a/include/libc-symbols.h
+++ b/include/libc-symbols.h
@@ -665,9 +665,9 @@ for linking")
 #endif
 
 /* Helper / base  macros for indirect function symbols.  */
-#define __ifunc_resolver(type_name, name, expr, arg, init, classifier)	\
+#define __ifunc_resolver(type_name, name, expr, init, classifier, ...)	\
   classifier inhibit_stack_protector					\
-  __typeof (type_name) *name##_ifunc (arg)				\
+  __typeof (type_name) *name##_ifunc (__VA_ARGS__)			\
   {									\
     init ();								\
     __typeof (type_name) *res = expr;					\
@@ -675,13 +675,13 @@ for linking")
   }
 
 #ifdef HAVE_GCC_IFUNC
-# define __ifunc(type_name, name, expr, arg, init)			\
+# define __ifunc_args(type_name, name, expr, init, ...)			\
   extern __typeof (type_name) name __attribute__			\
 			      ((ifunc (#name "_ifunc")));		\
-  __ifunc_resolver (type_name, name, expr, arg, init, static)
+  __ifunc_resolver (type_name, name, expr, init, static, __VA_ARGS__)
 
-# define __ifunc_hidden(type_name, name, expr, arg, init)	\
-  __ifunc (type_name, name, expr, arg, init)
+# define __ifunc_args_hidden(type_name, name, expr, init, ...)		\
+  __ifunc_args (type_name, name, expr, init, __VA_ARGS__)
 #else
 /* Gcc does not support __attribute__ ((ifunc (...))).  Use the old behaviour
    as fallback.  But keep in mind that the debug information for the ifunc
@@ -692,18 +692,24 @@ for linking")
    different signatures.  (Gcc support is disabled at least on a ppc64le
    Ubuntu 14.04 system.)  */
 
-# define __ifunc(type_name, name, expr, arg, init)			\
+# define __ifunc_args(type_name, name, expr, init, ...)			\
   extern __typeof (type_name) name;					\
-  __typeof (type_name) *name##_ifunc (arg) __asm__ (#name);		\
-  __ifunc_resolver (type_name, name, expr, arg, init,)			\
+  __typeof (type_name) *name##_ifunc (__VA_ARGS__) __asm__ (#name);	\
+  __ifunc_resolver (type_name, name, expr, init, , __VA_ARGS__)		\
  __asm__ (".type " #name ", %gnu_indirect_function");
 
-# define __ifunc_hidden(type_name, name, expr, arg, init)		\
+# define __ifunc_args_hidden(type_name, name, expr, init, ...)		\
   extern __typeof (type_name) __libc_##name;				\
-  __ifunc (type_name, __libc_##name, expr, arg, init)			\
+  __ifunc (type_name, __libc_##name, expr, __VA_ARGS__, init)		\
   strong_alias (__libc_##name, name);
 #endif /* !HAVE_GCC_IFUNC  */
 
+#define __ifunc(type_name, name, expr, arg, init)			\
+  __ifunc_args (type_name, name, expr, init, arg)
+
+#define __ifunc_hidden(type_name, name, expr, arg, init)		\
+  __ifunc_args_hidden (type_name, name, expr, init, arg)
+
 /* The following macros are used for indirect function symbols in libc.so.
    First of all, you need to have the function prototyped somewhere,
    say in foo.h:
diff --git a/sysdeps/riscv/riscv-ifunc.h b/sysdeps/riscv/riscv-ifunc.h
new file mode 100644
index 0000000000..7bff591d1e
--- /dev/null
+++ b/sysdeps/riscv/riscv-ifunc.h
@@ -0,0 +1,27 @@
+/* Common definition for ifunc resolvers.  Linux/RISC-V version.
+   This file is part of the GNU C Library.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <ifunc-init.h>
+#include <sys/hwprobe.h>
+
+#define INIT_ARCH()
+
+#define riscv_libc_ifunc(name, expr)				\
+  __ifunc_args (name, name, expr(hwcap, hwprobe), INIT_ARCH,	\
+                uint64_t hwcap, __riscv_hwprobe_t hwprobe)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v9 5/6] riscv: Add ifunc helper method to hwprobe.h
  2023-11-30 18:32 [PATCH v9 0/6] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
                   ` (3 preceding siblings ...)
  2023-11-30 18:32 ` [PATCH v9 4/6] riscv: Enable multi-arg ifunc resolvers Evan Green
@ 2023-11-30 18:32 ` Evan Green
  2023-12-06 17:39   ` Adhemerval Zanella Netto
  2023-12-06 18:17   ` enh
  2023-11-30 18:32 ` [PATCH v9 6/6] riscv: Add and use alignment-ignorant memcpy Evan Green
  2023-12-06 17:07 ` [PATCH v9 0/6] RISC-V: ifunced memcpy using new kernel hwprobe interface Palmer Dabbelt
  6 siblings, 2 replies; 20+ messages in thread
From: Evan Green @ 2023-11-30 18:32 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, palmer, vineetg, slewis, Evan Green

Add a little helper method so it's easier to fetch a single value from
the hwprobe function when used within an ifunc selector.

Signed-off-by: Evan Green <evan@rivosinc.com>

---

Changes in v9:
 - Use __inline rather than inline so c89 compiles (build-many-glibcs)

Changes in v7:
 - Introduced static inline helper (Richard)

 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 25 +++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
index fd3be5a411..ee7eed3960 100644
--- a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
+++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
@@ -22,6 +22,7 @@
 
 #include <features.h>
 #include <stddef.h>
+#include <errno.h>
 #ifdef __has_include
 # if __has_include (<asm/hwprobe.h>)
 #  include <asm/hwprobe.h>
@@ -79,4 +80,28 @@ typedef int (*__riscv_hwprobe_t) (struct riscv_hwprobe *__pairs, size_t __pair_c
 
 __END_DECLS
 
+/* Helper function usable from ifunc selectors that probes a single key. */
+static __inline int
+__riscv_hwprobe_one(__riscv_hwprobe_t hwprobe_func,
+                    signed long long int key,
+                    unsigned long long int *value)
+{
+  struct riscv_hwprobe pair;
+  int rc;
+
+  if (!hwprobe_func)
+    return ENOSYS;
+
+  pair.key = key;
+  rc = hwprobe_func(&pair, 1, 0, NULL, 0);
+  if (rc)
+    return rc;
+
+  if (pair.key < 0)
+    return ENOENT;
+
+  *value = pair.value;
+  return 0;
+}
+
 #endif /* sys/hwprobe.h */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v9 6/6] riscv: Add and use alignment-ignorant memcpy
  2023-11-30 18:32 [PATCH v9 0/6] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
                   ` (4 preceding siblings ...)
  2023-11-30 18:32 ` [PATCH v9 5/6] riscv: Add ifunc helper method to hwprobe.h Evan Green
@ 2023-11-30 18:32 ` Evan Green
  2023-12-06 17:07 ` [PATCH v9 0/6] RISC-V: ifunced memcpy using new kernel hwprobe interface Palmer Dabbelt
  6 siblings, 0 replies; 20+ messages in thread
From: Evan Green @ 2023-11-30 18:32 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer, palmer, vineetg, slewis, Evan Green

For CPU implementations that can perform unaligned accesses with little
or no performance penalty, create a memcpy implementation that does not
bother aligning buffers. It will use a block of integer registers, a
single integer register, and fall back to bytewise copy for the
remainder.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>

---

(no changes since v7)

Changes in v7:
 - Use new helper function in memcpy ifunc selector (Richard)

Changes in v6:
 - Fix a couple regressions in the assembly from v5 :/
 - Use passed hwprobe pointer in memcpy ifunc selector.

Changes in v5:
 - Do unaligned word access for final trailing bytes (Richard)

Changes in v4:
 - Fixed comment style (Florian)

Changes in v3:
 - Word align dest for large memcpy()s.
 - Add tags
 - Remove spurious blank line from sysdeps/riscv/memcpy.c

Changes in v2:
 - Used _MASK instead of _FAST value itself.


---
 sysdeps/riscv/memcopy.h                       |  26 ++++
 sysdeps/riscv/memcpy.c                        |  63 ++++++++
 sysdeps/riscv/memcpy_noalignment.S            | 138 ++++++++++++++++++
 sysdeps/unix/sysv/linux/riscv/Makefile        |   4 +
 .../unix/sysv/linux/riscv/memcpy-generic.c    |  24 +++
 5 files changed, 255 insertions(+)
 create mode 100644 sysdeps/riscv/memcopy.h
 create mode 100644 sysdeps/riscv/memcpy.c
 create mode 100644 sysdeps/riscv/memcpy_noalignment.S
 create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c

diff --git a/sysdeps/riscv/memcopy.h b/sysdeps/riscv/memcopy.h
new file mode 100644
index 0000000000..2b685c8aa0
--- /dev/null
+++ b/sysdeps/riscv/memcopy.h
@@ -0,0 +1,26 @@
+/* memcopy.h -- definitions for memory copy functions. RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdeps/generic/memcopy.h>
+
+/* Redefine the generic memcpy implementation to __memcpy_generic, so
+   the memcpy ifunc can select between generic and special versions.
+   In rtld, don't bother with all the ifunciness. */
+#if IS_IN (libc)
+#define MEMCPY __memcpy_generic
+#endif
diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/riscv/memcpy.c
new file mode 100644
index 0000000000..285ca85f8e
--- /dev/null
+++ b/sysdeps/riscv/memcpy.c
@@ -0,0 +1,63 @@
+/* Multiple versions of memcpy.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017-2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#if IS_IN (libc)
+/* Redefine memcpy so that the compiler won't complain about the type
+   mismatch with the IFUNC selector in strong_alias, below.  */
+# undef memcpy
+# define memcpy __redirect_memcpy
+# include <stdint.h>
+# include <string.h>
+# include <ifunc-init.h>
+# include <riscv-ifunc.h>
+# include <sys/hwprobe.h>
+
+# define INIT_ARCH()
+
+extern __typeof (__redirect_memcpy) __libc_memcpy;
+
+extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
+extern __typeof (__redirect_memcpy) __memcpy_noalignment attribute_hidden;
+
+static inline __typeof (__redirect_memcpy) *
+select_memcpy_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_func)
+{
+  unsigned long long int value;
+
+  INIT_ARCH ();
+
+  if (__riscv_hwprobe_one(hwprobe_func, RISCV_HWPROBE_KEY_CPUPERF_0, &value))
+    return __memcpy_generic;
+
+  if ((value & RISCV_HWPROBE_MISALIGNED_MASK) == RISCV_HWPROBE_MISALIGNED_FAST)
+    return __memcpy_noalignment;
+
+  return __memcpy_generic;
+}
+
+riscv_libc_ifunc (__libc_memcpy, select_memcpy_ifunc);
+
+# undef memcpy
+strong_alias (__libc_memcpy, memcpy);
+# ifdef SHARED
+__hidden_ver1 (memcpy, __GI_memcpy, __redirect_memcpy)
+  __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memcpy);
+# endif
+
+#endif
diff --git a/sysdeps/riscv/memcpy_noalignment.S b/sysdeps/riscv/memcpy_noalignment.S
new file mode 100644
index 0000000000..f3bf8e5867
--- /dev/null
+++ b/sysdeps/riscv/memcpy_noalignment.S
@@ -0,0 +1,138 @@
+/* memcpy for RISC-V, ignoring buffer alignment
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+/* void *memcpy(void *, const void *, size_t) */
+ENTRY (__memcpy_noalignment)
+	move t6, a0  /* Preserve return value */
+
+	/* Bail if 0 */
+	beqz a2, 7f
+
+	/* Jump to byte copy if size < SZREG */
+	li a4, SZREG
+	bltu a2, a4, 5f
+
+	/* Round down to the nearest "page" size */
+	andi a4, a2, ~((16*SZREG)-1)
+	beqz a4, 2f
+	add a3, a1, a4
+
+	/* Copy the first word to get dest word aligned */
+	andi a5, t6, SZREG-1
+	beqz a5, 1f
+	REG_L a6, (a1)
+	REG_S a6, (t6)
+
+	/* Align dst up to a word, move src and size as well. */
+	addi t6, t6, SZREG-1
+	andi t6, t6, ~(SZREG-1)
+	sub a5, t6, a0
+	add a1, a1, a5
+	sub a2, a2, a5
+
+	/* Recompute page count */
+	andi a4, a2, ~((16*SZREG)-1)
+	beqz a4, 2f
+
+1:
+	/* Copy "pages" (chunks of 16 registers) */
+	REG_L a4,       0(a1)
+	REG_L a5,   SZREG(a1)
+	REG_L a6, 2*SZREG(a1)
+	REG_L a7, 3*SZREG(a1)
+	REG_L t0, 4*SZREG(a1)
+	REG_L t1, 5*SZREG(a1)
+	REG_L t2, 6*SZREG(a1)
+	REG_L t3, 7*SZREG(a1)
+	REG_L t4, 8*SZREG(a1)
+	REG_L t5, 9*SZREG(a1)
+	REG_S a4,       0(t6)
+	REG_S a5,   SZREG(t6)
+	REG_S a6, 2*SZREG(t6)
+	REG_S a7, 3*SZREG(t6)
+	REG_S t0, 4*SZREG(t6)
+	REG_S t1, 5*SZREG(t6)
+	REG_S t2, 6*SZREG(t6)
+	REG_S t3, 7*SZREG(t6)
+	REG_S t4, 8*SZREG(t6)
+	REG_S t5, 9*SZREG(t6)
+	REG_L a4, 10*SZREG(a1)
+	REG_L a5, 11*SZREG(a1)
+	REG_L a6, 12*SZREG(a1)
+	REG_L a7, 13*SZREG(a1)
+	REG_L t0, 14*SZREG(a1)
+	REG_L t1, 15*SZREG(a1)
+	addi a1, a1, 16*SZREG
+	REG_S a4, 10*SZREG(t6)
+	REG_S a5, 11*SZREG(t6)
+	REG_S a6, 12*SZREG(t6)
+	REG_S a7, 13*SZREG(t6)
+	REG_S t0, 14*SZREG(t6)
+	REG_S t1, 15*SZREG(t6)
+	addi t6, t6, 16*SZREG
+	bltu a1, a3, 1b
+	andi a2, a2, (16*SZREG)-1  /* Update count */
+
+2:
+	/* Remainder is smaller than a page, compute native word count */
+	beqz a2, 7f
+	andi a5, a2, ~(SZREG-1)
+	andi a2, a2, (SZREG-1)
+	add a3, a1, a5
+	/* Jump directly to last word if no words. */
+	beqz a5, 4f
+
+3:
+	/* Use single native register copy */
+	REG_L a4, 0(a1)
+	addi a1, a1, SZREG
+	REG_S a4, 0(t6)
+	addi t6, t6, SZREG
+	bltu a1, a3, 3b
+
+	/* Jump directly out if no more bytes */
+	beqz a2, 7f
+
+4:
+	/* Copy the last word unaligned */
+	add a3, a1, a2
+	add a4, t6, a2
+	REG_L a5, -SZREG(a3)
+	REG_S a5, -SZREG(a4)
+	ret
+
+5:
+	/* Copy bytes when the total copy is <SZREG */
+	add a3, a1, a2
+
+6:
+	lb a4, 0(a1)
+	addi a1, a1, 1
+	sb a4, 0(t6)
+	addi t6, t6, 1
+	bltu a1, a3, 6b
+
+7:
+	ret
+
+END (__memcpy_noalignment)
+
+hidden_def (__memcpy_noalignment)
diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile
index 45cc29e40d..aa9ea443d6 100644
--- a/sysdeps/unix/sysv/linux/riscv/Makefile
+++ b/sysdeps/unix/sysv/linux/riscv/Makefile
@@ -7,6 +7,10 @@ ifeq ($(subdir),stdlib)
 gen-as-const-headers += ucontext_i.sym
 endif
 
+ifeq ($(subdir),string)
+sysdep_routines += memcpy memcpy-generic memcpy_noalignment
+endif
+
 abi-variants := ilp32 ilp32d lp64 lp64d
 
 ifeq (,$(filter $(default-abi),$(abi-variants)))
diff --git a/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
new file mode 100644
index 0000000000..0abe03f7f5
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
@@ -0,0 +1,24 @@
+/* Re-include the default memcpy implementation.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+
+extern __typeof (memcpy) __memcpy_generic;
+hidden_proto(__memcpy_generic)
+
+#include <string/memcpy.c>
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9 3/6] riscv: Add __riscv_hwprobe pointer to ifunc calls
  2023-11-30 18:32 ` [PATCH v9 3/6] riscv: Add __riscv_hwprobe pointer to ifunc calls Evan Green
@ 2023-12-06  0:31   ` enh
  0 siblings, 0 replies; 20+ messages in thread
From: enh @ 2023-12-06  0:31 UTC (permalink / raw)
  To: Evan Green; +Cc: libc-alpha, Florian Weimer, palmer, vineetg, slewis

makes sense to me. i've made the equivalent change to bionic for
source compatibility in
https://android-review.googlesource.com/c/platform/bionic/+/2860689
(and thanks to your earlier "lets pass a null void* as our last
argument", it's more backwards compatible than any other ifunc
resolver change we've ever made).

On Thu, Nov 30, 2023 at 10:33 AM Evan Green <evan@rivosinc.com> wrote:
>
> The new __riscv_hwprobe() function is designed to be used by ifunc
> selector functions. This presents a challenge for applications and
> libraries, as ifunc selectors are invoked before all relocations have
> been performed, so an external call to __riscv_hwprobe() from an ifunc
> selector won't work. To address this, pass a pointer to the
> __riscv_hwprobe() vDSO function into ifunc selectors as the second
> argument (alongside dl_hwcap, which was already being passed).
>
> Include a typedef as well for convenience, so that ifunc users don't
> have to go through contortions to call this routine. Users will need to
> remember to check the second argument for NULL, both to account for
> older glibcs that don't pass the function, and older kernels that don't
> have the vDSO pointer.
>
> Signed-off-by: Evan Green <evan@rivosinc.com>
>
> ---
>
> (no changes since v7)
>
> Changes in v7:
>  - Remove __THROW from function pointer type, as it creates warnings
>    together with __fortified_attr_access.
>
>  sysdeps/riscv/dl-irel.h                     |  8 ++++----
>  sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 10 ++++++++++
>  2 files changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/sysdeps/riscv/dl-irel.h b/sysdeps/riscv/dl-irel.h
> index eaeec5467c..2147504458 100644
> --- a/sysdeps/riscv/dl-irel.h
> +++ b/sysdeps/riscv/dl-irel.h
> @@ -31,10 +31,10 @@ static inline ElfW(Addr)
>  __attribute ((always_inline))
>  elf_ifunc_invoke (ElfW(Addr) addr)
>  {
> -  /* The second argument is a void pointer to preserve the extension
> -     fexibility.  */
> -  return ((ElfW(Addr) (*) (uint64_t, void *)) (addr))
> -        (GLRO(dl_hwcap), NULL);
> +  /* The third argument is a void pointer to preserve the extension
> +     flexibility.  */
> +  return ((ElfW(Addr) (*) (uint64_t, void *, void *)) (addr))
> +        (GLRO(dl_hwcap), GLRO(dl_vdso_riscv_hwprobe), NULL);
>  }
>
>  static inline void
> diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
> index aa189a4818..fd3be5a411 100644
> --- a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
> +++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
> @@ -67,6 +67,16 @@ extern int __riscv_hwprobe (struct riscv_hwprobe *__pairs, size_t __pair_count,
>       __fortified_attr_access (__read_write__, 1, 2)
>       __fortified_attr_access (__read_only__, 4, 3);
>
> +/* A pointer to the __riscv_hwprobe vDSO function is passed as the second
> +   argument to ifunc selector routines. Include a function pointer type for
> +   convenience in calling the function in those settings. */
> +typedef int (*__riscv_hwprobe_t) (struct riscv_hwprobe *__pairs, size_t __pair_count,
> +                                 size_t __cpu_count, unsigned long int *__cpus,
> +                                 unsigned int __flags)
> +     __nonnull ((1)) __wur
> +     __fortified_attr_access (__read_write__, 1, 2)
> +     __fortified_attr_access (__read_only__, 4, 3);
> +
>  __END_DECLS
>
>  #endif /* sys/hwprobe.h */
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9 0/6] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2023-11-30 18:32 [PATCH v9 0/6] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
                   ` (5 preceding siblings ...)
  2023-11-30 18:32 ` [PATCH v9 6/6] riscv: Add and use alignment-ignorant memcpy Evan Green
@ 2023-12-06 17:07 ` Palmer Dabbelt
  6 siblings, 0 replies; 20+ messages in thread
From: Palmer Dabbelt @ 2023-12-06 17:07 UTC (permalink / raw)
  To: Evan Green; +Cc: libc-alpha, fweimer, Vineet Gupta, slewis, Evan Green

On Thu, 30 Nov 2023 10:32:33 PST (-0800), Evan Green wrote:
>
> This series illustrates the use of a recently accepted Linux syscall that
> enumerates architectural information about the RISC-V cores the system
> is running on. In this series we expose a small wrapper function around
> the syscall. An ifunc selector for memcpy queries it to see if unaligned
> access is "fast" on this hardware. If it is, it selects a newly provided
> implementation of memcpy that doesn't work hard at aligning the src and
> destination buffers.
>
> For applications and libraries outside of glibc that want to use
> __riscv_hwprobe() in ifunc selectors, this series also sends a pointer
> to the riscv_hwprobe() function in as the second argument to ifunc
> selectors. A new inline convenience function can help application and
> library callers to check for validity and quickly probe a single key.
>
> The memcpy implementation is independent enough from the rest of the
> series that it can be omitted safely if desired.
>
> Performance numbers were compared using a small test program [1], run on
> a D1 Nezha board, which supports fast unaligned access. "Fast" here
> means copying unaligned words is faster than copying byte-wise, but
> still slower than copying aligned words. Here's the speed of various
> memcpy()s with the generic implementation. The numbers before are using
> v4's memcpy implementation, with the "copy last byte via overlapping
> misaligned word" fix this should get slightly better.
>
> memcpy size 1 count 1000000 offset 0 took 109564 us
> memcpy size 3 count 1000000 offset 0 took 138425 us
> memcpy size 4 count 1000000 offset 0 took 148374 us
> memcpy size 7 count 1000000 offset 0 took 178433 us
> memcpy size 8 count 1000000 offset 0 took 188430 us
> memcpy size f count 1000000 offset 0 took 266118 us
> memcpy size f count 1000000 offset 1 took 265940 us
> memcpy size f count 1000000 offset 3 took 265934 us
> memcpy size f count 1000000 offset 7 took 266215 us
> memcpy size f count 1000000 offset 8 took 265954 us
> memcpy size f count 1000000 offset 9 took 265886 us
> memcpy size 10 count 1000000 offset 0 took 195308 us
> memcpy size 11 count 1000000 offset 0 took 205161 us
> memcpy size 17 count 1000000 offset 0 took 274376 us
> memcpy size 18 count 1000000 offset 0 took 199188 us
> memcpy size 19 count 1000000 offset 0 took 209258 us
> memcpy size 1f count 1000000 offset 0 took 278263 us
> memcpy size 20 count 1000000 offset 0 took 207364 us
> memcpy size 21 count 1000000 offset 0 took 217143 us
> memcpy size 3f count 1000000 offset 0 took 300023 us
> memcpy size 40 count 1000000 offset 0 took 231063 us
> memcpy size 41 count 1000000 offset 0 took 241259 us
> memcpy size 7c count 100000 offset 0 took 32807 us
> memcpy size 7f count 100000 offset 0 took 36274 us
> memcpy size ff count 100000 offset 0 took 47818 us
> memcpy size ff count 100000 offset 0 took 47932 us
> memcpy size 100 count 100000 offset 0 took 40468 us
> memcpy size 200 count 100000 offset 0 took 64245 us
> memcpy size 27f count 100000 offset 0 took 82549 us
> memcpy size 400 count 100000 offset 0 took 111254 us
> memcpy size 407 count 100000 offset 0 took 119364 us
> memcpy size 800 count 100000 offset 0 took 203899 us
> memcpy size 87f count 100000 offset 0 took 222465 us
> memcpy size 87f count 100000 offset 3 took 222289 us
> memcpy size 1000 count 100000 offset 0 took 388846 us
> memcpy size 1000 count 100000 offset 1 took 468827 us
> memcpy size 1000 count 100000 offset 3 took 397098 us
> memcpy size 1000 count 100000 offset 4 took 397379 us
> memcpy size 1000 count 100000 offset 5 took 397368 us
> memcpy size 1000 count 100000 offset 7 took 396867 us
> memcpy size 1000 count 100000 offset 8 took 389227 us
> memcpy size 1000 count 100000 offset 9 took 395949 us
> memcpy size 3000 count 50000 offset 0 took 674837 us
> memcpy size 3000 count 50000 offset 1 took 676944 us
> memcpy size 3000 count 50000 offset 3 took 679709 us
> memcpy size 3000 count 50000 offset 4 took 680829 us
> memcpy size 3000 count 50000 offset 5 took 678024 us
> memcpy size 3000 count 50000 offset 7 took 681097 us
> memcpy size 3000 count 50000 offset 8 took 670004 us
> memcpy size 3000 count 50000 offset 9 took 674553 us
>
> Here is that same test run with the assembly memcpy() in this series:
> memcpy size 1 count 1000000 offset 0 took 92703 us
> memcpy size 3 count 1000000 offset 0 took 112527 us
> memcpy size 4 count 1000000 offset 0 took 120481 us
> memcpy size 7 count 1000000 offset 0 took 149558 us
> memcpy size 8 count 1000000 offset 0 took 90617 us
> memcpy size f count 1000000 offset 0 took 174373 us
> memcpy size f count 1000000 offset 1 took 178615 us
> memcpy size f count 1000000 offset 3 took 178845 us
> memcpy size f count 1000000 offset 7 took 178636 us
> memcpy size f count 1000000 offset 8 took 174442 us
> memcpy size f count 1000000 offset 9 took 178660 us
> memcpy size 10 count 1000000 offset 0 took 99845 us
> memcpy size 11 count 1000000 offset 0 took 112522 us
> memcpy size 17 count 1000000 offset 0 took 179735 us
> memcpy size 18 count 1000000 offset 0 took 110870 us
> memcpy size 19 count 1000000 offset 0 took 121472 us
> memcpy size 1f count 1000000 offset 0 took 188231 us
> memcpy size 20 count 1000000 offset 0 took 119571 us
> memcpy size 21 count 1000000 offset 0 took 132429 us
> memcpy size 3f count 1000000 offset 0 took 227021 us
> memcpy size 40 count 1000000 offset 0 took 166416 us
> memcpy size 41 count 1000000 offset 0 took 180206 us
> memcpy size 7c count 100000 offset 0 took 28602 us
> memcpy size 7f count 100000 offset 0 took 31676 us
> memcpy size ff count 100000 offset 0 took 39257 us
> memcpy size ff count 100000 offset 0 took 39176 us
> memcpy size 100 count 100000 offset 0 took 21928 us
> memcpy size 200 count 100000 offset 0 took 35814 us
> memcpy size 27f count 100000 offset 0 took 60315 us
> memcpy size 400 count 100000 offset 0 took 63652 us
> memcpy size 407 count 100000 offset 0 took 73160 us
> memcpy size 800 count 100000 offset 0 took 121532 us
> memcpy size 87f count 100000 offset 0 took 147269 us
> memcpy size 87f count 100000 offset 3 took 144744 us
> memcpy size 1000 count 100000 offset 0 took 232057 us
> memcpy size 1000 count 100000 offset 1 took 254319 us
> memcpy size 1000 count 100000 offset 3 took 256973 us
> memcpy size 1000 count 100000 offset 4 took 257655 us
> memcpy size 1000 count 100000 offset 5 took 259456 us
> memcpy size 1000 count 100000 offset 7 took 260849 us
> memcpy size 1000 count 100000 offset 8 took 232347 us
> memcpy size 1000 count 100000 offset 9 took 254330 us
> memcpy size 3000 count 50000 offset 0 took 382376 us
> memcpy size 3000 count 50000 offset 1 took 389872 us
> memcpy size 3000 count 50000 offset 3 took 385310 us
> memcpy size 3000 count 50000 offset 4 took 389748 us
> memcpy size 3000 count 50000 offset 5 took 391707 us
> memcpy size 3000 count 50000 offset 7 took 386778 us
> memcpy size 3000 count 50000 offset 8 took 385691 us
> memcpy size 3000 count 50000 offset 9 took 392030 us
>
> The assembly routine is measurably better.
>
> [1] https://pastebin.com/DRyECNQW
>
>
> Changes in v9:
>  - Alphabetize new entries in libc.abilist (to pass checks)
>  - Fix a couple of typos causing powerpc not to build
>    (build-many-glibcs)
>  - Use __inline rather than inline so c89 compiles (build-many-glibcs)

Ah, sorry, I missed the v9 here -- I'd made a hwprobe-v9 testing branch 
with the intent of fixing the failures, but forgot it was actually your 
v8.  So I thought I'd already looked at this.

I'm testing this again on top of master, just to make sure.  What I'm 
testing is here 
<https://github.com/palmer-dabbelt/glibc/tree/hwprobe-v9>, I'll push it 
assuming it passes the test on my end too.

Thanks!

>
> Changes in v8:
>  - Fix missed 2.39 in abilists (Joseph)
>  - Just return -r (Florian)
>
> Changes in v7:
>  - Bumped Versions up to 2.39 (Joseph)
>  - Used INTERNAL_SYSCALL_CALL, and return positive errno to match
>    pthreads API (Florian).
>  - Remove __THROW since it creates a warning in combination with the
>    fortified access decorators.
>  - Use INTERNAL_VSYSCALL_CALL (Florian)
>  - Remove __THROW from function pointer type, as it creates warnings
>    together with __fortified_attr_access.
>  - Introduced static inline helper (Richard)
>  - Use new helper function in memcpy ifunc selector (Richard)
>
> Changes in v6:
>  - Prefixed __riscv_hwprobe() parameters names with __ to avoid user
>    macro namespace pollution (Joseph)
>  - Introduced riscv-ifunc.h for multi-arg ifunc selectors.
>  - Fix a couple regressions in the assembly from v5 :/
>  - Use passed hwprobe pointer in memcpy ifunc selector.
>
> Changes in v5:
>  - Do unaligned word access for final trailing bytes (Richard)
>
> Changes in v4:
>  - Remove __USE_GNU (Florian)
>  - __nonnull, __wur, __THROW, and  __fortified_attr_access decorations
>   (Florian)
>  - change long to long int (Florian)
>  - Fix comment formatting (Florian)
>  - Update backup kernel header content copy.
>  - Fix function declaration formatting (Florian)
>  - Changed export versions to 2.38
>  - Fixed comment style (Florian)
>
> Changes in v3:
>  - Update argument types to match v4 kernel interface
>  - Add the "return" to the vsyscall
>  - Fix up vdso arg types to match kernel v4 version
>  - Remove ifdef around INLINE_VSYSCALL (Adhemerval)
>  - Word align dest for large memcpy()s.
>  - Add tags
>  - Remove spurious blank line from sysdeps/riscv/memcpy.c
>
> Changes in v2:
>  - hwprobe.h: Use __has_include and duplicate Linux content to make
>    compilation work when Linux headers are absent (Adhemerval)
>  - hwprobe.h: Put declaration under __USE_GNU (Adhemerval)
>  - Use INLINE_SYSCALL_CALL (Adhemerval)
>  - Update versions
>  - Update UNALIGNED_MASK to match kernel v3 series.
>  - Add vDSO interface
>  - Used _MASK instead of _FAST value itself.
>
> Evan Green (6):
>   riscv: Add Linux hwprobe syscall support
>   riscv: Add hwprobe vdso call support
>   riscv: Add __riscv_hwprobe pointer to ifunc calls
>   riscv: Enable multi-arg ifunc resolvers
>   riscv: Add ifunc helper method to hwprobe.h
>   riscv: Add and use alignment-ignorant memcpy
>
>  include/libc-symbols.h                        |  28 ++--
>  sysdeps/riscv/dl-irel.h                       |   8 +-
>  sysdeps/riscv/memcopy.h                       |  26 ++++
>  sysdeps/riscv/memcpy.c                        |  63 ++++++++
>  sysdeps/riscv/memcpy_noalignment.S            | 138 ++++++++++++++++++
>  sysdeps/riscv/riscv-ifunc.h                   |  27 ++++
>  sysdeps/unix/sysv/linux/dl-vdso-setup.c       |  10 ++
>  sysdeps/unix/sysv/linux/dl-vdso-setup.h       |   3 +
>  sysdeps/unix/sysv/linux/riscv/Makefile        |   8 +-
>  sysdeps/unix/sysv/linux/riscv/Versions        |   3 +
>  sysdeps/unix/sysv/linux/riscv/hwprobe.c       |  47 ++++++
>  .../unix/sysv/linux/riscv/memcpy-generic.c    |  24 +++
>  .../unix/sysv/linux/riscv/rv32/libc.abilist   |   1 +
>  .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
>  sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h   | 107 ++++++++++++++
>  sysdeps/unix/sysv/linux/riscv/sysdep.h        |   1 +
>  16 files changed, 478 insertions(+), 17 deletions(-)
>  create mode 100644 sysdeps/riscv/memcopy.h
>  create mode 100644 sysdeps/riscv/memcpy.c
>  create mode 100644 sysdeps/riscv/memcpy_noalignment.S
>  create mode 100644 sysdeps/riscv/riscv-ifunc.h
>  create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c
>  create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
>  create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9 1/6] riscv: Add Linux hwprobe syscall support
  2023-11-30 18:32 ` [PATCH v9 1/6] riscv: Add Linux hwprobe syscall support Evan Green
@ 2023-12-06 17:22   ` Adhemerval Zanella Netto
  2023-12-06 17:24     ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 20+ messages in thread
From: Adhemerval Zanella Netto @ 2023-12-06 17:22 UTC (permalink / raw)
  To: Evan Green, libc-alpha; +Cc: Florian Weimer, palmer, vineetg, slewis



On 30/11/23 15:32, Evan Green wrote:
> Add awareness and a thin wrapper function around a new Linux system call
> that allows callers to get architecture and microarchitecture
> information about the CPUs from the kernel. This can be used to
> do things like dynamically choose a memcpy implementation.
> 
> Signed-off-by: Evan Green <evan@rivosinc.com>
> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
> ---
> 
> Changes in v9:
>  - Alphabetize new entries in libc.abilist (to pass checks)
> 
> Changes in v8:
>  - Fix missed 2.39 in abilists (Joseph)
>  - Just return -r (Florian)
> 
> Changes in v7:
>  - Bumped Versions up to 2.39 (Joseph)
>  - Used INTERNAL_SYSCALL_CALL, and return positive errno to match
>    pthreads API (Florian).
>  - Remove __THROW since it creates a warning in combination with the
>    fortified access decorators.
> 
> Changes in v6:
>  - Prefixed __riscv_hwprobe() parameters names with __ to avoid user
>    macro namespace pollution (Joseph)
> 
> Changes in v4:
>  - Remove __USE_GNU (Florian)
>  - __nonnull, __wur, __THROW, and  __fortified_attr_access decorations
>   (Florian)
>  - change long to long int (Florian)
>  - Fix comment formatting (Florian)
>  - Update backup kernel header content copy.
>  - Fix function declaration formatting (Florian)
>  - Changed export versions to 2.38
> 
> Changes in v3:
>  - Update argument types to match v4 kernel interface
> 
> Changes in v2:
>  - hwprobe.h: Use __has_include and duplicate Linux content to make
>    compilation work when Linux headers are absent (Adhemerval)
>  - hwprobe.h: Put declaration under __USE_GNU (Adhemerval)
>  - Use INLINE_SYSCALL_CALL (Adhemerval)
>  - Update versions
>  - Update UNALIGNED_MASK to match kernel v3 series.
> 
>  sysdeps/unix/sysv/linux/riscv/Makefile        |  4 +-
>  sysdeps/unix/sysv/linux/riscv/Versions        |  3 +
>  sysdeps/unix/sysv/linux/riscv/hwprobe.c       | 36 ++++++++++
>  .../unix/sysv/linux/riscv/rv32/libc.abilist   |  1 +
>  .../unix/sysv/linux/riscv/rv64/libc.abilist   |  1 +
>  sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h   | 72 +++++++++++++++++++
>  6 files changed, 115 insertions(+), 2 deletions(-)
>  create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c
>  create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
> 
> diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile
> index 4b6eacb32f..45cc29e40d 100644
> --- a/sysdeps/unix/sysv/linux/riscv/Makefile
> +++ b/sysdeps/unix/sysv/linux/riscv/Makefile
> @@ -1,6 +1,6 @@
>  ifeq ($(subdir),misc)
> -sysdep_headers += sys/cachectl.h
> -sysdep_routines += flush-icache
> +sysdep_headers += sys/cachectl.h sys/hwprobe.h
> +sysdep_routines += flush-icache hwprobe

For newer Makefile definitions, add one entr per line:

sysdep_headers += \
  sys/cachectl.h \
  sys/hwprobe.h \
  # sysdep_headers
sysdep_routines += \
  flush-icache \
  hwprobe \
  # sysdep_routines


>  endif
>  
>  ifeq ($(subdir),stdlib)
> diff --git a/sysdeps/unix/sysv/linux/riscv/Versions b/sysdeps/unix/sysv/linux/riscv/Versions
> index 5625d2a0b8..8717b62a4a 100644
> --- a/sysdeps/unix/sysv/linux/riscv/Versions
> +++ b/sysdeps/unix/sysv/linux/riscv/Versions
> @@ -8,4 +8,7 @@ libc {
>    GLIBC_2.27 {
>      __riscv_flush_icache;
>    }
> +  GLIBC_2.39 {
> +    __riscv_hwprobe;
> +  }
>  }
> diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
> new file mode 100644
> index 0000000000..e28194e344
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
> @@ -0,0 +1,36 @@
> +/* RISC-V hardware feature probing support on Linux
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public License as
> +   published by the Free Software Foundation; either version 2.1 of the
> +   License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <sys/syscall.h>
> +#include <sys/hwprobe.h>
> +#include <sysdep.h>
> +#include <sysdep-vdso.h>
> +
> +int __riscv_hwprobe (struct riscv_hwprobe *__pairs, size_t __pair_count,
> +		     size_t __cpu_count, unsigned long int *__cpus,
> +		     unsigned int __flags)
> +{
> +  int r;
> +
> +  r = INTERNAL_SYSCALL_CALL (riscv_hwprobe, 5, __pairs, __pair_count,
> +                             __cpu_count, __cpus, __flags);

You do not need to pass the number of argument with INTERNAL_SYSCALL_CALL
(that why the macro exists).  Just issue:

  r = INTERNAL_SYSCALL_CALL (riscv_hwprobe, __pairs, __pair_count,
                             __cpu_count, __cpus, __flags);

The code above will pass '5' as '__pairs' argument and so forth, which
also makes me wonder why it has not failed in your testing.

Also, there is no need to double underscore in the implementation itself. 

> +
> +  /* Negate negative errno values to match pthreads API. */
> +  return -r;
> +}
> diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> index a13c484582..6c96e25552 100644
> --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> @@ -2436,6 +2436,7 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 __riscv_hwprobe F
>  GLIBC_2.39 pidfd_getpid F
>  GLIBC_2.39 pidfd_spawn F
>  GLIBC_2.39 pidfd_spawnp F
> diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> index cf65d8d6d4..225278807b 100644
> --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> @@ -2636,6 +2636,7 @@ GLIBC_2.38 strlcat F
>  GLIBC_2.38 strlcpy F
>  GLIBC_2.38 wcslcat F
>  GLIBC_2.38 wcslcpy F
> +GLIBC_2.39 __riscv_hwprobe F
>  GLIBC_2.39 pidfd_getpid F
>  GLIBC_2.39 pidfd_spawn F
>  GLIBC_2.39 pidfd_spawnp F
> diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
> new file mode 100644
> index 0000000000..aa189a4818
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
> @@ -0,0 +1,72 @@
> +/* RISC-V architecture probe interface
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library.  If not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_HWPROBE_H
> +#define _SYS_HWPROBE_H 1
> +
> +#include <features.h>
> +#include <stddef.h>
> +#ifdef __has_include
> +# if __has_include (<asm/hwprobe.h>)
> +#  include <asm/hwprobe.h>
> +# endif
> +#endif
> +
> +/* Define a (probably stale) version of the interface if the Linux headers
> +   aren't present.  */
> +#ifndef RISCV_HWPROBE_KEY_MVENDORID
> +struct riscv_hwprobe {
> +	signed long long int key;
> +	unsigned long long int value;
> +};
> +
> +#define RISCV_HWPROBE_KEY_MVENDORID	0
> +#define RISCV_HWPROBE_KEY_MARCHID	1
> +#define RISCV_HWPROBE_KEY_MIMPID	2
> +#define RISCV_HWPROBE_KEY_BASE_BEHAVIOR	3
> +#define		RISCV_HWPROBE_BASE_BEHAVIOR_IMA	(1 << 0)
> +#define RISCV_HWPROBE_KEY_IMA_EXT_0	4
> +#define		RISCV_HWPROBE_IMA_FD		(1 << 0)
> +#define		RISCV_HWPROBE_IMA_C		(1 << 1)
> +#define		RISCV_HWPROBE_IMA_V		(1 << 2)
> +#define		RISCV_HWPROBE_EXT_ZBA		(1 << 3)
> +#define		RISCV_HWPROBE_EXT_ZBB		(1 << 4)
> +#define		RISCV_HWPROBE_EXT_ZBS		(1 << 5)
> +#define RISCV_HWPROBE_KEY_CPUPERF_0	5
> +#define		RISCV_HWPROBE_MISALIGNED_UNKNOWN	(0 << 0)
> +#define		RISCV_HWPROBE_MISALIGNED_EMULATED	(1 << 0)
> +#define		RISCV_HWPROBE_MISALIGNED_SLOW		(2 << 0)
> +#define		RISCV_HWPROBE_MISALIGNED_FAST		(3 << 0)
> +#define		RISCV_HWPROBE_MISALIGNED_UNSUPPORTED	(4 << 0)
> +#define		RISCV_HWPROBE_MISALIGNED_MASK		(7 << 0)

It has some inconsistent space/tab usage, I would recoment to just use
spaces in macro definitions.

> +
> +#endif /* RISCV_HWPROBE_KEY_MVENDORID */
> +
> +__BEGIN_DECLS
> +
> +extern int __riscv_hwprobe (struct riscv_hwprobe *__pairs, size_t __pair_count,
> +			    size_t __cpu_count, unsigned long int *__cpus,
> +			    unsigned int __flags)
> +     __nonnull ((1)) __wur
> +     __fortified_attr_access (__read_write__, 1, 2)
> +     __fortified_attr_access (__read_only__, 4, 3);
> +
> +__END_DECLS
> +
> +#endif /* sys/hwprobe.h */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9 1/6] riscv: Add Linux hwprobe syscall support
  2023-12-06 17:22   ` Adhemerval Zanella Netto
@ 2023-12-06 17:24     ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 20+ messages in thread
From: Adhemerval Zanella Netto @ 2023-12-06 17:24 UTC (permalink / raw)
  To: Evan Green, libc-alpha; +Cc: Florian Weimer, palmer, vineetg, slewis



On 06/12/23 14:22, Adhemerval Zanella Netto wrote:

> You do not need to pass the number of argument with INTERNAL_SYSCALL_CALL
> (that why the macro exists).  Just issue:
> 
>   r = INTERNAL_SYSCALL_CALL (riscv_hwprobe, __pairs, __pair_count,
>                              __cpu_count, __cpus, __flags);
> 
> The code above will pass '5' as '__pairs' argument and so forth, which
> also makes me wonder why it has not failed in your testing.

In the next patch you replaced with INTERNAL_VSYSCALL_CALL, so it answers
why it has not failed in your tests.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9 2/6] riscv: Add hwprobe vdso call support
  2023-11-30 18:32 ` [PATCH v9 2/6] riscv: Add hwprobe vdso call support Evan Green
@ 2023-12-06 17:34   ` Adhemerval Zanella Netto
  2023-12-11 21:02     ` Evan Green
  0 siblings, 1 reply; 20+ messages in thread
From: Adhemerval Zanella Netto @ 2023-12-06 17:34 UTC (permalink / raw)
  To: Evan Green, libc-alpha; +Cc: Florian Weimer, palmer, vineetg, slewis



On 30/11/23 15:32, Evan Green wrote:
> The new riscv_hwprobe syscall also comes with a vDSO for faster answers
> to your most common questions. Call in today to speak with a kernel
> representative near you!
> 
> Signed-off-by: Evan Green <evan@rivosinc.com>
> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
> ---
> 
> (no changes since v7)
> 
> Changes in v7:
>  - Use INTERNAL_VSYSCALL_CALL (Florian)
> 
> Changes in v3:
>  - Add the "return" to the vsyscall
>  - Fix up vdso arg types to match kernel v4 version
>  - Remove ifdef around INLINE_VSYSCALL (Adhemerval)
> 
> Changes in v2:
>  - Add vDSO interface
> 
>  sysdeps/unix/sysv/linux/dl-vdso-setup.c | 10 ++++++++++
>  sysdeps/unix/sysv/linux/dl-vdso-setup.h |  3 +++
>  sysdeps/unix/sysv/linux/riscv/hwprobe.c | 17 ++++++++++++++---
>  sysdeps/unix/sysv/linux/riscv/sysdep.h  |  1 +
>  4 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.c b/sysdeps/unix/sysv/linux/dl-vdso-setup.c
> index 97eaaeac37..ed8b1ef426 100644
> --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.c
> +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.c
> @@ -71,6 +71,16 @@ PROCINFO_CLASS int (*_dl_vdso_clock_getres_time64) (clockid_t,
>  # ifdef HAVE_GET_TBFREQ
>  PROCINFO_CLASS uint64_t (*_dl_vdso_get_tbfreq)(void) RELRO;
>  # endif
> +
> +/* RISC-V specific ones.  */
> +# ifdef HAVE_RISCV_HWPROBE
> +PROCINFO_CLASS int (*_dl_vdso_riscv_hwprobe)(void *,
> +                                             size_t,
> +                                             size_t,
> +                                             unsigned long *,
> +                                             unsigned int) RELRO;
> +# endif
> +
>  #endif
>  
>  #undef RELRO
> diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.h b/sysdeps/unix/sysv/linux/dl-vdso-setup.h
> index 867072b897..39eafd5316 100644
> --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.h
> +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.h
> @@ -47,6 +47,9 @@ setup_vdso_pointers (void)
>  #ifdef HAVE_GET_TBFREQ
>    GLRO(dl_vdso_get_tbfreq) = dl_vdso_vsym (HAVE_GET_TBFREQ);
>  #endif
> +#ifdef HAVE_RISCV_HWPROBE
> +  GLRO(dl_vdso_riscv_hwprobe) = dl_vdso_vsym (HAVE_RISCV_HWPROBE);
> +#endif
>  }
>  
>  #endif
> diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
> index e28194e344..6a9a44657f 100644
> --- a/sysdeps/unix/sysv/linux/riscv/hwprobe.c
> +++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
> @@ -27,9 +27,20 @@ int __riscv_hwprobe (struct riscv_hwprobe *__pairs, size_t __pair_count,
>  		     unsigned int __flags)
>  {
>    int r;
> -
> -  r = INTERNAL_SYSCALL_CALL (riscv_hwprobe, 5, __pairs, __pair_count,
> -                             __cpu_count, __cpus, __flags);
> +  __riscv_hwprobe_t vdso_hwprobe =
> +    (__riscv_hwprobe_t)GLRO(dl_vdso_riscv_hwprobe);
> +
> +  if (vdso_hwprobe != NULL)
> +    {
> +      r = INTERNAL_VSYSCALL_CALL (vdso_hwprobe, 5, __pairs, __pair_count,
> +                                  __cpu_count, __cpus, __flags);
> +    }
> +  else
> +    {
> +      r = INTERNAL_SYSCALL_CALL (riscv_hwprobe, 5, __pairs, __pair_count,
> +                                 __cpu_count, __cpus, __flags);
> +
> +    }
>  
>    /* Negate negative errno values to match pthreads API. */
>    return -r;

Maybe add a INTERNAL_VSYSCALL macro instead, similar to the INLINE_VSYSCALL?

Something like:

#define INTERNAL_VSYSCALL(name, nr, args...)                                  \
  ({                                                                          \
    __label__ out;                                                            \
    __label__ iserr;                                                          \
    long int sc_ret;                                                          \
                                                                              \
    __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);             \
    if (vdsop != NULL)                                                        \
      {                                                                       \
        sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);                  \
        if (!INTERNAL_SYSCALL_ERROR_P (sc_ret))                               \
          goto out;                                                           \
        if (INTERNAL_SYSCALL_ERRNO (sc_ret) != ENOSYS)                        \
          goto iserr;                                                         \
      }                                                                       \
                                                                              \
    sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                            \
    if (INTERNAL_SYSCALL_ERROR_P (sc_ret))                                    \
      {                                                                       \
      iserr:                                                                  \
        sc_ret = -1L;                                                         \
      }                                                                       \
  out:                                                                        \
    sc_ret;                                                                   \
  })

It might be redundant for the riscv case, but it would be helpful for some
archs that require either to fallback to syscall (it is just for some kernel
version on specific archs) or require some extra care to call the vDSO
(powerpc).

> diff --git a/sysdeps/unix/sysv/linux/riscv/sysdep.h b/sysdeps/unix/sysv/linux/riscv/sysdep.h
> index 5583b96d23..ee015dfeb6 100644
> --- a/sysdeps/unix/sysv/linux/riscv/sysdep.h
> +++ b/sysdeps/unix/sysv/linux/riscv/sysdep.h
> @@ -156,6 +156,7 @@
>  /* List of system calls which are supported as vsyscalls (for RV32 and
>     RV64).  */
>  # define HAVE_GETCPU_VSYSCALL		"__vdso_getcpu"
> +# define HAVE_RISCV_HWPROBE		"__vdso_riscv_hwprobe"
>  
>  # undef HAVE_INTERNAL_BRK_ADDR_SYMBOL
>  # define HAVE_INTERNAL_BRK_ADDR_SYMBOL 1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9 5/6] riscv: Add ifunc helper method to hwprobe.h
  2023-11-30 18:32 ` [PATCH v9 5/6] riscv: Add ifunc helper method to hwprobe.h Evan Green
@ 2023-12-06 17:39   ` Adhemerval Zanella Netto
  2023-12-06 18:17   ` enh
  1 sibling, 0 replies; 20+ messages in thread
From: Adhemerval Zanella Netto @ 2023-12-06 17:39 UTC (permalink / raw)
  To: Evan Green, libc-alpha; +Cc: Florian Weimer, palmer, vineetg, slewis



On 30/11/23 15:32, Evan Green wrote:
> Add a little helper method so it's easier to fetch a single value from
> the hwprobe function when used within an ifunc selector.
> 
> Signed-off-by: Evan Green <evan@rivosinc.com>
> 
> ---
> 
> Changes in v9:
>  - Use __inline rather than inline so c89 compiles (build-many-glibcs)
> 
> Changes in v7:
>  - Introduced static inline helper (Richard)
> 
>  sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 25 +++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
> index fd3be5a411..ee7eed3960 100644
> --- a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
> +++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
> @@ -22,6 +22,7 @@
>  
>  #include <features.h>
>  #include <stddef.h>
> +#include <errno.h>
>  #ifdef __has_include
>  # if __has_include (<asm/hwprobe.h>)
>  #  include <asm/hwprobe.h>
> @@ -79,4 +80,28 @@ typedef int (*__riscv_hwprobe_t) (struct riscv_hwprobe *__pairs, size_t __pair_c
>  
>  __END_DECLS
>  
> +/* Helper function usable from ifunc selectors that probes a single key. */
> +static __inline int
> +__riscv_hwprobe_one(__riscv_hwprobe_t hwprobe_func,
> +                    signed long long int key,
> +                    unsigned long long int *value)
> +{
> +  struct riscv_hwprobe pair;
> +  int rc;
> +
> +  if (!hwprobe_func)
> +    return ENOSYS;

The convention is no implicit checks:

  if (hwprobe_func == NULL)
    return ENOSYS;

> +
> +  pair.key = key;
> +  rc = hwprobe_func(&pair, 1, 0, NULL, 0);

Missing space after function name.

> +  if (rc)
> +    return rc;

Same as before for function that return 'int':

  if (rc != 0)
    return rc;

> +
> +  if (pair.key < 0)
> +    return ENOENT;
> +
> +  *value = pair.value;
> +  return 0;
> +}
> +
>  #endif /* sys/hwprobe.h */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9 5/6] riscv: Add ifunc helper method to hwprobe.h
  2023-11-30 18:32 ` [PATCH v9 5/6] riscv: Add ifunc helper method to hwprobe.h Evan Green
  2023-12-06 17:39   ` Adhemerval Zanella Netto
@ 2023-12-06 18:17   ` enh
  2023-12-06 19:42     ` Palmer Dabbelt
  2023-12-07  9:46     ` Florian Weimer
  1 sibling, 2 replies; 20+ messages in thread
From: enh @ 2023-12-06 18:17 UTC (permalink / raw)
  To: Evan Green; +Cc: libc-alpha, Florian Weimer, palmer, vineetg, slewis

fwiw, this is the one part of the ifunc proposal i haven't aimed for
source compatibility with in bionic --- i'm not _against_ it exactly,
i'm just unconvinced it's much use. and it's trivial to add after the
fact if it does ship in glibc and does actually turn out to be useful.

specifically, my anecdata from Android [both open-source libraries and
apps] is that basically no-one uses ifuncs beyond libc and
compiler-generated fmv ifuncs [though for riscv64 that's still "future
work"], so there's no market for helpers for hand-written ifuncs?

i think i also worry that people aren't going to pay much attention,
and will prefer this even when they have multiple keys to query
because it's a nicer api, and i'm not sure i want to encourage that
kind of thing :-)

On Thu, Nov 30, 2023 at 10:33 AM Evan Green <evan@rivosinc.com> wrote:
>
> Add a little helper method so it's easier to fetch a single value from
> the hwprobe function when used within an ifunc selector.
>
> Signed-off-by: Evan Green <evan@rivosinc.com>
>
> ---
>
> Changes in v9:
>  - Use __inline rather than inline so c89 compiles (build-many-glibcs)
>
> Changes in v7:
>  - Introduced static inline helper (Richard)
>
>  sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 25 +++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
> index fd3be5a411..ee7eed3960 100644
> --- a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
> +++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
> @@ -22,6 +22,7 @@
>
>  #include <features.h>
>  #include <stddef.h>
> +#include <errno.h>
>  #ifdef __has_include
>  # if __has_include (<asm/hwprobe.h>)
>  #  include <asm/hwprobe.h>
> @@ -79,4 +80,28 @@ typedef int (*__riscv_hwprobe_t) (struct riscv_hwprobe *__pairs, size_t __pair_c
>
>  __END_DECLS
>
> +/* Helper function usable from ifunc selectors that probes a single key. */
> +static __inline int
> +__riscv_hwprobe_one(__riscv_hwprobe_t hwprobe_func,
> +                    signed long long int key,
> +                    unsigned long long int *value)
> +{
> +  struct riscv_hwprobe pair;
> +  int rc;
> +
> +  if (!hwprobe_func)
> +    return ENOSYS;
> +
> +  pair.key = key;
> +  rc = hwprobe_func(&pair, 1, 0, NULL, 0);
> +  if (rc)
> +    return rc;
> +
> +  if (pair.key < 0)
> +    return ENOENT;
> +
> +  *value = pair.value;
> +  return 0;
> +}
> +
>  #endif /* sys/hwprobe.h */
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9 5/6] riscv: Add ifunc helper method to hwprobe.h
  2023-12-06 18:17   ` enh
@ 2023-12-06 19:42     ` Palmer Dabbelt
  2023-12-07  9:46     ` Florian Weimer
  1 sibling, 0 replies; 20+ messages in thread
From: Palmer Dabbelt @ 2023-12-06 19:42 UTC (permalink / raw)
  To: enh; +Cc: Evan Green, libc-alpha, fweimer, Vineet Gupta, slewis

On Wed, 06 Dec 2023 10:17:15 PST (-0800), enh@google.com wrote:
> fwiw, this is the one part of the ifunc proposal i haven't aimed for
> source compatibility with in bionic --- i'm not _against_ it exactly,
> i'm just unconvinced it's much use. and it's trivial to add after the
> fact if it does ship in glibc and does actually turn out to be useful.
>
> specifically, my anecdata from Android [both open-source libraries and
> apps] is that basically no-one uses ifuncs beyond libc and
> compiler-generated fmv ifuncs [though for riscv64 that's still "future
> work"], so there's no market for helpers for hand-written ifuncs?

After seeing how complicated the IFUNC rules are, I'm not super 
surprised that we've only got core toolchain types using them -- even 
getting this done has been a multi-month saga of trying to figure out 
how things work, so I couldn't imagine what a regular user would do.

> i think i also worry that people aren't going to pay much attention,
> and will prefer this even when they have multiple keys to query
> because it's a nicer api, and i'm not sure i want to encourage that
> kind of thing :-)

Ya, makes sense, I could see this one going either way.  I don't really 
have a strong feeling here, so I'm going to defer to Evan on the glibc 
side as he's been poking the hwprobe stuff more than I have lately.

If anyone else has a stronger opinion then now's the time to speak up, 
though ;)

> On Thu, Nov 30, 2023 at 10:33 AM Evan Green <evan@rivosinc.com> wrote:
>>
>> Add a little helper method so it's easier to fetch a single value from
>> the hwprobe function when used within an ifunc selector.
>>
>> Signed-off-by: Evan Green <evan@rivosinc.com>
>>
>> ---
>>
>> Changes in v9:
>>  - Use __inline rather than inline so c89 compiles (build-many-glibcs)
>>
>> Changes in v7:
>>  - Introduced static inline helper (Richard)
>>
>>  sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 25 +++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
>> index fd3be5a411..ee7eed3960 100644
>> --- a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
>> +++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
>> @@ -22,6 +22,7 @@
>>
>>  #include <features.h>
>>  #include <stddef.h>
>> +#include <errno.h>
>>  #ifdef __has_include
>>  # if __has_include (<asm/hwprobe.h>)
>>  #  include <asm/hwprobe.h>
>> @@ -79,4 +80,28 @@ typedef int (*__riscv_hwprobe_t) (struct riscv_hwprobe *__pairs, size_t __pair_c
>>
>>  __END_DECLS
>>
>> +/* Helper function usable from ifunc selectors that probes a single key. */
>> +static __inline int
>> +__riscv_hwprobe_one(__riscv_hwprobe_t hwprobe_func,
>> +                    signed long long int key,
>> +                    unsigned long long int *value)
>> +{
>> +  struct riscv_hwprobe pair;
>> +  int rc;
>> +
>> +  if (!hwprobe_func)
>> +    return ENOSYS;
>> +
>> +  pair.key = key;
>> +  rc = hwprobe_func(&pair, 1, 0, NULL, 0);
>> +  if (rc)
>> +    return rc;
>> +
>> +  if (pair.key < 0)
>> +    return ENOENT;
>> +
>> +  *value = pair.value;
>> +  return 0;
>> +}
>> +
>>  #endif /* sys/hwprobe.h */
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9 5/6] riscv: Add ifunc helper method to hwprobe.h
  2023-12-06 18:17   ` enh
  2023-12-06 19:42     ` Palmer Dabbelt
@ 2023-12-07  9:46     ` Florian Weimer
  2023-12-07 16:32       ` enh
  1 sibling, 1 reply; 20+ messages in thread
From: Florian Weimer @ 2023-12-07  9:46 UTC (permalink / raw)
  To: enh; +Cc: Evan Green, libc-alpha, palmer, vineetg, slewis

> specifically, my anecdata from Android [both open-source libraries and
> apps] is that basically no-one uses ifuncs beyond libc and
> compiler-generated fmv ifuncs [though for riscv64 that's still "future
> work"], so there's no market for helpers for hand-written ifuncs?

It used to be more widely used on GNU/Linux, for example:

  implemented enabling sized-delete support at runtime

  Under gcc 4.5 or greater we're using ifunc function attribute to
  resolve sized delete operator to either plain delete implementation
  (default) or to sized delete (if enabled via environment variable
  TCMALLOC_ENABLE_SIZED_DELETE).
   
  <https://github.com/gperftools/gperftools/commit/6fdfc5a7f40ebcff3fdaada1a2994ff54be2f9c7>

After distributions turned on BIND_NOW, using IFUNCs for purposes like
that became much more difficult.  We are getting closer to re-enabling
getenv for use in IFUNC resolvers if the ELF dependencies express a
usable relocatoion order, but until my delayed IFUNC resolution patch is
accepted, IFUNC resolvers won't interoperate will with symbol
interposition or underlinking, so compatibility problems remain.

Outside GCC and glibc, I only see an IFUNC resolver in zlib (which is
known to be problematic because it's a frequently interposed library,
and it will go away with the switch to zlib-ng), and nauty
<https://pallini.di.uniroma1.it/>.  I don't know how that was produced.
I do not evidence of other uses for global symbols, neither from
function multi-versioning nor from manually written clones.  (There
might be some additional use in the form of IRELATIVE relocations.)

Thanks,
Florian


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9 5/6] riscv: Add ifunc helper method to hwprobe.h
  2023-12-07  9:46     ` Florian Weimer
@ 2023-12-07 16:32       ` enh
  2023-12-07 16:47         ` Florian Weimer
  0 siblings, 1 reply; 20+ messages in thread
From: enh @ 2023-12-07 16:32 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Evan Green, libc-alpha, palmer, vineetg, slewis

On Thu, Dec 7, 2023 at 1:46 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> > specifically, my anecdata from Android [both open-source libraries and
> > apps] is that basically no-one uses ifuncs beyond libc and
> > compiler-generated fmv ifuncs [though for riscv64 that's still "future
> > work"], so there's no market for helpers for hand-written ifuncs?
>
> It used to be more widely used on GNU/Linux, for example:
>
>   implemented enabling sized-delete support at runtime
>
>   Under gcc 4.5 or greater we're using ifunc function attribute to
>   resolve sized delete operator to either plain delete implementation
>   (default) or to sized delete (if enabled via environment variable
>   TCMALLOC_ENABLE_SIZED_DELETE).
>
>   <https://github.com/gperftools/gperftools/commit/6fdfc5a7f40ebcff3fdaada1a2994ff54be2f9c7>
>
> After distributions turned on BIND_NOW, using IFUNCs for purposes like
> that became much more difficult.  We are getting closer to re-enabling
> getenv for use in IFUNC resolvers if the ELF dependencies express a
> usable relocatoion order, but until my delayed IFUNC resolution patch is
> accepted, IFUNC resolvers won't interoperate will with symbol
> interposition or underlinking, so compatibility problems remain.
>
> Outside GCC and glibc, I only see an IFUNC resolver in zlib (which is
> known to be problematic because it's a frequently interposed library,
> and it will go away with the switch to zlib-ng),

interesting... which zlib was that? i didn't see an ifunc in any of
the zlib forks i looked at, most (all?) of which exist because madler
zlib won't accept architecture-specific optimizations :-)

because macOS/iOS doesn't have ifuncs (and presumably Windows doesn't
either?), the zlib forks i'm aware of "roll their own" with
pthread_once() and function pointers or equivalents, since that works
everywhere.

> and nauty
> <https://pallini.di.uniroma1.it/>.  I don't know how that was produced.
> I do not evidence of other uses for global symbols, neither from
> function multi-versioning nor from manually written clones.  (There
> might be some additional use in the form of IRELATIVE relocations.)
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9 5/6] riscv: Add ifunc helper method to hwprobe.h
  2023-12-07 16:32       ` enh
@ 2023-12-07 16:47         ` Florian Weimer
  2023-12-07 17:09           ` enh
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Weimer @ 2023-12-07 16:47 UTC (permalink / raw)
  To: enh; +Cc: Evan Green, libc-alpha, palmer, vineetg, slewis

> On Thu, Dec 7, 2023 at 1:46 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> > specifically, my anecdata from Android [both open-source libraries and
>> > apps] is that basically no-one uses ifuncs beyond libc and
>> > compiler-generated fmv ifuncs [though for riscv64 that's still "future
>> > work"], so there's no market for helpers for hand-written ifuncs?
>>
>> It used to be more widely used on GNU/Linux, for example:
>>
>>   implemented enabling sized-delete support at runtime
>>
>>   Under gcc 4.5 or greater we're using ifunc function attribute to
>>   resolve sized delete operator to either plain delete implementation
>>   (default) or to sized delete (if enabled via environment variable
>>   TCMALLOC_ENABLE_SIZED_DELETE).
>>
>>   <https://github.com/gperftools/gperftools/commit/6fdfc5a7f40ebcff3fdaada1a2994ff54be2f9c7>
>>
>> After distributions turned on BIND_NOW, using IFUNCs for purposes like
>> that became much more difficult.  We are getting closer to re-enabling
>> getenv for use in IFUNC resolvers if the ELF dependencies express a
>> usable relocatoion order, but until my delayed IFUNC resolution patch is
>> accepted, IFUNC resolvers won't interoperate will with symbol
>> interposition or underlinking, so compatibility problems remain.
>>
>> Outside GCC and glibc, I only see an IFUNC resolver in zlib (which is
>> known to be problematic because it's a frequently interposed library,
>> and it will go away with the switch to zlib-ng),
>
> interesting... which zlib was that? i didn't see an ifunc in any of
> the zlib forks i looked at, most (all?) of which exist because madler
> zlib won't accept architecture-specific optimizations :-)

It's our own fork I guess.

  <https://gitlab.com/redhat/centos-stream/rpms/zlib/-/blob/cec3445f7b544f6562f9ce52be6de97af7dc142b/zlib-1.2.11-Add-Power8-optimized-crc32.patch>

It's not even necessary for us because on ppc64le (as defined by
upstream/OpenPOWER), we always have at least POWER8 CPUs with support
for this instruction. 8-/

Thanks,
Florian


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9 5/6] riscv: Add ifunc helper method to hwprobe.h
  2023-12-07 16:47         ` Florian Weimer
@ 2023-12-07 17:09           ` enh
  0 siblings, 0 replies; 20+ messages in thread
From: enh @ 2023-12-07 17:09 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Evan Green, libc-alpha, palmer, vineetg, slewis

On Thu, Dec 7, 2023 at 8:47 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> > On Thu, Dec 7, 2023 at 1:46 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> > specifically, my anecdata from Android [both open-source libraries and
> >> > apps] is that basically no-one uses ifuncs beyond libc and
> >> > compiler-generated fmv ifuncs [though for riscv64 that's still "future
> >> > work"], so there's no market for helpers for hand-written ifuncs?
> >>
> >> It used to be more widely used on GNU/Linux, for example:
> >>
> >>   implemented enabling sized-delete support at runtime
> >>
> >>   Under gcc 4.5 or greater we're using ifunc function attribute to
> >>   resolve sized delete operator to either plain delete implementation
> >>   (default) or to sized delete (if enabled via environment variable
> >>   TCMALLOC_ENABLE_SIZED_DELETE).
> >>
> >>   <https://github.com/gperftools/gperftools/commit/6fdfc5a7f40ebcff3fdaada1a2994ff54be2f9c7>
> >>
> >> After distributions turned on BIND_NOW, using IFUNCs for purposes like
> >> that became much more difficult.  We are getting closer to re-enabling
> >> getenv for use in IFUNC resolvers if the ELF dependencies express a
> >> usable relocatoion order, but until my delayed IFUNC resolution patch is
> >> accepted, IFUNC resolvers won't interoperate will with symbol
> >> interposition or underlinking, so compatibility problems remain.
> >>
> >> Outside GCC and glibc, I only see an IFUNC resolver in zlib (which is
> >> known to be problematic because it's a frequently interposed library,
> >> and it will go away with the switch to zlib-ng),
> >
> > interesting... which zlib was that? i didn't see an ifunc in any of
> > the zlib forks i looked at, most (all?) of which exist because madler
> > zlib won't accept architecture-specific optimizations :-)
>
> It's our own fork I guess.

lol... yeah, i should have guessed that all distros probably hit this
problem before the "usual" forks even existed!

(/me really wishes the consensus around zlib-ng had formed before i
moved Android from madler to chromium-zlib a few years ago...)

>   <https://gitlab.com/redhat/centos-stream/rpms/zlib/-/blob/cec3445f7b544f6562f9ce52be6de97af7dc142b/zlib-1.2.11-Add-Power8-optimized-crc32.patch>
>
> It's not even necessary for us because on ppc64le (as defined by
> upstream/OpenPOWER), we always have at least POWER8 CPUs with support
> for this instruction. 8-/
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9 2/6] riscv: Add hwprobe vdso call support
  2023-12-06 17:34   ` Adhemerval Zanella Netto
@ 2023-12-11 21:02     ` Evan Green
  0 siblings, 0 replies; 20+ messages in thread
From: Evan Green @ 2023-12-11 21:02 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: libc-alpha, Florian Weimer, palmer, vineetg, slewis

On Wed, Dec 6, 2023 at 9:34 AM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 30/11/23 15:32, Evan Green wrote:
> > The new riscv_hwprobe syscall also comes with a vDSO for faster answers
> > to your most common questions. Call in today to speak with a kernel
> > representative near you!
> >
> > Signed-off-by: Evan Green <evan@rivosinc.com>
> > Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
> > ---
> >
> > (no changes since v7)
> >
> > Changes in v7:
> >  - Use INTERNAL_VSYSCALL_CALL (Florian)
> >
> > Changes in v3:
> >  - Add the "return" to the vsyscall
> >  - Fix up vdso arg types to match kernel v4 version
> >  - Remove ifdef around INLINE_VSYSCALL (Adhemerval)
> >
> > Changes in v2:
> >  - Add vDSO interface
> >
> >  sysdeps/unix/sysv/linux/dl-vdso-setup.c | 10 ++++++++++
> >  sysdeps/unix/sysv/linux/dl-vdso-setup.h |  3 +++
> >  sysdeps/unix/sysv/linux/riscv/hwprobe.c | 17 ++++++++++++++---
> >  sysdeps/unix/sysv/linux/riscv/sysdep.h  |  1 +
> >  4 files changed, 28 insertions(+), 3 deletions(-)
> >
> > diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.c b/sysdeps/unix/sysv/linux/dl-vdso-setup.c
> > index 97eaaeac37..ed8b1ef426 100644
> > --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.c
> > +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.c
> > @@ -71,6 +71,16 @@ PROCINFO_CLASS int (*_dl_vdso_clock_getres_time64) (clockid_t,
> >  # ifdef HAVE_GET_TBFREQ
> >  PROCINFO_CLASS uint64_t (*_dl_vdso_get_tbfreq)(void) RELRO;
> >  # endif
> > +
> > +/* RISC-V specific ones.  */
> > +# ifdef HAVE_RISCV_HWPROBE
> > +PROCINFO_CLASS int (*_dl_vdso_riscv_hwprobe)(void *,
> > +                                             size_t,
> > +                                             size_t,
> > +                                             unsigned long *,
> > +                                             unsigned int) RELRO;
> > +# endif
> > +
> >  #endif
> >
> >  #undef RELRO
> > diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.h b/sysdeps/unix/sysv/linux/dl-vdso-setup.h
> > index 867072b897..39eafd5316 100644
> > --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.h
> > +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.h
> > @@ -47,6 +47,9 @@ setup_vdso_pointers (void)
> >  #ifdef HAVE_GET_TBFREQ
> >    GLRO(dl_vdso_get_tbfreq) = dl_vdso_vsym (HAVE_GET_TBFREQ);
> >  #endif
> > +#ifdef HAVE_RISCV_HWPROBE
> > +  GLRO(dl_vdso_riscv_hwprobe) = dl_vdso_vsym (HAVE_RISCV_HWPROBE);
> > +#endif
> >  }
> >
> >  #endif
> > diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
> > index e28194e344..6a9a44657f 100644
> > --- a/sysdeps/unix/sysv/linux/riscv/hwprobe.c
> > +++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
> > @@ -27,9 +27,20 @@ int __riscv_hwprobe (struct riscv_hwprobe *__pairs, size_t __pair_count,
> >                    unsigned int __flags)
> >  {
> >    int r;
> > -
> > -  r = INTERNAL_SYSCALL_CALL (riscv_hwprobe, 5, __pairs, __pair_count,
> > -                             __cpu_count, __cpus, __flags);
> > +  __riscv_hwprobe_t vdso_hwprobe =
> > +    (__riscv_hwprobe_t)GLRO(dl_vdso_riscv_hwprobe);
> > +
> > +  if (vdso_hwprobe != NULL)
> > +    {
> > +      r = INTERNAL_VSYSCALL_CALL (vdso_hwprobe, 5, __pairs, __pair_count,
> > +                                  __cpu_count, __cpus, __flags);
> > +    }
> > +  else
> > +    {
> > +      r = INTERNAL_SYSCALL_CALL (riscv_hwprobe, 5, __pairs, __pair_count,
> > +                                 __cpu_count, __cpus, __flags);
> > +
> > +    }
> >
> >    /* Negate negative errno values to match pthreads API. */
> >    return -r;
>
> Maybe add a INTERNAL_VSYSCALL macro instead, similar to the INLINE_VSYSCALL?
>
> Something like:
>
> #define INTERNAL_VSYSCALL(name, nr, args...)                                  \
>   ({                                                                          \
>     __label__ out;                                                            \
>     __label__ iserr;                                                          \
>     long int sc_ret;                                                          \
>                                                                               \
>     __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);             \
>     if (vdsop != NULL)                                                        \
>       {                                                                       \
>         sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);                  \
>         if (!INTERNAL_SYSCALL_ERROR_P (sc_ret))                               \
>           goto out;                                                           \
>         if (INTERNAL_SYSCALL_ERRNO (sc_ret) != ENOSYS)                        \
>           goto iserr;                                                         \
>       }                                                                       \
>                                                                               \
>     sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                            \
>     if (INTERNAL_SYSCALL_ERROR_P (sc_ret))                                    \
>       {                                                                       \
>       iserr:                                                                  \
>         sc_ret = -1L;                                                         \
>       }                                                                       \
>   out:                                                                        \
>     sc_ret;                                                                   \
>   })
>
> It might be redundant for the riscv case, but it would be helpful for some
> archs that require either to fallback to syscall (it is just for some kernel
> version on specific archs) or require some extra care to call the vDSO
> (powerpc).
>

I think I see what you're going for. As per Florian's suggestion, I'd
like to return the positive error value on failure though, so it's
better if the macro returns the errno rather than discarding it. How
about this:

diff --git a/sysdeps/unix/sysv/linux/sysdep-vdso.h
b/sysdeps/unix/sysv/linux/sysdep-vdso.h
index a51b5deb37..f63fcc510a 100644
--- a/sysdeps/unix/sysv/linux/sysdep-vdso.h
+++ b/sysdeps/unix/sysv/linux/sysdep-vdso.h
@@ -53,4 +53,24 @@
     sc_ret;                                                                  \
   })

+#define INTERNAL_VSYSCALL(name, nr, args...)                                 \
+  ({                                                                         \
+    __label__ out;                                                           \
+    __label__ iserr;                                                         \
+    long int sc_ret;                                                         \
+                                                                             \
+    __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);            \
+    if (vdsop != NULL)                                                       \
+      {
               \
+       sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);                  \
+       if ((!INTERNAL_SYSCALL_ERROR_P (sc_ret)) ||                           \
+           (INTERNAL_SYSCALL_ERRNO (sc_ret) != ENOSYS))                      \
+         goto out;                                                           \
+      }
               \
+                                                                             \
+    sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                           \
+  out:                                                                       \
+    sc_ret;                                                                  \
+  })
+
 #endif /* SYSDEP_VDSO_LINUX_H  */

Holler if you see some changes you want (or are not a fan), otherwise
I think I've got the rest of your comments incorporated, and can send
out a v10 tomorrow.

-Evan

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-12-11 21:02 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-30 18:32 [PATCH v9 0/6] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
2023-11-30 18:32 ` [PATCH v9 1/6] riscv: Add Linux hwprobe syscall support Evan Green
2023-12-06 17:22   ` Adhemerval Zanella Netto
2023-12-06 17:24     ` Adhemerval Zanella Netto
2023-11-30 18:32 ` [PATCH v9 2/6] riscv: Add hwprobe vdso call support Evan Green
2023-12-06 17:34   ` Adhemerval Zanella Netto
2023-12-11 21:02     ` Evan Green
2023-11-30 18:32 ` [PATCH v9 3/6] riscv: Add __riscv_hwprobe pointer to ifunc calls Evan Green
2023-12-06  0:31   ` enh
2023-11-30 18:32 ` [PATCH v9 4/6] riscv: Enable multi-arg ifunc resolvers Evan Green
2023-11-30 18:32 ` [PATCH v9 5/6] riscv: Add ifunc helper method to hwprobe.h Evan Green
2023-12-06 17:39   ` Adhemerval Zanella Netto
2023-12-06 18:17   ` enh
2023-12-06 19:42     ` Palmer Dabbelt
2023-12-07  9:46     ` Florian Weimer
2023-12-07 16:32       ` enh
2023-12-07 16:47         ` Florian Weimer
2023-12-07 17:09           ` enh
2023-11-30 18:32 ` [PATCH v9 6/6] riscv: Add and use alignment-ignorant memcpy Evan Green
2023-12-06 17:07 ` [PATCH v9 0/6] RISC-V: ifunced memcpy using new kernel hwprobe interface Palmer Dabbelt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).