public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
@ 2024-02-14 14:31 Evan Green
  2024-02-14 14:31 ` [PATCH v12 1/7] riscv: Add Linux hwprobe syscall support Evan Green
                   ` (7 more replies)
  0 siblings, 8 replies; 31+ messages in thread
From: Evan Green @ 2024-02-14 14:31 UTC (permalink / raw)
  To: libc-alpha; +Cc: vineetg, Florian Weimer, slewis, palmer, Evan Green


This series illustrates the use of a recently accepted Linux syscall that
enumerates architectural information about the RISC-V cores the system
is running on. In this series we expose a small wrapper function around
the syscall. An ifunc selector for memcpy queries it to see if unaligned
access is "fast" on this hardware. If it is, it selects a newly provided
implementation of memcpy that doesn't work hard at aligning the src and
destination buffers.

For applications and libraries outside of glibc that want to use
__riscv_hwprobe() in ifunc selectors, this series also sends a pointer
to the riscv_hwprobe() function in as the second argument to ifunc
selectors. A new inline convenience function can help application and
library callers to check for validity and quickly probe a single key.

The memcpy implementation is independent enough from the rest of the
series that it can be omitted safely if desired.

Performance numbers were compared using a small test program [1], run on
a D1 Nezha board, which supports fast unaligned access. "Fast" here
means copying unaligned words is faster than copying byte-wise, but
still slower than copying aligned words. Here's the speed of various
memcpy()s with the generic implementation. The numbers before are using
v4's memcpy implementation, with the "copy last byte via overlapping
misaligned word" fix this should get slightly better.

memcpy size 1 count 1000000 offset 0 took 109564 us
memcpy size 3 count 1000000 offset 0 took 138425 us
memcpy size 4 count 1000000 offset 0 took 148374 us
memcpy size 7 count 1000000 offset 0 took 178433 us
memcpy size 8 count 1000000 offset 0 took 188430 us
memcpy size f count 1000000 offset 0 took 266118 us
memcpy size f count 1000000 offset 1 took 265940 us
memcpy size f count 1000000 offset 3 took 265934 us
memcpy size f count 1000000 offset 7 took 266215 us
memcpy size f count 1000000 offset 8 took 265954 us
memcpy size f count 1000000 offset 9 took 265886 us
memcpy size 10 count 1000000 offset 0 took 195308 us
memcpy size 11 count 1000000 offset 0 took 205161 us
memcpy size 17 count 1000000 offset 0 took 274376 us
memcpy size 18 count 1000000 offset 0 took 199188 us
memcpy size 19 count 1000000 offset 0 took 209258 us
memcpy size 1f count 1000000 offset 0 took 278263 us
memcpy size 20 count 1000000 offset 0 took 207364 us
memcpy size 21 count 1000000 offset 0 took 217143 us
memcpy size 3f count 1000000 offset 0 took 300023 us
memcpy size 40 count 1000000 offset 0 took 231063 us
memcpy size 41 count 1000000 offset 0 took 241259 us
memcpy size 7c count 100000 offset 0 took 32807 us
memcpy size 7f count 100000 offset 0 took 36274 us
memcpy size ff count 100000 offset 0 took 47818 us
memcpy size ff count 100000 offset 0 took 47932 us
memcpy size 100 count 100000 offset 0 took 40468 us
memcpy size 200 count 100000 offset 0 took 64245 us
memcpy size 27f count 100000 offset 0 took 82549 us
memcpy size 400 count 100000 offset 0 took 111254 us
memcpy size 407 count 100000 offset 0 took 119364 us
memcpy size 800 count 100000 offset 0 took 203899 us
memcpy size 87f count 100000 offset 0 took 222465 us
memcpy size 87f count 100000 offset 3 took 222289 us
memcpy size 1000 count 100000 offset 0 took 388846 us
memcpy size 1000 count 100000 offset 1 took 468827 us
memcpy size 1000 count 100000 offset 3 took 397098 us
memcpy size 1000 count 100000 offset 4 took 397379 us
memcpy size 1000 count 100000 offset 5 took 397368 us
memcpy size 1000 count 100000 offset 7 took 396867 us
memcpy size 1000 count 100000 offset 8 took 389227 us
memcpy size 1000 count 100000 offset 9 took 395949 us
memcpy size 3000 count 50000 offset 0 took 674837 us
memcpy size 3000 count 50000 offset 1 took 676944 us
memcpy size 3000 count 50000 offset 3 took 679709 us
memcpy size 3000 count 50000 offset 4 took 680829 us
memcpy size 3000 count 50000 offset 5 took 678024 us
memcpy size 3000 count 50000 offset 7 took 681097 us
memcpy size 3000 count 50000 offset 8 took 670004 us
memcpy size 3000 count 50000 offset 9 took 674553 us

Here is that same test run with the assembly memcpy() in this series:
memcpy size 1 count 1000000 offset 0 took 92703 us
memcpy size 3 count 1000000 offset 0 took 112527 us
memcpy size 4 count 1000000 offset 0 took 120481 us
memcpy size 7 count 1000000 offset 0 took 149558 us
memcpy size 8 count 1000000 offset 0 took 90617 us
memcpy size f count 1000000 offset 0 took 174373 us
memcpy size f count 1000000 offset 1 took 178615 us
memcpy size f count 1000000 offset 3 took 178845 us
memcpy size f count 1000000 offset 7 took 178636 us
memcpy size f count 1000000 offset 8 took 174442 us
memcpy size f count 1000000 offset 9 took 178660 us
memcpy size 10 count 1000000 offset 0 took 99845 us
memcpy size 11 count 1000000 offset 0 took 112522 us
memcpy size 17 count 1000000 offset 0 took 179735 us
memcpy size 18 count 1000000 offset 0 took 110870 us
memcpy size 19 count 1000000 offset 0 took 121472 us
memcpy size 1f count 1000000 offset 0 took 188231 us
memcpy size 20 count 1000000 offset 0 took 119571 us
memcpy size 21 count 1000000 offset 0 took 132429 us
memcpy size 3f count 1000000 offset 0 took 227021 us
memcpy size 40 count 1000000 offset 0 took 166416 us
memcpy size 41 count 1000000 offset 0 took 180206 us
memcpy size 7c count 100000 offset 0 took 28602 us
memcpy size 7f count 100000 offset 0 took 31676 us
memcpy size ff count 100000 offset 0 took 39257 us
memcpy size ff count 100000 offset 0 took 39176 us
memcpy size 100 count 100000 offset 0 took 21928 us
memcpy size 200 count 100000 offset 0 took 35814 us
memcpy size 27f count 100000 offset 0 took 60315 us
memcpy size 400 count 100000 offset 0 took 63652 us
memcpy size 407 count 100000 offset 0 took 73160 us
memcpy size 800 count 100000 offset 0 took 121532 us
memcpy size 87f count 100000 offset 0 took 147269 us
memcpy size 87f count 100000 offset 3 took 144744 us
memcpy size 1000 count 100000 offset 0 took 232057 us
memcpy size 1000 count 100000 offset 1 took 254319 us
memcpy size 1000 count 100000 offset 3 took 256973 us
memcpy size 1000 count 100000 offset 4 took 257655 us
memcpy size 1000 count 100000 offset 5 took 259456 us
memcpy size 1000 count 100000 offset 7 took 260849 us
memcpy size 1000 count 100000 offset 8 took 232347 us
memcpy size 1000 count 100000 offset 9 took 254330 us
memcpy size 3000 count 50000 offset 0 took 382376 us
memcpy size 3000 count 50000 offset 1 took 389872 us
memcpy size 3000 count 50000 offset 3 took 385310 us
memcpy size 3000 count 50000 offset 4 took 389748 us
memcpy size 3000 count 50000 offset 5 took 391707 us
memcpy size 3000 count 50000 offset 7 took 386778 us
memcpy size 3000 count 50000 offset 8 took 385691 us
memcpy size 3000 count 50000 offset 9 took 392030 us

The assembly routine is measurably better.

[1] https://pastebin.com/DRyECNQW


Changes in v12:
 - Updated version to 2.40

Changes in v11:
 - Update copyright year (Adhemerval)
 - Update copyright year (Adhemerval)
 - Remove superfluous 'signed' (Adhemerval)
 - Put helper before __END_DECLS (Adhemerval)
 - Add comment about +1 on non-Linux systems (Stefan)
 - Update copyrights to 2024 (Adhemerval)
 - Avoid implicit check in select_memcpy_ifunc (Adhemerval)
 - Remove hidden_def (__memcpy_noalignment) (Adhemerval)

Changes in v10:
 - Remove spurious 5 from syscall patch (Adhemerval)
 - Use one item per line in Makefile (Adhemerval)
 - Remove double underscores from __riscv_hwprobe definition
   (Adhemerval)
 - Use only spaces in macro definitions of hwprobe.h (Adhemerval)
 - Introduced INTERNAL_VSYSCALL patch
 - Remove leading underscores in definition (Adhemerval)
 - Remove spurious 5 from INTERNAL_SYSCALL_CALL (Adhemerval)
 - Use new INTERNAL_VSYSCALL macro instead (Adhemerval)
 - Hand in pointer to __riscv_hwprobe(), not vDSO function
 - Avoid implicit comparisons (Adhemerval)
 - One line per function in Makefile for memcpy (Adhemerval)
 - Space before argument-like things (Adhemerval)

Changes in v9:
 - Alphabetize new entries in libc.abilist (to pass checks)
 - Fix a couple of typos causing powerpc not to build
   (build-many-glibcs)
 - Use __inline rather than inline so c89 compiles (build-many-glibcs)

Changes in v8:
 - Fix missed 2.39 in abilists (Joseph)
 - Just return -r (Florian)

Changes in v7:
 - Bumped Versions up to 2.39 (Joseph)
 - Used INTERNAL_SYSCALL_CALL, and return positive errno to match
   pthreads API (Florian).
 - Remove __THROW since it creates a warning in combination with the
   fortified access decorators.
 - Use INTERNAL_VSYSCALL_CALL (Florian)
 - Remove __THROW from function pointer type, as it creates warnings
   together with __fortified_attr_access.
 - Introduced static inline helper (Richard)
 - Use new helper function in memcpy ifunc selector (Richard)

Changes in v6:
 - Prefixed __riscv_hwprobe() parameters names with __ to avoid user
   macro namespace pollution (Joseph)
 - Introduced riscv-ifunc.h for multi-arg ifunc selectors.
 - Fix a couple regressions in the assembly from v5 :/
 - Use passed hwprobe pointer in memcpy ifunc selector.

Changes in v5:
 - Do unaligned word access for final trailing bytes (Richard)

Changes in v4:
 - Remove __USE_GNU (Florian)
 - __nonnull, __wur, __THROW, and  __fortified_attr_access decorations
  (Florian)
 - change long to long int (Florian)
 - Fix comment formatting (Florian)
 - Update backup kernel header content copy.
 - Fix function declaration formatting (Florian)
 - Changed export versions to 2.38
 - Fixed comment style (Florian)

Changes in v3:
 - Update argument types to match v4 kernel interface
 - Add the "return" to the vsyscall
 - Fix up vdso arg types to match kernel v4 version
 - Remove ifdef around INLINE_VSYSCALL (Adhemerval)
 - Word align dest for large memcpy()s.
 - Add tags
 - Remove spurious blank line from sysdeps/riscv/memcpy.c

Changes in v2:
 - hwprobe.h: Use __has_include and duplicate Linux content to make
   compilation work when Linux headers are absent (Adhemerval)
 - hwprobe.h: Put declaration under __USE_GNU (Adhemerval)
 - Use INLINE_SYSCALL_CALL (Adhemerval)
 - Update versions
 - Update UNALIGNED_MASK to match kernel v3 series.
 - Add vDSO interface
 - Used _MASK instead of _FAST value itself.

Evan Green (7):
  riscv: Add Linux hwprobe syscall support
  linux: Introduce INTERNAL_VSYSCALL
  riscv: Add hwprobe vdso call support
  riscv: Add __riscv_hwprobe pointer to ifunc calls
  riscv: Enable multi-arg ifunc resolvers
  riscv: Add ifunc helper method to hwprobe.h
  riscv: Add and use alignment-ignorant memcpy

 include/libc-symbols.h                        |  28 ++--
 sysdeps/riscv/dl-irel.h                       |   9 +-
 sysdeps/riscv/memcopy.h                       |  26 ++++
 sysdeps/riscv/memcpy.c                        |  63 ++++++++
 sysdeps/riscv/memcpy_noalignment.S            | 136 ++++++++++++++++++
 sysdeps/riscv/riscv-ifunc.h                   |  27 ++++
 sysdeps/unix/sysv/linux/dl-vdso-setup.c       |  10 ++
 sysdeps/unix/sysv/linux/dl-vdso-setup.h       |   3 +
 sysdeps/unix/sysv/linux/riscv/Makefile        |  21 ++-
 sysdeps/unix/sysv/linux/riscv/Versions        |   3 +
 sysdeps/unix/sysv/linux/riscv/hwprobe.c       |  36 +++++
 .../unix/sysv/linux/riscv/memcpy-generic.c    |  24 ++++
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h   | 113 +++++++++++++++
 sysdeps/unix/sysv/linux/riscv/sysdep.h        |   1 +
 sysdeps/unix/sysv/linux/sysdep-vdso.h         |  19 +++
 17 files changed, 504 insertions(+), 17 deletions(-)
 create mode 100644 sysdeps/riscv/memcopy.h
 create mode 100644 sysdeps/riscv/memcpy.c
 create mode 100644 sysdeps/riscv/memcpy_noalignment.S
 create mode 100644 sysdeps/riscv/riscv-ifunc.h
 create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v12 1/7] riscv: Add Linux hwprobe syscall support
  2024-02-14 14:31 [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
@ 2024-02-14 14:31 ` Evan Green
  2024-02-14 14:31 ` [PATCH v12 2/7] linux: Introduce INTERNAL_VSYSCALL Evan Green
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Evan Green @ 2024-02-14 14:31 UTC (permalink / raw)
  To: libc-alpha; +Cc: vineetg, Florian Weimer, slewis, palmer, Evan Green

Add awareness and a thin wrapper function around a new Linux system call
that allows callers to get architecture and microarchitecture
information about the CPUs from the kernel. This can be used to
do things like dynamically choose a memcpy implementation.

Signed-off-by: Evan Green <evan@rivosinc.com>
---

Changes in v12:
 - Updated version to 2.40

Changes in v11:
 - Update copyright year (Adhemerval)

Changes in v10:
 - Remove spurious 5 from syscall patch (Adhemerval)
 - Use one item per line in Makefile (Adhemerval)
 - Remove double underscores from __riscv_hwprobe definition
   (Adhemerval)
 - Use only spaces in macro definitions of hwprobe.h (Adhemerval)

Changes in v9:
 - Alphabetize new entries in libc.abilist (to pass checks)

Changes in v8:
 - Fix missed 2.39 in abilists (Joseph)
 - Just return -r (Florian)

Changes in v7:
 - Bumped Versions up to 2.39 (Joseph)
 - Used INTERNAL_SYSCALL_CALL, and return positive errno to match
   pthreads API (Florian).
 - Remove __THROW since it creates a warning in combination with the
   fortified access decorators.

Changes in v6:
 - Prefixed __riscv_hwprobe() parameters names with __ to avoid user
   macro namespace pollution (Joseph)

Changes in v4:
 - Remove __USE_GNU (Florian)
 - __nonnull, __wur, __THROW, and  __fortified_attr_access decorations
  (Florian)
 - change long to long int (Florian)
 - Fix comment formatting (Florian)
 - Update backup kernel header content copy.
 - Fix function declaration formatting (Florian)
 - Changed export versions to 2.38

Changes in v3:
 - Update argument types to match v4 kernel interface

Changes in v2:
 - hwprobe.h: Use __has_include and duplicate Linux content to make
   compilation work when Linux headers are absent (Adhemerval)
 - hwprobe.h: Put declaration under __USE_GNU (Adhemerval)
 - Use INLINE_SYSCALL_CALL (Adhemerval)
 - Update versions
 - Update UNALIGNED_MASK to match kernel v3 series.

 sysdeps/unix/sysv/linux/riscv/Makefile        | 12 ++-
 sysdeps/unix/sysv/linux/riscv/Versions        |  3 +
 sysdeps/unix/sysv/linux/riscv/hwprobe.c       | 36 +++++++++
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |  1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |  1 +
 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h   | 74 +++++++++++++++++++
 6 files changed, 125 insertions(+), 2 deletions(-)
 create mode 100644 sysdeps/unix/sysv/linux/riscv/hwprobe.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h

diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile
index 4b6eacb32f..04abf226ad 100644
--- a/sysdeps/unix/sysv/linux/riscv/Makefile
+++ b/sysdeps/unix/sysv/linux/riscv/Makefile
@@ -1,6 +1,14 @@
 ifeq ($(subdir),misc)
-sysdep_headers += sys/cachectl.h
-sysdep_routines += flush-icache
+sysdep_headers += \
+  sys/cachectl.h \
+  sys/hwprobe.h \
+  # sysdep_headers
+
+sysdep_routines += \
+  flush-icache \
+  hwprobe \
+  # sysdep_routines
+
 endif
 
 ifeq ($(subdir),stdlib)
diff --git a/sysdeps/unix/sysv/linux/riscv/Versions b/sysdeps/unix/sysv/linux/riscv/Versions
index 5625d2a0b8..8183e9feb6 100644
--- a/sysdeps/unix/sysv/linux/riscv/Versions
+++ b/sysdeps/unix/sysv/linux/riscv/Versions
@@ -8,4 +8,7 @@ libc {
   GLIBC_2.27 {
     __riscv_flush_icache;
   }
+  GLIBC_2.40 {
+    __riscv_hwprobe;
+  }
 }
diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
new file mode 100644
index 0000000000..28a448175b
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
@@ -0,0 +1,36 @@
+/* RISC-V hardware feature probing support on Linux
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sys/syscall.h>
+#include <sys/hwprobe.h>
+#include <sysdep.h>
+#include <sysdep-vdso.h>
+
+int __riscv_hwprobe (struct riscv_hwprobe *pairs, size_t pair_count,
+		     size_t cpu_count, unsigned long int *cpus,
+		     unsigned int flags)
+{
+  int r;
+
+  r = INTERNAL_SYSCALL_CALL (riscv_hwprobe, pairs, pair_count,
+                             cpu_count, cpus, flags);
+
+  /* Negate negative errno values to match pthreads API. */
+  return -r;
+}
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index f90c94bc35..6397a9cb91 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2511,3 +2511,4 @@ GLIBC_2.39 stdc_trailing_zeros_ui F
 GLIBC_2.39 stdc_trailing_zeros_ul F
 GLIBC_2.39 stdc_trailing_zeros_ull F
 GLIBC_2.39 stdc_trailing_zeros_us F
+GLIBC_2.40 __riscv_hwprobe F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index e04ff93bd2..71bbf94f66 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2711,3 +2711,4 @@ GLIBC_2.39 stdc_trailing_zeros_ui F
 GLIBC_2.39 stdc_trailing_zeros_ul F
 GLIBC_2.39 stdc_trailing_zeros_ull F
 GLIBC_2.39 stdc_trailing_zeros_us F
+GLIBC_2.40 __riscv_hwprobe F
diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
new file mode 100644
index 0000000000..5592b9e100
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
@@ -0,0 +1,74 @@
+/* RISC-V architecture probe interface
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_HWPROBE_H
+#define _SYS_HWPROBE_H 1
+
+#include <features.h>
+#include <stddef.h>
+#ifdef __has_include
+# if __has_include (<asm/hwprobe.h>)
+#  include <asm/hwprobe.h>
+# endif
+#endif
+
+/* Define a (probably stale) version of the interface if the Linux headers
+   aren't present.  */
+#ifndef RISCV_HWPROBE_KEY_MVENDORID
+struct riscv_hwprobe {
+	signed long long int key;
+	unsigned long long int value;
+};
+
+#define RISCV_HWPROBE_KEY_MVENDORID 0
+#define RISCV_HWPROBE_KEY_MARCHID 1
+#define RISCV_HWPROBE_KEY_MIMPID 2
+#define RISCV_HWPROBE_KEY_BASE_BEHAVIOR 3
+#define  RISCV_HWPROBE_BASE_BEHAVIOR_IMA (1 << 0)
+#define RISCV_HWPROBE_KEY_IMA_EXT_0 4
+#define  RISCV_HWPROBE_IMA_FD (1 << 0)
+#define  RISCV_HWPROBE_IMA_C (1 << 1)
+#define  RISCV_HWPROBE_IMA_V (1 << 2)
+#define  RISCV_HWPROBE_EXT_ZBA (1 << 3)
+#define  RISCV_HWPROBE_EXT_ZBB (1 << 4)
+#define  RISCV_HWPROBE_EXT_ZBS (1 << 5)
+#define  RISCV_HWPROBE_EXT_ZICBOZ (1 << 6)
+#define RISCV_HWPROBE_KEY_CPUPERF_0 5
+#define  RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0)
+#define  RISCV_HWPROBE_MISALIGNED_EMULATED (1 << 0)
+#define  RISCV_HWPROBE_MISALIGNED_SLOW (2 << 0)
+#define  RISCV_HWPROBE_MISALIGNED_FAST (3 << 0)
+#define  RISCV_HWPROBE_MISALIGNED_UNSUPPORTED (4 << 0)
+#define  RISCV_HWPROBE_MISALIGNED_MASK (7 << 0)
+#define RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE 6
+
+#endif /* RISCV_HWPROBE_KEY_MVENDORID */
+
+__BEGIN_DECLS
+
+extern int __riscv_hwprobe (struct riscv_hwprobe *__pairs, size_t __pair_count,
+			    size_t __cpu_count, unsigned long int *__cpus,
+			    unsigned int __flags)
+     __nonnull ((1)) __wur
+     __fortified_attr_access (__read_write__, 1, 2)
+     __fortified_attr_access (__read_only__, 4, 3);
+
+__END_DECLS
+
+#endif /* sys/hwprobe.h */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v12 2/7] linux: Introduce INTERNAL_VSYSCALL
  2024-02-14 14:31 [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
  2024-02-14 14:31 ` [PATCH v12 1/7] riscv: Add Linux hwprobe syscall support Evan Green
@ 2024-02-14 14:31 ` Evan Green
  2024-02-16  7:44   ` Florian Weimer
  2024-02-14 14:31 ` [PATCH v12 3/7] riscv: Add hwprobe vdso call support Evan Green
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Evan Green @ 2024-02-14 14:31 UTC (permalink / raw)
  To: libc-alpha; +Cc: vineetg, Florian Weimer, slewis, palmer, Evan Green

Add an INTERNAL_VSYSCALL() macro that makes a vDSO call, falling back to
a regular syscall, but without setting errno. Instead, the return value
is plumbed straight out of the macro.

Signed-off-by: Evan Green <evan@rivosinc.com>
---

(no changes since v10)

Changes in v10:
 - Introduced INTERNAL_VSYSCALL patch

 sysdeps/unix/sysv/linux/sysdep-vdso.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/sysdeps/unix/sysv/linux/sysdep-vdso.h b/sysdeps/unix/sysv/linux/sysdep-vdso.h
index 189319ad98..f7d4b29b25 100644
--- a/sysdeps/unix/sysv/linux/sysdep-vdso.h
+++ b/sysdeps/unix/sysv/linux/sysdep-vdso.h
@@ -53,4 +53,23 @@
     sc_ret;								      \
   })
 
+#define INTERNAL_VSYSCALL(name, nr, args...)				      \
+  ({									      \
+    __label__ out;							      \
+    long int sc_ret;							      \
+									      \
+    __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);	      \
+    if (vdsop != NULL)							      \
+      {									      \
+	sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);	      	      \
+	if ((!INTERNAL_SYSCALL_ERROR_P (sc_ret)) ||			      \
+	    (INTERNAL_SYSCALL_ERRNO (sc_ret) != ENOSYS))		      \
+	  goto out;							      \
+      }									      \
+									      \
+    sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);		      	      \
+  out:									      \
+    sc_ret;								      \
+  })
+
 #endif /* SYSDEP_VDSO_LINUX_H  */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v12 3/7] riscv: Add hwprobe vdso call support
  2024-02-14 14:31 [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
  2024-02-14 14:31 ` [PATCH v12 1/7] riscv: Add Linux hwprobe syscall support Evan Green
  2024-02-14 14:31 ` [PATCH v12 2/7] linux: Introduce INTERNAL_VSYSCALL Evan Green
@ 2024-02-14 14:31 ` Evan Green
  2024-02-14 14:31 ` [PATCH v12 4/7] riscv: Add __riscv_hwprobe pointer to ifunc calls Evan Green
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Evan Green @ 2024-02-14 14:31 UTC (permalink / raw)
  To: libc-alpha; +Cc: vineetg, Florian Weimer, slewis, palmer, Evan Green

The new riscv_hwprobe syscall also comes with a vDSO for faster answers
to your most common questions. Call in today to speak with a kernel
representative near you!

Signed-off-by: Evan Green <evan@rivosinc.com>
---

(no changes since v10)

Changes in v10:
 - Remove leading underscores in definition (Adhemerval)
 - Remove spurious 5 from INTERNAL_SYSCALL_CALL (Adhemerval)
 - Use new INTERNAL_VSYSCALL macro instead (Adhemerval)

Changes in v7:
 - Use INTERNAL_VSYSCALL_CALL (Florian)

Changes in v3:
 - Add the "return" to the vsyscall
 - Fix up vdso arg types to match kernel v4 version
 - Remove ifdef around INLINE_VSYSCALL (Adhemerval)

Changes in v2:
 - Add vDSO interface

 sysdeps/unix/sysv/linux/dl-vdso-setup.c | 10 ++++++++++
 sysdeps/unix/sysv/linux/dl-vdso-setup.h |  3 +++
 sysdeps/unix/sysv/linux/riscv/hwprobe.c |  4 ++--
 sysdeps/unix/sysv/linux/riscv/sysdep.h  |  1 +
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.c b/sysdeps/unix/sysv/linux/dl-vdso-setup.c
index 5dd7ed9d12..3a44944dbb 100644
--- a/sysdeps/unix/sysv/linux/dl-vdso-setup.c
+++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.c
@@ -71,6 +71,16 @@ PROCINFO_CLASS int (*_dl_vdso_clock_getres_time64) (clockid_t,
 # ifdef HAVE_GET_TBFREQ
 PROCINFO_CLASS uint64_t (*_dl_vdso_get_tbfreq)(void) RELRO;
 # endif
+
+/* RISC-V specific ones.  */
+# ifdef HAVE_RISCV_HWPROBE
+PROCINFO_CLASS int (*_dl_vdso_riscv_hwprobe)(void *,
+                                             size_t,
+                                             size_t,
+                                             unsigned long *,
+                                             unsigned int) RELRO;
+# endif
+
 #endif
 
 #undef RELRO
diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.h b/sysdeps/unix/sysv/linux/dl-vdso-setup.h
index e87d886940..8aee5a8212 100644
--- a/sysdeps/unix/sysv/linux/dl-vdso-setup.h
+++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.h
@@ -47,6 +47,9 @@ setup_vdso_pointers (void)
 #ifdef HAVE_GET_TBFREQ
   GLRO(dl_vdso_get_tbfreq) = dl_vdso_vsym (HAVE_GET_TBFREQ);
 #endif
+#ifdef HAVE_RISCV_HWPROBE
+  GLRO(dl_vdso_riscv_hwprobe) = dl_vdso_vsym (HAVE_RISCV_HWPROBE);
+#endif
 }
 
 #endif
diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
index 28a448175b..e64c159eb3 100644
--- a/sysdeps/unix/sysv/linux/riscv/hwprobe.c
+++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c
@@ -28,8 +28,8 @@ int __riscv_hwprobe (struct riscv_hwprobe *pairs, size_t pair_count,
 {
   int r;
 
-  r = INTERNAL_SYSCALL_CALL (riscv_hwprobe, pairs, pair_count,
-                             cpu_count, cpus, flags);
+  r = INTERNAL_VSYSCALL (riscv_hwprobe, 5, pairs, pair_count,
+                         cpu_count, cpus, flags);
 
   /* Negate negative errno values to match pthreads API. */
   return -r;
diff --git a/sysdeps/unix/sysv/linux/riscv/sysdep.h b/sysdeps/unix/sysv/linux/riscv/sysdep.h
index 5583b96d23..ee015dfeb6 100644
--- a/sysdeps/unix/sysv/linux/riscv/sysdep.h
+++ b/sysdeps/unix/sysv/linux/riscv/sysdep.h
@@ -156,6 +156,7 @@
 /* List of system calls which are supported as vsyscalls (for RV32 and
    RV64).  */
 # define HAVE_GETCPU_VSYSCALL		"__vdso_getcpu"
+# define HAVE_RISCV_HWPROBE		"__vdso_riscv_hwprobe"
 
 # undef HAVE_INTERNAL_BRK_ADDR_SYMBOL
 # define HAVE_INTERNAL_BRK_ADDR_SYMBOL 1
-- 
2.34.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v12 4/7] riscv: Add __riscv_hwprobe pointer to ifunc calls
  2024-02-14 14:31 [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
                   ` (2 preceding siblings ...)
  2024-02-14 14:31 ` [PATCH v12 3/7] riscv: Add hwprobe vdso call support Evan Green
@ 2024-02-14 14:31 ` Evan Green
  2024-02-14 14:31 ` [PATCH v12 5/7] riscv: Enable multi-arg ifunc resolvers Evan Green
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Evan Green @ 2024-02-14 14:31 UTC (permalink / raw)
  To: libc-alpha; +Cc: vineetg, Florian Weimer, slewis, palmer, Evan Green

The new __riscv_hwprobe() function is designed to be used by ifunc
selector functions. This presents a challenge for applications and
libraries, as ifunc selectors are invoked before all relocations have
been performed, so an external call to __riscv_hwprobe() from an ifunc
selector won't work. To address this, pass a pointer to the
__riscv_hwprobe() function into ifunc selectors as the second
argument (alongside dl_hwcap, which was already being passed).

Include a typedef as well for convenience, so that ifunc users don't
have to go through contortions to call this routine. Users will need to
remember to check the second argument for NULL, to account for older
glibcs that don't pass the function.

Signed-off-by: Evan Green <evan@rivosinc.com>

---

(no changes since v10)

Changes in v10:
 - Hand in pointer to __riscv_hwprobe(), not vDSO function

Changes in v7:
 - Remove __THROW from function pointer type, as it creates warnings
   together with __fortified_attr_access.

 sysdeps/riscv/dl-irel.h                     |  9 +++++----
 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 10 ++++++++++
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/sysdeps/riscv/dl-irel.h b/sysdeps/riscv/dl-irel.h
index e6ab51ccd4..61b3511c96 100644
--- a/sysdeps/riscv/dl-irel.h
+++ b/sysdeps/riscv/dl-irel.h
@@ -24,6 +24,7 @@
 #include <unistd.h>
 #include <ldsodefs.h>
 #include <sysdep.h>
+#include <sys/hwprobe.h>
 
 #define ELF_MACHINE_IRELA	1
 
@@ -31,10 +32,10 @@ static inline ElfW(Addr)
 __attribute ((always_inline))
 elf_ifunc_invoke (ElfW(Addr) addr)
 {
-  /* The second argument is a void pointer to preserve the extension
-     fexibility.  */
-  return ((ElfW(Addr) (*) (uint64_t, void *)) (addr))
-	 (GLRO(dl_hwcap), NULL);
+  /* The third argument is a void pointer to preserve the extension
+     flexibility.  */
+  return ((ElfW(Addr) (*) (uint64_t, void *, void *)) (addr))
+	 (GLRO(dl_hwcap), __riscv_hwprobe, NULL);
 }
 
 static inline void
diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
index 5592b9e100..34a2e3dbc2 100644
--- a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
+++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
@@ -69,6 +69,16 @@ extern int __riscv_hwprobe (struct riscv_hwprobe *__pairs, size_t __pair_count,
      __fortified_attr_access (__read_write__, 1, 2)
      __fortified_attr_access (__read_only__, 4, 3);
 
+/* A pointer to the __riscv_hwprobe vDSO function is passed as the second
+   argument to ifunc selector routines. Include a function pointer type for
+   convenience in calling the function in those settings. */
+typedef int (*__riscv_hwprobe_t) (struct riscv_hwprobe *__pairs, size_t __pair_count,
+				  size_t __cpu_count, unsigned long int *__cpus,
+				  unsigned int __flags)
+     __nonnull ((1)) __wur
+     __fortified_attr_access (__read_write__, 1, 2)
+     __fortified_attr_access (__read_only__, 4, 3);
+
 __END_DECLS
 
 #endif /* sys/hwprobe.h */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v12 5/7] riscv: Enable multi-arg ifunc resolvers
  2024-02-14 14:31 [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
                   ` (3 preceding siblings ...)
  2024-02-14 14:31 ` [PATCH v12 4/7] riscv: Add __riscv_hwprobe pointer to ifunc calls Evan Green
@ 2024-02-14 14:31 ` Evan Green
  2024-02-16  7:45   ` Florian Weimer
  2024-02-14 14:31 ` [PATCH v12 6/7] riscv: Add ifunc helper method to hwprobe.h Evan Green
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Evan Green @ 2024-02-14 14:31 UTC (permalink / raw)
  To: libc-alpha; +Cc: vineetg, Florian Weimer, slewis, palmer, Evan Green

RISC-V is apparently the first architecture to pass more than one
argument to ifunc resolvers. The helper macros in libc-symbols.h,
__ifunc_resolver(), __ifunc(), and __ifunc_hidden(), are incompatible
with this. These macros have an "arg" (non-final) parameter that
represents the parameter signature of the ifunc resolver. The result is
an inability to pass the required comma through in a single preprocessor
argument.

Rearrange the __ifunc_resolver() macro to be variadic, and pass the
types as those variable parameters. Move the guts of __ifunc() and
__ifunc_hidden() into new macros, __ifunc_args(), and
__ifunc_args_hidden(), that pass the variable arguments down through to
__ifunc_resolver(). Then redefine __ifunc() and __ifunc_hidden(), which
are used in a bunch of places, to simply shuffle the arguments down into
__ifunc_args[_hidden]. Finally, define a riscv-ifunc.h header, which
provides convenience macros to those looking to write ifunc selectors
that use both arguments.

Signed-off-by: Evan Green <evan@rivosinc.com>

---

(no changes since v11)

Changes in v11:
 - Update copyright year (Adhemerval)

Changes in v9:
 - Fix a couple of typos causing powerpc not to build
   (build-many-glibcs)

Changes in v6:
 - Introduced riscv-ifunc.h for multi-arg ifunc selectors.

---
 include/libc-symbols.h      | 28 +++++++++++++++++-----------
 sysdeps/riscv/riscv-ifunc.h | 27 +++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 11 deletions(-)
 create mode 100644 sysdeps/riscv/riscv-ifunc.h

diff --git a/include/libc-symbols.h b/include/libc-symbols.h
index e21bb599b3..4367aa6740 100644
--- a/include/libc-symbols.h
+++ b/include/libc-symbols.h
@@ -667,9 +667,9 @@ for linking")
 #endif
 
 /* Helper / base  macros for indirect function symbols.  */
-#define __ifunc_resolver(type_name, name, expr, arg, init, classifier)	\
+#define __ifunc_resolver(type_name, name, expr, init, classifier, ...)	\
   classifier inhibit_stack_protector					\
-  __typeof (type_name) *name##_ifunc (arg)				\
+  __typeof (type_name) *name##_ifunc (__VA_ARGS__)			\
   {									\
     init ();								\
     __typeof (type_name) *res = expr;					\
@@ -677,13 +677,13 @@ for linking")
   }
 
 #ifdef HAVE_GCC_IFUNC
-# define __ifunc(type_name, name, expr, arg, init)			\
+# define __ifunc_args(type_name, name, expr, init, ...)			\
   extern __typeof (type_name) name __attribute__			\
 			      ((ifunc (#name "_ifunc")));		\
-  __ifunc_resolver (type_name, name, expr, arg, init, static)
+  __ifunc_resolver (type_name, name, expr, init, static, __VA_ARGS__)
 
-# define __ifunc_hidden(type_name, name, expr, arg, init)	\
-  __ifunc (type_name, name, expr, arg, init)
+# define __ifunc_args_hidden(type_name, name, expr, init, ...)		\
+  __ifunc_args (type_name, name, expr, init, __VA_ARGS__)
 #else
 /* Gcc does not support __attribute__ ((ifunc (...))).  Use the old behaviour
    as fallback.  But keep in mind that the debug information for the ifunc
@@ -694,18 +694,24 @@ for linking")
    different signatures.  (Gcc support is disabled at least on a ppc64le
    Ubuntu 14.04 system.)  */
 
-# define __ifunc(type_name, name, expr, arg, init)			\
+# define __ifunc_args(type_name, name, expr, init, ...)			\
   extern __typeof (type_name) name;					\
-  __typeof (type_name) *name##_ifunc (arg) __asm__ (#name);		\
-  __ifunc_resolver (type_name, name, expr, arg, init,)			\
+  __typeof (type_name) *name##_ifunc (__VA_ARGS__) __asm__ (#name);	\
+  __ifunc_resolver (type_name, name, expr, init, , __VA_ARGS__)		\
  __asm__ (".type " #name ", %gnu_indirect_function");
 
-# define __ifunc_hidden(type_name, name, expr, arg, init)		\
+# define __ifunc_args_hidden(type_name, name, expr, init, ...)		\
   extern __typeof (type_name) __libc_##name;				\
-  __ifunc (type_name, __libc_##name, expr, arg, init)			\
+  __ifunc (type_name, __libc_##name, expr, __VA_ARGS__, init)		\
   strong_alias (__libc_##name, name);
 #endif /* !HAVE_GCC_IFUNC  */
 
+#define __ifunc(type_name, name, expr, arg, init)			\
+  __ifunc_args (type_name, name, expr, init, arg)
+
+#define __ifunc_hidden(type_name, name, expr, arg, init)		\
+  __ifunc_args_hidden (type_name, name, expr, init, arg)
+
 /* The following macros are used for indirect function symbols in libc.so.
    First of all, you need to have the function prototyped somewhere,
    say in foo.h:
diff --git a/sysdeps/riscv/riscv-ifunc.h b/sysdeps/riscv/riscv-ifunc.h
new file mode 100644
index 0000000000..c77ab51548
--- /dev/null
+++ b/sysdeps/riscv/riscv-ifunc.h
@@ -0,0 +1,27 @@
+/* Common definition for ifunc resolvers.  Linux/RISC-V version.
+   This file is part of the GNU C Library.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <ifunc-init.h>
+#include <sys/hwprobe.h>
+
+#define INIT_ARCH()
+
+#define riscv_libc_ifunc(name, expr)				\
+  __ifunc_args (name, name, expr(hwcap, hwprobe), INIT_ARCH,	\
+                uint64_t hwcap, __riscv_hwprobe_t hwprobe)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v12 6/7] riscv: Add ifunc helper method to hwprobe.h
  2024-02-14 14:31 [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
                   ` (4 preceding siblings ...)
  2024-02-14 14:31 ` [PATCH v12 5/7] riscv: Enable multi-arg ifunc resolvers Evan Green
@ 2024-02-14 14:31 ` Evan Green
  2024-02-14 14:31 ` [PATCH v12 7/7] riscv: Add and use alignment-ignorant memcpy Evan Green
  2024-02-14 15:16 ` [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Adhemerval Zanella Netto
  7 siblings, 0 replies; 31+ messages in thread
From: Evan Green @ 2024-02-14 14:31 UTC (permalink / raw)
  To: libc-alpha; +Cc: vineetg, Florian Weimer, slewis, palmer, Evan Green

Add a little helper method so it's easier to fetch a single value from
the hwprobe function when used within an ifunc selector.

Signed-off-by: Evan Green <evan@rivosinc.com>

---

(no changes since v11)

Changes in v11:
 - Remove superfluous 'signed' (Adhemerval)
 - Put helper before __END_DECLS (Adhemerval)
 - Add comment about +1 on non-Linux systems (Stefan)

Changes in v10:
 - Avoid implicit comparisons (Adhemerval)

Changes in v9:
 - Use __inline rather than inline so c89 compiles (build-many-glibcs)

Changes in v7:
 - Introduced static inline helper (Richard)

 sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h | 29 +++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
index 34a2e3dbc2..8ecb43bb69 100644
--- a/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
+++ b/sysdeps/unix/sysv/linux/riscv/sys/hwprobe.h
@@ -22,6 +22,7 @@
 
 #include <features.h>
 #include <stddef.h>
+#include <errno.h>
 #ifdef __has_include
 # if __has_include (<asm/hwprobe.h>)
 #  include <asm/hwprobe.h>
@@ -79,6 +80,34 @@ typedef int (*__riscv_hwprobe_t) (struct riscv_hwprobe *__pairs, size_t __pair_c
      __fortified_attr_access (__read_write__, 1, 2)
      __fortified_attr_access (__read_only__, 4, 3);
 
+/* Helper function usable from ifunc selectors that probes a single key. */
+static __inline int
+__riscv_hwprobe_one(__riscv_hwprobe_t hwprobe_func,
+                    long long int key,
+                    unsigned long long int *value)
+{
+  struct riscv_hwprobe pair;
+  int rc;
+
+  /* Earlier versions of glibc pass NULL as the second ifunc parameter. Other C
+     libraries on non-Linux systems may pass +1 as this function pointer to
+     indicate no support. Users copying this function to exotic worlds
+     (non-Linux non-glibc) may want to do additional validity checks here. */
+  if (hwprobe_func == NULL)
+    return ENOSYS;
+
+  pair.key = key;
+  rc = hwprobe_func (&pair, 1, 0, NULL, 0);
+  if (rc != 0)
+    return rc;
+
+  if (pair.key < 0)
+    return ENOENT;
+
+  *value = pair.value;
+  return 0;
+}
+
 __END_DECLS
 
 #endif /* sys/hwprobe.h */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v12 7/7] riscv: Add and use alignment-ignorant memcpy
  2024-02-14 14:31 [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
                   ` (5 preceding siblings ...)
  2024-02-14 14:31 ` [PATCH v12 6/7] riscv: Add ifunc helper method to hwprobe.h Evan Green
@ 2024-02-14 14:31 ` Evan Green
  2024-02-14 15:16 ` [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Adhemerval Zanella Netto
  7 siblings, 0 replies; 31+ messages in thread
From: Evan Green @ 2024-02-14 14:31 UTC (permalink / raw)
  To: libc-alpha; +Cc: vineetg, Florian Weimer, slewis, palmer, Evan Green

For CPU implementations that can perform unaligned accesses with little
or no performance penalty, create a memcpy implementation that does not
bother aligning buffers. It will use a block of integer registers, a
single integer register, and fall back to bytewise copy for the
remainder.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>

---

(no changes since v11)

Changes in v11:
 - Update copyrights to 2024 (Adhemerval)
 - Avoid implicit check in select_memcpy_ifunc (Adhemerval)
 - Remove hidden_def (__memcpy_noalignment) (Adhemerval)

Changes in v10:
 - One line per function in Makefile for memcpy (Adhemerval)
 - Space before argument-like things (Adhemerval)

Changes in v7:
 - Use new helper function in memcpy ifunc selector (Richard)

Changes in v6:
 - Fix a couple regressions in the assembly from v5 :/
 - Use passed hwprobe pointer in memcpy ifunc selector.

Changes in v5:
 - Do unaligned word access for final trailing bytes (Richard)

Changes in v4:
 - Fixed comment style (Florian)

Changes in v3:
 - Word align dest for large memcpy()s.
 - Add tags
 - Remove spurious blank line from sysdeps/riscv/memcpy.c

Changes in v2:
 - Used _MASK instead of _FAST value itself.


---
 sysdeps/riscv/memcopy.h                       |  26 ++++
 sysdeps/riscv/memcpy.c                        |  63 ++++++++
 sysdeps/riscv/memcpy_noalignment.S            | 136 ++++++++++++++++++
 sysdeps/unix/sysv/linux/riscv/Makefile        |   9 ++
 .../unix/sysv/linux/riscv/memcpy-generic.c    |  24 ++++
 5 files changed, 258 insertions(+)
 create mode 100644 sysdeps/riscv/memcopy.h
 create mode 100644 sysdeps/riscv/memcpy.c
 create mode 100644 sysdeps/riscv/memcpy_noalignment.S
 create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c

diff --git a/sysdeps/riscv/memcopy.h b/sysdeps/riscv/memcopy.h
new file mode 100644
index 0000000000..27675964b0
--- /dev/null
+++ b/sysdeps/riscv/memcopy.h
@@ -0,0 +1,26 @@
+/* memcopy.h -- definitions for memory copy functions. RISC-V version.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdeps/generic/memcopy.h>
+
+/* Redefine the generic memcpy implementation to __memcpy_generic, so
+   the memcpy ifunc can select between generic and special versions.
+   In rtld, don't bother with all the ifunciness. */
+#if IS_IN (libc)
+#define MEMCPY __memcpy_generic
+#endif
diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/riscv/memcpy.c
new file mode 100644
index 0000000000..20f9548c44
--- /dev/null
+++ b/sysdeps/riscv/memcpy.c
@@ -0,0 +1,63 @@
+/* Multiple versions of memcpy.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017-2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#if IS_IN (libc)
+/* Redefine memcpy so that the compiler won't complain about the type
+   mismatch with the IFUNC selector in strong_alias, below.  */
+# undef memcpy
+# define memcpy __redirect_memcpy
+# include <stdint.h>
+# include <string.h>
+# include <ifunc-init.h>
+# include <riscv-ifunc.h>
+# include <sys/hwprobe.h>
+
+# define INIT_ARCH()
+
+extern __typeof (__redirect_memcpy) __libc_memcpy;
+
+extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
+extern __typeof (__redirect_memcpy) __memcpy_noalignment attribute_hidden;
+
+static inline __typeof (__redirect_memcpy) *
+select_memcpy_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_func)
+{
+  unsigned long long int value;
+
+  INIT_ARCH ();
+
+  if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_CPUPERF_0, &value) != 0)
+    return __memcpy_generic;
+
+  if ((value & RISCV_HWPROBE_MISALIGNED_MASK) == RISCV_HWPROBE_MISALIGNED_FAST)
+    return __memcpy_noalignment;
+
+  return __memcpy_generic;
+}
+
+riscv_libc_ifunc (__libc_memcpy, select_memcpy_ifunc);
+
+# undef memcpy
+strong_alias (__libc_memcpy, memcpy);
+# ifdef SHARED
+__hidden_ver1 (memcpy, __GI_memcpy, __redirect_memcpy)
+  __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memcpy);
+# endif
+
+#endif
diff --git a/sysdeps/riscv/memcpy_noalignment.S b/sysdeps/riscv/memcpy_noalignment.S
new file mode 100644
index 0000000000..621f8d028f
--- /dev/null
+++ b/sysdeps/riscv/memcpy_noalignment.S
@@ -0,0 +1,136 @@
+/* memcpy for RISC-V, ignoring buffer alignment
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+/* void *memcpy(void *, const void *, size_t) */
+ENTRY (__memcpy_noalignment)
+	move t6, a0  /* Preserve return value */
+
+	/* Bail if 0 */
+	beqz a2, 7f
+
+	/* Jump to byte copy if size < SZREG */
+	li a4, SZREG
+	bltu a2, a4, 5f
+
+	/* Round down to the nearest "page" size */
+	andi a4, a2, ~((16*SZREG)-1)
+	beqz a4, 2f
+	add a3, a1, a4
+
+	/* Copy the first word to get dest word aligned */
+	andi a5, t6, SZREG-1
+	beqz a5, 1f
+	REG_L a6, (a1)
+	REG_S a6, (t6)
+
+	/* Align dst up to a word, move src and size as well. */
+	addi t6, t6, SZREG-1
+	andi t6, t6, ~(SZREG-1)
+	sub a5, t6, a0
+	add a1, a1, a5
+	sub a2, a2, a5
+
+	/* Recompute page count */
+	andi a4, a2, ~((16*SZREG)-1)
+	beqz a4, 2f
+
+1:
+	/* Copy "pages" (chunks of 16 registers) */
+	REG_L a4,       0(a1)
+	REG_L a5,   SZREG(a1)
+	REG_L a6, 2*SZREG(a1)
+	REG_L a7, 3*SZREG(a1)
+	REG_L t0, 4*SZREG(a1)
+	REG_L t1, 5*SZREG(a1)
+	REG_L t2, 6*SZREG(a1)
+	REG_L t3, 7*SZREG(a1)
+	REG_L t4, 8*SZREG(a1)
+	REG_L t5, 9*SZREG(a1)
+	REG_S a4,       0(t6)
+	REG_S a5,   SZREG(t6)
+	REG_S a6, 2*SZREG(t6)
+	REG_S a7, 3*SZREG(t6)
+	REG_S t0, 4*SZREG(t6)
+	REG_S t1, 5*SZREG(t6)
+	REG_S t2, 6*SZREG(t6)
+	REG_S t3, 7*SZREG(t6)
+	REG_S t4, 8*SZREG(t6)
+	REG_S t5, 9*SZREG(t6)
+	REG_L a4, 10*SZREG(a1)
+	REG_L a5, 11*SZREG(a1)
+	REG_L a6, 12*SZREG(a1)
+	REG_L a7, 13*SZREG(a1)
+	REG_L t0, 14*SZREG(a1)
+	REG_L t1, 15*SZREG(a1)
+	addi a1, a1, 16*SZREG
+	REG_S a4, 10*SZREG(t6)
+	REG_S a5, 11*SZREG(t6)
+	REG_S a6, 12*SZREG(t6)
+	REG_S a7, 13*SZREG(t6)
+	REG_S t0, 14*SZREG(t6)
+	REG_S t1, 15*SZREG(t6)
+	addi t6, t6, 16*SZREG
+	bltu a1, a3, 1b
+	andi a2, a2, (16*SZREG)-1  /* Update count */
+
+2:
+	/* Remainder is smaller than a page, compute native word count */
+	beqz a2, 7f
+	andi a5, a2, ~(SZREG-1)
+	andi a2, a2, (SZREG-1)
+	add a3, a1, a5
+	/* Jump directly to last word if no words. */
+	beqz a5, 4f
+
+3:
+	/* Use single native register copy */
+	REG_L a4, 0(a1)
+	addi a1, a1, SZREG
+	REG_S a4, 0(t6)
+	addi t6, t6, SZREG
+	bltu a1, a3, 3b
+
+	/* Jump directly out if no more bytes */
+	beqz a2, 7f
+
+4:
+	/* Copy the last word unaligned */
+	add a3, a1, a2
+	add a4, t6, a2
+	REG_L a5, -SZREG(a3)
+	REG_S a5, -SZREG(a4)
+	ret
+
+5:
+	/* Copy bytes when the total copy is <SZREG */
+	add a3, a1, a2
+
+6:
+	lb a4, 0(a1)
+	addi a1, a1, 1
+	sb a4, 0(t6)
+	addi t6, t6, 1
+	bltu a1, a3, 6b
+
+7:
+	ret
+
+END (__memcpy_noalignment)
diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile
index 04abf226ad..398ff7418b 100644
--- a/sysdeps/unix/sysv/linux/riscv/Makefile
+++ b/sysdeps/unix/sysv/linux/riscv/Makefile
@@ -15,6 +15,15 @@ ifeq ($(subdir),stdlib)
 gen-as-const-headers += ucontext_i.sym
 endif
 
+ifeq ($(subdir),string)
+sysdep_routines += \
+  memcpy \
+  memcpy-generic \
+  memcpy_noalignment \
+  # sysdep_routines
+
+endif
+
 abi-variants := ilp32 ilp32d lp64 lp64d
 
 ifeq (,$(filter $(default-abi),$(abi-variants)))
diff --git a/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
new file mode 100644
index 0000000000..f06f4bda15
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
@@ -0,0 +1,24 @@
+/* Re-include the default memcpy implementation.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+
+extern __typeof (memcpy) __memcpy_generic;
+hidden_proto (__memcpy_generic)
+
+#include <string/memcpy.c>
-- 
2.34.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-14 14:31 [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
                   ` (6 preceding siblings ...)
  2024-02-14 14:31 ` [PATCH v12 7/7] riscv: Add and use alignment-ignorant memcpy Evan Green
@ 2024-02-14 15:16 ` Adhemerval Zanella Netto
  2024-02-14 15:24   ` Andreas Schwab
  2024-02-15 15:48   ` Evan Green
  7 siblings, 2 replies; 31+ messages in thread
From: Adhemerval Zanella Netto @ 2024-02-14 15:16 UTC (permalink / raw)
  To: Evan Green, libc-alpha
  Cc: vineetg, Florian Weimer, slewis, palmer, Jessica Clarke



On 14/02/24 11:31, Evan Green wrote:
> 
> This series illustrates the use of a recently accepted Linux syscall that
> enumerates architectural information about the RISC-V cores the system
> is running on. In this series we expose a small wrapper function around
> the syscall. An ifunc selector for memcpy queries it to see if unaligned
> access is "fast" on this hardware. If it is, it selects a newly provided
> implementation of memcpy that doesn't work hard at aligning the src and
> destination buffers.
> 
> For applications and libraries outside of glibc that want to use
> __riscv_hwprobe() in ifunc selectors, this series also sends a pointer
> to the riscv_hwprobe() function in as the second argument to ifunc
> selectors. A new inline convenience function can help application and
> library callers to check for validity and quickly probe a single key.

I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
I recall that Florian has tried to address the ifunc ordering and that
Jessica proposed solutions was not fully sufficient to address all the
ifunc corner cases.

[1] https://sourceware.org/pipermail/libc-alpha/2024-January/154082.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-14 15:16 ` [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Adhemerval Zanella Netto
@ 2024-02-14 15:24   ` Andreas Schwab
  2024-02-22 18:44     ` Palmer Dabbelt
  2024-02-15 15:48   ` Evan Green
  1 sibling, 1 reply; 31+ messages in thread
From: Andreas Schwab @ 2024-02-14 15:24 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: Evan Green, libc-alpha, vineetg, Florian Weimer, slewis, palmer,
	Jessica Clarke

On Feb 14 2024, Adhemerval Zanella Netto wrote:

> I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
> I recall that Florian has tried to address the ifunc ordering and that
> Jessica proposed solutions was not fully sufficient to address all the
> ifunc corner cases.

That will also help resolving bug 31317.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-14 15:16 ` [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Adhemerval Zanella Netto
  2024-02-14 15:24   ` Andreas Schwab
@ 2024-02-15 15:48   ` Evan Green
  2024-02-15 15:57     ` enh
  2024-02-15 16:50     ` Palmer Dabbelt
  1 sibling, 2 replies; 31+ messages in thread
From: Evan Green @ 2024-02-15 15:48 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: libc-alpha, vineetg, Florian Weimer, slewis, palmer, Jessica Clarke

On Wed, Feb 14, 2024 at 10:16 AM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 14/02/24 11:31, Evan Green wrote:
> >
> > This series illustrates the use of a recently accepted Linux syscall that
> > enumerates architectural information about the RISC-V cores the system
> > is running on. In this series we expose a small wrapper function around
> > the syscall. An ifunc selector for memcpy queries it to see if unaligned
> > access is "fast" on this hardware. If it is, it selects a newly provided
> > implementation of memcpy that doesn't work hard at aligning the src and
> > destination buffers.
> >
> > For applications and libraries outside of glibc that want to use
> > __riscv_hwprobe() in ifunc selectors, this series also sends a pointer
> > to the riscv_hwprobe() function in as the second argument to ifunc
> > selectors. A new inline convenience function can help application and
> > library callers to check for validity and quickly probe a single key.
>
> I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
> I recall that Florian has tried to address the ifunc ordering and that
> Jessica proposed solutions was not fully sufficient to address all the
> ifunc corner cases.
>
> [1] https://sourceware.org/pipermail/libc-alpha/2024-January/154082.html

I haven't invested the time yet in studying the resolver to understand
how feasible Jessica's suggestion is. I was sort of hoping Florian
would chime in with an "oh yeah let's do that" or "no, it doesn't
work". I suppose I still am :)

Alternatively, patches 1-3 of this series stand on their own. If the
ifunc aspect of this is gated on me doing a bunch of research, it
might at least make sense to land the first half now, to get Linux
users easy access to the __riscv_hwprobe() syscall and vDSO.

-Evan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-15 15:48   ` Evan Green
@ 2024-02-15 15:57     ` enh
  2024-02-15 16:50     ` Palmer Dabbelt
  1 sibling, 0 replies; 31+ messages in thread
From: enh @ 2024-02-15 15:57 UTC (permalink / raw)
  To: Evan Green
  Cc: Adhemerval Zanella Netto, libc-alpha, vineetg, Florian Weimer,
	slewis, palmer, Jessica Clarke

On Thu, Feb 15, 2024 at 7:49 AM Evan Green <evan@rivosinc.com> wrote:
>
> On Wed, Feb 14, 2024 at 10:16 AM Adhemerval Zanella Netto
> <adhemerval.zanella@linaro.org> wrote:
> >
> >
> >
> > On 14/02/24 11:31, Evan Green wrote:
> > >
> > > This series illustrates the use of a recently accepted Linux syscall that
> > > enumerates architectural information about the RISC-V cores the system
> > > is running on. In this series we expose a small wrapper function around
> > > the syscall. An ifunc selector for memcpy queries it to see if unaligned
> > > access is "fast" on this hardware. If it is, it selects a newly provided
> > > implementation of memcpy that doesn't work hard at aligning the src and
> > > destination buffers.
> > >
> > > For applications and libraries outside of glibc that want to use
> > > __riscv_hwprobe() in ifunc selectors, this series also sends a pointer
> > > to the riscv_hwprobe() function in as the second argument to ifunc
> > > selectors. A new inline convenience function can help application and
> > > library callers to check for validity and quickly probe a single key.
> >
> > I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
> > I recall that Florian has tried to address the ifunc ordering and that
> > Jessica proposed solutions was not fully sufficient to address all the
> > ifunc corner cases.
> >
> > [1] https://sourceware.org/pipermail/libc-alpha/2024-January/154082.html
>
> I haven't invested the time yet in studying the resolver to understand
> how feasible Jessica's suggestion is. I was sort of hoping Florian
> would chime in with an "oh yeah let's do that" or "no, it doesn't
> work". I suppose I still am :)

certainly on Android we looked at the amount of work that would
require (and the likely slowdown to dynamic linking, which is already
problematic for us, given that it's a large component of app launch
time that our zygote trick can't magic away) and decided that we're
unlikely to do it, even though in principle it's a nice idea.

> Alternatively, patches 1-3 of this series stand on their own. If the
> ifunc aspect of this is gated on me doing a bunch of research, it
> might at least make sense to land the first half now, to get Linux
> users easy access to the __riscv_hwprobe() syscall and vDSO.
>
> -Evan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-15 15:48   ` Evan Green
  2024-02-15 15:57     ` enh
@ 2024-02-15 16:50     ` Palmer Dabbelt
  2024-02-15 17:00       ` enh
  2024-02-15 17:42       ` Jessica Clarke
  1 sibling, 2 replies; 31+ messages in thread
From: Palmer Dabbelt @ 2024-02-15 16:50 UTC (permalink / raw)
  To: Evan Green
  Cc: adhemerval.zanella, libc-alpha, Vineet Gupta, fweimer, slewis, jrtc27

On Thu, 15 Feb 2024 07:48:03 PST (-0800), Evan Green wrote:
> On Wed, Feb 14, 2024 at 10:16 AM Adhemerval Zanella Netto
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 14/02/24 11:31, Evan Green wrote:
>> >
>> > This series illustrates the use of a recently accepted Linux syscall that
>> > enumerates architectural information about the RISC-V cores the system
>> > is running on. In this series we expose a small wrapper function around
>> > the syscall. An ifunc selector for memcpy queries it to see if unaligned
>> > access is "fast" on this hardware. If it is, it selects a newly provided
>> > implementation of memcpy that doesn't work hard at aligning the src and
>> > destination buffers.
>> >
>> > For applications and libraries outside of glibc that want to use
>> > __riscv_hwprobe() in ifunc selectors, this series also sends a pointer
>> > to the riscv_hwprobe() function in as the second argument to ifunc
>> > selectors. A new inline convenience function can help application and
>> > library callers to check for validity and quickly probe a single key.
>>
>> I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
>> I recall that Florian has tried to address the ifunc ordering and that
>> Jessica proposed solutions was not fully sufficient to address all the
>> ifunc corner cases.
>>
>> [1] https://sourceware.org/pipermail/libc-alpha/2024-January/154082.html
>
> I haven't invested the time yet in studying the resolver to understand
> how feasible Jessica's suggestion is. I was sort of hoping Florian
> would chime in with an "oh yeah let's do that" or "no, it doesn't
> work". I suppose I still am :)

The discussion's over here: 
https://inbox.sourceware.org/libc-alpha/20231017044641.pw2ccr6exvhtmhkk@google.com/ 

I was inclined to just ignore it: Florian explained what's going on with 
the multiple libraries constraint, so we're kind of just going around in 
circles at that point.

I'm also not sure the argument makes a whole lot of sense in the first 
place.  The core of the argument seems to be around binary compatibility 
with other kernels/libcs, but there's a ton of reasons why binaries 
aren't compatible between those systems.  So if glibc just has a more 
complex IFUNC resolving scheme and that ends up requiring the extra 
agrument, then users of glibc have to go deal with that -- other 
systems/libcs might make different decisions, but that's just how these 
things go.

Looks like Maskray is proposing making glibc's IFUNC resolving.  That's 
a generic glibc decision and I don't really understand this stuff well 
enough to have much of an opinion -- sure maybe the multi-library IFUNC 
resolving is overkill, but aside from doing some similar grep of all the 
sources (or trying some builds and such) I'm not sure how to tell if 
we're safe breaking the ABI there.

That said, we're not comitting to ABI stability until the release.  So 
if we decide to make a generic glibc change we can always go drop the 
argument, that should be easy.  Maybe we even just throw out a RFC patch 
to do it, unless I'm missing something the code is the easy part here 
(we're essentially just going back to one of the early versions, from 
before we knew about this resolving order complexity).

> Alternatively, patches 1-3 of this series stand on their own. If the
> ifunc aspect of this is gated on me doing a bunch of research, it
> might at least make sense to land the first half now, to get Linux
> users easy access to the __riscv_hwprobe() syscall and vDSO.

I'm OK with that, too.

>
> -Evan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-15 16:50     ` Palmer Dabbelt
@ 2024-02-15 17:00       ` enh
  2024-02-15 17:22         ` Palmer Dabbelt
  2024-02-15 17:42       ` Jessica Clarke
  1 sibling, 1 reply; 31+ messages in thread
From: enh @ 2024-02-15 17:00 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Evan Green, adhemerval.zanella, libc-alpha, Vineet Gupta,
	fweimer, slewis, jrtc27

On Thu, Feb 15, 2024 at 8:50 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
>
> On Thu, 15 Feb 2024 07:48:03 PST (-0800), Evan Green wrote:
> > On Wed, Feb 14, 2024 at 10:16 AM Adhemerval Zanella Netto
> > <adhemerval.zanella@linaro.org> wrote:
> >>
> >>
> >>
> >> On 14/02/24 11:31, Evan Green wrote:
> >> >
> >> > This series illustrates the use of a recently accepted Linux syscall that
> >> > enumerates architectural information about the RISC-V cores the system
> >> > is running on. In this series we expose a small wrapper function around
> >> > the syscall. An ifunc selector for memcpy queries it to see if unaligned
> >> > access is "fast" on this hardware. If it is, it selects a newly provided
> >> > implementation of memcpy that doesn't work hard at aligning the src and
> >> > destination buffers.
> >> >
> >> > For applications and libraries outside of glibc that want to use
> >> > __riscv_hwprobe() in ifunc selectors, this series also sends a pointer
> >> > to the riscv_hwprobe() function in as the second argument to ifunc
> >> > selectors. A new inline convenience function can help application and
> >> > library callers to check for validity and quickly probe a single key.
> >>
> >> I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
> >> I recall that Florian has tried to address the ifunc ordering and that
> >> Jessica proposed solutions was not fully sufficient to address all the
> >> ifunc corner cases.
> >>
> >> [1] https://sourceware.org/pipermail/libc-alpha/2024-January/154082.html
> >
> > I haven't invested the time yet in studying the resolver to understand
> > how feasible Jessica's suggestion is. I was sort of hoping Florian
> > would chime in with an "oh yeah let's do that" or "no, it doesn't
> > work". I suppose I still am :)
>
> The discussion's over here:
> https://inbox.sourceware.org/libc-alpha/20231017044641.pw2ccr6exvhtmhkk@google.com/
>
> I was inclined to just ignore it: Florian explained what's going on with
> the multiple libraries constraint, so we're kind of just going around in
> circles at that point.
>
> I'm also not sure the argument makes a whole lot of sense in the first
> place.  The core of the argument seems to be around binary compatibility
> with other kernels/libcs, but there's a ton of reasons why binaries
> aren't compatible between those systems.  So if glibc just has a more
> complex IFUNC resolving scheme and that ends up requiring the extra
> agrument, then users of glibc have to go deal with that -- other
> systems/libcs might make different decisions, but that's just how these
> things go.

yeah, i struggle with the portability premise too. not least because
macOS/iOS (which is obviously the most common "other platform" for me
with my Android hat on) doesn't have ifuncs at all. that's quite a big
blocker to any dream of portability.

as i've said before, in a survey of all the open source libraries that
go into Android, there are none that don't support macOS/iOS, and so
there are none that actually rely on ifuncs. ifunc usage is limited to
libc (which is why we libc maintainers get bikeshedded by stuff like
this) and compiler-generated FMV stuff. i'd argue that "real people"
(app developers) should probably be looking at the latter anyway, and
it's our job to make that work (which happens once^Wtwice, in llvm and
gcc).

also, the fact that Android is doing what's proposed here (with the
extra argument) means there'd be _some_ incompatibility even if glibc
and FreeBSD didn't do that.

(Android's only source incompatibility with glibc at the moment is the
lack of the single-probe helper inline, which i fear will encourage
_worse_ code, but suspect will actually be unused in practice anyway
for the same "ifuncs are 'assembler' for libc/toolchain, and
library/app developers doing stuff themselves will continue to use
regular function pointers like they do today" reasons.)

> Looks like Maskray is proposing making glibc's IFUNC resolving.  That's
> a generic glibc decision and I don't really understand this stuff well
> enough to have much of an opinion -- sure maybe the multi-library IFUNC
> resolving is overkill, but aside from doing some similar grep of all the
> sources (or trying some builds and such) I'm not sure how to tell if
> we're safe breaking the ABI there.
>
> That said, we're not comitting to ABI stability until the release.  So
> if we decide to make a generic glibc change we can always go drop the
> argument, that should be easy.  Maybe we even just throw out a RFC patch
> to do it, unless I'm missing something the code is the easy part here
> (we're essentially just going back to one of the early versions, from
> before we knew about this resolving order complexity).
>
> > Alternatively, patches 1-3 of this series stand on their own. If the
> > ifunc aspect of this is gated on me doing a bunch of research, it
> > might at least make sense to land the first half now, to get Linux
> > users easy access to the __riscv_hwprobe() syscall and vDSO.
>
> I'm OK with that, too.
>
> >
> > -Evan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-15 17:00       ` enh
@ 2024-02-15 17:22         ` Palmer Dabbelt
  2024-02-15 18:45           ` enh
  0 siblings, 1 reply; 31+ messages in thread
From: Palmer Dabbelt @ 2024-02-15 17:22 UTC (permalink / raw)
  To: enh
  Cc: Evan Green, adhemerval.zanella, libc-alpha, Vineet Gupta,
	fweimer, slewis, jrtc27

On Thu, 15 Feb 2024 09:00:03 PST (-0800), enh@google.com wrote:
> On Thu, Feb 15, 2024 at 8:50 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
>>
>> On Thu, 15 Feb 2024 07:48:03 PST (-0800), Evan Green wrote:
>> > On Wed, Feb 14, 2024 at 10:16 AM Adhemerval Zanella Netto
>> > <adhemerval.zanella@linaro.org> wrote:
>> >>
>> >>
>> >>
>> >> On 14/02/24 11:31, Evan Green wrote:
>> >> >
>> >> > This series illustrates the use of a recently accepted Linux syscall that
>> >> > enumerates architectural information about the RISC-V cores the system
>> >> > is running on. In this series we expose a small wrapper function around
>> >> > the syscall. An ifunc selector for memcpy queries it to see if unaligned
>> >> > access is "fast" on this hardware. If it is, it selects a newly provided
>> >> > implementation of memcpy that doesn't work hard at aligning the src and
>> >> > destination buffers.
>> >> >
>> >> > For applications and libraries outside of glibc that want to use
>> >> > __riscv_hwprobe() in ifunc selectors, this series also sends a pointer
>> >> > to the riscv_hwprobe() function in as the second argument to ifunc
>> >> > selectors. A new inline convenience function can help application and
>> >> > library callers to check for validity and quickly probe a single key.
>> >>
>> >> I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
>> >> I recall that Florian has tried to address the ifunc ordering and that
>> >> Jessica proposed solutions was not fully sufficient to address all the
>> >> ifunc corner cases.
>> >>
>> >> [1] https://sourceware.org/pipermail/libc-alpha/2024-January/154082.html
>> >
>> > I haven't invested the time yet in studying the resolver to understand
>> > how feasible Jessica's suggestion is. I was sort of hoping Florian
>> > would chime in with an "oh yeah let's do that" or "no, it doesn't
>> > work". I suppose I still am :)
>>
>> The discussion's over here:
>> https://inbox.sourceware.org/libc-alpha/20231017044641.pw2ccr6exvhtmhkk@google.com/
>>
>> I was inclined to just ignore it: Florian explained what's going on with
>> the multiple libraries constraint, so we're kind of just going around in
>> circles at that point.
>>
>> I'm also not sure the argument makes a whole lot of sense in the first
>> place.  The core of the argument seems to be around binary compatibility
>> with other kernels/libcs, but there's a ton of reasons why binaries
>> aren't compatible between those systems.  So if glibc just has a more
>> complex IFUNC resolving scheme and that ends up requiring the extra
>> agrument, then users of glibc have to go deal with that -- other
>> systems/libcs might make different decisions, but that's just how these
>> things go.
>
> yeah, i struggle with the portability premise too. not least because
> macOS/iOS (which is obviously the most common "other platform" for me
> with my Android hat on) doesn't have ifuncs at all. that's quite a big
> blocker to any dream of portability.
>
> as i've said before, in a survey of all the open source libraries that
> go into Android, there are none that don't support macOS/iOS, and so
> there are none that actually rely on ifuncs. ifunc usage is limited to
> libc (which is why we libc maintainers get bikeshedded by stuff like
> this) and compiler-generated FMV stuff. i'd argue that "real people"
> (app developers) should probably be looking at the latter anyway, and
> it's our job to make that work (which happens once^Wtwice, in llvm and
> gcc).
>
> also, the fact that Android is doing what's proposed here (with the
> extra argument) means there'd be _some_ incompatibility even if glibc
> and FreeBSD didn't do that.

So you're already committed to that ABI?

> (Android's only source incompatibility with glibc at the moment is the
> lack of the single-probe helper inline, which i fear will encourage
> _worse_ code, but suspect will actually be unused in practice anyway
> for the same "ifuncs are 'assembler' for libc/toolchain, and
> library/app developers doing stuff themselves will continue to use
> regular function pointers like they do today" reasons.)
>
>> Looks like Maskray is proposing making glibc's IFUNC resolving.  That's
>> a generic glibc decision and I don't really understand this stuff well
>> enough to have much of an opinion -- sure maybe the multi-library IFUNC
>> resolving is overkill, but aside from doing some similar grep of all the
>> sources (or trying some builds and such) I'm not sure how to tell if
>> we're safe breaking the ABI there.
>>
>> That said, we're not comitting to ABI stability until the release.  So
>> if we decide to make a generic glibc change we can always go drop the
>> argument, that should be easy.  Maybe we even just throw out a RFC patch
>> to do it, unless I'm missing something the code is the easy part here
>> (we're essentially just going back to one of the early versions, from
>> before we knew about this resolving order complexity).
>>
>> > Alternatively, patches 1-3 of this series stand on their own. If the
>> > ifunc aspect of this is gated on me doing a bunch of research, it
>> > might at least make sense to land the first half now, to get Linux
>> > users easy access to the __riscv_hwprobe() syscall and vDSO.
>>
>> I'm OK with that, too.
>>
>> >
>> > -Evan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-15 16:50     ` Palmer Dabbelt
  2024-02-15 17:00       ` enh
@ 2024-02-15 17:42       ` Jessica Clarke
  2024-02-15 18:52         ` enh
  1 sibling, 1 reply; 31+ messages in thread
From: Jessica Clarke @ 2024-02-15 17:42 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Evan Green, adhemerval.zanella, libc-alpha, Vineet Gupta,
	Florian Weimer, slewis

On 15 Feb 2024, at 16:50, Palmer Dabbelt <palmer@rivosinc.com> wrote:
> On Thu, 15 Feb 2024 07:48:03 PST (-0800), Evan Green wrote:
>> On Wed, Feb 14, 2024 at 10:16 AM Adhemerval Zanella Netto
>> <adhemerval.zanella@linaro.org> wrote:
>>> 
>>> 
>>> 
>>> On 14/02/24 11:31, Evan Green wrote:
>>> >
>>> > This series illustrates the use of a recently accepted Linux syscall that
>>> > enumerates architectural information about the RISC-V cores the system
>>> > is running on. In this series we expose a small wrapper function around
>>> > the syscall. An ifunc selector for memcpy queries it to see if unaligned
>>> > access is "fast" on this hardware. If it is, it selects a newly provided
>>> > implementation of memcpy that doesn't work hard at aligning the src and
>>> > destination buffers.
>>> >
>>> > For applications and libraries outside of glibc that want to use
>>> > __riscv_hwprobe() in ifunc selectors, this series also sends a pointer
>>> > to the riscv_hwprobe() function in as the second argument to ifunc
>>> > selectors. A new inline convenience function can help application and
>>> > library callers to check for validity and quickly probe a single key.
>>> 
>>> I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
>>> I recall that Florian has tried to address the ifunc ordering and that
>>> Jessica proposed solutions was not fully sufficient to address all the
>>> ifunc corner cases.
>>> 
>>> [1] https://sourceware.org/pipermail/libc-alpha/2024-January/154082.html
>> 
>> I haven't invested the time yet in studying the resolver to understand
>> how feasible Jessica's suggestion is. I was sort of hoping Florian
>> would chime in with an "oh yeah let's do that" or "no, it doesn't
>> work". I suppose I still am :)
> 
> The discussion's over here: https://inbox.sourceware.org/libc-alpha/20231017044641.pw2ccr6exvhtmhkk@google.com/
> I was inclined to just ignore it: Florian explained what's going on with the multiple libraries constraint, so we're kind of just going around in circles at that point.
> 
> I'm also not sure the argument makes a whole lot of sense in the first place.  The core of the argument seems to be around binary compatibility with other kernels/libcs,

Not binary compatibility, source compatibility. Yes, it’s part of that
platform’s ABI, but it leaks into the source code by means of the
resolver’s type and arguments. I would much rather see a standard IFUNC
resolver interface that can be specified in the psABI as a
cross-platform standard for any ELF-using OS that implements IFUNCs,
since the moment glibc adopts this as its implementation of choice
there’s no going back (without ugly “ignore that magic -1 argument,
it’s there for historical reasons” kinds of situations). Adopting the
stronger guarantees provided by FreeBSD’s rtld is one way to achieve
that, and seems the simplest to me, but I’m not wedded to that, other
suggestions that achieve the same end goal are just as welcome. Of
course glibc is free to ignore me, too, with a “not our problem, deal
with it, we’re the dominant OS”, but that would seem sad.

Jess

> but there's a ton of reasons why binaries aren't compatible between those systems.  So if glibc just has a more complex IFUNC resolving scheme and that ends up requiring the extra agrument, then users of glibc have to go deal with that -- other systems/libcs might make different decisions, but that's just how these things go.
> 
> Looks like Maskray is proposing making glibc's IFUNC resolving.  That's a generic glibc decision and I don't really understand this stuff well enough to have much of an opinion -- sure maybe the multi-library IFUNC resolving is overkill, but aside from doing some similar grep of all the sources (or trying some builds and such) I'm not sure how to tell if we're safe breaking the ABI there.
> 
> That said, we're not comitting to ABI stability until the release.  So if we decide to make a generic glibc change we can always go drop the argument, that should be easy.  Maybe we even just throw out a RFC patch to do it, unless I'm missing something the code is the easy part here (we're essentially just going back to one of the early versions, from before we knew about this resolving order complexity).
> 
>> Alternatively, patches 1-3 of this series stand on their own. If the
>> ifunc aspect of this is gated on me doing a bunch of research, it
>> might at least make sense to land the first half now, to get Linux
>> users easy access to the __riscv_hwprobe() syscall and vDSO.
> 
> I'm OK with that, too.
> 
>> 
>> -Evan



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-15 17:22         ` Palmer Dabbelt
@ 2024-02-15 18:45           ` enh
  2024-02-15 18:56             ` Palmer Dabbelt
  0 siblings, 1 reply; 31+ messages in thread
From: enh @ 2024-02-15 18:45 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Evan Green, adhemerval.zanella, libc-alpha, Vineet Gupta,
	fweimer, slewis, jrtc27

On Thu, Feb 15, 2024 at 9:22 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
>
> On Thu, 15 Feb 2024 09:00:03 PST (-0800), enh@google.com wrote:
> > On Thu, Feb 15, 2024 at 8:50 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> >>
> >> On Thu, 15 Feb 2024 07:48:03 PST (-0800), Evan Green wrote:
> >> > On Wed, Feb 14, 2024 at 10:16 AM Adhemerval Zanella Netto
> >> > <adhemerval.zanella@linaro.org> wrote:
> >> >>
> >> >>
> >> >>
> >> >> On 14/02/24 11:31, Evan Green wrote:
> >> >> >
> >> >> > This series illustrates the use of a recently accepted Linux syscall that
> >> >> > enumerates architectural information about the RISC-V cores the system
> >> >> > is running on. In this series we expose a small wrapper function around
> >> >> > the syscall. An ifunc selector for memcpy queries it to see if unaligned
> >> >> > access is "fast" on this hardware. If it is, it selects a newly provided
> >> >> > implementation of memcpy that doesn't work hard at aligning the src and
> >> >> > destination buffers.
> >> >> >
> >> >> > For applications and libraries outside of glibc that want to use
> >> >> > __riscv_hwprobe() in ifunc selectors, this series also sends a pointer
> >> >> > to the riscv_hwprobe() function in as the second argument to ifunc
> >> >> > selectors. A new inline convenience function can help application and
> >> >> > library callers to check for validity and quickly probe a single key.
> >> >>
> >> >> I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
> >> >> I recall that Florian has tried to address the ifunc ordering and that
> >> >> Jessica proposed solutions was not fully sufficient to address all the
> >> >> ifunc corner cases.
> >> >>
> >> >> [1] https://sourceware.org/pipermail/libc-alpha/2024-January/154082.html
> >> >
> >> > I haven't invested the time yet in studying the resolver to understand
> >> > how feasible Jessica's suggestion is. I was sort of hoping Florian
> >> > would chime in with an "oh yeah let's do that" or "no, it doesn't
> >> > work". I suppose I still am :)
> >>
> >> The discussion's over here:
> >> https://inbox.sourceware.org/libc-alpha/20231017044641.pw2ccr6exvhtmhkk@google.com/
> >>
> >> I was inclined to just ignore it: Florian explained what's going on with
> >> the multiple libraries constraint, so we're kind of just going around in
> >> circles at that point.
> >>
> >> I'm also not sure the argument makes a whole lot of sense in the first
> >> place.  The core of the argument seems to be around binary compatibility
> >> with other kernels/libcs, but there's a ton of reasons why binaries
> >> aren't compatible between those systems.  So if glibc just has a more
> >> complex IFUNC resolving scheme and that ends up requiring the extra
> >> agrument, then users of glibc have to go deal with that -- other
> >> systems/libcs might make different decisions, but that's just how these
> >> things go.
> >
> > yeah, i struggle with the portability premise too. not least because
> > macOS/iOS (which is obviously the most common "other platform" for me
> > with my Android hat on) doesn't have ifuncs at all. that's quite a big
> > blocker to any dream of portability.
> >
> > as i've said before, in a survey of all the open source libraries that
> > go into Android, there are none that don't support macOS/iOS, and so
> > there are none that actually rely on ifuncs. ifunc usage is limited to
> > libc (which is why we libc maintainers get bikeshedded by stuff like
> > this) and compiler-generated FMV stuff. i'd argue that "real people"
> > (app developers) should probably be looking at the latter anyway, and
> > it's our job to make that work (which happens once^Wtwice, in llvm and
> > gcc).
> >
> > also, the fact that Android is doing what's proposed here (with the
> > extra argument) means there'd be _some_ incompatibility even if glibc
> > and FreeBSD didn't do that.
>
> So you're already committed to that ABI?

"pretty much" given glibc release cycles and Android release cycles...

the current API addresses a real problem with current dynamic linkers,
and -- though it would have been great to make a different decision 16
years ago -- between the already stated reasons and the potential for
app compat issues (and a distaste for having the riscv64 linker
behavior differ from the already-shipped architectures), i think we'll
want something like that...

> > (Android's only source incompatibility with glibc at the moment is the
> > lack of the single-probe helper inline, which i fear will encourage
> > _worse_ code, but suspect will actually be unused in practice anyway
> > for the same "ifuncs are 'assembler' for libc/toolchain, and
> > library/app developers doing stuff themselves will continue to use
> > regular function pointers like they do today" reasons.)
> >
> >> Looks like Maskray is proposing making glibc's IFUNC resolving.  That's
> >> a generic glibc decision and I don't really understand this stuff well
> >> enough to have much of an opinion -- sure maybe the multi-library IFUNC
> >> resolving is overkill, but aside from doing some similar grep of all the
> >> sources (or trying some builds and such) I'm not sure how to tell if
> >> we're safe breaking the ABI there.
> >>
> >> That said, we're not comitting to ABI stability until the release.  So
> >> if we decide to make a generic glibc change we can always go drop the
> >> argument, that should be easy.  Maybe we even just throw out a RFC patch
> >> to do it, unless I'm missing something the code is the easy part here
> >> (we're essentially just going back to one of the early versions, from
> >> before we knew about this resolving order complexity).
> >>
> >> > Alternatively, patches 1-3 of this series stand on their own. If the
> >> > ifunc aspect of this is gated on me doing a bunch of research, it
> >> > might at least make sense to land the first half now, to get Linux
> >> > users easy access to the __riscv_hwprobe() syscall and vDSO.
> >>
> >> I'm OK with that, too.
> >>
> >> >
> >> > -Evan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-15 17:42       ` Jessica Clarke
@ 2024-02-15 18:52         ` enh
  2024-02-15 19:09           ` Jessica Clarke
  0 siblings, 1 reply; 31+ messages in thread
From: enh @ 2024-02-15 18:52 UTC (permalink / raw)
  To: Jessica Clarke
  Cc: Palmer Dabbelt, Evan Green, adhemerval.zanella, libc-alpha,
	Vineet Gupta, Florian Weimer, slewis

On Thu, Feb 15, 2024 at 9:42 AM Jessica Clarke <jrtc27@jrtc27.com> wrote:
>
> On 15 Feb 2024, at 16:50, Palmer Dabbelt <palmer@rivosinc.com> wrote:
> > On Thu, 15 Feb 2024 07:48:03 PST (-0800), Evan Green wrote:
> >> On Wed, Feb 14, 2024 at 10:16 AM Adhemerval Zanella Netto
> >> <adhemerval.zanella@linaro.org> wrote:
> >>>
> >>>
> >>>
> >>> On 14/02/24 11:31, Evan Green wrote:
> >>> >
> >>> > This series illustrates the use of a recently accepted Linux syscall that
> >>> > enumerates architectural information about the RISC-V cores the system
> >>> > is running on. In this series we expose a small wrapper function around
> >>> > the syscall. An ifunc selector for memcpy queries it to see if unaligned
> >>> > access is "fast" on this hardware. If it is, it selects a newly provided
> >>> > implementation of memcpy that doesn't work hard at aligning the src and
> >>> > destination buffers.
> >>> >
> >>> > For applications and libraries outside of glibc that want to use
> >>> > __riscv_hwprobe() in ifunc selectors, this series also sends a pointer
> >>> > to the riscv_hwprobe() function in as the second argument to ifunc
> >>> > selectors. A new inline convenience function can help application and
> >>> > library callers to check for validity and quickly probe a single key.
> >>>
> >>> I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
> >>> I recall that Florian has tried to address the ifunc ordering and that
> >>> Jessica proposed solutions was not fully sufficient to address all the
> >>> ifunc corner cases.
> >>>
> >>> [1] https://sourceware.org/pipermail/libc-alpha/2024-January/154082.html
> >>
> >> I haven't invested the time yet in studying the resolver to understand
> >> how feasible Jessica's suggestion is. I was sort of hoping Florian
> >> would chime in with an "oh yeah let's do that" or "no, it doesn't
> >> work". I suppose I still am :)
> >
> > The discussion's over here: https://inbox.sourceware.org/libc-alpha/20231017044641.pw2ccr6exvhtmhkk@google.com/
> > I was inclined to just ignore it: Florian explained what's going on with the multiple libraries constraint, so we're kind of just going around in circles at that point.
> >
> > I'm also not sure the argument makes a whole lot of sense in the first place.  The core of the argument seems to be around binary compatibility with other kernels/libcs,
>
> Not binary compatibility, source compatibility.

oh, yeah, obviously on the Android side i'm only talking about API,
not ABI. but i think Android-glibc source compatibility is more
meaningful -- by virtue of them both using Linux -- than anything
else.

> Yes, it’s part of that
> platform’s ABI, but it leaks into the source code by means of the
> resolver’s type and arguments. I would much rather see a standard IFUNC
> resolver interface that can be specified in the psABI as a
> cross-platform standard for any ELF-using OS that implements IFUNCs,
> since the moment glibc adopts this as its implementation of choice
> there’s no going back (without ugly “ignore that magic -1 argument,
> it’s there for historical reasons” kinds of situations). Adopting the
> stronger guarantees provided by FreeBSD’s rtld is one way to achieve
> that, and seems the simplest to me, but I’m not wedded to that, other
> suggestions that achieve the same end goal are just as welcome. Of
> course glibc is free to ignore me, too, with a “not our problem, deal
> with it, we’re the dominant OS”, but that would seem sad.

_not_ passing the function pointer to the ifunc resolver only
addresses source compatibility as far as the function _signature_
goes, but does nothing for the _body_ of the function unless BSD
copies the Linux __riscv_hwprobe() syscall exactly. and tracks every
change after that. that seems unlikely?

plus, like i keep saying --- iOS and macOS don't have ifuncs (and i
don't think Windows does either?), so for any code that actually cares
about portability, handling your own function pointer is the only
truly portable game in town, and all this fussing about a niche
alternative is just noise.

(and we see that in practice, not just theory: open source libraries
have voted with their feet, and nothing we do here will affect that.)

i don't see that anyone actually gains anything from your proposal,
whereas the "pass the function pointer to the ifunc resolver"
workaround addresses actual problems with currently-shipping systems
in a way that doesn't require large and complex behavior changes (for
no gain _in the context of ifuncs_).

> Jess
>
> > but there's a ton of reasons why binaries aren't compatible between those systems.  So if glibc just has a more complex IFUNC resolving scheme and that ends up requiring the extra agrument, then users of glibc have to go deal with that -- other systems/libcs might make different decisions, but that's just how these things go.
> >
> > Looks like Maskray is proposing making glibc's IFUNC resolving.  That's a generic glibc decision and I don't really understand this stuff well enough to have much of an opinion -- sure maybe the multi-library IFUNC resolving is overkill, but aside from doing some similar grep of all the sources (or trying some builds and such) I'm not sure how to tell if we're safe breaking the ABI there.
> >
> > That said, we're not comitting to ABI stability until the release.  So if we decide to make a generic glibc change we can always go drop the argument, that should be easy.  Maybe we even just throw out a RFC patch to do it, unless I'm missing something the code is the easy part here (we're essentially just going back to one of the early versions, from before we knew about this resolving order complexity).
> >
> >> Alternatively, patches 1-3 of this series stand on their own. If the
> >> ifunc aspect of this is gated on me doing a bunch of research, it
> >> might at least make sense to land the first half now, to get Linux
> >> users easy access to the __riscv_hwprobe() syscall and vDSO.
> >
> > I'm OK with that, too.
> >
> >>
> >> -Evan
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-15 18:45           ` enh
@ 2024-02-15 18:56             ` Palmer Dabbelt
  0 siblings, 0 replies; 31+ messages in thread
From: Palmer Dabbelt @ 2024-02-15 18:56 UTC (permalink / raw)
  To: enh
  Cc: Evan Green, adhemerval.zanella, libc-alpha, Vineet Gupta,
	fweimer, slewis, jrtc27

On Thu, 15 Feb 2024 10:45:10 PST (-0800), enh@google.com wrote:
> On Thu, Feb 15, 2024 at 9:22 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
>>
>> On Thu, 15 Feb 2024 09:00:03 PST (-0800), enh@google.com wrote:
>> > On Thu, Feb 15, 2024 at 8:50 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
>> >>
>> >> On Thu, 15 Feb 2024 07:48:03 PST (-0800), Evan Green wrote:
>> >> > On Wed, Feb 14, 2024 at 10:16 AM Adhemerval Zanella Netto
>> >> > <adhemerval.zanella@linaro.org> wrote:
>> >> >>
>> >> >>
>> >> >>
>> >> >> On 14/02/24 11:31, Evan Green wrote:
>> >> >> >
>> >> >> > This series illustrates the use of a recently accepted Linux syscall that
>> >> >> > enumerates architectural information about the RISC-V cores the system
>> >> >> > is running on. In this series we expose a small wrapper function around
>> >> >> > the syscall. An ifunc selector for memcpy queries it to see if unaligned
>> >> >> > access is "fast" on this hardware. If it is, it selects a newly provided
>> >> >> > implementation of memcpy that doesn't work hard at aligning the src and
>> >> >> > destination buffers.
>> >> >> >
>> >> >> > For applications and libraries outside of glibc that want to use
>> >> >> > __riscv_hwprobe() in ifunc selectors, this series also sends a pointer
>> >> >> > to the riscv_hwprobe() function in as the second argument to ifunc
>> >> >> > selectors. A new inline convenience function can help application and
>> >> >> > library callers to check for validity and quickly probe a single key.
>> >> >>
>> >> >> I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
>> >> >> I recall that Florian has tried to address the ifunc ordering and that
>> >> >> Jessica proposed solutions was not fully sufficient to address all the
>> >> >> ifunc corner cases.
>> >> >>
>> >> >> [1] https://sourceware.org/pipermail/libc-alpha/2024-January/154082.html
>> >> >
>> >> > I haven't invested the time yet in studying the resolver to understand
>> >> > how feasible Jessica's suggestion is. I was sort of hoping Florian
>> >> > would chime in with an "oh yeah let's do that" or "no, it doesn't
>> >> > work". I suppose I still am :)
>> >>
>> >> The discussion's over here:
>> >> https://inbox.sourceware.org/libc-alpha/20231017044641.pw2ccr6exvhtmhkk@google.com/
>> >>
>> >> I was inclined to just ignore it: Florian explained what's going on with
>> >> the multiple libraries constraint, so we're kind of just going around in
>> >> circles at that point.
>> >>
>> >> I'm also not sure the argument makes a whole lot of sense in the first
>> >> place.  The core of the argument seems to be around binary compatibility
>> >> with other kernels/libcs, but there's a ton of reasons why binaries
>> >> aren't compatible between those systems.  So if glibc just has a more
>> >> complex IFUNC resolving scheme and that ends up requiring the extra
>> >> agrument, then users of glibc have to go deal with that -- other
>> >> systems/libcs might make different decisions, but that's just how these
>> >> things go.
>> >
>> > yeah, i struggle with the portability premise too. not least because
>> > macOS/iOS (which is obviously the most common "other platform" for me
>> > with my Android hat on) doesn't have ifuncs at all. that's quite a big
>> > blocker to any dream of portability.
>> >
>> > as i've said before, in a survey of all the open source libraries that
>> > go into Android, there are none that don't support macOS/iOS, and so
>> > there are none that actually rely on ifuncs. ifunc usage is limited to
>> > libc (which is why we libc maintainers get bikeshedded by stuff like
>> > this) and compiler-generated FMV stuff. i'd argue that "real people"
>> > (app developers) should probably be looking at the latter anyway, and
>> > it's our job to make that work (which happens once^Wtwice, in llvm and
>> > gcc).
>> >
>> > also, the fact that Android is doing what's proposed here (with the
>> > extra argument) means there'd be _some_ incompatibility even if glibc
>> > and FreeBSD didn't do that.
>>
>> So you're already committed to that ABI?
>
> "pretty much" given glibc release cycles and Android release cycles...

Ya, cool -- I was just curious, I figured it'd be something along those 
lines (we also talked a bit off list).

> the current API addresses a real problem with current dynamic linkers,
> and -- though it would have been great to make a different decision 16
> years ago -- between the already stated reasons and the potential for
> app compat issues (and a distaste for having the riscv64 linker
> behavior differ from the already-shipped architectures), i think we'll
> want something like that...

Ya, I'm generally in favor of making RISC-V as unsurprising as possible.  
Given how complex the IFUNC stuff is it seems like a recipe for disaster 
to have different rules around linker ordering or cross-object symbol 
resoltion for RISC-V.  Maybe folks want to change that for all the 
ports, but that seems like a fairly complex project and I'm worried 
we'll miss another release waiting on it.

As far as I can tell the downside here is just that we have an extra 
argument pointing to a hwprobe function, that doesn't seem so painful to 
just support forever.  It's not like this possible change to the 
resolution rules would make calling via the pre-resolved function 
argument invalid, it'd just be unnecessary.

Thus I'm generally in favor of merging this, so

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>

though I'm cool holding off a bit on merging it, in case anyone thinks 
that's a bad idea.

>> > (Android's only source incompatibility with glibc at the moment is the
>> > lack of the single-probe helper inline, which i fear will encourage
>> > _worse_ code, but suspect will actually be unused in practice anyway
>> > for the same "ifuncs are 'assembler' for libc/toolchain, and
>> > library/app developers doing stuff themselves will continue to use
>> > regular function pointers like they do today" reasons.)
>> >
>> >> Looks like Maskray is proposing making glibc's IFUNC resolving.  That's
>> >> a generic glibc decision and I don't really understand this stuff well
>> >> enough to have much of an opinion -- sure maybe the multi-library IFUNC
>> >> resolving is overkill, but aside from doing some similar grep of all the
>> >> sources (or trying some builds and such) I'm not sure how to tell if
>> >> we're safe breaking the ABI there.
>> >>
>> >> That said, we're not comitting to ABI stability until the release.  So
>> >> if we decide to make a generic glibc change we can always go drop the
>> >> argument, that should be easy.  Maybe we even just throw out a RFC patch
>> >> to do it, unless I'm missing something the code is the easy part here
>> >> (we're essentially just going back to one of the early versions, from
>> >> before we knew about this resolving order complexity).
>> >>
>> >> > Alternatively, patches 1-3 of this series stand on their own. If the
>> >> > ifunc aspect of this is gated on me doing a bunch of research, it
>> >> > might at least make sense to land the first half now, to get Linux
>> >> > users easy access to the __riscv_hwprobe() syscall and vDSO.
>> >>
>> >> I'm OK with that, too.
>> >>
>> >> >
>> >> > -Evan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-15 18:52         ` enh
@ 2024-02-15 19:09           ` Jessica Clarke
  2024-02-22 19:41             ` Palmer Dabbelt
  0 siblings, 1 reply; 31+ messages in thread
From: Jessica Clarke @ 2024-02-15 19:09 UTC (permalink / raw)
  To: enh
  Cc: Palmer Dabbelt, Evan Green, adhemerval.zanella, libc-alpha,
	Vineet Gupta, Florian Weimer, slewis

On 15 Feb 2024, at 18:52, enh <enh@google.com> wrote:
> 
> On Thu, Feb 15, 2024 at 9:42 AM Jessica Clarke <jrtc27@jrtc27.com> wrote:
>> 
>> On 15 Feb 2024, at 16:50, Palmer Dabbelt <palmer@rivosinc.com> wrote:
>>> On Thu, 15 Feb 2024 07:48:03 PST (-0800), Evan Green wrote:
>>>> On Wed, Feb 14, 2024 at 10:16 AM Adhemerval Zanella Netto
>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 14/02/24 11:31, Evan Green wrote:
>>>>>> 
>>>>>> This series illustrates the use of a recently accepted Linux syscall that
>>>>>> enumerates architectural information about the RISC-V cores the system
>>>>>> is running on. In this series we expose a small wrapper function around
>>>>>> the syscall. An ifunc selector for memcpy queries it to see if unaligned
>>>>>> access is "fast" on this hardware. If it is, it selects a newly provided
>>>>>> implementation of memcpy that doesn't work hard at aligning the src and
>>>>>> destination buffers.
>>>>>> 
>>>>>> For applications and libraries outside of glibc that want to use
>>>>>> __riscv_hwprobe() in ifunc selectors, this series also sends a pointer
>>>>>> to the riscv_hwprobe() function in as the second argument to ifunc
>>>>>> selectors. A new inline convenience function can help application and
>>>>>> library callers to check for validity and quickly probe a single key.
>>>>> 
>>>>> I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
>>>>> I recall that Florian has tried to address the ifunc ordering and that
>>>>> Jessica proposed solutions was not fully sufficient to address all the
>>>>> ifunc corner cases.
>>>>> 
>>>>> [1] https://sourceware.org/pipermail/libc-alpha/2024-January/154082.html
>>>> 
>>>> I haven't invested the time yet in studying the resolver to understand
>>>> how feasible Jessica's suggestion is. I was sort of hoping Florian
>>>> would chime in with an "oh yeah let's do that" or "no, it doesn't
>>>> work". I suppose I still am :)
>>> 
>>> The discussion's over here: https://inbox.sourceware.org/libc-alpha/20231017044641.pw2ccr6exvhtmhkk@google.com/
>>> I was inclined to just ignore it: Florian explained what's going on with the multiple libraries constraint, so we're kind of just going around in circles at that point.
>>> 
>>> I'm also not sure the argument makes a whole lot of sense in the first place.  The core of the argument seems to be around binary compatibility with other kernels/libcs,
>> 
>> Not binary compatibility, source compatibility.
> 
> oh, yeah, obviously on the Android side i'm only talking about API,
> not ABI. but i think Android-glibc source compatibility is more
> meaningful -- by virtue of them both using Linux -- than anything
> else.
> 
>> Yes, it’s part of that
>> platform’s ABI, but it leaks into the source code by means of the
>> resolver’s type and arguments. I would much rather see a standard IFUNC
>> resolver interface that can be specified in the psABI as a
>> cross-platform standard for any ELF-using OS that implements IFUNCs,
>> since the moment glibc adopts this as its implementation of choice
>> there’s no going back (without ugly “ignore that magic -1 argument,
>> it’s there for historical reasons” kinds of situations). Adopting the
>> stronger guarantees provided by FreeBSD’s rtld is one way to achieve
>> that, and seems the simplest to me, but I’m not wedded to that, other
>> suggestions that achieve the same end goal are just as welcome. Of
>> course glibc is free to ignore me, too, with a “not our problem, deal
>> with it, we’re the dominant OS”, but that would seem sad.
> 
> _not_ passing the function pointer to the ifunc resolver only
> addresses source compatibility as far as the function _signature_
> goes, but does nothing for the _body_ of the function unless BSD
> copies the Linux __riscv_hwprobe() syscall exactly. and tracks every
> change after that. that seems unlikely?

Yes, nobody’s going to copy that, but that doesn’t mean it’s going to
forever be the only interface that you could use. One function existing
doesn’t stop another from doing so, but one argument being used for one
purpose does stop it being used for another. That is, in the proposed
world where you just call __riscv_hwprobe directly and it works, there
is a path towards having a cross-platform solution, namely defining a
simplified, OS-independent interface for querying what extensions you
have (which won’t cover all use cases, and doesn’t need to, so long as
it can deal with the ones that IFUNCs typically care about, i.e. simple
questions like “do I have V/Zb*/etc?”), which could just be a simple
wrapper around __riscv_hwprobe on Linux. But if you pass
__riscv_hwprobe as a function pointer as the solution to the problem,
you’re tied to the Linux interface, and specifically the
__riscv_hwprobe one at that, unless you want to add an argument every
time you come up with an interface that software might want to use in a
resolver for querying RISC-V information. I’m trying to look at the
bigger picture here rather than the very narrow view of “what’s the
easiest path to being able to write an IFUNCed memcpy in glibc for
Linux”.

> plus, like i keep saying --- iOS and macOS don't have ifuncs (and i
> don't think Windows does either?), so for any code that actually cares
> about portability, handling your own function pointer is the only
> truly portable game in town, and all this fussing about a niche
> alternative is just noise.

Firstly, none of those support RISC-V. Secondly, just because some OSes
are more different than others doesn’t mean we shouldn’t aim for as
cross-platform interfaces as possible. Otherwise why are we bothering
to have a RISC-V psABI at all rather than just letting glibc define
whatever it wants?

> (and we see that in practice, not just theory: open source libraries
> have voted with their feet, and nothing we do here will affect that.)
> 
> i don't see that anyone actually gains anything from your proposal,
> whereas the "pass the function pointer to the ifunc resolver"
> workaround addresses actual problems with currently-shipping systems
> in a way that doesn't require large and complex behavior changes (for
> no gain _in the context of ifuncs_).

Well, it’s a general solution to the specific problem seen here, which
arguably makes the glibc implementation of IFUNCs on RISC-V less
special, as you can just call the normal __riscv_hwprobe if you want to.
The gain is generality, which includes, but isn’t limited to, being
cross-platform.

Jess

>> Jess
>> 
>>> but there's a ton of reasons why binaries aren't compatible between those systems.  So if glibc just has a more complex IFUNC resolving scheme and that ends up requiring the extra agrument, then users of glibc have to go deal with that -- other systems/libcs might make different decisions, but that's just how these things go.
>>> 
>>> Looks like Maskray is proposing making glibc's IFUNC resolving.  That's a generic glibc decision and I don't really understand this stuff well enough to have much of an opinion -- sure maybe the multi-library IFUNC resolving is overkill, but aside from doing some similar grep of all the sources (or trying some builds and such) I'm not sure how to tell if we're safe breaking the ABI there.
>>> 
>>> That said, we're not comitting to ABI stability until the release.  So if we decide to make a generic glibc change we can always go drop the argument, that should be easy.  Maybe we even just throw out a RFC patch to do it, unless I'm missing something the code is the easy part here (we're essentially just going back to one of the early versions, from before we knew about this resolving order complexity).
>>> 
>>>> Alternatively, patches 1-3 of this series stand on their own. If the
>>>> ifunc aspect of this is gated on me doing a bunch of research, it
>>>> might at least make sense to land the first half now, to get Linux
>>>> users easy access to the __riscv_hwprobe() syscall and vDSO.
>>> 
>>> I'm OK with that, too.
>>> 
>>>> 
>>>> -Evan



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 2/7] linux: Introduce INTERNAL_VSYSCALL
  2024-02-14 14:31 ` [PATCH v12 2/7] linux: Introduce INTERNAL_VSYSCALL Evan Green
@ 2024-02-16  7:44   ` Florian Weimer
  2024-02-23 23:12     ` Evan Green
  0 siblings, 1 reply; 31+ messages in thread
From: Florian Weimer @ 2024-02-16  7:44 UTC (permalink / raw)
  To: Evan Green; +Cc: libc-alpha, vineetg, slewis, palmer

* Evan Green:

> Add an INTERNAL_VSYSCALL() macro that makes a vDSO call, falling back to
> a regular syscall, but without setting errno. Instead, the return value
> is plumbed straight out of the macro.
>
> Signed-off-by: Evan Green <evan@rivosinc.com>
> ---
>
> (no changes since v10)
>
> Changes in v10:
>  - Introduced INTERNAL_VSYSCALL patch
>
>  sysdeps/unix/sysv/linux/sysdep-vdso.h | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/sysdeps/unix/sysv/linux/sysdep-vdso.h b/sysdeps/unix/sysv/linux/sysdep-vdso.h
> index 189319ad98..f7d4b29b25 100644
> --- a/sysdeps/unix/sysv/linux/sysdep-vdso.h
> +++ b/sysdeps/unix/sysv/linux/sysdep-vdso.h
> @@ -53,4 +53,23 @@
>      sc_ret;								      \
>    })
>  
> +#define INTERNAL_VSYSCALL(name, nr, args...)				      \
> +  ({									      \
> +    __label__ out;							      \
> +    long int sc_ret;							      \
> +									      \
> +    __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);	      \
> +    if (vdsop != NULL)							      \
> +      {									      \
> +	sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);	      	      \
> +	if ((!INTERNAL_SYSCALL_ERROR_P (sc_ret)) ||			      \
> +	    (INTERNAL_SYSCALL_ERRNO (sc_ret) != ENOSYS))		      \
> +	  goto out;							      \
> +      }									      \
> +									      \
> +    sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);		      	      \
> +  out:									      \
> +    sc_ret;								      \
> +  })
> +
>  #endif /* SYSDEP_VDSO_LINUX_H  */

I wonder if this is clearer without the label and with a conditional
expression instead?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 5/7] riscv: Enable multi-arg ifunc resolvers
  2024-02-14 14:31 ` [PATCH v12 5/7] riscv: Enable multi-arg ifunc resolvers Evan Green
@ 2024-02-16  7:45   ` Florian Weimer
  0 siblings, 0 replies; 31+ messages in thread
From: Florian Weimer @ 2024-02-16  7:45 UTC (permalink / raw)
  To: Evan Green; +Cc: libc-alpha, vineetg, slewis, palmer

* Evan Green:

> diff --git a/include/libc-symbols.h b/include/libc-symbols.h
> index e21bb599b3..4367aa6740 100644
> --- a/include/libc-symbols.h
> +++ b/include/libc-symbols.h
> @@ -667,9 +667,9 @@ for linking")
>  #endif
>  
>  /* Helper / base  macros for indirect function symbols.  */
> -#define __ifunc_resolver(type_name, name, expr, arg, init, classifier)	\
> +#define __ifunc_resolver(type_name, name, expr, init, classifier, ...)	\
>    classifier inhibit_stack_protector					\
> -  __typeof (type_name) *name##_ifunc (arg)				\
> +  __typeof (type_name) *name##_ifunc (__VA_ARGS__)			\
>    {									\
>      init ();								\
>      __typeof (type_name) *res = expr;					\
> @@ -677,13 +677,13 @@ for linking")
>    }
>  
>  #ifdef HAVE_GCC_IFUNC
> -# define __ifunc(type_name, name, expr, arg, init)			\
> +# define __ifunc_args(type_name, name, expr, init, ...)			\
>    extern __typeof (type_name) name __attribute__			\
>  			      ((ifunc (#name "_ifunc")));		\
> -  __ifunc_resolver (type_name, name, expr, arg, init, static)
> +  __ifunc_resolver (type_name, name, expr, init, static, __VA_ARGS__)
>  
> -# define __ifunc_hidden(type_name, name, expr, arg, init)	\
> -  __ifunc (type_name, name, expr, arg, init)
> +# define __ifunc_args_hidden(type_name, name, expr, init, ...)		\
> +  __ifunc_args (type_name, name, expr, init, __VA_ARGS__)
>  #else
>  /* Gcc does not support __attribute__ ((ifunc (...))).  Use the old behaviour
>     as fallback.  But keep in mind that the debug information for the ifunc
> @@ -694,18 +694,24 @@ for linking")
>     different signatures.  (Gcc support is disabled at least on a ppc64le
>     Ubuntu 14.04 system.)  */
>  
> -# define __ifunc(type_name, name, expr, arg, init)			\
> +# define __ifunc_args(type_name, name, expr, init, ...)			\
>    extern __typeof (type_name) name;					\
> -  __typeof (type_name) *name##_ifunc (arg) __asm__ (#name);		\
> -  __ifunc_resolver (type_name, name, expr, arg, init,)			\
> +  __typeof (type_name) *name##_ifunc (__VA_ARGS__) __asm__ (#name);	\
> +  __ifunc_resolver (type_name, name, expr, init, , __VA_ARGS__)		\
>   __asm__ (".type " #name ", %gnu_indirect_function");
>  
> -# define __ifunc_hidden(type_name, name, expr, arg, init)		\
> +# define __ifunc_args_hidden(type_name, name, expr, init, ...)		\
>    extern __typeof (type_name) __libc_##name;				\
> -  __ifunc (type_name, __libc_##name, expr, arg, init)			\
> +  __ifunc (type_name, __libc_##name, expr, __VA_ARGS__, init)		\
>    strong_alias (__libc_##name, name);
>  #endif /* !HAVE_GCC_IFUNC  */
>  
> +#define __ifunc(type_name, name, expr, arg, init)			\
> +  __ifunc_args (type_name, name, expr, init, arg)
> +
> +#define __ifunc_hidden(type_name, name, expr, arg, init)		\
> +  __ifunc_args_hidden (type_name, name, expr, init, arg)
> +
>  /* The following macros are used for indirect function symbols in libc.so.
>     First of all, you need to have the function prototyped somewhere,
>     say in foo.h:

The generic parts look good to me.

Reviewed-by: Florian Weimer <fweimer@redhat.com>

Thanks,
Florian


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-14 15:24   ` Andreas Schwab
@ 2024-02-22 18:44     ` Palmer Dabbelt
  0 siblings, 0 replies; 31+ messages in thread
From: Palmer Dabbelt @ 2024-02-22 18:44 UTC (permalink / raw)
  To: schwab
  Cc: adhemerval.zanella, Evan Green, libc-alpha, Vineet Gupta,
	fweimer, slewis, jrtc27

On Wed, 14 Feb 2024 07:24:01 PST (-0800), schwab@suse.de wrote:
> On Feb 14 2024, Adhemerval Zanella Netto wrote:
>
>> I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
>> I recall that Florian has tried to address the ifunc ordering and that
>> Jessica proposed solutions was not fully sufficient to address all the
>> ifunc corner cases.
>
> That will also help resolving bug 31317.

Sorry I missed that, I just commented on the bug.  I think this thread 
got stuck on some other stuff, so maybe let's talk over there?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface
  2024-02-15 19:09           ` Jessica Clarke
@ 2024-02-22 19:41             ` Palmer Dabbelt
  0 siblings, 0 replies; 31+ messages in thread
From: Palmer Dabbelt @ 2024-02-22 19:41 UTC (permalink / raw)
  To: jrtc27
  Cc: enh, Evan Green, adhemerval.zanella, libc-alpha, Vineet Gupta,
	fweimer, slewis

On Thu, 15 Feb 2024 11:09:13 PST (-0800), jrtc27@jrtc27.com wrote:
> On 15 Feb 2024, at 18:52, enh <enh@google.com> wrote:
>> 
>> On Thu, Feb 15, 2024 at 9:42 AM Jessica Clarke <jrtc27@jrtc27.com> wrote:
>>> 
>>> On 15 Feb 2024, at 16:50, Palmer Dabbelt <palmer@rivosinc.com> wrote:
>>>> On Thu, 15 Feb 2024 07:48:03 PST (-0800), Evan Green wrote:
>>>>> On Wed, Feb 14, 2024 at 10:16 AM Adhemerval Zanella Netto
>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 14/02/24 11:31, Evan Green wrote:
>>>>>>> 
>>>>>>> This series illustrates the use of a recently accepted Linux syscall that
>>>>>>> enumerates architectural information about the RISC-V cores the system
>>>>>>> is running on. In this series we expose a small wrapper function around
>>>>>>> the syscall. An ifunc selector for memcpy queries it to see if unaligned
>>>>>>> access is "fast" on this hardware. If it is, it selects a newly provided
>>>>>>> implementation of memcpy that doesn't work hard at aligning the src and
>>>>>>> destination buffers.
>>>>>>> 
>>>>>>> For applications and libraries outside of glibc that want to use
>>>>>>> __riscv_hwprobe() in ifunc selectors, this series also sends a pointer
>>>>>>> to the riscv_hwprobe() function in as the second argument to ifunc
>>>>>>> selectors. A new inline convenience function can help application and
>>>>>>> library callers to check for validity and quickly probe a single key.
>>>>>> 
>>>>>> I still think we should address Jessica Clarke remarks for the ifunc ABI [1].
>>>>>> I recall that Florian has tried to address the ifunc ordering and that
>>>>>> Jessica proposed solutions was not fully sufficient to address all the
>>>>>> ifunc corner cases.
>>>>>> 
>>>>>> [1] https://sourceware.org/pipermail/libc-alpha/2024-January/154082.html
>>>>> 
>>>>> I haven't invested the time yet in studying the resolver to understand
>>>>> how feasible Jessica's suggestion is. I was sort of hoping Florian
>>>>> would chime in with an "oh yeah let's do that" or "no, it doesn't
>>>>> work". I suppose I still am :)
>>>> 
>>>> The discussion's over here: https://inbox.sourceware.org/libc-alpha/20231017044641.pw2ccr6exvhtmhkk@google.com/
>>>> I was inclined to just ignore it: Florian explained what's going on with the multiple libraries constraint, so we're kind of just going around in circles at that point.
>>>> 
>>>> I'm also not sure the argument makes a whole lot of sense in the first place.  The core of the argument seems to be around binary compatibility with other kernels/libcs,
>>> 
>>> Not binary compatibility, source compatibility.
>> 
>> oh, yeah, obviously on the Android side i'm only talking about API,
>> not ABI. but i think Android-glibc source compatibility is more
>> meaningful -- by virtue of them both using Linux -- than anything
>> else.
>> 
>>> Yes, it’s part of that
>>> platform’s ABI, but it leaks into the source code by means of the
>>> resolver’s type and arguments. I would much rather see a standard IFUNC
>>> resolver interface that can be specified in the psABI as a
>>> cross-platform standard for any ELF-using OS that implements IFUNCs,
>>> since the moment glibc adopts this as its implementation of choice
>>> there’s no going back (without ugly “ignore that magic -1 argument,
>>> it’s there for historical reasons” kinds of situations). Adopting the
>>> stronger guarantees provided by FreeBSD’s rtld is one way to achieve
>>> that, and seems the simplest to me, but I’m not wedded to that, other
>>> suggestions that achieve the same end goal are just as welcome. Of
>>> course glibc is free to ignore me, too, with a “not our problem, deal
>>> with it, we’re the dominant OS”, but that would seem sad.
>> 
>> _not_ passing the function pointer to the ifunc resolver only
>> addresses source compatibility as far as the function _signature_
>> goes, but does nothing for the _body_ of the function unless BSD
>> copies the Linux __riscv_hwprobe() syscall exactly. and tracks every
>> change after that. that seems unlikely?
>
> Yes, nobody’s going to copy that, but that doesn’t mean it’s going to

Why not?  It's about the simplest interface that's extensible, 
essentially just a list of key-value pairs and some scaffolding to get 
them past a protection domain.

> forever be the only interface that you could use. One function existing
> doesn’t stop another from doing so, but one argument being used for one
> purpose does stop it being used for another. That is, in the proposed
> world where you just call __riscv_hwprobe directly and it works, there
> is a path towards having a cross-platform solution, namely defining a
> simplified, OS-independent interface for querying what extensions you
> have (which won’t cover all use cases, and doesn’t need to, so long as
> it can deal with the ones that IFUNCs typically care about, i.e. simple
> questions like “do I have V/Zb*/etc?”), which could just be a simple
> wrapper around __riscv_hwprobe on Linux. But if you pass
> __riscv_hwprobe as a function pointer as the solution to the problem,
> you’re tied to the Linux interface, and specifically the
> __riscv_hwprobe one at that, unless you want to add an argument every
> time you come up with an interface that software might want to use in a
> resolver for querying RISC-V information. I’m trying to look at the
> bigger picture here rather than the very narrow view of “what’s the
> easiest path to being able to write an IFUNCed memcpy in glibc for
> Linux”.

I think this is just going around in circles, it's basically the same 
discussion we had when adding hwprobe to Linux.  We have a real concrete 
solution to real concrete problems, progress has been blocked for years 
without any concrete proposal aside from just doing something different.

So like I said in some other thread, I'm inclined to just merge this.  
There's some feedback from Florian that looks pretty easy to resolve, 
and looks like Evan should have a new version soon.

>> plus, like i keep saying --- iOS and macOS don't have ifuncs (and i
>> don't think Windows does either?), so for any code that actually cares
>> about portability, handling your own function pointer is the only
>> truly portable game in town, and all this fussing about a niche
>> alternative is just noise.
>
> Firstly, none of those support RISC-V. Secondly, just because some OSes
> are more different than others doesn’t mean we shouldn’t aim for as
> cross-platform interfaces as possible. Otherwise why are we bothering
> to have a RISC-V psABI at all rather than just letting glibc define
> whatever it wants?

We have already defacto forked from the psABI pretty much everywhere due 
to all the unresolved issues, we just try to keep things compatible in 
practice -- or at least where we can, some of it just gets ignored.

>> (and we see that in practice, not just theory: open source libraries
>> have voted with their feet, and nothing we do here will affect that.)
>> 
>> i don't see that anyone actually gains anything from your proposal,
>> whereas the "pass the function pointer to the ifunc resolver"
>> workaround addresses actual problems with currently-shipping systems
>> in a way that doesn't require large and complex behavior changes (for
>> no gain _in the context of ifuncs_).
>
> Well, it’s a general solution to the specific problem seen here, which
> arguably makes the glibc implementation of IFUNCs on RISC-V less
> special, as you can just call the normal __riscv_hwprobe if you want to.
> The gain is generality, which includes, but isn’t limited to, being
> cross-platform.

I guess it's kind of hard to evaluate hypothetical interface proposals, 
but it's not clear how merging this prevents a new interface from being 
added.  Unless I'm missing something we've already got a NULL at the end 
of the argument list, so we should be safe to extend it.  If users of 
this new interface can also call hwprobe then we'll need to maintain it 
forever, having an argument seems like a pretty straight-forward 
solution.

> Jess
>
>>> Jess
>>> 
>>>> but there's a ton of reasons why binaries aren't compatible between those systems.  So if glibc just has a more complex IFUNC resolving scheme and that ends up requiring the extra agrument, then users of glibc have to go deal with that -- other systems/libcs might make different decisions, but that's just how these things go.
>>>> 
>>>> Looks like Maskray is proposing making glibc's IFUNC resolving.  That's a generic glibc decision and I don't really understand this stuff well enough to have much of an opinion -- sure maybe the multi-library IFUNC resolving is overkill, but aside from doing some similar grep of all the sources (or trying some builds and such) I'm not sure how to tell if we're safe breaking the ABI there.
>>>> 
>>>> That said, we're not comitting to ABI stability until the release.  So if we decide to make a generic glibc change we can always go drop the argument, that should be easy.  Maybe we even just throw out a RFC patch to do it, unless I'm missing something the code is the easy part here (we're essentially just going back to one of the early versions, from before we knew about this resolving order complexity).
>>>> 
>>>>> Alternatively, patches 1-3 of this series stand on their own. If the
>>>>> ifunc aspect of this is gated on me doing a bunch of research, it
>>>>> might at least make sense to land the first half now, to get Linux
>>>>> users easy access to the __riscv_hwprobe() syscall and vDSO.
>>>> 
>>>> I'm OK with that, too.
>>>> 
>>>>> 
>>>>> -Evan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 2/7] linux: Introduce INTERNAL_VSYSCALL
  2024-02-16  7:44   ` Florian Weimer
@ 2024-02-23 23:12     ` Evan Green
  2024-02-23 23:29       ` Gabriel Ravier
  2024-02-24 11:40       ` Florian Weimer
  0 siblings, 2 replies; 31+ messages in thread
From: Evan Green @ 2024-02-23 23:12 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, vineetg, slewis, palmer

On Thu, Feb 15, 2024 at 11:44 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Evan Green:
>
> > Add an INTERNAL_VSYSCALL() macro that makes a vDSO call, falling back to
> > a regular syscall, but without setting errno. Instead, the return value
> > is plumbed straight out of the macro.
> >
> > Signed-off-by: Evan Green <evan@rivosinc.com>
> > ---
> >
> > (no changes since v10)
> >
> > Changes in v10:
> >  - Introduced INTERNAL_VSYSCALL patch
> >
> >  sysdeps/unix/sysv/linux/sysdep-vdso.h | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/sysdeps/unix/sysv/linux/sysdep-vdso.h b/sysdeps/unix/sysv/linux/sysdep-vdso.h
> > index 189319ad98..f7d4b29b25 100644
> > --- a/sysdeps/unix/sysv/linux/sysdep-vdso.h
> > +++ b/sysdeps/unix/sysv/linux/sysdep-vdso.h
> > @@ -53,4 +53,23 @@
> >      sc_ret;                                                                \
> >    })
> >
> > +#define INTERNAL_VSYSCALL(name, nr, args...)                               \
> > +  ({                                                                       \
> > +    __label__ out;                                                         \
> > +    long int sc_ret;                                                       \
> > +                                                                           \
> > +    __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);          \
> > +    if (vdsop != NULL)                                                             \
> > +      {                                                                            \
> > +     sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);                  \
> > +     if ((!INTERNAL_SYSCALL_ERROR_P (sc_ret)) ||                           \
> > +         (INTERNAL_SYSCALL_ERRNO (sc_ret) != ENOSYS))                      \
> > +       goto out;                                                           \
> > +      }                                                                            \
> > +                                                                           \
> > +    sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                         \
> > +  out:                                                                             \
> > +    sc_ret;                                                                \
> > +  })
> > +
> >  #endif /* SYSDEP_VDSO_LINUX_H  */
>
> I wonder if this is clearer without the label and with a conditional
> expression instead?

It's kinda awkward without the label because the regular syscall is
made either if vdsop is NULL or returns ENOSYS (which is two checks on
the return value, so doesn't lend itself to inlining in the if
statement). The best I could come up with without the label is to
duplicate the syscall_call:

#define INTERNAL_VSYSCALL(name, nr, args...)                      \
  ({                                          \
    long int sc_ret;                                  \
                                          \
    __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);          \
    if (vdsop != NULL)                                  \
      {                                          \
    sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);              \
    if ((INTERNAL_SYSCALL_ERROR_P (sc_ret)) &&                  \
        (INTERNAL_SYSCALL_ERRNO (sc_ret) == ENOSYS))              \
      sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);              \
      }                                          \
    else                                      \
      {                                          \
    sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                  \
      }                                          \
    sc_ret;                                      \
  })

I think I prefer the version with the label, but I'm not attuned to
the glibc style, so let me know if you like this and I'll send out a
new version.
-Evan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 2/7] linux: Introduce INTERNAL_VSYSCALL
  2024-02-23 23:12     ` Evan Green
@ 2024-02-23 23:29       ` Gabriel Ravier
  2024-02-24  2:06         ` Richard Henderson
  2024-02-24 11:40       ` Florian Weimer
  1 sibling, 1 reply; 31+ messages in thread
From: Gabriel Ravier @ 2024-02-23 23:29 UTC (permalink / raw)
  To: Evan Green, Florian Weimer; +Cc: libc-alpha, vineetg, slewis, palmer

On 2/23/24 23:12, Evan Green wrote:
> On Thu, Feb 15, 2024 at 11:44 PM Florian Weimer <fweimer@redhat.com> wrote:
>> * Evan Green:
>>
>>> Add an INTERNAL_VSYSCALL() macro that makes a vDSO call, falling back to
>>> a regular syscall, but without setting errno. Instead, the return value
>>> is plumbed straight out of the macro.
>>>
>>> Signed-off-by: Evan Green <evan@rivosinc.com>
>>> ---
>>>
>>> (no changes since v10)
>>>
>>> Changes in v10:
>>>   - Introduced INTERNAL_VSYSCALL patch
>>>
>>>   sysdeps/unix/sysv/linux/sysdep-vdso.h | 19 +++++++++++++++++++
>>>   1 file changed, 19 insertions(+)
>>>
>>> diff --git a/sysdeps/unix/sysv/linux/sysdep-vdso.h b/sysdeps/unix/sysv/linux/sysdep-vdso.h
>>> index 189319ad98..f7d4b29b25 100644
>>> --- a/sysdeps/unix/sysv/linux/sysdep-vdso.h
>>> +++ b/sysdeps/unix/sysv/linux/sysdep-vdso.h
>>> @@ -53,4 +53,23 @@
>>>       sc_ret;                                                                \
>>>     })
>>>
>>> +#define INTERNAL_VSYSCALL(name, nr, args...)                               \
>>> +  ({                                                                       \
>>> +    __label__ out;                                                         \
>>> +    long int sc_ret;                                                       \
>>> +                                                                           \
>>> +    __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);          \
>>> +    if (vdsop != NULL)                                                             \
>>> +      {                                                                            \
>>> +     sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);                  \
>>> +     if ((!INTERNAL_SYSCALL_ERROR_P (sc_ret)) ||                           \
>>> +         (INTERNAL_SYSCALL_ERRNO (sc_ret) != ENOSYS))                      \
>>> +       goto out;                                                           \
>>> +      }                                                                            \
>>> +                                                                           \
>>> +    sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                         \
>>> +  out:                                                                             \
>>> +    sc_ret;                                                                \
>>> +  })
>>> +
>>>   #endif /* SYSDEP_VDSO_LINUX_H  */
>> I wonder if this is clearer without the label and with a conditional
>> expression instead?
> It's kinda awkward without the label because the regular syscall is
> made either if vdsop is NULL or returns ENOSYS (which is two checks on
> the return value, so doesn't lend itself to inlining in the if
> statement). The best I could come up with without the label is to
> duplicate the syscall_call:
>
> #define INTERNAL_VSYSCALL(name, nr, args...)                      \
>    ({                                          \
>      long int sc_ret;                                  \
>                                            \
>      __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);          \
>      if (vdsop != NULL)                                  \
>        {                                          \
>      sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);              \
>      if ((INTERNAL_SYSCALL_ERROR_P (sc_ret)) &&                  \
>          (INTERNAL_SYSCALL_ERRNO (sc_ret) == ENOSYS))              \
>        sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);              \
>        }                                          \
>      else                                      \
>        {                                          \
>      sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                  \
>        }                                          \
>      sc_ret;                                      \
>    })
>
> I think I prefer the version with the label, but I'm not attuned to
> the glibc style, so let me know if you like this and I'll send out a
> new version.
> -Evan

Is GCC able to de-duplicate the duplicated syscall call ? The label 
version already seems readable enough in any case and if GCC is not able 
to de-duplicate the syscall it seems preferable from a performance/code 
size standpoint as well.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 2/7] linux: Introduce INTERNAL_VSYSCALL
  2024-02-23 23:29       ` Gabriel Ravier
@ 2024-02-24  2:06         ` Richard Henderson
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2024-02-24  2:06 UTC (permalink / raw)
  To: Gabriel Ravier, Evan Green, Florian Weimer
  Cc: libc-alpha, vineetg, slewis, palmer

On 2/23/24 13:29, Gabriel Ravier wrote:
> On 2/23/24 23:12, Evan Green wrote:
>> On Thu, Feb 15, 2024 at 11:44 PM Florian Weimer <fweimer@redhat.com> wrote:
>>> * Evan Green:
>>>
>>>> Add an INTERNAL_VSYSCALL() macro that makes a vDSO call, falling back to
>>>> a regular syscall, but without setting errno. Instead, the return value
>>>> is plumbed straight out of the macro.
>>>>
>>>> Signed-off-by: Evan Green <evan@rivosinc.com>
>>>> ---
>>>>
>>>> (no changes since v10)
>>>>
>>>> Changes in v10:
>>>>   - Introduced INTERNAL_VSYSCALL patch
>>>>
>>>>   sysdeps/unix/sysv/linux/sysdep-vdso.h | 19 +++++++++++++++++++
>>>>   1 file changed, 19 insertions(+)
>>>>
>>>> diff --git a/sysdeps/unix/sysv/linux/sysdep-vdso.h 
>>>> b/sysdeps/unix/sysv/linux/sysdep-vdso.h
>>>> index 189319ad98..f7d4b29b25 100644
>>>> --- a/sysdeps/unix/sysv/linux/sysdep-vdso.h
>>>> +++ b/sysdeps/unix/sysv/linux/sysdep-vdso.h
>>>> @@ -53,4 +53,23 @@
>>>>       sc_ret;                                                                \
>>>>     })
>>>>
>>>> +#define INTERNAL_VSYSCALL(name, nr, args...)                               \
>>>> +  ({                                                                       \
>>>> +    __label__ out;                                                         \
>>>> +    long int sc_ret;                                                       \
>>>> +                                                                           \
>>>> +    __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);          \
>>>> +    if (vdsop != NULL)                                                             \
>>>> +      {                                                                            \
>>>> +     sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);                  \
>>>> +     if ((!INTERNAL_SYSCALL_ERROR_P (sc_ret)) ||                           \
>>>> +         (INTERNAL_SYSCALL_ERRNO (sc_ret) != ENOSYS))                      \
>>>> +       goto out;                                                           \
>>>> +      }                                                                            \
>>>> +                                                                           \
>>>> +    sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                         \
>>>> +  out:                                                                             \
>>>> +    sc_ret;                                                                \
>>>> +  })
>>>> +
>>>>   #endif /* SYSDEP_VDSO_LINUX_H  */
>>> I wonder if this is clearer without the label and with a conditional
>>> expression instead?
>> It's kinda awkward without the label because the regular syscall is
>> made either if vdsop is NULL or returns ENOSYS (which is two checks on
>> the return value, so doesn't lend itself to inlining in the if
>> statement). The best I could come up with without the label is to
>> duplicate the syscall_call:
>>
>> #define INTERNAL_VSYSCALL(name, nr, args...)                      \
>>    ({                                          \
>>      long int sc_ret;                                  \
>>                                            \
>>      __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);          \
>>      if (vdsop != NULL)                                  \
>>        {                                          \
>>      sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);              \
>>      if ((INTERNAL_SYSCALL_ERROR_P (sc_ret)) &&                  \
>>          (INTERNAL_SYSCALL_ERRNO (sc_ret) == ENOSYS))              \
>>        sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);              \
>>        }                                          \
>>      else                                      \
>>        {                                          \
>>      sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                  \
>>        }                                          \
>>      sc_ret;                                      \
>>    })
>>
>> I think I prefer the version with the label, but I'm not attuned to
>> the glibc style, so let me know if you like this and I'll send out a
>> new version.
>> -Evan
> 
> Is GCC able to de-duplicate the duplicated syscall call ? The label version already seems 
> readable enough in any case and if GCC is not able to de-duplicate the syscall it seems 
> preferable from a performance/code size standpoint as well.

Or a switch?

   switch (vdsop != NULL)
     {
     case 1:
       sc_ret = INTERNAL_VSYSCALL_CALL(...);
       if (!error_p (sc_ret) || errno (sc_ret) != ENOSYS)
         break;
       /* FALLTHRU */
     default:
       sc_ret = INTERVAL_SYSCALL_CALL(...);
       break;
     }

I haven't actually looked at the generated code, but I'd think it would be identical to 
the gotos.


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 2/7] linux: Introduce INTERNAL_VSYSCALL
  2024-02-23 23:12     ` Evan Green
  2024-02-23 23:29       ` Gabriel Ravier
@ 2024-02-24 11:40       ` Florian Weimer
  2024-02-26 16:47         ` Evan Green
  1 sibling, 1 reply; 31+ messages in thread
From: Florian Weimer @ 2024-02-24 11:40 UTC (permalink / raw)
  To: Evan Green; +Cc: Florian Weimer, libc-alpha, vineetg, slewis, palmer

* Evan Green:

> It's kinda awkward without the label because the regular syscall is
> made either if vdsop is NULL or returns ENOSYS (which is two checks on
> the return value, so doesn't lend itself to inlining in the if
> statement). The best I could come up with without the label is to
> duplicate the syscall_call:
>
> #define INTERNAL_VSYSCALL(name, nr, args...)                      \
>   ({                                          \
>     long int sc_ret;                                  \
>                                           \
>     __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);          \
>     if (vdsop != NULL)                                  \
>       {                                          \
>     sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);              \
>     if ((INTERNAL_SYSCALL_ERROR_P (sc_ret)) &&                  \
>         (INTERNAL_SYSCALL_ERRNO (sc_ret) == ENOSYS))              \
>       sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);              \
>       }                                          \
>     else                                      \
>       {                                          \
>     sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                  \
>       }                                          \
>     sc_ret;                                      \
>   })

Indentation is off for me, for clarification:

#define INTERNAL_VSYSCALL(name, nr, args...)                            \
  ({                                                                    \
    long int sc_ret;                                                    \
                                                                        \
    __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);       \
    if (vdsop != NULL)                                                  \
      {                                                                 \
        sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);            \
        if ((INTERNAL_SYSCALL_ERROR_P (sc_ret)) &&                      \
            (INTERNAL_SYSCALL_ERRNO (sc_ret) == ENOSYS))                \
          sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                \
      }                                                                 \
    else                                                                \
      {                                                                 \
        sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                  \
      }                                                                 \
    sc_ret;                                                             \
  })

?: does not work because the comparison with -ENOSYS discards the
return value.

Does this look correct?

#define INTERNAL_VSYSCALL(name, nr, args...)                            \
  ({                                                                    \
    long int sc_ret = -ENOSYS;                                          \
    __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);       \
    if (vdsop != NULL)                                                  \
      sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);              \
    if (sc_ret == -ENOSYS)                                              \
      sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                    \
    sc_ret;                                                             \
  })

I expect GCC to generate decent code for it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 2/7] linux: Introduce INTERNAL_VSYSCALL
  2024-02-24 11:40       ` Florian Weimer
@ 2024-02-26 16:47         ` Evan Green
  2024-02-26 17:07           ` Florian Weimer
  0 siblings, 1 reply; 31+ messages in thread
From: Evan Green @ 2024-02-26 16:47 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Florian Weimer, libc-alpha, vineetg, slewis, palmer

On Sat, Feb 24, 2024 at 3:40 AM Florian Weimer <fw@deneb.enyo.de> wrote:
>
> * Evan Green:
>
> > It's kinda awkward without the label because the regular syscall is
> > made either if vdsop is NULL or returns ENOSYS (which is two checks on
> > the return value, so doesn't lend itself to inlining in the if
> > statement). The best I could come up with without the label is to
> > duplicate the syscall_call:
> >
> > #define INTERNAL_VSYSCALL(name, nr, args...)                      \
> >   ({                                          \
> >     long int sc_ret;                                  \
> >                                           \
> >     __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);          \
> >     if (vdsop != NULL)                                  \
> >       {                                          \
> >     sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);              \
> >     if ((INTERNAL_SYSCALL_ERROR_P (sc_ret)) &&                  \
> >         (INTERNAL_SYSCALL_ERRNO (sc_ret) == ENOSYS))              \
> >       sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);              \
> >       }                                          \
> >     else                                      \
> >       {                                          \
> >     sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                  \
> >       }                                          \
> >     sc_ret;                                      \
> >   })
>
> Indentation is off for me, for clarification:

Yeah sorry I just pasted it into the gmail client, which mangles everything.

>
> #define INTERNAL_VSYSCALL(name, nr, args...)                            \
>   ({                                                                    \
>     long int sc_ret;                                                    \
>                                                                         \
>     __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);       \
>     if (vdsop != NULL)                                                  \
>       {                                                                 \
>         sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);            \
>         if ((INTERNAL_SYSCALL_ERROR_P (sc_ret)) &&                      \
>             (INTERNAL_SYSCALL_ERRNO (sc_ret) == ENOSYS))                \
>           sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                \
>       }                                                                 \
>     else                                                                \
>       {                                                                 \
>         sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                  \
>       }                                                                 \
>     sc_ret;                                                             \
>   })
>
> ?: does not work because the comparison with -ENOSYS discards the
> return value.
>
> Does this look correct?
>
> #define INTERNAL_VSYSCALL(name, nr, args...)                            \
>   ({                                                                    \
>     long int sc_ret = -ENOSYS;                                          \
>     __typeof (GLRO(dl_vdso_##name)) vdsop = GLRO(dl_vdso_##name);       \
>     if (vdsop != NULL)                                                  \
>       sc_ret = INTERNAL_VSYSCALL_CALL (vdsop, nr, ##args);              \
>     if (sc_ret == -ENOSYS)                                              \
>       sc_ret = INTERNAL_SYSCALL_CALL (name, ##args);                    \
>     sc_ret;                                                             \
>   })
>
> I expect GCC to generate decent code for it.

I like it! I'm guessing I should continue to use the
INTERNAL_SYSCALL_ERRNO_P() and INTERNAL_SYSCALL_ERRNO(), and you
omitted it for example's sake. Will plan to spin with this.
-Evan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 2/7] linux: Introduce INTERNAL_VSYSCALL
  2024-02-26 16:47         ` Evan Green
@ 2024-02-26 17:07           ` Florian Weimer
  2024-02-26 17:57             ` Evan Green
  0 siblings, 1 reply; 31+ messages in thread
From: Florian Weimer @ 2024-02-26 17:07 UTC (permalink / raw)
  To: Evan Green; +Cc: Florian Weimer, libc-alpha, vineetg, slewis, palmer

* Evan Green:

> I like it! I'm guessing I should continue to use the
> INTERNAL_SYSCALL_ERRNO_P() and INTERNAL_SYSCALL_ERRNO(), and you
> omitted it for example's sake. Will plan to spin with this.

We already hard-code the Linux convention in a few places, e.g.
sysdeps/unix/sysv/linux/fstatat64.c,
sysdeps/unix/sysv/linux/clock_gettime.c.  I don't think the macros add
value here.  We also don't have a macro to inject the error code, I
believe, so one -ENOSYS will stick around.  But I don't feel strongly
about it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v12 2/7] linux: Introduce INTERNAL_VSYSCALL
  2024-02-26 17:07           ` Florian Weimer
@ 2024-02-26 17:57             ` Evan Green
  0 siblings, 0 replies; 31+ messages in thread
From: Evan Green @ 2024-02-26 17:57 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Florian Weimer, libc-alpha, vineetg, slewis, palmer

On Mon, Feb 26, 2024 at 9:07 AM Florian Weimer <fw@deneb.enyo.de> wrote:
>
> * Evan Green:
>
> > I like it! I'm guessing I should continue to use the
> > INTERNAL_SYSCALL_ERRNO_P() and INTERNAL_SYSCALL_ERRNO(), and you
> > omitted it for example's sake. Will plan to spin with this.
>
> We already hard-code the Linux convention in a few places, e.g.
> sysdeps/unix/sysv/linux/fstatat64.c,
> sysdeps/unix/sysv/linux/clock_gettime.c.  I don't think the macros add
> value here.  We also don't have a macro to inject the error code, I
> believe, so one -ENOSYS will stick around.  But I don't feel strongly
> about it.

Got it, will spin with this as-is then. Thanks for the clarification.
-Evan

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-02-26 17:58 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-14 14:31 [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Evan Green
2024-02-14 14:31 ` [PATCH v12 1/7] riscv: Add Linux hwprobe syscall support Evan Green
2024-02-14 14:31 ` [PATCH v12 2/7] linux: Introduce INTERNAL_VSYSCALL Evan Green
2024-02-16  7:44   ` Florian Weimer
2024-02-23 23:12     ` Evan Green
2024-02-23 23:29       ` Gabriel Ravier
2024-02-24  2:06         ` Richard Henderson
2024-02-24 11:40       ` Florian Weimer
2024-02-26 16:47         ` Evan Green
2024-02-26 17:07           ` Florian Weimer
2024-02-26 17:57             ` Evan Green
2024-02-14 14:31 ` [PATCH v12 3/7] riscv: Add hwprobe vdso call support Evan Green
2024-02-14 14:31 ` [PATCH v12 4/7] riscv: Add __riscv_hwprobe pointer to ifunc calls Evan Green
2024-02-14 14:31 ` [PATCH v12 5/7] riscv: Enable multi-arg ifunc resolvers Evan Green
2024-02-16  7:45   ` Florian Weimer
2024-02-14 14:31 ` [PATCH v12 6/7] riscv: Add ifunc helper method to hwprobe.h Evan Green
2024-02-14 14:31 ` [PATCH v12 7/7] riscv: Add and use alignment-ignorant memcpy Evan Green
2024-02-14 15:16 ` [PATCH v12 0/7] RISC-V: ifunced memcpy using new kernel hwprobe interface Adhemerval Zanella Netto
2024-02-14 15:24   ` Andreas Schwab
2024-02-22 18:44     ` Palmer Dabbelt
2024-02-15 15:48   ` Evan Green
2024-02-15 15:57     ` enh
2024-02-15 16:50     ` Palmer Dabbelt
2024-02-15 17:00       ` enh
2024-02-15 17:22         ` Palmer Dabbelt
2024-02-15 18:45           ` enh
2024-02-15 18:56             ` Palmer Dabbelt
2024-02-15 17:42       ` Jessica Clarke
2024-02-15 18:52         ` enh
2024-02-15 19:09           ` Jessica Clarke
2024-02-22 19:41             ` Palmer Dabbelt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).