public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/19] riscv: ifunc support with optimized mem*/str*/cpu_relax routines
@ 2023-02-07  0:15 Christoph Muellner
  2023-02-07  0:16 ` [RFC PATCH 01/19] Inhibit early libcalls before ifunc support is ready Christoph Muellner
                   ` (20 more replies)
  0 siblings, 21 replies; 42+ messages in thread
From: Christoph Muellner @ 2023-02-07  0:15 UTC (permalink / raw)
  To: libc-alpha, Palmer Dabbelt, Darius Rad, Andrew Waterman,
	DJ Delorie, Vineet Gupta, Kito Cheng, Jeff Law, Philipp Tomsich,
	Heiko Stuebner
  Cc: Christoph Müllner

From: Christoph Müllner <christoph.muellner@vrull.eu>

This RFC series introduces ifunc support for RISC-V and adds
optimized routines of memset(), memcpy()/memmove(), strlen(),
strcmp(), strncmp(), and cpu_relax().

The ifunc mechanism desides based on the following hart features:
- Available extensions
- Cache block size
- Fast unaligned accesses

Since we don't have an interface to get this information from the
kernel (at the moment), this patch uses environment variables instead,
which is also why this patch should not be considered for upstream
inclusion and is explicitly tagged as RFC.

The environment variables are:
- RISCV_RT_MARCH (e.g. "rv64gc_zicboz")
- RISCV_RT_CBOZ_BLOCKSIZE (e.g. "64")
- RISCV_RT_CBOM_BLOCKSIZE (e.g. "64")
- RISCV_RT_FAST_UNALIGNED (e.g. "1")

The environment variables are looked up and parsed early during
startup, where other architectures query similar properties from
the kernel or the CPU.
The ifunc implementation can use test macros to select a matching
implementation (e.g. HAVE_RV(zbb) or HAVE_FAST_UNALIGNED()).

The following optimized routines exist:
- memset
- memcpy/memmove
- strlen
- strcmp
- strncmp
- cpu_relax

The following optimizations have been applied:
- excessive loop unrolling
- Zbb's orc.b instruction
- Zbb's ctz intruction
- Zicboz/Zic64b ability to clear a cache block in memory
- Fast unaligned accesses (but with keeping exception guarantees intact)
- Fast overlapping accesses

The patch was developed more than a year ago and was tested as part
of a vendor SDK since then. One of the areas where this patchset
was used is benchmarking (e.g. SPEC CPU2017).
The optimized string functions have been tested with the glibc tests
for that purpose.

The first patch of the series does not strictly belong to this series,
but was required to build and test SPEC CPU2017 benchmarks.

To build a cross-toolchain that includes these patches,
the riscv-gnu-toolchain or any other cross-toolchain
builder can be used.

Christoph Müllner (19):
  Inhibit early libcalls before ifunc support is ready
  riscv: LEAF: Use C_LABEL() to construct the asm name for a C symbol
  riscv: Add ENTRY_ALIGN() macro
  riscv: Add hart feature run-time detection framework
  riscv: Introduction of ISA extensions
  riscv: Adding ISA string parser for environment variables
  riscv: hart-features: Add fast_unaligned property
  riscv: Add (empty) ifunc framework
  riscv: Add ifunc support for memset
  riscv: Add accelerated memset routines for RV64
  riscv: Add ifunc support for memcpy/memmove
  riscv: Add accelerated memcpy/memmove routines for RV64
  riscv: Add ifunc support for strlen
  riscv: Add accelerated strlen routine
  riscv: Add ifunc support for strcmp
  riscv: Add accelerated strcmp routines
  riscv: Add ifunc support for strncmp
  riscv: Add an optimized strncmp routine
  riscv: Add __riscv_cpu_relax() to allow yielding in busy loops

 csu/libc-start.c                              |   1 +
 elf/dl-support.c                              |   1 +
 sysdeps/riscv/dl-machine.h                    |  13 +
 sysdeps/riscv/ldsodefs.h                      |   1 +
 sysdeps/riscv/multiarch/Makefile              |  24 +
 sysdeps/riscv/multiarch/cpu_relax.c           |  36 ++
 sysdeps/riscv/multiarch/cpu_relax_impl.S      |  40 ++
 sysdeps/riscv/multiarch/ifunc-impl-list.c     |  70 +++
 sysdeps/riscv/multiarch/init-arch.h           |  24 +
 sysdeps/riscv/multiarch/memcpy.c              |  49 ++
 sysdeps/riscv/multiarch/memcpy_generic.c      |  32 ++
 .../riscv/multiarch/memcpy_rv64_unaligned.S   | 475 ++++++++++++++++++
 sysdeps/riscv/multiarch/memmove.c             |  49 ++
 sysdeps/riscv/multiarch/memmove_generic.c     |  32 ++
 sysdeps/riscv/multiarch/memset.c              |  52 ++
 sysdeps/riscv/multiarch/memset_generic.c      |  32 ++
 .../riscv/multiarch/memset_rv64_unaligned.S   |  31 ++
 .../multiarch/memset_rv64_unaligned_cboz64.S  | 217 ++++++++
 sysdeps/riscv/multiarch/strcmp.c              |  47 ++
 sysdeps/riscv/multiarch/strcmp_generic.c      |  32 ++
 sysdeps/riscv/multiarch/strcmp_zbb.S          | 104 ++++
 .../riscv/multiarch/strcmp_zbb_unaligned.S    | 213 ++++++++
 sysdeps/riscv/multiarch/strlen.c              |  44 ++
 sysdeps/riscv/multiarch/strlen_generic.c      |  32 ++
 sysdeps/riscv/multiarch/strlen_zbb.S          | 105 ++++
 sysdeps/riscv/multiarch/strncmp.c             |  44 ++
 sysdeps/riscv/multiarch/strncmp_generic.c     |  32 ++
 sysdeps/riscv/multiarch/strncmp_zbb.S         | 119 +++++
 sysdeps/riscv/sys/asm.h                       |  14 +-
 .../unix/sysv/linux/riscv/atomic-machine.h    |   3 +
 sysdeps/unix/sysv/linux/riscv/dl-procinfo.c   |  62 +++
 sysdeps/unix/sysv/linux/riscv/dl-procinfo.h   |  46 ++
 sysdeps/unix/sysv/linux/riscv/hart-features.c | 356 +++++++++++++
 sysdeps/unix/sysv/linux/riscv/hart-features.h |  58 +++
 .../unix/sysv/linux/riscv/isa-extensions.def  |  72 +++
 sysdeps/unix/sysv/linux/riscv/libc-start.c    |  29 ++
 .../unix/sysv/linux/riscv/macro-for-each.h    |  24 +
 37 files changed, 2610 insertions(+), 5 deletions(-)
 create mode 100644 sysdeps/riscv/multiarch/Makefile
 create mode 100644 sysdeps/riscv/multiarch/cpu_relax.c
 create mode 100644 sysdeps/riscv/multiarch/cpu_relax_impl.S
 create mode 100644 sysdeps/riscv/multiarch/ifunc-impl-list.c
 create mode 100644 sysdeps/riscv/multiarch/init-arch.h
 create mode 100644 sysdeps/riscv/multiarch/memcpy.c
 create mode 100644 sysdeps/riscv/multiarch/memcpy_generic.c
 create mode 100644 sysdeps/riscv/multiarch/memcpy_rv64_unaligned.S
 create mode 100644 sysdeps/riscv/multiarch/memmove.c
 create mode 100644 sysdeps/riscv/multiarch/memmove_generic.c
 create mode 100644 sysdeps/riscv/multiarch/memset.c
 create mode 100644 sysdeps/riscv/multiarch/memset_generic.c
 create mode 100644 sysdeps/riscv/multiarch/memset_rv64_unaligned.S
 create mode 100644 sysdeps/riscv/multiarch/memset_rv64_unaligned_cboz64.S
 create mode 100644 sysdeps/riscv/multiarch/strcmp.c
 create mode 100644 sysdeps/riscv/multiarch/strcmp_generic.c
 create mode 100644 sysdeps/riscv/multiarch/strcmp_zbb.S
 create mode 100644 sysdeps/riscv/multiarch/strcmp_zbb_unaligned.S
 create mode 100644 sysdeps/riscv/multiarch/strlen.c
 create mode 100644 sysdeps/riscv/multiarch/strlen_generic.c
 create mode 100644 sysdeps/riscv/multiarch/strlen_zbb.S
 create mode 100644 sysdeps/riscv/multiarch/strncmp.c
 create mode 100644 sysdeps/riscv/multiarch/strncmp_generic.c
 create mode 100644 sysdeps/riscv/multiarch/strncmp_zbb.S
 create mode 100644 sysdeps/unix/sysv/linux/riscv/dl-procinfo.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/dl-procinfo.h
 create mode 100644 sysdeps/unix/sysv/linux/riscv/hart-features.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/hart-features.h
 create mode 100644 sysdeps/unix/sysv/linux/riscv/isa-extensions.def
 create mode 100644 sysdeps/unix/sysv/linux/riscv/libc-start.c
 create mode 100644 sysdeps/unix/sysv/linux/riscv/macro-for-each.h

-- 
2.39.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2023-03-31 17:19 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-07  0:15 [RFC PATCH 00/19] riscv: ifunc support with optimized mem*/str*/cpu_relax routines Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 01/19] Inhibit early libcalls before ifunc support is ready Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 02/19] riscv: LEAF: Use C_LABEL() to construct the asm name for a C symbol Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 03/19] riscv: Add ENTRY_ALIGN() macro Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 04/19] riscv: Add hart feature run-time detection framework Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 05/19] riscv: Introduction of ISA extensions Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 06/19] riscv: Adding ISA string parser for environment variables Christoph Muellner
2023-02-07  6:20   ` David Abdurachmanov
2023-02-07  0:16 ` [RFC PATCH 07/19] riscv: hart-features: Add fast_unaligned property Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 08/19] riscv: Add (empty) ifunc framework Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 09/19] riscv: Add ifunc support for memset Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 10/19] riscv: Add accelerated memset routines for RV64 Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 11/19] riscv: Add ifunc support for memcpy/memmove Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 12/19] riscv: Add accelerated memcpy/memmove routines for RV64 Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 13/19] riscv: Add ifunc support for strlen Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 14/19] riscv: Add accelerated strlen routine Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 15/19] riscv: Add ifunc support for strcmp Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 16/19] riscv: Add accelerated strcmp routines Christoph Muellner
2023-02-07 11:57   ` Xi Ruoyao
2023-02-07 14:15     ` Christoph Müllner
2023-03-31  5:06       ` Jeff Law
2023-03-31 12:31         ` Adhemerval Zanella Netto
2023-03-31 14:30           ` Jeff Law
2023-03-31 14:48             ` Adhemerval Zanella Netto
2023-03-31 17:19               ` Palmer Dabbelt
2023-03-31 14:32       ` Jeff Law
2023-02-07  0:16 ` [RFC PATCH 17/19] riscv: Add ifunc support for strncmp Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 18/19] riscv: Add an optimized strncmp routine Christoph Muellner
2023-02-07  1:19   ` Noah Goldstein
2023-02-08 15:13     ` Philipp Tomsich
2023-02-08 17:55       ` Palmer Dabbelt
2023-02-08 19:48         ` Adhemerval Zanella Netto
2023-02-08 18:04       ` Noah Goldstein
2023-02-07  0:16 ` [RFC PATCH 19/19] riscv: Add __riscv_cpu_relax() to allow yielding in busy loops Christoph Muellner
2023-02-07  0:23   ` Andrew Waterman
2023-02-07  0:29     ` Christoph Müllner
2023-02-07  2:59 ` [RFC PATCH 00/19] riscv: ifunc support with optimized mem*/str*/cpu_relax routines Kito Cheng
2023-02-07 16:40 ` Adhemerval Zanella Netto
2023-02-07 17:16   ` DJ Delorie
2023-02-07 19:32     ` Philipp Tomsich
2023-02-07 21:14       ` DJ Delorie
2023-02-08 11:26         ` Christoph Müllner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).