public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] riscv: Vectorized mem*/str* function
@ 2023-05-04  7:48 Hau Hsu
  2023-05-04  7:48 ` [PATCH v3 1/5] riscv: Enabling vectorized mem*/str* functions in build time Hau Hsu
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Hau Hsu @ 2023-05-04  7:48 UTC (permalink / raw)
  To: libc-alpha
  Cc: hau.hsu, kito.cheng, nick.knight, jerry.shih, vincent.chen, hongrong.hsu

This is v3 patchset of adding vectorized mem*/str* functions for
RISC-V.

This patch proposes implementations of memchr, memcmp, memcpy, memmove,
memset, strcat, strchr, strcmp, strcpy, strlen, strncat, strncmp,
strncpy and strnlen that leverage the RISC-V V extension (RVV), version
1.0 (https://github.com/riscv/riscv-v-spec/releases/tag/v1.0). These
routines are from https://github.com/sifive/sifive-libc, which we agree
to be contributed to the Free Software Foundation. With regards to
IFUNC, some details concerning `hwcap` are still under discussion in the
community. For the purposes of reviewing this patch, we have temporarily
opted for RVV delegation at compile time. Once the `hwcap` mechanism is
ready, we’ll rebase on it.

These routines assume VLEN is at least 32 bits, as is required by all
currently defined vector extensions, and they support arbitrarily large
VLEN. All implementations work for both RV32 and RV64 platforms, and
make no assumptions about page size.

The `mem*` (known-length) routines use LMUL=8 to minimize dynamic code
size, while the `str*` (unknown-length) routines use LMUL=1 instead.
Longer LMUL will still minimize dynamic code size for the latter
routines, but it will also increase the cost of the remainder/tail loop:
more data loaded and comparisons performed past the `\0`. This overhead
will be particularly pronounced for smaller strings.

Measured performance improvements of the vectorized ("rvv")
implementations vs. the existing Glibc ("scalar") implementations are as
follows:
memchr: 85% time savings (i.e., if scalar is 100ms, then rvv is 15ms)
memcmp: 55%
memcpy: 88%
memmove: 80%
memset: 88%
strcmp: 85%
strlen: 70%
strcat: 53%
strchr: 85%
strcpy: 70%
strncmp 90%
strncat: 50%
strncpy: 60%
strnlen: 80%
Above data are collected on SiFive X280 (FPGA simulation), across a wide
range of problem sizes.


v1: https://sourceware.org/pipermail/libc-alpha/2023-March/145976.html
  * add RISC-V vectoriezed mem*/str* functions

v2: https://sourceware.org/pipermail/libc-alpha/2023-April/147519.html
  * include the __memcmpeq function
  * set lmul=1 for memcmp for generality

v3:
  * remove "Contributed by" comments
  * fix licesnce headers
  * avoid using camelcase variables
  * avoid using C99 one line comment

Jerry Shih (2):
  riscv: vectorized mem* functions
  riscv: vectorized str* functions

Nick Knight (1):
  riscv: vectorized strchr and strnlen functions

Vincent Chen (1):
  riscv: Enabling vectorized mem*/str* functions in build time

Yun Hsiang (1):
  riscv: add vectorized __memcmpeq

 scripts/build-many-glibcs.py   | 10 ++++
 sysdeps/riscv/preconfigure     | 19 ++++++++
 sysdeps/riscv/preconfigure.ac  | 18 +++++++
 sysdeps/riscv/rv32/rvv/Implies |  2 +
 sysdeps/riscv/rv64/rvv/Implies |  2 +
 sysdeps/riscv/rvv/memchr.S     | 62 ++++++++++++++++++++++++
 sysdeps/riscv/rvv/memcmp.S     | 70 +++++++++++++++++++++++++++
 sysdeps/riscv/rvv/memcmpeq.S   | 67 ++++++++++++++++++++++++++
 sysdeps/riscv/rvv/memcpy.S     | 50 +++++++++++++++++++
 sysdeps/riscv/rvv/memmove.S    | 71 +++++++++++++++++++++++++++
 sysdeps/riscv/rvv/memset.S     | 49 +++++++++++++++++++
 sysdeps/riscv/rvv/strcat.S     | 71 +++++++++++++++++++++++++++
 sysdeps/riscv/rvv/strchr.S     | 62 ++++++++++++++++++++++++
 sysdeps/riscv/rvv/strcmp.S     | 88 ++++++++++++++++++++++++++++++++++
 sysdeps/riscv/rvv/strcpy.S     | 55 +++++++++++++++++++++
 sysdeps/riscv/rvv/strlen.S     | 53 ++++++++++++++++++++
 sysdeps/riscv/rvv/strncat.S    | 82 +++++++++++++++++++++++++++++++
 sysdeps/riscv/rvv/strncmp.S    | 84 ++++++++++++++++++++++++++++++++
 sysdeps/riscv/rvv/strncpy.S    | 85 ++++++++++++++++++++++++++++++++
 sysdeps/riscv/rvv/strnlen.S    | 55 +++++++++++++++++++++
 20 files changed, 1055 insertions(+)
 create mode 100644 sysdeps/riscv/rv32/rvv/Implies
 create mode 100644 sysdeps/riscv/rv64/rvv/Implies
 create mode 100644 sysdeps/riscv/rvv/memchr.S
 create mode 100644 sysdeps/riscv/rvv/memcmp.S
 create mode 100644 sysdeps/riscv/rvv/memcmpeq.S
 create mode 100644 sysdeps/riscv/rvv/memcpy.S
 create mode 100644 sysdeps/riscv/rvv/memmove.S
 create mode 100644 sysdeps/riscv/rvv/memset.S
 create mode 100644 sysdeps/riscv/rvv/strcat.S
 create mode 100644 sysdeps/riscv/rvv/strchr.S
 create mode 100644 sysdeps/riscv/rvv/strcmp.S
 create mode 100644 sysdeps/riscv/rvv/strcpy.S
 create mode 100644 sysdeps/riscv/rvv/strlen.S
 create mode 100644 sysdeps/riscv/rvv/strncat.S
 create mode 100644 sysdeps/riscv/rvv/strncmp.S
 create mode 100644 sysdeps/riscv/rvv/strncpy.S
 create mode 100644 sysdeps/riscv/rvv/strnlen.S

-- 
2.38.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/5] riscv: Enabling vectorized mem*/str* functions in build time
  2023-05-04  7:48 [PATCH v3 0/5] riscv: Vectorized mem*/str* function Hau Hsu
@ 2023-05-04  7:48 ` Hau Hsu
  2023-05-04  7:48 ` [PATCH v3 2/5] riscv: vectorized mem* functions Hau Hsu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Hau Hsu @ 2023-05-04  7:48 UTC (permalink / raw)
  To: libc-alpha
  Cc: hau.hsu, kito.cheng, nick.knight, jerry.shih, vincent.chen, hongrong.hsu

From: Vincent Chen <vincent.chen@sifive.com>

Let the build selects the vectorized mem*/str* functions when it detects
the compiler supports RISC-V V extension and enables it in this build.

We agree that the these vectorized mem*/str* functions should be
selected by IFUNC. Therefore, this patch is intended as a
**temporary solution** to enable reviewers to evaluate the effectiveness
of these vectorized mem*/str* functions.
---
 scripts/build-many-glibcs.py   | 10 ++++++++++
 sysdeps/riscv/preconfigure     | 19 +++++++++++++++++++
 sysdeps/riscv/preconfigure.ac  | 18 ++++++++++++++++++
 sysdeps/riscv/rv32/rvv/Implies |  2 ++
 sysdeps/riscv/rv64/rvv/Implies |  2 ++
 5 files changed, 51 insertions(+)
 create mode 100644 sysdeps/riscv/rv32/rvv/Implies
 create mode 100644 sysdeps/riscv/rv64/rvv/Implies

diff --git a/scripts/build-many-glibcs.py b/scripts/build-many-glibcs.py
index 95726c4a29..98688d6665 100755
--- a/scripts/build-many-glibcs.py
+++ b/scripts/build-many-glibcs.py
@@ -381,6 +381,11 @@ class Context(object):
                         variant='rv32imafdc-ilp32d',
                         gcc_cfg=['--with-arch=rv32imafdc', '--with-abi=ilp32d',
                                  '--disable-multilib'])
+        self.add_config(arch='riscv32',
+                        os_name='linux-gnu',
+                        variant='rv32imafdcv-ilp32d',
+                        gcc_cfg=['--with-arch=rv32imafdcv', '--with-abi=ilp32d',
+                                 '--disable-multilib'])
         self.add_config(arch='riscv64',
                         os_name='linux-gnu',
                         variant='rv64imac-lp64',
@@ -396,6 +401,11 @@ class Context(object):
                         variant='rv64imafdc-lp64d',
                         gcc_cfg=['--with-arch=rv64imafdc', '--with-abi=lp64d',
                                  '--disable-multilib'])
+        self.add_config(arch='riscv64',
+                        os_name='linux-gnu',
+                        variant='rv64imafdcv-lp64d',
+                        gcc_cfg=['--with-arch=rv64imafdcv', '--with-abi=lp64d',
+                                 '--disable-multilib'])
         self.add_config(arch='s390x',
                         os_name='linux-gnu',
                         glibcs=[{},
diff --git a/sysdeps/riscv/preconfigure b/sysdeps/riscv/preconfigure
index 4dedf4b0bb..5ddc195b46 100644
--- a/sysdeps/riscv/preconfigure
+++ b/sysdeps/riscv/preconfigure
@@ -7,6 +7,7 @@ riscv*)
     flen=`$CC $CFLAGS $CPPFLAGS -E -dM -xc /dev/null | sed -n 's/^#define __riscv_flen \(.*\)/\1/p'`
     float_abi=`$CC $CFLAGS $CPPFLAGS -E -dM -xc /dev/null | sed -n 's/^#define __riscv_float_abi_\([^ ]*\) .*/\1/p'`
     atomic=`$CC $CFLAGS $CPPFLAGS -E -dM -xc /dev/null | grep '#define __riscv_atomic' | cut -d' ' -f2`
+    vector=`$CC $CFLAGS $CPPFLAGS -E -dM -xc /dev/null | grep '#define __riscv_vector' | cut -d' ' -f2`
 
     case "$xlen" in
     64 | 32)
@@ -32,6 +33,24 @@ riscv*)
 	;;
     esac
 
+    case "$vector" in
+    __riscv_vector)
+        case "$flen" in
+        64)
+        float_machine=rvv
+        ;;
+        *)
+        # V 1.0 spec requires both F and D extensions, but this may be an older version. Degrade to scalar only.
+        ;;
+        esac
+    ;;
+    *)
+    ;;
+    esac
+
+    { $as_echo "$as_me:${as_lineno-$LINENO}: vector $vector flen $flen float_machine $float_machine" >&5
+$as_echo "$as_me: vector $vector flen $flen float_machine $float_machine" >&6;}
+
     case "$float_abi" in
     soft)
 	abi_flen=0
diff --git a/sysdeps/riscv/preconfigure.ac b/sysdeps/riscv/preconfigure.ac
index a5c30e0dbf..b6b8bb46e4 100644
--- a/sysdeps/riscv/preconfigure.ac
+++ b/sysdeps/riscv/preconfigure.ac
@@ -7,6 +7,7 @@ riscv*)
     flen=`$CC $CFLAGS $CPPFLAGS -E -dM -xc /dev/null | sed -n 's/^#define __riscv_flen \(.*\)/\1/p'`
     float_abi=`$CC $CFLAGS $CPPFLAGS -E -dM -xc /dev/null | sed -n 's/^#define __riscv_float_abi_\([^ ]*\) .*/\1/p'`
     atomic=`$CC $CFLAGS $CPPFLAGS -E -dM -xc /dev/null | grep '#define __riscv_atomic' | cut -d' ' -f2`
+    vector=`$CC $CFLAGS $CPPFLAGS -E -dM -xc /dev/null | grep '#define __riscv_vector' | cut -d' ' -f2`
 
     case "$xlen" in
     64 | 32)
@@ -32,6 +33,23 @@ riscv*)
 	;;
     esac
 
+    case "$vector" in
+    __riscv_vector)
+        case "$flen" in
+        64)
+        float_machine=rvv
+        ;;
+        *)
+        # V 1.0 spec requires both F and D extensions, but this may be an older version. Degrade to scalar only.
+        ;;
+        esac
+    ;;
+    *)
+    ;;
+    esac
+
+    AC_MSG_NOTICE([vector $vector flen $flen float_machine $float_machine])
+
     case "$float_abi" in
     soft)
 	abi_flen=0
diff --git a/sysdeps/riscv/rv32/rvv/Implies b/sysdeps/riscv/rv32/rvv/Implies
new file mode 100644
index 0000000000..25ce1df222
--- /dev/null
+++ b/sysdeps/riscv/rv32/rvv/Implies
@@ -0,0 +1,2 @@
+riscv/rv32/rvd
+riscv/rvv
diff --git a/sysdeps/riscv/rv64/rvv/Implies b/sysdeps/riscv/rv64/rvv/Implies
new file mode 100644
index 0000000000..9993bb30e3
--- /dev/null
+++ b/sysdeps/riscv/rv64/rvv/Implies
@@ -0,0 +1,2 @@
+riscv/rv64/rvd
+riscv/rvv
-- 
2.38.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 2/5] riscv: vectorized mem* functions
  2023-05-04  7:48 [PATCH v3 0/5] riscv: Vectorized mem*/str* function Hau Hsu
  2023-05-04  7:48 ` [PATCH v3 1/5] riscv: Enabling vectorized mem*/str* functions in build time Hau Hsu
@ 2023-05-04  7:48 ` Hau Hsu
  2023-05-04  7:48 ` [PATCH v3 3/5] riscv: vectorized str* functions Hau Hsu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Hau Hsu @ 2023-05-04  7:48 UTC (permalink / raw)
  To: libc-alpha
  Cc: hau.hsu, kito.cheng, nick.knight, jerry.shih, vincent.chen, hongrong.hsu

From: Jerry Shih <jerry.shih@sifive.com>

This patch proposes implementations of memchr, memcmp, memcpy, memmove,
and memset that leverage the RISC-V V extension (RVV), version 1.0.
These routines assumes VLEN is at least 32 bits, as is required by all
currently defined vector extensions, and they support arbitrarily large
VLEN. All implementations work for both RV32 and RV64 platforms, and
make no assumptions about page size.
---
 sysdeps/riscv/rvv/memchr.S  | 62 +++++++++++++++++++++++++++++++
 sysdeps/riscv/rvv/memcmp.S  | 74 +++++++++++++++++++++++++++++++++++++
 sysdeps/riscv/rvv/memcpy.S  | 50 +++++++++++++++++++++++++
 sysdeps/riscv/rvv/memmove.S | 71 +++++++++++++++++++++++++++++++++++
 sysdeps/riscv/rvv/memset.S  | 49 ++++++++++++++++++++++++
 5 files changed, 306 insertions(+)
 create mode 100644 sysdeps/riscv/rvv/memchr.S
 create mode 100644 sysdeps/riscv/rvv/memcmp.S
 create mode 100644 sysdeps/riscv/rvv/memcpy.S
 create mode 100644 sysdeps/riscv/rvv/memmove.S
 create mode 100644 sysdeps/riscv/rvv/memset.S

diff --git a/sysdeps/riscv/rvv/memchr.S b/sysdeps/riscv/rvv/memchr.S
new file mode 100644
index 0000000000..a8273e9a55
--- /dev/null
+++ b/sysdeps/riscv/rvv/memchr.S
@@ -0,0 +1,62 @@
+/* RVV versions memchr.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define result a0
+
+#define src a0
+#define value a1
+#define num a2
+
+#define ivl a3
+#define temp a4
+
+#define ELEM_LMUL_SETTING m8
+#define vdata v0
+#define vmask v8
+
+ENTRY(memchr)
+
+L(loop):
+    vsetvli zero, num, e8, ELEM_LMUL_SETTING, ta, ma
+
+    vle8ff.v vdata, (src)
+    /* Find the value inside the loaded data.  */
+    vmseq.vx vmask, vdata, value
+    vfirst.m temp, vmask
+
+    /* Skip the loop if we find the matched value.  */
+    bgez temp, L(found)
+
+    csrr ivl, vl
+    sub num, num, ivl
+    add src, src, ivl
+
+    bnez num, L(loop)
+
+    li result, 0
+    ret
+
+L(found):
+    add result, src, temp
+    ret
+
+END(memchr)
+libc_hidden_builtin_def (memchr)
diff --git a/sysdeps/riscv/rvv/memcmp.S b/sysdeps/riscv/rvv/memcmp.S
new file mode 100644
index 0000000000..fbf81acc2f
--- /dev/null
+++ b/sysdeps/riscv/rvv/memcmp.S
@@ -0,0 +1,74 @@
+/* RVV versions memcmp.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define result a0
+
+#define src1 a0
+#define src2 a1
+#define num a2
+
+#define ivl a3
+#define temp a4
+#define temp1 a5
+#define temp2 a6
+
+#define ELEM_LMUL_SETTING m8
+#define vdata1 v0
+#define vdata2 v8
+#define vmask v16
+
+ENTRY(memcmp)
+
+L(loop):
+    vsetvli ivl, num, e8, ELEM_LMUL_SETTING, ta, ma
+
+    vle8.v vdata1, (src1)
+    vle8.v vdata2, (src2)
+
+    vmsne.vv vmask, vdata1, vdata2
+    sub num, num, ivl
+    vfirst.m temp, vmask
+
+    /* Skip the loop if we find the different value between src1 and src2.  */
+    bgez temp, L(found)
+
+    add src1, src1, ivl
+    add src2, src2, ivl
+
+    bnez num, L(loop)
+
+    li result, 0
+    ret
+
+L(found):
+    add src1, src1, temp
+    add src2, src2, temp
+    lbu temp1, 0(src1)
+    lbu temp2, 0(src2)
+    sub result, temp1, temp2
+    ret
+
+END(memcmp)
+libc_hidden_builtin_def (memcmp)
+weak_alias (memcmp,bcmp)
+strong_alias (memcmp, __memcmpeq)
+libc_hidden_def (__memcmpeq)
+
diff --git a/sysdeps/riscv/rvv/memcpy.S b/sysdeps/riscv/rvv/memcpy.S
new file mode 100644
index 0000000000..982c128370
--- /dev/null
+++ b/sysdeps/riscv/rvv/memcpy.S
@@ -0,0 +1,50 @@
+/* RVV versions memcpy.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define dst a0
+#define src a1
+#define num a2
+
+#define ivl a3
+#define dst_ptr a4
+
+#define ELEM_LMUL_SETTING m8
+#define vdata v0
+
+ENTRY(memcpy)
+
+    mv dst_ptr, dst
+
+L(loop):
+    vsetvli ivl, num, e8, ELEM_LMUL_SETTING, ta, ma
+
+    vle8.v vdata, (src)
+    sub num, num, ivl
+    add src, src, ivl
+    vse8.v vdata, (dst_ptr)
+    add dst_ptr, dst_ptr, ivl
+
+    bnez num, L(loop)
+
+    ret
+
+END(memcpy)
+libc_hidden_builtin_def (memcpy)
diff --git a/sysdeps/riscv/rvv/memmove.S b/sysdeps/riscv/rvv/memmove.S
new file mode 100644
index 0000000000..492c0b65f7
--- /dev/null
+++ b/sysdeps/riscv/rvv/memmove.S
@@ -0,0 +1,71 @@
+/* RVV versions memmove.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define dst a0
+#define src a1
+#define num a2
+
+#define ivl a3
+#define dst_ptr a4
+#define src_backward_ptr a5
+#define dst_backward_ptr a6
+
+#define ELEM_LMUL_SETTING m8
+#define vdata v0
+
+ENTRY(memmove)
+
+    mv dst_ptr, dst
+
+    /* If src is equal or after dst, all data in src will be loaded before
+       overwrited for the overlapping case. We could use faster `forward-copy`.  */
+    bgeu src, dst, L(forward_copy_loop)
+    add src_backward_ptr, src, num
+    add dst_backward_ptr, dst, num
+    /* If dst inside source data range, we need to use `backward_copy_loop` to
+       handle the overlapping issue.  */
+    bltu dst, src_backward_ptr, L(backward_copy_loop)
+
+L(forward_copy_loop):
+    vsetvli ivl, num, e8, ELEM_LMUL_SETTING, ta, ma
+
+    vle8.v vdata, (src)
+    sub num, num, ivl
+    add src, src, ivl
+    vse8.v vdata, (dst_ptr)
+    add dst_ptr, dst_ptr, ivl
+
+    bnez num, L(forward_copy_loop)
+    ret
+
+L(backward_copy_loop):
+    vsetvli ivl, num, e8, ELEM_LMUL_SETTING, ta, ma
+
+    sub src_backward_ptr, src_backward_ptr, ivl
+    vle8.v vdata, (src_backward_ptr)
+    sub num, num, ivl
+    sub dst_backward_ptr, dst_backward_ptr, ivl
+    vse8.v vdata, (dst_backward_ptr)
+    bnez num, L(backward_copy_loop)
+    ret
+
+END(memmove)
+libc_hidden_builtin_def (memmove)
diff --git a/sysdeps/riscv/rvv/memset.S b/sysdeps/riscv/rvv/memset.S
new file mode 100644
index 0000000000..ac3f88e492
--- /dev/null
+++ b/sysdeps/riscv/rvv/memset.S
@@ -0,0 +1,49 @@
+/* RVV versions memset.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define dst a0
+#define value a1
+#define num a2
+
+#define ivl a3
+#define dst_ptr a5
+
+#define ELEM_LMUL_SETTING m8
+#define vdata v0
+
+ENTRY(memset)
+
+    mv dst_ptr, dst
+
+    vsetvli ivl, num, e8, ELEM_LMUL_SETTING, ta, ma
+    vmv.v.x vdata, value
+
+L(loop):
+    vse8.v vdata, (dst_ptr)
+    sub num, num, ivl
+    add dst_ptr, dst_ptr, ivl
+    vsetvli ivl, num, e8, ELEM_LMUL_SETTING, ta, ma
+    bnez num, L(loop)
+
+    ret
+
+END(memset)
+libc_hidden_builtin_def (memset)
-- 
2.38.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 3/5] riscv: vectorized str* functions
  2023-05-04  7:48 [PATCH v3 0/5] riscv: Vectorized mem*/str* function Hau Hsu
  2023-05-04  7:48 ` [PATCH v3 1/5] riscv: Enabling vectorized mem*/str* functions in build time Hau Hsu
  2023-05-04  7:48 ` [PATCH v3 2/5] riscv: vectorized mem* functions Hau Hsu
@ 2023-05-04  7:48 ` Hau Hsu
  2023-05-04  7:48 ` [PATCH v3 4/5] riscv: vectorized strchr and strnlen functions Hau Hsu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Hau Hsu @ 2023-05-04  7:48 UTC (permalink / raw)
  To: libc-alpha
  Cc: hau.hsu, kito.cheng, nick.knight, jerry.shih, vincent.chen, hongrong.hsu

From: Jerry Shih <jerry.shih@sifive.com>

This patch proposes implementations of strcat, strcmp, strcpy, strlen,
strncat, strncmp and strncpy that leverage the RISC-V V extension (RVV),
version 1.0. These routines assumes VLEN is at least 32 bits, as is
required by all currently defined vector extensions, and they support
arbitrarily large VLEN. All implementations work for both RV32 and RV64
platforms, and make no assumptions about page size.
---
 sysdeps/riscv/rvv/strcat.S  | 71 ++++++++++++++++++++++++++++++
 sysdeps/riscv/rvv/strcmp.S  | 88 +++++++++++++++++++++++++++++++++++++
 sysdeps/riscv/rvv/strcpy.S  | 55 +++++++++++++++++++++++
 sysdeps/riscv/rvv/strlen.S  | 53 ++++++++++++++++++++++
 sysdeps/riscv/rvv/strncat.S | 82 ++++++++++++++++++++++++++++++++++
 sysdeps/riscv/rvv/strncmp.S | 84 +++++++++++++++++++++++++++++++++++
 sysdeps/riscv/rvv/strncpy.S | 85 +++++++++++++++++++++++++++++++++++
 7 files changed, 518 insertions(+)
 create mode 100644 sysdeps/riscv/rvv/strcat.S
 create mode 100644 sysdeps/riscv/rvv/strcmp.S
 create mode 100644 sysdeps/riscv/rvv/strcpy.S
 create mode 100644 sysdeps/riscv/rvv/strlen.S
 create mode 100644 sysdeps/riscv/rvv/strncat.S
 create mode 100644 sysdeps/riscv/rvv/strncmp.S
 create mode 100644 sysdeps/riscv/rvv/strncpy.S

diff --git a/sysdeps/riscv/rvv/strcat.S b/sysdeps/riscv/rvv/strcat.S
new file mode 100644
index 0000000000..fb5858fa82
--- /dev/null
+++ b/sysdeps/riscv/rvv/strcat.S
@@ -0,0 +1,71 @@
+/* RVV versions strcat.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define dst a0
+#define src a1
+#define dst_ptr a2
+
+#define ivl a3
+#define cur_vl a4
+#define active_elem_pos a5
+
+#define ELEM_LMUL_SETTING m1
+#define vmask1 v0
+#define vmask2 v1
+#define vstr1 v8
+#define vstr2 v16
+
+ENTRY(strcat)
+
+    mv dst_ptr, dst
+
+    /* Perform `strlen(dst)`.  */
+L(strlen_loop):
+    vsetvli ivl, zero, e8, ELEM_LMUL_SETTING, ta, ma
+
+    vle8ff.v vstr1, (dst_ptr)
+    vmseq.vx vmask1, vstr1, zero
+    csrr cur_vl, vl
+    vfirst.m active_elem_pos, vmask1
+    add dst_ptr, dst_ptr, cur_vl
+    bltz active_elem_pos, L(strlen_loop)
+
+    sub dst_ptr, dst_ptr, cur_vl
+    add dst_ptr, dst_ptr, active_elem_pos
+
+    /* Perform `strcpy(dst, src)`.  */
+L(strcpy_loop):
+    vsetvli ivl, zero, e8, ELEM_LMUL_SETTING, ta, ma
+
+    vle8ff.v vstr1, (src)
+    vmseq.vx vmask2, vstr1, zero
+    csrr cur_vl, vl
+    vfirst.m active_elem_pos, vmask2
+    vmsif.m vmask1, vmask2
+    add src, src, cur_vl
+    vse8.v vstr1, (dst_ptr), vmask1.t
+    add dst_ptr, dst_ptr, cur_vl
+    bltz active_elem_pos, L(strcpy_loop)
+
+    ret
+
+END(strcat)
+libc_hidden_builtin_def (strcat)
diff --git a/sysdeps/riscv/rvv/strcmp.S b/sysdeps/riscv/rvv/strcmp.S
new file mode 100644
index 0000000000..2e60d76dc8
--- /dev/null
+++ b/sysdeps/riscv/rvv/strcmp.S
@@ -0,0 +1,88 @@
+/* RVV versions strcmp.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define result a0
+
+#define str1 a0
+#define str2 a1
+
+#define ivl a2
+#define temp1 a3
+#define temp2 a4
+
+#define vstr1 v0
+#define vstr2 v8
+#define vmask1 v16
+#define vmask2 v17
+
+ENTRY(strcmp)
+    /* lmul=1 */
+
+L(Loop):
+    vsetvli ivl, zero, e8, m1, ta, ma
+    vle8ff.v vstr1, (str1)
+    /* check if vstr1[i] == 0 */
+    vmseq.vx vmask1, vstr1, zero
+
+    vle8ff.v vstr2, (str2)
+    /* check if vstr1[i] != vstr2[i] */
+    vmsne.vv vmask2, vstr1, vstr2
+
+    /* find the index x for vstr1[x]==0 */
+    vfirst.m temp1, vmask1
+    /* find the index x for vstr1[x]!=vstr2[x] */
+    vfirst.m temp2, vmask2
+
+    bgez temp1, L(check1)
+    bgez temp2, L(check2)
+
+    /* get the current vl updated by vle8ff. */
+    csrr ivl, vl
+    add str1, str1, ivl
+    add str2, str2, ivl
+    j L(Loop)
+
+    /* temp1>=0 */
+L(check1):
+    bltz temp2, 1f
+    blt temp2, temp1, L(check2)
+1:
+    /* temp2<0 */
+    /* temp2>=0 && temp1<temp2 */
+    add str1, str1, temp1
+    add str2, str2, temp1
+    lbu temp1, 0(str1)
+    lbu temp2, 0(str2)
+    sub result, temp1, temp2
+    ret
+
+    /* temp1<0 */
+    /* temp2>=0 */
+L(check2):
+    add str1, str1, temp2
+    add str2, str2, temp2
+    lbu temp1, 0(str1)
+    lbu temp2, 0(str2)
+    sub result, temp1, temp2
+    ret
+
+END(strcmp)
+libc_hidden_builtin_def (strcmp)
diff --git a/sysdeps/riscv/rvv/strcpy.S b/sysdeps/riscv/rvv/strcpy.S
new file mode 100644
index 0000000000..1ad433f5f3
--- /dev/null
+++ b/sysdeps/riscv/rvv/strcpy.S
@@ -0,0 +1,55 @@
+/* RVV versions strcpy.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define dst a0
+#define src a1
+#define dst_ptr a2
+
+#define ivl a3
+#define cur_vl a4
+#define active_elem_pos a5
+
+#define ELEM_LMUL_SETTING m1
+#define vmask1 v0
+#define vmask2 v1
+#define vstr1 v8
+#define vstr2 v16
+
+ENTRY(strcpy)
+
+    mv dst_ptr, dst
+
+L(strcpy_loop):
+    vsetvli ivl, zero, e8, ELEM_LMUL_SETTING, ta, ma
+    vle8ff.v vstr1, (src)
+    vmseq.vx vmask2, vstr1, zero
+    csrr cur_vl, vl
+    vfirst.m active_elem_pos, vmask2
+    vmsif.m vmask1, vmask2
+    add src, src, cur_vl
+    vse8.v vstr1, (dst_ptr), vmask1.t
+    add dst_ptr, dst_ptr, cur_vl
+    bltz active_elem_pos, L(strcpy_loop)
+
+    ret
+
+END(strcpy)
+libc_hidden_builtin_def (strcpy)
diff --git a/sysdeps/riscv/rvv/strlen.S b/sysdeps/riscv/rvv/strlen.S
new file mode 100644
index 0000000000..cf3698f52a
--- /dev/null
+++ b/sysdeps/riscv/rvv/strlen.S
@@ -0,0 +1,53 @@
+/* RVV versions strlen.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define result a0
+#define str a0
+#define copy_str a1
+#define ivl a2
+#define cur_vl a2
+#define end_offset a3
+
+#define ELEM_LMUL_SETTING m2
+#define vstr v0
+#define vmask_end v2
+
+ENTRY(strlen)
+
+    mv copy_str, str
+L(loop):
+    vsetvli ivl, zero, e8, ELEM_LMUL_SETTING, ta, ma
+    vle8ff.v vstr, (copy_str)
+    csrr cur_vl, vl
+    vmseq.vi vmask_end, vstr, 0
+    vfirst.m end_offset, vmask_end
+    add copy_str, copy_str, cur_vl
+    bltz end_offset, L(loop)
+
+    add str, str, cur_vl
+    add copy_str, copy_str, end_offset
+    sub result, copy_str, result
+
+    ret
+
+END(strlen)
+
+libc_hidden_builtin_def (strlen)
diff --git a/sysdeps/riscv/rvv/strncat.S b/sysdeps/riscv/rvv/strncat.S
new file mode 100644
index 0000000000..d30a6533a3
--- /dev/null
+++ b/sysdeps/riscv/rvv/strncat.S
@@ -0,0 +1,82 @@
+/* RVV versions strncat.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define dst a0
+#define src a1
+#define length a2
+#define dst_ptr a3
+
+#define ivl a4
+#define cur_vl a5
+#define activate_elem_pos a6
+
+#define ELEM_LMUL_SETTING m1
+#define vmask1 v0
+#define vmask2 v1
+#define vstr1 v8
+#define vstr2 v16
+
+ENTRY(strncat)
+
+    mv dst_ptr, dst
+
+    /* the strlen of dst.  */
+L(strlen_loop):
+    vsetvli ivl, zero, e8, ELEM_LMUL_SETTING, ta, ma
+
+    vle8ff.v vstr1, (dst_ptr)
+    /* find the '\0'.  */
+    vmseq.vx vmask1, vstr1, zero
+    csrr cur_vl, vl
+    vfirst.m activate_elem_pos, vmask1
+    add dst_ptr, dst_ptr, cur_vl
+    bltz activate_elem_pos, L(strlen_loop)
+
+    sub dst_ptr, dst_ptr, cur_vl
+    add dst_ptr, dst_ptr, activate_elem_pos
+
+    /* copy src to dst_ptr.  */
+L(strcpy_loop):
+    vsetvli zero, length, e8, ELEM_LMUL_SETTING, ta, ma
+
+    vle8ff.v vstr1, (src)
+    vmseq.vx vmask2, vstr1, zero
+    csrr cur_vl, vl
+    vfirst.m activate_elem_pos, vmask2
+    vmsif.m vmask1, vmask2
+    add src, src, cur_vl
+    sub length, length, cur_vl
+    vse8.v vstr1, (dst_ptr), vmask1.t
+    add dst_ptr, dst_ptr, cur_vl
+    beqz length, L(fill_zero)
+    bltz activate_elem_pos, L(strcpy_loop)
+
+    ret
+
+L(fill_zero):
+    bgez activate_elem_pos, L(fill_zero_end)
+    sb zero, (dst_ptr)
+
+L(fill_zero_end):
+    ret
+
+END(strncat)
+libc_hidden_builtin_def (strncat)
diff --git a/sysdeps/riscv/rvv/strncmp.S b/sysdeps/riscv/rvv/strncmp.S
new file mode 100644
index 0000000000..2b6ab1f233
--- /dev/null
+++ b/sysdeps/riscv/rvv/strncmp.S
@@ -0,0 +1,84 @@
+/* RVV versions strncmp.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http:/*www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define result a0
+
+#define str1 a0
+#define str2 a1
+#define length a2
+
+#define ivl a3
+#define temp1 a4
+#define temp2 a5
+
+#define ELEM_LMUL_SETTING m1
+#define vstr1 v0
+#define vstr2 v4
+#define vmask1 v8
+#define vmask2 v9
+
+ENTRY(strncmp)
+
+    beqz length, L(zero_length)
+
+L(loop):
+    vsetvli zero, length, e8, ELEM_LMUL_SETTING, ta, ma
+
+    vle8ff.v vstr1, (str1)
+    /* vstr1[i] == 0.  */
+    vmseq.vx vmask1, vstr1, zero
+
+    vle8ff.v vstr2, (str2)
+    /* vstr1[i] != vstr2[i].  */
+    vmsne.vv vmask2, vstr1, vstr2
+
+    csrr ivl, vl
+
+    /* r = mask1 | mask2
+       We could use vfirst.m to get the first zero char or the
+       first different char between str1 and str2.  */
+    vmor.mm vmask1, vmask1, vmask2
+
+    sub length, length, ivl
+
+    vfirst.m temp1, vmask1
+
+    bgez temp1, L(end_loop)
+
+    add str1, str1, ivl
+    add str2, str2, ivl
+    bnez length, L(loop)
+L(end_loop):
+
+    add str1, str1, temp1
+    add str2, str2, temp1
+    lbu temp1, 0(str1)
+    lbu temp2, 0(str2)
+
+    sub result, temp1, temp2
+    ret
+
+L(zero_length):
+    li result, 0
+    ret
+
+END(strncmp)
+libc_hidden_builtin_def (strncmp)
diff --git a/sysdeps/riscv/rvv/strncpy.S b/sysdeps/riscv/rvv/strncpy.S
new file mode 100644
index 0000000000..53fb8cdec7
--- /dev/null
+++ b/sysdeps/riscv/rvv/strncpy.S
@@ -0,0 +1,85 @@
+/* RVV versions strncpy.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define dst a0
+#define src a1
+#define length a2
+#define dst_ptr a3
+
+#define ivl a4
+#define cur_vl a5
+#define active_elem_pos a6
+#define temp a7
+
+#define ELEM_LMUL_SETTING m1
+#define vmask1 v0
+#define vmask2 v1
+#define ZERO_FILL_ELEM_LMUL_SETTING m8
+#define vstr1 v8
+#define vstr2 v16
+
+ENTRY(strncpy)
+
+    mv dst_ptr, dst
+
+    /* Copy src to dst_ptr.  */
+L(strcpy_loop):
+    vsetvli zero, length, e8, ELEM_LMUL_SETTING, ta, ma
+    vle8ff.v vstr1, (src)
+    vmseq.vx vmask2, vstr1, zero
+    csrr cur_vl, vl
+    vfirst.m active_elem_pos, vmask2
+    vmsif.m vmask1, vmask2
+    add src, src, cur_vl
+    sub length, length, cur_vl
+    vse8.v vstr1, (dst_ptr), vmask1.t
+    add dst_ptr, dst_ptr, cur_vl
+    bgez active_elem_pos, L(fill_zero)
+    bnez length, L(strcpy_loop)
+    ret
+
+    /* Fill the tail zero.  */
+L(fill_zero):
+    /* We already copy the `\0` to dst. But we use `vfirst.m` to
+       get the `index` of `\0` position. We need to adjust `-1`
+       to get the correct remaining length for zero filling.  */
+    sub temp, cur_vl, active_elem_pos
+    addi temp, temp, -1
+    add length, length, temp
+    /* Have an earily return for `strlen(src) + 1 == count` case.  */
+    bnez length, 1f
+    ret
+1:
+    sub dst_ptr, dst_ptr, temp
+    vsetvli zero, length, e8, ZERO_FILL_ELEM_LMUL_SETTING, ta, ma
+    vmv.v.x vstr2, zero
+
+L(fill_zero_loop):
+    vsetvli ivl, length, e8, ZERO_FILL_ELEM_LMUL_SETTING, ta, ma
+    vse8.v vstr2, (dst_ptr)
+    sub length, length, ivl
+    add dst_ptr, dst_ptr, ivl
+    bnez length, L(fill_zero_loop)
+
+    ret
+
+END(strncpy)
+libc_hidden_builtin_def (strncpy)
-- 
2.38.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 4/5] riscv: vectorized strchr and strnlen functions
  2023-05-04  7:48 [PATCH v3 0/5] riscv: Vectorized mem*/str* function Hau Hsu
                   ` (2 preceding siblings ...)
  2023-05-04  7:48 ` [PATCH v3 3/5] riscv: vectorized str* functions Hau Hsu
@ 2023-05-04  7:48 ` Hau Hsu
  2023-05-04  7:48 ` [PATCH v3 5/5] riscv: vectorized __memcmpeq function Hau Hsu
  2023-05-08 14:06 ` [PATCH v3 0/5] riscv: Vectorized mem*/str* function Palmer Dabbelt
  5 siblings, 0 replies; 9+ messages in thread
From: Hau Hsu @ 2023-05-04  7:48 UTC (permalink / raw)
  To: libc-alpha
  Cc: hau.hsu, kito.cheng, nick.knight, jerry.shih, vincent.chen, hongrong.hsu

From: Nick Knight <nick.knight@sifive.com>

This patch proposes implementations of strcat, strcmp, strcpy, strlen,
strncat, strncmp and strncpy that leverage the RISC-V V extension (RVV),
version 1.0. These routines assumes VLEN is at least 32 bits, as is
required by all currently defined vector extensions, and they support
arbitrarily large VLEN. All implementations work for both RV32 and RV64
platforms, and make no assumptions about page size.
---
 sysdeps/riscv/rvv/strchr.S  | 62 +++++++++++++++++++++++++++++++++++++
 sysdeps/riscv/rvv/strnlen.S | 55 ++++++++++++++++++++++++++++++++
 2 files changed, 117 insertions(+)
 create mode 100644 sysdeps/riscv/rvv/strchr.S
 create mode 100644 sysdeps/riscv/rvv/strnlen.S

diff --git a/sysdeps/riscv/rvv/strchr.S b/sysdeps/riscv/rvv/strchr.S
new file mode 100644
index 0000000000..053923d3d7
--- /dev/null
+++ b/sysdeps/riscv/rvv/strchr.S
@@ -0,0 +1,62 @@
+/* RVV versions strchr.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define str a0
+#define ch a1
+#define end_offset a2
+#define ch_offset a3
+#define temp1 a4
+#define temp2 a5
+#define cur_vl a6
+#define ivl t0
+
+#define ELEM_LMUL_SETTING m1
+#define vstr v0
+#define vmask_end v8
+#define vmask_ch v9
+
+ENTRY(strchr)
+
+L(strchr_loop):
+    vsetvli ivl, zero, e8, ELEM_LMUL_SETTING, ta, ma
+    vle8ff.v vstr, (str)
+    vmseq.vi vmask_end, vstr, 0
+    vmseq.vx vmask_ch, vstr, ch
+    vfirst.m end_offset, vmask_end /* first occurrence of \0 */
+    vfirst.m ch_offset, vmask_ch /* first occurrence of ch */
+    sltz temp1, ch_offset
+    sltu temp2, end_offset, ch_offset
+    or temp1, temp1, temp2
+    beqz temp1, L(found_ch) /* Found ch, not preceded by \0? */
+    csrr cur_vl, vl
+    add str, str, cur_vl
+    bltz end_offset, L(strchr_loop) /* Didn't find \0? */
+    li str, 0
+    ret
+L(found_ch):
+    add str, str, ch_offset
+    ret
+
+END(strchr)
+weak_alias (strchr, index)
+libc_hidden_builtin_def (strchr)
+
diff --git a/sysdeps/riscv/rvv/strnlen.S b/sysdeps/riscv/rvv/strnlen.S
new file mode 100644
index 0000000000..b902ae0fd4
--- /dev/null
+++ b/sysdeps/riscv/rvv/strnlen.S
@@ -0,0 +1,55 @@
+/* RVV versions strnlen.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+#define str a0
+#define copy_str a2
+#define ret_value a0
+#define max_len a1
+#define cur_vl a3
+#define end_offset a4
+
+#define ELEM_LMUL_SETTING m1
+#define vstr v0
+#define vmask_end v8
+
+ENTRY(__strnlen)
+
+    mv copy_str, str
+    mv ret_value, max_len
+L(strnlen_loop):
+    beqz max_len, L(end_strnlen_loop)
+    vsetvli zero, max_len, e8, ELEM_LMUL_SETTING, ta, ma
+    vle8ff.v vstr, (copy_str)
+    vmseq.vi vmask_end, vstr, 0
+    vfirst.m end_offset, vmask_end /* first occurence of \0 */
+    csrr cur_vl, vl
+    add copy_str, copy_str, cur_vl
+    sub max_len, max_len, cur_vl
+    bltz end_offset, L(strnlen_loop)
+    add max_len, max_len, cur_vl
+    sub ret_value, ret_value, max_len
+    add ret_value, ret_value, end_offset
+L(end_strnlen_loop):
+    ret
+END(__strnlen)
+weak_alias (__strnlen, strnlen)
+libc_hidden_builtin_def (strnlen)
+libc_hidden_builtin_def (__strnlen)
-- 
2.38.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 5/5] riscv: vectorized __memcmpeq function
  2023-05-04  7:48 [PATCH v3 0/5] riscv: Vectorized mem*/str* function Hau Hsu
                   ` (3 preceding siblings ...)
  2023-05-04  7:48 ` [PATCH v3 4/5] riscv: vectorized strchr and strnlen functions Hau Hsu
@ 2023-05-04  7:48 ` Hau Hsu
  2023-05-08 14:06 ` [PATCH v3 0/5] riscv: Vectorized mem*/str* function Palmer Dabbelt
  5 siblings, 0 replies; 9+ messages in thread
From: Hau Hsu @ 2023-05-04  7:48 UTC (permalink / raw)
  To: libc-alpha
  Cc: hau.hsu, kito.cheng, nick.knight, jerry.shih, vincent.chen,
	hongrong.hsu, Yun Hsiang

From: Yun Hsiang <yun.hsiang@sifive.com>

This patch proposes implementations of __memcmpeq that leverage the
RISC-V V extension (RVV), version 1.0. These routines assumes VLEN is at
least 32 bits, as is required by all currently defined vector
extensions, and they support arbitrarily large VLEN. All implementations
work for both RV32 and RV64 platforms, and make no assumptions about
page size.
---
 sysdeps/riscv/rvv/memcmp.S   |  4 ---
 sysdeps/riscv/rvv/memcmpeq.S | 67 ++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 4 deletions(-)
 create mode 100644 sysdeps/riscv/rvv/memcmpeq.S

diff --git a/sysdeps/riscv/rvv/memcmp.S b/sysdeps/riscv/rvv/memcmp.S
index fbf81acc2f..eeec2cae6a 100644
--- a/sysdeps/riscv/rvv/memcmp.S
+++ b/sysdeps/riscv/rvv/memcmp.S
@@ -68,7 +68,3 @@ L(found):
 
 END(memcmp)
 libc_hidden_builtin_def (memcmp)
-weak_alias (memcmp,bcmp)
-strong_alias (memcmp, __memcmpeq)
-libc_hidden_def (__memcmpeq)
-
diff --git a/sysdeps/riscv/rvv/memcmpeq.S b/sysdeps/riscv/rvv/memcmpeq.S
new file mode 100644
index 0000000000..5820af69d7
--- /dev/null
+++ b/sysdeps/riscv/rvv/memcmpeq.S
@@ -0,0 +1,67 @@
+/* RVV versions memcmp.  RISC-V version.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+
+#define result a0
+
+#define src1 a0
+#define src2 a1
+#define num a2
+
+#define ivl a3
+#define temp a4
+
+#define ELEM_LMUL_SETTING m1
+#define vdata1 v0
+#define vdata2 v8
+#define vmask v16
+
+ENTRY(__memcmpeq)
+
+L(loop):
+    vsetvli ivl, num, e8, ELEM_LMUL_SETTING, ta, ma
+
+    vle8.v vdata1, (src1)
+    vle8.v vdata2, (src2)
+
+    vmsne.vv vmask, vdata1, vdata2
+    sub num, num, ivl
+    vfirst.m temp, vmask
+
+    /* Skip the loop if we find the different value between src1 and src2. */
+    bgez temp, L(found)
+
+    add src1, src1, ivl
+    add src2, src2, ivl
+
+    bnez num, L(loop)
+
+    li result, 0
+    ret
+
+L(found):
+    mv result, ivl
+    ret
+
+END(__memcmpeq)
+
+weak_alias (__memcmpeq, bcmp)
+libc_hidden_def (__memcmpeq)
-- 
2.38.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/5] riscv: Vectorized mem*/str* function
  2023-05-04  7:48 [PATCH v3 0/5] riscv: Vectorized mem*/str* function Hau Hsu
                   ` (4 preceding siblings ...)
  2023-05-04  7:48 ` [PATCH v3 5/5] riscv: vectorized __memcmpeq function Hau Hsu
@ 2023-05-08 14:06 ` Palmer Dabbelt
  2023-05-10  9:01   ` Hau Hsu
  5 siblings, 1 reply; 9+ messages in thread
From: Palmer Dabbelt @ 2023-05-08 14:06 UTC (permalink / raw)
  To: hau.hsu
  Cc: libc-alpha, hau.hsu, kito.cheng, nick.knight, jerry.shih,
	vincent.chen, hongrong.hsu

On Thu, 04 May 2023 00:48:46 PDT (-0700), hau.hsu@sifive.com wrote:
> This is v3 patchset of adding vectorized mem*/str* functions for
> RISC-V.
>
> This patch proposes implementations of memchr, memcmp, memcpy, memmove,
> memset, strcat, strchr, strcmp, strcpy, strlen, strncat, strncmp,
> strncpy and strnlen that leverage the RISC-V V extension (RVV), version
> 1.0 (https://github.com/riscv/riscv-v-spec/releases/tag/v1.0). These
> routines are from https://github.com/sifive/sifive-libc, which we agree
> to be contributed to the Free Software Foundation. With regards to
> IFUNC, some details concerning `hwcap` are still under discussion in the
> community. For the purposes of reviewing this patch, we have temporarily
> opted for RVV delegation at compile time. Once the `hwcap` mechanism is
> ready, we’ll rebase on it.

IMO it's fine to allow users to build a glibc that assumes the V 
extension, so we don't need to block this on having the dynamic probing 
working.

That said, we do need to get the Linux uABI sorted out as right now we 
can't even turn on V for userspace.

> These routines assume VLEN is at least 32 bits, as is required by all
> currently defined vector extensions, and they support arbitrarily large
> VLEN. All implementations work for both RV32 and RV64 platforms, and
> make no assumptions about page size.
>
> The `mem*` (known-length) routines use LMUL=8 to minimize dynamic code
> size, while the `str*` (unknown-length) routines use LMUL=1 instead.
> Longer LMUL will still minimize dynamic code size for the latter
> routines, but it will also increase the cost of the remainder/tail loop:
> more data loaded and comparisons performed past the `\0`. This overhead
> will be particularly pronounced for smaller strings.
>
> Measured performance improvements of the vectorized ("rvv")
> implementations vs. the existing Glibc ("scalar") implementations are as

There's been a few of these posted so I forget exactly where the reviews 
ended up, but at least one of the asks was to compare these against 
vectorized versions of the standard glibc routines.

> follows:
> memchr: 85% time savings (i.e., if scalar is 100ms, then rvv is 15ms)
> memcmp: 55%
> memcpy: 88%
> memmove: 80%
> memset: 88%
> strcmp: 85%
> strlen: 70%
> strcat: 53%
> strchr: 85%
> strcpy: 70%
> strncmp 90%
> strncat: 50%
> strncpy: 60%
> strnlen: 80%
> Above data are collected on SiFive X280 (FPGA simulation), across a wide
> range of problem sizes.

That's certainly more realistic of a system than the QEMU results, but 
the general consensus has been that FPGA-based development systems don't 
count as hardware -- not so much because of the FPGA, but because we're 
looking for production systems.  If there's real production systems 
running on FPGAs that's a different story, but it looks like these are 
just pre-silicon development systems.

> v1: https://sourceware.org/pipermail/libc-alpha/2023-March/145976.html
>   * add RISC-V vectoriezed mem*/str* functions
>
> v2: https://sourceware.org/pipermail/libc-alpha/2023-April/147519.html
>   * include the __memcmpeq function
>   * set lmul=1 for memcmp for generality
>
> v3:
>   * remove "Contributed by" comments
>   * fix licesnce headers
>   * avoid using camelcase variables
>   * avoid using C99 one line comment
>
> Jerry Shih (2):
>   riscv: vectorized mem* functions
>   riscv: vectorized str* functions
>
> Nick Knight (1):
>   riscv: vectorized strchr and strnlen functions
>
> Vincent Chen (1):
>   riscv: Enabling vectorized mem*/str* functions in build time
>
> Yun Hsiang (1):
>   riscv: add vectorized __memcmpeq
>
>  scripts/build-many-glibcs.py   | 10 ++++
>  sysdeps/riscv/preconfigure     | 19 ++++++++
>  sysdeps/riscv/preconfigure.ac  | 18 +++++++
>  sysdeps/riscv/rv32/rvv/Implies |  2 +
>  sysdeps/riscv/rv64/rvv/Implies |  2 +
>  sysdeps/riscv/rvv/memchr.S     | 62 ++++++++++++++++++++++++
>  sysdeps/riscv/rvv/memcmp.S     | 70 +++++++++++++++++++++++++++
>  sysdeps/riscv/rvv/memcmpeq.S   | 67 ++++++++++++++++++++++++++
>  sysdeps/riscv/rvv/memcpy.S     | 50 +++++++++++++++++++
>  sysdeps/riscv/rvv/memmove.S    | 71 +++++++++++++++++++++++++++
>  sysdeps/riscv/rvv/memset.S     | 49 +++++++++++++++++++
>  sysdeps/riscv/rvv/strcat.S     | 71 +++++++++++++++++++++++++++
>  sysdeps/riscv/rvv/strchr.S     | 62 ++++++++++++++++++++++++
>  sysdeps/riscv/rvv/strcmp.S     | 88 ++++++++++++++++++++++++++++++++++
>  sysdeps/riscv/rvv/strcpy.S     | 55 +++++++++++++++++++++
>  sysdeps/riscv/rvv/strlen.S     | 53 ++++++++++++++++++++
>  sysdeps/riscv/rvv/strncat.S    | 82 +++++++++++++++++++++++++++++++
>  sysdeps/riscv/rvv/strncmp.S    | 84 ++++++++++++++++++++++++++++++++
>  sysdeps/riscv/rvv/strncpy.S    | 85 ++++++++++++++++++++++++++++++++
>  sysdeps/riscv/rvv/strnlen.S    | 55 +++++++++++++++++++++
>  20 files changed, 1055 insertions(+)
>  create mode 100644 sysdeps/riscv/rv32/rvv/Implies
>  create mode 100644 sysdeps/riscv/rv64/rvv/Implies
>  create mode 100644 sysdeps/riscv/rvv/memchr.S
>  create mode 100644 sysdeps/riscv/rvv/memcmp.S
>  create mode 100644 sysdeps/riscv/rvv/memcmpeq.S
>  create mode 100644 sysdeps/riscv/rvv/memcpy.S
>  create mode 100644 sysdeps/riscv/rvv/memmove.S
>  create mode 100644 sysdeps/riscv/rvv/memset.S
>  create mode 100644 sysdeps/riscv/rvv/strcat.S
>  create mode 100644 sysdeps/riscv/rvv/strchr.S
>  create mode 100644 sysdeps/riscv/rvv/strcmp.S
>  create mode 100644 sysdeps/riscv/rvv/strcpy.S
>  create mode 100644 sysdeps/riscv/rvv/strlen.S
>  create mode 100644 sysdeps/riscv/rvv/strncat.S
>  create mode 100644 sysdeps/riscv/rvv/strncmp.S
>  create mode 100644 sysdeps/riscv/rvv/strncpy.S
>  create mode 100644 sysdeps/riscv/rvv/strnlen.S

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/5] riscv: Vectorized mem*/str* function
  2023-05-08 14:06 ` [PATCH v3 0/5] riscv: Vectorized mem*/str* function Palmer Dabbelt
@ 2023-05-10  9:01   ` Hau Hsu
  2023-05-10 12:28     ` Sergei Lewis
  0 siblings, 1 reply; 9+ messages in thread
From: Hau Hsu @ 2023-05-10  9:01 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: libc-alpha, Kito Cheng, nick.knight, jerry.shih, vincent.chen,
	hongrong.hsu

[-- Attachment #1: Type: text/plain, Size: 7406 bytes --]



> On May 8, 2023, at 10:06 PM, Palmer Dabbelt <palmer@dabbelt.com> wrote:
> 
> On Thu, 04 May 2023 00:48:46 PDT (-0700), hau.hsu@sifive.com <mailto:hau.hsu@sifive.com> wrote:
>> This is v3 patchset of adding vectorized mem*/str* functions for
>> RISC-V.
>> 
>> This patch proposes implementations of memchr, memcmp, memcpy, memmove,
>> memset, strcat, strchr, strcmp, strcpy, strlen, strncat, strncmp,
>> strncpy and strnlen that leverage the RISC-V V extension (RVV), version
>> 1.0 (https://github.com/riscv/riscv-v-spec/releases/tag/v1.0). These
>> routines are from https://github.com/sifive/sifive-libc, which we agree
>> to be contributed to the Free Software Foundation. With regards to
>> IFUNC, some details concerning `hwcap` are still under discussion in the
>> community. For the purposes of reviewing this patch, we have temporarily
>> opted for RVV delegation at compile time. Once the `hwcap` mechanism is
>> ready, we’ll rebase on it.
> 
> IMO it's fine to allow users to build a glibc that assumes the V extension, so we don't need to block this on having the dynamic probing working.
> 
> That said, we do need to get the Linux uABI sorted out as right now we can't even turn on V for userspace.

Does this mean that our current implementation that checks whether a user is building
glibc with RVV compile flags is acceptable, at least for now?

>> These routines assume VLEN is at least 32 bits, as is required by all
>> currently defined vector extensions, and they support arbitrarily large
>> VLEN. All implementations work for both RV32 and RV64 platforms, and
>> make no assumptions about page size.
>> 
>> The `mem*` (known-length) routines use LMUL=8 to minimize dynamic code
>> size, while the `str*` (unknown-length) routines use LMUL=1 instead.
>> Longer LMUL will still minimize dynamic code size for the latter
>> routines, but it will also increase the cost of the remainder/tail loop:
>> more data loaded and comparisons performed past the `\0`. This overhead
>> will be particularly pronounced for smaller strings.
>> 
>> Measured performance improvements of the vectorized ("rvv")
>> implementations vs. the existing Glibc ("scalar") implementations are as
> 
> There's been a few of these posted so I forget exactly where the reviews ended up, but at least one of the asks was to compare these against vectorized versions of the standard glibc routines.

I guess you mean this thread?
https://sourceware.org/pipermail/libc-alpha/2023-April/147056.html <https://sourceware.org/pipermail/libc-alpha/2023-April/147056.html> 

> 
>> follows:
>> memchr: 85% time savings (i.e., if scalar is 100ms, then rvv is 15ms)
>> memcmp: 55%
>> memcpy: 88%
>> memmove: 80%
>> memset: 88%
>> strcmp: 85%
>> strlen: 70%
>> strcat: 53%
>> strchr: 85%
>> strcpy: 70%
>> strncmp 90%
>> strncat: 50%
>> strncpy: 60%
>> strnlen: 80%
>> Above data are collected on SiFive X280 (FPGA simulation), across a wide
>> range of problem sizes.
> 
> That's certainly more realistic of a system than the QEMU results, but the general consensus has been that FPGA-based development systems don't count as hardware -- not so much because of the FPGA, but because we're looking for production systems.  If there's real production systems running on FPGAs that's a different story, but it looks like these are just pre-silicon development systems.

Yes, the FPGA environment is not a production system, but currently we don't have
any RVV products in hand nor similar simulation platforms, this is the best benchmarking environment we have.

Yun Hsiang also ran benchmarks base on Sergei Lewis's commits in the same environment:
https://sourceware.org/pipermail/libc-alpha/2023-May/147821.html <https://sourceware.org/pipermail/libc-alpha/2023-May/147821.html> 
Out implementations in this have less instruction/cycle count in most cases.

When benchmarking Sergei Lewis's commits, Yun Hsiang encountered some errors.
He helped to debug the source code and pointed out some issues:
https://sourceware.org/pipermail/libc-alpha/2023-May/147820.html <https://sourceware.org/pipermail/libc-alpha/2023-May/147820.html> 

We know that different uarch variants might prefer different code, but our implementation is more generic.
It follows the RVV spec 1.0 and has no other hardware assumptions.
The benchmarking also shows good results, compare with the default and other proposed implementations.


> 
>> v1: https://sourceware.org/pipermail/libc-alpha/2023-March/145976.html
>>  * add RISC-V vectoriezed mem*/str* functions
>> 
>> v2: https://sourceware.org/pipermail/libc-alpha/2023-April/147519.html
>>  * include the __memcmpeq function
>>  * set lmul=1 for memcmp for generality
>> 
>> v3:
>>  * remove "Contributed by" comments
>>  * fix licesnce headers
>>  * avoid using camelcase variables
>>  * avoid using C99 one line comment
>> 
>> Jerry Shih (2):
>>  riscv: vectorized mem* functions
>>  riscv: vectorized str* functions
>> 
>> Nick Knight (1):
>>  riscv: vectorized strchr and strnlen functions
>> 
>> Vincent Chen (1):
>>  riscv: Enabling vectorized mem*/str* functions in build time
>> 
>> Yun Hsiang (1):
>>  riscv: add vectorized __memcmpeq
>> 
>> scripts/build-many-glibcs.py   | 10 ++++
>> sysdeps/riscv/preconfigure     | 19 ++++++++
>> sysdeps/riscv/preconfigure.ac  | 18 +++++++
>> sysdeps/riscv/rv32/rvv/Implies |  2 +
>> sysdeps/riscv/rv64/rvv/Implies |  2 +
>> sysdeps/riscv/rvv/memchr.S     | 62 ++++++++++++++++++++++++
>> sysdeps/riscv/rvv/memcmp.S     | 70 +++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/memcmpeq.S   | 67 ++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/memcpy.S     | 50 +++++++++++++++++++
>> sysdeps/riscv/rvv/memmove.S    | 71 +++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/memset.S     | 49 +++++++++++++++++++
>> sysdeps/riscv/rvv/strcat.S     | 71 +++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/strchr.S     | 62 ++++++++++++++++++++++++
>> sysdeps/riscv/rvv/strcmp.S     | 88 ++++++++++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/strcpy.S     | 55 +++++++++++++++++++++
>> sysdeps/riscv/rvv/strlen.S     | 53 ++++++++++++++++++++
>> sysdeps/riscv/rvv/strncat.S    | 82 +++++++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/strncmp.S    | 84 ++++++++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/strncpy.S    | 85 ++++++++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/strnlen.S    | 55 +++++++++++++++++++++
>> 20 files changed, 1055 insertions(+)
>> create mode 100644 sysdeps/riscv/rv32/rvv/Implies
>> create mode 100644 sysdeps/riscv/rv64/rvv/Implies
>> create mode 100644 sysdeps/riscv/rvv/memchr.S
>> create mode 100644 sysdeps/riscv/rvv/memcmp.S
>> create mode 100644 sysdeps/riscv/rvv/memcmpeq.S
>> create mode 100644 sysdeps/riscv/rvv/memcpy.S
>> create mode 100644 sysdeps/riscv/rvv/memmove.S
>> create mode 100644 sysdeps/riscv/rvv/memset.S
>> create mode 100644 sysdeps/riscv/rvv/strcat.S
>> create mode 100644 sysdeps/riscv/rvv/strchr.S
>> create mode 100644 sysdeps/riscv/rvv/strcmp.S
>> create mode 100644 sysdeps/riscv/rvv/strcpy.S
>> create mode 100644 sysdeps/riscv/rvv/strlen.S
>> create mode 100644 sysdeps/riscv/rvv/strncat.S
>> create mode 100644 sysdeps/riscv/rvv/strncmp.S
>> create mode 100644 sysdeps/riscv/rvv/strncpy.S
>> create mode 100644 sysdeps/riscv/rvv/strnlen.S


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/5] riscv: Vectorized mem*/str* function
  2023-05-10  9:01   ` Hau Hsu
@ 2023-05-10 12:28     ` Sergei Lewis
  0 siblings, 0 replies; 9+ messages in thread
From: Sergei Lewis @ 2023-05-10 12:28 UTC (permalink / raw)
  Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 8562 bytes --]

Yes, I've been in email conversation with Yun and have an updated version
of that patchset locally. Since a key design goal of this approach is to
explicitly align memory accesses instead of using fault only first loads in
order to take advantage of uarch specific fast paths, my plan here is to
submit an ifunc based version of the patch, gated behind suitable tests to
make sure it is only enabled when it is not only safe but also outperforms
more generic alternatives, once suitable hwcaps support is available to
enable appropriate checks to be written. There seems little point sending
an update to the list until it can be gated in this manner, but I am
certainly happy to do so if there is interest.

On Wed, May 10, 2023 at 10:02 AM Hau Hsu via Libc-alpha <
libc-alpha@sourceware.org> wrote:

>
>
> > On May 8, 2023, at 10:06 PM, Palmer Dabbelt <palmer@dabbelt.com> wrote:
> >
> > On Thu, 04 May 2023 00:48:46 PDT (-0700), hau.hsu@sifive.com <mailto:
> hau.hsu@sifive.com> wrote:
> >> This is v3 patchset of adding vectorized mem*/str* functions for
> >> RISC-V.
> >>
> >> This patch proposes implementations of memchr, memcmp, memcpy, memmove,
> >> memset, strcat, strchr, strcmp, strcpy, strlen, strncat, strncmp,
> >> strncpy and strnlen that leverage the RISC-V V extension (RVV), version
> >> 1.0 (https://github.com/riscv/riscv-v-spec/releases/tag/v1.0). These
> >> routines are from https://github.com/sifive/sifive-libc, which we agree
> >> to be contributed to the Free Software Foundation. With regards to
> >> IFUNC, some details concerning `hwcap` are still under discussion in the
> >> community. For the purposes of reviewing this patch, we have temporarily
> >> opted for RVV delegation at compile time. Once the `hwcap` mechanism is
> >> ready, we’ll rebase on it.
> >
> > IMO it's fine to allow users to build a glibc that assumes the V
> extension, so we don't need to block this on having the dynamic probing
> working.
> >
> > That said, we do need to get the Linux uABI sorted out as right now we
> can't even turn on V for userspace.
>
> Does this mean that our current implementation that checks whether a user
> is building
> glibc with RVV compile flags is acceptable, at least for now?
>
> >> These routines assume VLEN is at least 32 bits, as is required by all
> >> currently defined vector extensions, and they support arbitrarily large
> >> VLEN. All implementations work for both RV32 and RV64 platforms, and
> >> make no assumptions about page size.
> >>
> >> The `mem*` (known-length) routines use LMUL=8 to minimize dynamic code
> >> size, while the `str*` (unknown-length) routines use LMUL=1 instead.
> >> Longer LMUL will still minimize dynamic code size for the latter
> >> routines, but it will also increase the cost of the remainder/tail loop:
> >> more data loaded and comparisons performed past the `\0`. This overhead
> >> will be particularly pronounced for smaller strings.
> >>
> >> Measured performance improvements of the vectorized ("rvv")
> >> implementations vs. the existing Glibc ("scalar") implementations are as
> >
> > There's been a few of these posted so I forget exactly where the reviews
> ended up, but at least one of the asks was to compare these against
> vectorized versions of the standard glibc routines.
>
> I guess you mean this thread?
> https://sourceware.org/pipermail/libc-alpha/2023-April/147056.html <
> https://sourceware.org/pipermail/libc-alpha/2023-April/147056.html>
>
> >
> >> follows:
> >> memchr: 85% time savings (i.e., if scalar is 100ms, then rvv is 15ms)
> >> memcmp: 55%
> >> memcpy: 88%
> >> memmove: 80%
> >> memset: 88%
> >> strcmp: 85%
> >> strlen: 70%
> >> strcat: 53%
> >> strchr: 85%
> >> strcpy: 70%
> >> strncmp 90%
> >> strncat: 50%
> >> strncpy: 60%
> >> strnlen: 80%
> >> Above data are collected on SiFive X280 (FPGA simulation), across a wide
> >> range of problem sizes.
> >
> > That's certainly more realistic of a system than the QEMU results, but
> the general consensus has been that FPGA-based development systems don't
> count as hardware -- not so much because of the FPGA, but because we're
> looking for production systems.  If there's real production systems running
> on FPGAs that's a different story, but it looks like these are just
> pre-silicon development systems.
>
> Yes, the FPGA environment is not a production system, but currently we
> don't have
> any RVV products in hand nor similar simulation platforms, this is the
> best benchmarking environment we have.
>
> Yun Hsiang also ran benchmarks base on Sergei Lewis's commits in the same
> environment:
> https://sourceware.org/pipermail/libc-alpha/2023-May/147821.html <
> https://sourceware.org/pipermail/libc-alpha/2023-May/147821.html>
> Out implementations in this have less instruction/cycle count in most
> cases.
>
> When benchmarking Sergei Lewis's commits, Yun Hsiang encountered some
> errors.
> He helped to debug the source code and pointed out some issues:
> https://sourceware.org/pipermail/libc-alpha/2023-May/147820.html <
> https://sourceware.org/pipermail/libc-alpha/2023-May/147820.html>
>
> We know that different uarch variants might prefer different code, but our
> implementation is more generic.
> It follows the RVV spec 1.0 and has no other hardware assumptions.
> The benchmarking also shows good results, compare with the default and
> other proposed implementations.
>
>
> >
> >> v1: https://sourceware.org/pipermail/libc-alpha/2023-March/145976.html
> >>  * add RISC-V vectoriezed mem*/str* functions
> >>
> >> v2: https://sourceware.org/pipermail/libc-alpha/2023-April/147519.html
> >>  * include the __memcmpeq function
> >>  * set lmul=1 for memcmp for generality
> >>
> >> v3:
> >>  * remove "Contributed by" comments
> >>  * fix licesnce headers
> >>  * avoid using camelcase variables
> >>  * avoid using C99 one line comment
> >>
> >> Jerry Shih (2):
> >>  riscv: vectorized mem* functions
> >>  riscv: vectorized str* functions
> >>
> >> Nick Knight (1):
> >>  riscv: vectorized strchr and strnlen functions
> >>
> >> Vincent Chen (1):
> >>  riscv: Enabling vectorized mem*/str* functions in build time
> >>
> >> Yun Hsiang (1):
> >>  riscv: add vectorized __memcmpeq
> >>
> >> scripts/build-many-glibcs.py   | 10 ++++
> >> sysdeps/riscv/preconfigure     | 19 ++++++++
> >> sysdeps/riscv/preconfigure.ac  | 18 +++++++
> >> sysdeps/riscv/rv32/rvv/Implies |  2 +
> >> sysdeps/riscv/rv64/rvv/Implies |  2 +
> >> sysdeps/riscv/rvv/memchr.S     | 62 ++++++++++++++++++++++++
> >> sysdeps/riscv/rvv/memcmp.S     | 70 +++++++++++++++++++++++++++
> >> sysdeps/riscv/rvv/memcmpeq.S   | 67 ++++++++++++++++++++++++++
> >> sysdeps/riscv/rvv/memcpy.S     | 50 +++++++++++++++++++
> >> sysdeps/riscv/rvv/memmove.S    | 71 +++++++++++++++++++++++++++
> >> sysdeps/riscv/rvv/memset.S     | 49 +++++++++++++++++++
> >> sysdeps/riscv/rvv/strcat.S     | 71 +++++++++++++++++++++++++++
> >> sysdeps/riscv/rvv/strchr.S     | 62 ++++++++++++++++++++++++
> >> sysdeps/riscv/rvv/strcmp.S     | 88 ++++++++++++++++++++++++++++++++++
> >> sysdeps/riscv/rvv/strcpy.S     | 55 +++++++++++++++++++++
> >> sysdeps/riscv/rvv/strlen.S     | 53 ++++++++++++++++++++
> >> sysdeps/riscv/rvv/strncat.S    | 82 +++++++++++++++++++++++++++++++
> >> sysdeps/riscv/rvv/strncmp.S    | 84 ++++++++++++++++++++++++++++++++
> >> sysdeps/riscv/rvv/strncpy.S    | 85 ++++++++++++++++++++++++++++++++
> >> sysdeps/riscv/rvv/strnlen.S    | 55 +++++++++++++++++++++
> >> 20 files changed, 1055 insertions(+)
> >> create mode 100644 sysdeps/riscv/rv32/rvv/Implies
> >> create mode 100644 sysdeps/riscv/rv64/rvv/Implies
> >> create mode 100644 sysdeps/riscv/rvv/memchr.S
> >> create mode 100644 sysdeps/riscv/rvv/memcmp.S
> >> create mode 100644 sysdeps/riscv/rvv/memcmpeq.S
> >> create mode 100644 sysdeps/riscv/rvv/memcpy.S
> >> create mode 100644 sysdeps/riscv/rvv/memmove.S
> >> create mode 100644 sysdeps/riscv/rvv/memset.S
> >> create mode 100644 sysdeps/riscv/rvv/strcat.S
> >> create mode 100644 sysdeps/riscv/rvv/strchr.S
> >> create mode 100644 sysdeps/riscv/rvv/strcmp.S
> >> create mode 100644 sysdeps/riscv/rvv/strcpy.S
> >> create mode 100644 sysdeps/riscv/rvv/strlen.S
> >> create mode 100644 sysdeps/riscv/rvv/strncat.S
> >> create mode 100644 sysdeps/riscv/rvv/strncmp.S
> >> create mode 100644 sysdeps/riscv/rvv/strncpy.S
> >> create mode 100644 sysdeps/riscv/rvv/strnlen.S
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-05-10 12:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-04  7:48 [PATCH v3 0/5] riscv: Vectorized mem*/str* function Hau Hsu
2023-05-04  7:48 ` [PATCH v3 1/5] riscv: Enabling vectorized mem*/str* functions in build time Hau Hsu
2023-05-04  7:48 ` [PATCH v3 2/5] riscv: vectorized mem* functions Hau Hsu
2023-05-04  7:48 ` [PATCH v3 3/5] riscv: vectorized str* functions Hau Hsu
2023-05-04  7:48 ` [PATCH v3 4/5] riscv: vectorized strchr and strnlen functions Hau Hsu
2023-05-04  7:48 ` [PATCH v3 5/5] riscv: vectorized __memcmpeq function Hau Hsu
2023-05-08 14:06 ` [PATCH v3 0/5] riscv: Vectorized mem*/str* function Palmer Dabbelt
2023-05-10  9:01   ` Hau Hsu
2023-05-10 12:28     ` Sergei Lewis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).