Re: [PATCH v3 0/5] riscv: Vectorized mem*/str* function

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Hau Hsu <hau.hsu@sifive.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: libc-alpha@sourceware.org, Kito Cheng <kito.cheng@sifive.com>,
	nick.knight@sifive.com, jerry.shih@sifive.com,
	vincent.chen@sifive.com, hongrong.hsu@sifive.com
Subject: Re: [PATCH v3 0/5] riscv: Vectorized mem*/str* function
Date: Wed, 10 May 2023 17:01:16 +0800	[thread overview]
Message-ID: <55DF5B20-485B-4A56-812E-47B4A3333427@sifive.com> (raw)
In-Reply-To: <mhng-8d9e92a4-c3f1-466e-86a6-ada72a60b8c8@palmer-ri-x1c9a>

[-- Attachment #1: Type: text/plain, Size: 7406 bytes --]



> On May 8, 2023, at 10:06 PM, Palmer Dabbelt <palmer@dabbelt.com> wrote:
> 
> On Thu, 04 May 2023 00:48:46 PDT (-0700), hau.hsu@sifive.com <mailto:hau.hsu@sifive.com> wrote:
>> This is v3 patchset of adding vectorized mem*/str* functions for
>> RISC-V.
>> 
>> This patch proposes implementations of memchr, memcmp, memcpy, memmove,
>> memset, strcat, strchr, strcmp, strcpy, strlen, strncat, strncmp,
>> strncpy and strnlen that leverage the RISC-V V extension (RVV), version
>> 1.0 (https://github.com/riscv/riscv-v-spec/releases/tag/v1.0). These
>> routines are from https://github.com/sifive/sifive-libc, which we agree
>> to be contributed to the Free Software Foundation. With regards to
>> IFUNC, some details concerning `hwcap` are still under discussion in the
>> community. For the purposes of reviewing this patch, we have temporarily
>> opted for RVV delegation at compile time. Once the `hwcap` mechanism is
>> ready, we’ll rebase on it.
> 
> IMO it's fine to allow users to build a glibc that assumes the V extension, so we don't need to block this on having the dynamic probing working.
> 
> That said, we do need to get the Linux uABI sorted out as right now we can't even turn on V for userspace.

Does this mean that our current implementation that checks whether a user is building
glibc with RVV compile flags is acceptable, at least for now?

>> These routines assume VLEN is at least 32 bits, as is required by all
>> currently defined vector extensions, and they support arbitrarily large
>> VLEN. All implementations work for both RV32 and RV64 platforms, and
>> make no assumptions about page size.
>> 
>> The `mem*` (known-length) routines use LMUL=8 to minimize dynamic code
>> size, while the `str*` (unknown-length) routines use LMUL=1 instead.
>> Longer LMUL will still minimize dynamic code size for the latter
>> routines, but it will also increase the cost of the remainder/tail loop:
>> more data loaded and comparisons performed past the `\0`. This overhead
>> will be particularly pronounced for smaller strings.
>> 
>> Measured performance improvements of the vectorized ("rvv")
>> implementations vs. the existing Glibc ("scalar") implementations are as
> 
> There's been a few of these posted so I forget exactly where the reviews ended up, but at least one of the asks was to compare these against vectorized versions of the standard glibc routines.

I guess you mean this thread?
https://sourceware.org/pipermail/libc-alpha/2023-April/147056.html <https://sourceware.org/pipermail/libc-alpha/2023-April/147056.html> 

> 
>> follows:
>> memchr: 85% time savings (i.e., if scalar is 100ms, then rvv is 15ms)
>> memcmp: 55%
>> memcpy: 88%
>> memmove: 80%
>> memset: 88%
>> strcmp: 85%
>> strlen: 70%
>> strcat: 53%
>> strchr: 85%
>> strcpy: 70%
>> strncmp 90%
>> strncat: 50%
>> strncpy: 60%
>> strnlen: 80%
>> Above data are collected on SiFive X280 (FPGA simulation), across a wide
>> range of problem sizes.
> 
> That's certainly more realistic of a system than the QEMU results, but the general consensus has been that FPGA-based development systems don't count as hardware -- not so much because of the FPGA, but because we're looking for production systems.  If there's real production systems running on FPGAs that's a different story, but it looks like these are just pre-silicon development systems.

Yes, the FPGA environment is not a production system, but currently we don't have
any RVV products in hand nor similar simulation platforms, this is the best benchmarking environment we have.

Yun Hsiang also ran benchmarks base on Sergei Lewis's commits in the same environment:
https://sourceware.org/pipermail/libc-alpha/2023-May/147821.html <https://sourceware.org/pipermail/libc-alpha/2023-May/147821.html> 
Out implementations in this have less instruction/cycle count in most cases.

When benchmarking Sergei Lewis's commits, Yun Hsiang encountered some errors.
He helped to debug the source code and pointed out some issues:
https://sourceware.org/pipermail/libc-alpha/2023-May/147820.html <https://sourceware.org/pipermail/libc-alpha/2023-May/147820.html> 

We know that different uarch variants might prefer different code, but our implementation is more generic.
It follows the RVV spec 1.0 and has no other hardware assumptions.
The benchmarking also shows good results, compare with the default and other proposed implementations.


> 
>> v1: https://sourceware.org/pipermail/libc-alpha/2023-March/145976.html
>>  * add RISC-V vectoriezed mem*/str* functions
>> 
>> v2: https://sourceware.org/pipermail/libc-alpha/2023-April/147519.html
>>  * include the __memcmpeq function
>>  * set lmul=1 for memcmp for generality
>> 
>> v3:
>>  * remove "Contributed by" comments
>>  * fix licesnce headers
>>  * avoid using camelcase variables
>>  * avoid using C99 one line comment
>> 
>> Jerry Shih (2):
>>  riscv: vectorized mem* functions
>>  riscv: vectorized str* functions
>> 
>> Nick Knight (1):
>>  riscv: vectorized strchr and strnlen functions
>> 
>> Vincent Chen (1):
>>  riscv: Enabling vectorized mem*/str* functions in build time
>> 
>> Yun Hsiang (1):
>>  riscv: add vectorized __memcmpeq
>> 
>> scripts/build-many-glibcs.py   | 10 ++++
>> sysdeps/riscv/preconfigure     | 19 ++++++++
>> sysdeps/riscv/preconfigure.ac  | 18 +++++++
>> sysdeps/riscv/rv32/rvv/Implies |  2 +
>> sysdeps/riscv/rv64/rvv/Implies |  2 +
>> sysdeps/riscv/rvv/memchr.S     | 62 ++++++++++++++++++++++++
>> sysdeps/riscv/rvv/memcmp.S     | 70 +++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/memcmpeq.S   | 67 ++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/memcpy.S     | 50 +++++++++++++++++++
>> sysdeps/riscv/rvv/memmove.S    | 71 +++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/memset.S     | 49 +++++++++++++++++++
>> sysdeps/riscv/rvv/strcat.S     | 71 +++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/strchr.S     | 62 ++++++++++++++++++++++++
>> sysdeps/riscv/rvv/strcmp.S     | 88 ++++++++++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/strcpy.S     | 55 +++++++++++++++++++++
>> sysdeps/riscv/rvv/strlen.S     | 53 ++++++++++++++++++++
>> sysdeps/riscv/rvv/strncat.S    | 82 +++++++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/strncmp.S    | 84 ++++++++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/strncpy.S    | 85 ++++++++++++++++++++++++++++++++
>> sysdeps/riscv/rvv/strnlen.S    | 55 +++++++++++++++++++++
>> 20 files changed, 1055 insertions(+)
>> create mode 100644 sysdeps/riscv/rv32/rvv/Implies
>> create mode 100644 sysdeps/riscv/rv64/rvv/Implies
>> create mode 100644 sysdeps/riscv/rvv/memchr.S
>> create mode 100644 sysdeps/riscv/rvv/memcmp.S
>> create mode 100644 sysdeps/riscv/rvv/memcmpeq.S
>> create mode 100644 sysdeps/riscv/rvv/memcpy.S
>> create mode 100644 sysdeps/riscv/rvv/memmove.S
>> create mode 100644 sysdeps/riscv/rvv/memset.S
>> create mode 100644 sysdeps/riscv/rvv/strcat.S
>> create mode 100644 sysdeps/riscv/rvv/strchr.S
>> create mode 100644 sysdeps/riscv/rvv/strcmp.S
>> create mode 100644 sysdeps/riscv/rvv/strcpy.S
>> create mode 100644 sysdeps/riscv/rvv/strlen.S
>> create mode 100644 sysdeps/riscv/rvv/strncat.S
>> create mode 100644 sysdeps/riscv/rvv/strncmp.S
>> create mode 100644 sysdeps/riscv/rvv/strncpy.S
>> create mode 100644 sysdeps/riscv/rvv/strnlen.S

next prev parent reply	other threads:[~2023-05-10  9:01 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-04  7:48 Hau Hsu
2023-05-04  7:48 ` [PATCH v3 1/5] riscv: Enabling vectorized mem*/str* functions in build time Hau Hsu
2023-05-04  7:48 ` [PATCH v3 2/5] riscv: vectorized mem* functions Hau Hsu
2023-05-04  7:48 ` [PATCH v3 3/5] riscv: vectorized str* functions Hau Hsu
2023-05-04  7:48 ` [PATCH v3 4/5] riscv: vectorized strchr and strnlen functions Hau Hsu
2023-05-04  7:48 ` [PATCH v3 5/5] riscv: vectorized __memcmpeq function Hau Hsu
2023-05-08 14:06 ` [PATCH v3 0/5] riscv: Vectorized mem*/str* function Palmer Dabbelt
2023-05-10  9:01   ` Hau Hsu [this message]
2023-05-10 12:28     ` Sergei Lewis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55DF5B20-485B-4A56-812E-47B4A3333427@sifive.com \
    --to=hau.hsu@sifive.com \
    --cc=hongrong.hsu@sifive.com \
    --cc=jerry.shih@sifive.com \
    --cc=kito.cheng@sifive.com \
    --cc=libc-alpha@sourceware.org \
    --cc=nick.knight@sifive.com \
    --cc=palmer@dabbelt.com \
    --cc=vincent.chen@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).