From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) by sourceware.org (Postfix) with ESMTPS id 80C2F3858D32 for ; Mon, 8 May 2023 14:06:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 80C2F3858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=dabbelt.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dabbelt.com Received: by mail-pj1-x102d.google.com with SMTP id 98e67ed59e1d1-24deb9c5ffcso3118920a91.1 for ; Mon, 08 May 2023 07:06:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dabbelt-com.20221208.gappssmtp.com; s=20221208; t=1683554809; x=1686146809; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:from:to:cc:subject:date:message-id :reply-to; bh=gzA544Pc8+yKGQ6CXOf4X0/4i7sHy0hJi6ueCDEpgik=; b=EV8sLgWExTL0tNCaexDm04N6VUGrVVw9/vYZ42KTv9JLIj6RVJF3NoNqpMOBJMGl8U NerJ0qY7cSx2eH9Jj9Qkf/CgwiBMFHRl094CaaUs0xm9zSqSLoLe+zv9V+pKcNYYC1sk sGYmqQiiE2B10qIyTBPl0W8MZL+qSQEM4TOOZfdSdAaY7wYGd838pUj+D/dQosah6JyF /pF4u5smsUICaqzDkYJwGaD9kaGQ9SjAWCi1anvFuW0gW+4JiPzrGxmnwPb0jyLFfCJY rHZc7BxrGXBG12H0hQ4/q8GH3JE4GjahYpWl/ZOHsU0UpneYJbhKFPuOXIMYGlkgo/Hp NHNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683554809; x=1686146809; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=gzA544Pc8+yKGQ6CXOf4X0/4i7sHy0hJi6ueCDEpgik=; b=bQ4hHGrB9vGlbIVhPxrH6L5YtrqZXSAI+xbUOnPayxXBqNFQRznQvw9aOqkYJYzwNf nHar47DyRysT25dSa+fpBa4WsbVUikmdAYQQSe/tH5rQVVHxJuGvB8/M9JGWTL17SRrb 48u12TOQZv34a1s/sqpHkgEc7wLC4+Od6IoHHlzkfWwS4dkpUgX9G4QvH1dz2PJ7CVJE 4rfXzy8MwIxldGfjh/aHK5NFzUm5M7l7jIlUfYKd/z29O8+OYeQLeVT0Q1YfvIWanpLE fYC7aiz+xq3ZLcBODKy1Zt9mLO0QUlsdLVptICJzSKvnSlXJYP64ftv/RSjYOqUH1kR6 /uRA== X-Gm-Message-State: AC+VfDy3S802/5c+ffVANHMPpZsK7KLOEAEAJL0JssHCwl04uYNFAOdB dT8/H7wLLZQuYFMvlkXXGkCvoodUVD2Gp9TvGvU= X-Google-Smtp-Source: ACHHUZ4soumTO0K9aHeDqw6JfDjt8J4knZR9huGXIm5x0URsTDNOny5NXsE8udSD0JVTuu2MpfBkQQ== X-Received: by 2002:a17:90a:404b:b0:24e:3b85:a8a with SMTP id k11-20020a17090a404b00b0024e3b850a8amr10720674pjg.8.1683554808966; Mon, 08 May 2023 07:06:48 -0700 (PDT) Received: from localhost ([50.221.140.188]) by smtp.gmail.com with ESMTPSA id p22-20020a17090adf9600b00246774a9addsm9874625pjv.48.2023.05.08.07.06.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 May 2023 07:06:48 -0700 (PDT) Date: Mon, 08 May 2023 07:06:48 -0700 (PDT) X-Google-Original-Date: Mon, 08 May 2023 07:06:26 PDT (-0700) Subject: Re: [PATCH v3 0/5] riscv: Vectorized mem*/str* function In-Reply-To: <20230504074851.38763-1-hau.hsu@sifive.com> CC: libc-alpha@sourceware.org, hau.hsu@sifive.com, kito.cheng@sifive.com, nick.knight@sifive.com, jerry.shih@sifive.com, vincent.chen@sifive.com, hongrong.hsu@sifive.com From: Palmer Dabbelt To: hau.hsu@sifive.com Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 04 May 2023 00:48:46 PDT (-0700), hau.hsu@sifive.com wrote: > This is v3 patchset of adding vectorized mem*/str* functions for > RISC-V. > > This patch proposes implementations of memchr, memcmp, memcpy, memmove, > memset, strcat, strchr, strcmp, strcpy, strlen, strncat, strncmp, > strncpy and strnlen that leverage the RISC-V V extension (RVV), version > 1.0 (https://github.com/riscv/riscv-v-spec/releases/tag/v1.0). These > routines are from https://github.com/sifive/sifive-libc, which we agree > to be contributed to the Free Software Foundation. With regards to > IFUNC, some details concerning `hwcap` are still under discussion in the > community. For the purposes of reviewing this patch, we have temporarily > opted for RVV delegation at compile time. Once the `hwcap` mechanism is > ready, we’ll rebase on it. IMO it's fine to allow users to build a glibc that assumes the V extension, so we don't need to block this on having the dynamic probing working. That said, we do need to get the Linux uABI sorted out as right now we can't even turn on V for userspace. > These routines assume VLEN is at least 32 bits, as is required by all > currently defined vector extensions, and they support arbitrarily large > VLEN. All implementations work for both RV32 and RV64 platforms, and > make no assumptions about page size. > > The `mem*` (known-length) routines use LMUL=8 to minimize dynamic code > size, while the `str*` (unknown-length) routines use LMUL=1 instead. > Longer LMUL will still minimize dynamic code size for the latter > routines, but it will also increase the cost of the remainder/tail loop: > more data loaded and comparisons performed past the `\0`. This overhead > will be particularly pronounced for smaller strings. > > Measured performance improvements of the vectorized ("rvv") > implementations vs. the existing Glibc ("scalar") implementations are as There's been a few of these posted so I forget exactly where the reviews ended up, but at least one of the asks was to compare these against vectorized versions of the standard glibc routines. > follows: > memchr: 85% time savings (i.e., if scalar is 100ms, then rvv is 15ms) > memcmp: 55% > memcpy: 88% > memmove: 80% > memset: 88% > strcmp: 85% > strlen: 70% > strcat: 53% > strchr: 85% > strcpy: 70% > strncmp 90% > strncat: 50% > strncpy: 60% > strnlen: 80% > Above data are collected on SiFive X280 (FPGA simulation), across a wide > range of problem sizes. That's certainly more realistic of a system than the QEMU results, but the general consensus has been that FPGA-based development systems don't count as hardware -- not so much because of the FPGA, but because we're looking for production systems. If there's real production systems running on FPGAs that's a different story, but it looks like these are just pre-silicon development systems. > v1: https://sourceware.org/pipermail/libc-alpha/2023-March/145976.html > * add RISC-V vectoriezed mem*/str* functions > > v2: https://sourceware.org/pipermail/libc-alpha/2023-April/147519.html > * include the __memcmpeq function > * set lmul=1 for memcmp for generality > > v3: > * remove "Contributed by" comments > * fix licesnce headers > * avoid using camelcase variables > * avoid using C99 one line comment > > Jerry Shih (2): > riscv: vectorized mem* functions > riscv: vectorized str* functions > > Nick Knight (1): > riscv: vectorized strchr and strnlen functions > > Vincent Chen (1): > riscv: Enabling vectorized mem*/str* functions in build time > > Yun Hsiang (1): > riscv: add vectorized __memcmpeq > > scripts/build-many-glibcs.py | 10 ++++ > sysdeps/riscv/preconfigure | 19 ++++++++ > sysdeps/riscv/preconfigure.ac | 18 +++++++ > sysdeps/riscv/rv32/rvv/Implies | 2 + > sysdeps/riscv/rv64/rvv/Implies | 2 + > sysdeps/riscv/rvv/memchr.S | 62 ++++++++++++++++++++++++ > sysdeps/riscv/rvv/memcmp.S | 70 +++++++++++++++++++++++++++ > sysdeps/riscv/rvv/memcmpeq.S | 67 ++++++++++++++++++++++++++ > sysdeps/riscv/rvv/memcpy.S | 50 +++++++++++++++++++ > sysdeps/riscv/rvv/memmove.S | 71 +++++++++++++++++++++++++++ > sysdeps/riscv/rvv/memset.S | 49 +++++++++++++++++++ > sysdeps/riscv/rvv/strcat.S | 71 +++++++++++++++++++++++++++ > sysdeps/riscv/rvv/strchr.S | 62 ++++++++++++++++++++++++ > sysdeps/riscv/rvv/strcmp.S | 88 ++++++++++++++++++++++++++++++++++ > sysdeps/riscv/rvv/strcpy.S | 55 +++++++++++++++++++++ > sysdeps/riscv/rvv/strlen.S | 53 ++++++++++++++++++++ > sysdeps/riscv/rvv/strncat.S | 82 +++++++++++++++++++++++++++++++ > sysdeps/riscv/rvv/strncmp.S | 84 ++++++++++++++++++++++++++++++++ > sysdeps/riscv/rvv/strncpy.S | 85 ++++++++++++++++++++++++++++++++ > sysdeps/riscv/rvv/strnlen.S | 55 +++++++++++++++++++++ > 20 files changed, 1055 insertions(+) > create mode 100644 sysdeps/riscv/rv32/rvv/Implies > create mode 100644 sysdeps/riscv/rv64/rvv/Implies > create mode 100644 sysdeps/riscv/rvv/memchr.S > create mode 100644 sysdeps/riscv/rvv/memcmp.S > create mode 100644 sysdeps/riscv/rvv/memcmpeq.S > create mode 100644 sysdeps/riscv/rvv/memcpy.S > create mode 100644 sysdeps/riscv/rvv/memmove.S > create mode 100644 sysdeps/riscv/rvv/memset.S > create mode 100644 sysdeps/riscv/rvv/strcat.S > create mode 100644 sysdeps/riscv/rvv/strchr.S > create mode 100644 sysdeps/riscv/rvv/strcmp.S > create mode 100644 sysdeps/riscv/rvv/strcpy.S > create mode 100644 sysdeps/riscv/rvv/strlen.S > create mode 100644 sysdeps/riscv/rvv/strncat.S > create mode 100644 sysdeps/riscv/rvv/strncmp.S > create mode 100644 sysdeps/riscv/rvv/strncpy.S > create mode 100644 sysdeps/riscv/rvv/strnlen.S