From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x32.google.com (mail-oa1-x32.google.com [IPv6:2001:4860:4864:20::32]) by sourceware.org (Postfix) with ESMTPS id 6D5723858C83 for ; Fri, 21 Apr 2023 12:12:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6D5723858C83 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x32.google.com with SMTP id 586e51a60fabf-187b70ab997so10642447fac.0 for ; Fri, 21 Apr 2023 05:12:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1682079128; x=1684671128; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=y7SyYXI12oOeXNJCRw9ubGN2bM8CHcKNIzopJzZimNw=; b=mFxisVVmVwFsenQP+bpBfQhKceAY904jdTnf8D7wNtYiV2ZPA3XhH73Z7duta2m5iK YjBMXPJuGIQZAuvzY3+AC5SyoFgat5RV7WyPHNgS0VdHuTBc8U5ftIxnvGSFPLG4TKxT /CugG1GJnGnPqGs5K1H97Ln6INAuZ/Iu0FW58+vxDT5jMPqLhfyK5EgXRy45nwzzMwTO nngBoaW9MDQfLIde+CpMxIurCeaT8LV0KEaoemXKJdOBBV0a16eon3ZkR66SQLPnEdH9 biQbmHG/dtIlXHnpxRs7u1z+6qqhoyWn4lu0cINc+BTawRTD873ElLiWH/YwCmOyPDbA Qlww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682079128; x=1684671128; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=y7SyYXI12oOeXNJCRw9ubGN2bM8CHcKNIzopJzZimNw=; b=goLprDx2ZZiN/EIXKPNOWyhDe35sJNHj5fCB8KpVE/HGl1MFBFVDpvIYuH/1OV/T0A nA7JAJ6AX5bC3ut26V9JjvCVZ/Ltdq3/tdTloLAeaAUtlND/ZT5wsXrzdSoiAcn5Fu89 G6itIWdD0+67wflW71dEwhCmh7mHPYAvjMArXji026Q4uAaN1MaIUbblBE1fgfyQRwG5 Waqlvz5LbdNt929JZjUI+ATF0Jk1miJvBZ8nylcikpjMbomCQnLMtoLqBw4hwhtM4/SL tjT7TXdcdfjsyuj0yefdWczbl9ci2XTOECBvrIwQpw5TTfGf/c7hS0cyOfnmp2pogANq TySg== X-Gm-Message-State: AAQBX9e1dTdCNP1DnSpqGjMSnWcUg+iSy1x87bIYtQ8tY4Psn8jiyWJn 2OmwCmDcXmcB4LAjBG+l5vdJBoQwkTvvBHEGW2DQew== X-Google-Smtp-Source: AKy350a3eJR+ZSRVpymnYXRl1yKNMq53G8tv3XZZIW/6MMTFzbzYQJIsF3MQVhbIX6Fx3buM+mhLGw== X-Received: by 2002:a05:6870:b50d:b0:183:f912:8488 with SMTP id v13-20020a056870b50d00b00183f9128488mr3902551oap.7.1682079127726; Fri, 21 Apr 2023 05:12:07 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c3:333:20b7:b016:1b7f:fd25? ([2804:1b3:a7c3:333:20b7:b016:1b7f:fd25]) by smtp.gmail.com with ESMTPSA id h9-20020a056830164900b006a61bef7968sm1687167otr.53.2023.04.21.05.12.04 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 21 Apr 2023 05:12:07 -0700 (PDT) Message-ID: <61819bfe-c1cb-c898-11e6-795f1b2da0d0@linaro.org> Date: Fri, 21 Apr 2023 09:12:03 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH v2 2/5] riscv: vectorized mem* functions Content-Language: en-US To: Hau Hsu , libc-alpha@sourceware.org, hongrong.hsu@sifive.com, jerry.shih@sifive.com, nick.knight@sifive.com, kito.cheng@sifive.com Cc: greentime.hu@sifive.com, alice.chan@sifive.com, andrew@sifive.com, vincent.chen@sifive.com References: <20230421075405.14892-1-hau.hsu@sifive.com> <20230421075405.14892-3-hau.hsu@sifive.com> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <20230421075405.14892-3-hau.hsu@sifive.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-13.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 21/04/23 04:54, Hau Hsu via Libc-alpha wrote: > From: Jerry Shih > > This patch proposes implementations of memchr, memcmp, memcpy, memmove, > and memset that leverage the RISC-V V extension (RVV), version 1.0. > These routines assumes VLEN is at least 32 bits, as is required by all > currently defined vector extensions, and they support arbitrarily large > VLEN. All implementations work for both RV32 and RV64 platforms, and > make no assumptions about page size. This is not a full review, just some remark skimming through the patch. > --- > sysdeps/riscv/rvv/memchr.S | 63 +++++++++++++++++++++++++++++++ > sysdeps/riscv/rvv/memcmp.S | 75 +++++++++++++++++++++++++++++++++++++ > sysdeps/riscv/rvv/memcpy.S | 51 +++++++++++++++++++++++++ > sysdeps/riscv/rvv/memmove.S | 72 +++++++++++++++++++++++++++++++++++ > sysdeps/riscv/rvv/memset.S | 51 +++++++++++++++++++++++++ > 5 files changed, 312 insertions(+) > create mode 100644 sysdeps/riscv/rvv/memchr.S > create mode 100644 sysdeps/riscv/rvv/memcmp.S > create mode 100644 sysdeps/riscv/rvv/memcpy.S > create mode 100644 sysdeps/riscv/rvv/memmove.S > create mode 100644 sysdeps/riscv/rvv/memset.S > > diff --git a/sysdeps/riscv/rvv/memchr.S b/sysdeps/riscv/rvv/memchr.S > new file mode 100644 > index 0000000000..6981a9f8b0 > --- /dev/null > +++ b/sysdeps/riscv/rvv/memchr.S > @@ -0,0 +1,63 @@ > +/* RVV versions memchr. RISC-V version. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + Contributed by Jerry Shih . We don't use 'Contributed by' anymore. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > + > +#define iResult a0 > + > +#define pSrc a0 > +#define iValue a1 > +#define iNum a2 > + > +#define iVL a3 > +#define iTemp a4 > + > +#define ELEM_LMUL_SETTING m8 > +#define vData v0 > +#define vMask v8 We avoid to use camelcase, even for assembly implementations. > + > +ENTRY(memchr) > + > +L(loop): > + vsetvli zero, iNum, e8, ELEM_LMUL_SETTING, ta, ma > + > + vle8ff.v vData, (pSrc) > + /* Find the iValue inside the loaded data. */ > + vmseq.vx vMask, vData, iValue > + vfirst.m iTemp, vMask > + > + /* Skip the loop if we find the matched value. */ > + bgez iTemp, L(found) > + > + csrr iVL, vl > + sub iNum, iNum, iVL > + add pSrc, pSrc, iVL > + > + bnez iNum, L(loop) > + > + li iResult, 0 > + ret > + > +L(found): > + add iResult, pSrc, iTemp > + ret > + > +END(memchr) > +libc_hidden_builtin_def (memchr) > diff --git a/sysdeps/riscv/rvv/memcmp.S b/sysdeps/riscv/rvv/memcmp.S > new file mode 100644 > index 0000000000..b156ec524c > --- /dev/null > +++ b/sysdeps/riscv/rvv/memcmp.S > @@ -0,0 +1,75 @@ > +/* RVV versions memcmp. RISC-V version. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + Contributed by Jerry Shih . > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > + > +#define iResult a0 > + > +#define pSrc1 a0 > +#define pSrc2 a1 > +#define iNum a2 > + > +#define iVL a3 > +#define iTemp a4 > +#define iTemp1 a5 > +#define iTemp2 a6 > + > +#define ELEM_LMUL_SETTING m8 > +#define vData1 v0 > +#define vData2 v8 > +#define vMask v16 > + > +ENTRY(memcmp) > + > +L(loop): > + vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma > + > + vle8.v vData1, (pSrc1) > + vle8.v vData2, (pSrc2) > + > + vmsne.vv vMask, vData1, vData2 > + sub iNum, iNum, iVL > + vfirst.m iTemp, vMask > + > + /* Skip the loop if we find the different value between pSrc1 and pSrc2. */ > + bgez iTemp, L(found) > + > + add pSrc1, pSrc1, iVL > + add pSrc2, pSrc2, iVL > + > + bnez iNum, L(loop) > + > + li iResult, 0 > + ret > + > +L(found): > + add pSrc1, pSrc1, iTemp > + add pSrc2, pSrc2, iTemp > + lbu iTemp1, 0(pSrc1) > + lbu iTemp2, 0(pSrc2) > + sub iResult, iTemp1, iTemp2 > + ret > + > +END(memcmp) > +libc_hidden_builtin_def (memcmp) > +weak_alias (memcmp,bcmp) > +strong_alias (memcmp, __memcmpeq) > +libc_hidden_def (__memcmpeq) > + > diff --git a/sysdeps/riscv/rvv/memcpy.S b/sysdeps/riscv/rvv/memcpy.S > new file mode 100644 > index 0000000000..de790fbe51 > --- /dev/null > +++ b/sysdeps/riscv/rvv/memcpy.S > @@ -0,0 +1,51 @@ > +/* RVV versions memcpy. RISC-V version. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + Contributed by Jerry Shih . > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > + > +#define pDst a0 > +#define pSrc a1 > +#define iNum a2 > + > +#define iVL a3 > +#define pDstPtr a4 > + > +#define ELEM_LMUL_SETTING m8 > +#define vData v0 > + > +ENTRY(memcpy) > + > + mv pDstPtr, pDst > + > +L(loop): > + vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma > + > + vle8.v vData, (pSrc) > + sub iNum, iNum, iVL > + add pSrc, pSrc, iVL > + vse8.v vData, (pDstPtr) > + add pDstPtr, pDstPtr, iVL > + > + bnez iNum, L(loop) > + > + ret > + > +END(memcpy) > +libc_hidden_builtin_def (memcpy) > diff --git a/sysdeps/riscv/rvv/memmove.S b/sysdeps/riscv/rvv/memmove.S > new file mode 100644 > index 0000000000..ed12744064 > --- /dev/null > +++ b/sysdeps/riscv/rvv/memmove.S > @@ -0,0 +1,72 @@ > +/* RVV versions memmove. RISC-V version. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + Contributed by Jerry Shih . > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > + > +#define pDst a0 > +#define pSrc a1 > +#define iNum a2 > + > +#define iVL a3 > +#define pDstPtr a4 > +#define pSrcBackwardPtr a5 > +#define pDstBackwardPtr a6 > + > +#define ELEM_LMUL_SETTING m8 > +#define vData v0 > + > +ENTRY(memmove) > + > + mv pDstPtr, pDst > + > + /* If pSrc is equal or after pDst, all data in pSrc will be loaded before > + overwrited for the overlapping case. We could use faster `forward-copy`. */ > + bgeu pSrc, pDst, L(forward_copy_loop) > + add pSrcBackwardPtr, pSrc, iNum > + add pDstBackwardPtr, pDst, iNum > + /* If pDst inside source data range, we need to use `backward_copy_loop` to > + handle the overlapping issue. */ > + bltu pDst, pSrcBackwardPtr, L(backward_copy_loop) > + > +L(forward_copy_loop): > + vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma > + > + vle8.v vData, (pSrc) > + sub iNum, iNum, iVL > + add pSrc, pSrc, iVL > + vse8.v vData, (pDstPtr) > + add pDstPtr, pDstPtr, iVL > + > + bnez iNum, L(forward_copy_loop) > + ret > + > +L(backward_copy_loop): > + vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma > + > + sub pSrcBackwardPtr, pSrcBackwardPtr, iVL > + vle8.v vData, (pSrcBackwardPtr) > + sub iNum, iNum, iVL > + sub pDstBackwardPtr, pDstBackwardPtr, iVL > + vse8.v vData, (pDstBackwardPtr) > + bnez iNum, L(backward_copy_loop) > + ret > + > +END(memmove) > +libc_hidden_builtin_def (memmove) > diff --git a/sysdeps/riscv/rvv/memset.S b/sysdeps/riscv/rvv/memset.S > new file mode 100644 > index 0000000000..3a6c3d0afd > --- /dev/null > +++ b/sysdeps/riscv/rvv/memset.S > @@ -0,0 +1,51 @@ > +/* RVV versions memset. RISC-V version. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + Contributed by Jerry Shih . > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > + > +#define pDst a0 > +#define iValue a1 > +#define iNum a2 > + > +#define iVL a3 > +#define iTemp a4 > +#define pDstPtr a5 > + > +#define ELEM_LMUL_SETTING m8 > +#define vData v0 > + > +ENTRY(memset) > + > + mv pDstPtr, pDst > + > + vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma > + vmv.v.x vData, iValue > + > +L(loop): > + vse8.v vData, (pDstPtr) > + sub iNum, iNum, iVL > + add pDstPtr, pDstPtr, iVL > + vsetvli iVL, iNum, e8, ELEM_LMUL_SETTING, ta, ma > + bnez iNum, L(loop) > + > + ret > + > +END(memset) > +libc_hidden_builtin_def (memset)