From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) by sourceware.org (Postfix) with ESMTPS id 552CA3858C60 for ; Sat, 8 Jul 2023 02:17:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 552CA3858C60 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=fastmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=fastmail.com Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 5D7A65C011C; Fri, 7 Jul 2023 22:17:05 -0400 (EDT) Received: from imap50 ([10.202.2.100]) by compute1.internal (MEProxy); Fri, 07 Jul 2023 22:17:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastmail.com; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm2; t=1688782625; x=1688869025; bh=Kz FSBVdu6RERmkJu9sWG39DbPYDS2K7E6ZwXiWolmv4=; b=GhNnV2qTJkoleIu+aQ 5VhzSu1euMTHucHAIXm7sx2C7iZIEPGpnRQn61bKDF2seHxKbGG1NW1P2MQE8c/5 0SnIDPiZVKzvr1yvlZSdYJWMZQTHf02mffTE3gOJipPn1QzP5s4wuO79XAGItdEu vMFG7RpP1auh29nhCfznvLF0HjdjRzic8Q4ivEH31Nwc2/0kf0dY43jZZ4gTDHBY Lv1HcizoXEY73OyuDTCuSrBUoVatqDWIrxxMuIEGtD1RNftc2rXqlFHf3Ya9PzS8 6iNzo23NaP6KSXX9jUaA+Yg7H3RZ9G0ST2/FfnNlyBj8Bs2bSfxX9DPWwoKYZyux 9UjA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; t=1688782625; x=1688869025; bh=KzFSBVdu6RERm kJu9sWG39DbPYDS2K7E6ZwXiWolmv4=; b=cLAgQA4V2qtJ5W2QcncF+9JAPeCnU 7JpYaYKfkUlyN57YBHrXpmhv9TA65E7cizDORPyLiwos+xbkdPVskEvFLVgoXOGf sv2uAw6q0dqR/KjDLByxmwZIE2e25zJgw0H2IRc2aCQ4zqoCjQNuZKbwXtMHAwG1 rdL5N7Tzn0l+McIyQRTmCgkjUgzV0+TDXtXorjsOhaE7vlFthXWobQghKQ693cH8 u+mZWKf5aibIuQBbP+knv3qvv71qrnC8JQnsQz0CRq4DLu0ijQRT74krAA14DVeF KPSYsxaHdtlR9THskU7uJqib/c74UZFrgm6FlV4SXBSENI+h+IfKt4h6g== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrvddvgdehkecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefofgggkfgjfhffhffvvefutgesthdtredtreertdenucfhrhhomhepfdfuthgv fhgrnhcuqfdktfgvrghrfdcuoehsohhrvggrrhesfhgrshhtmhgrihhlrdgtohhmqeenuc ggtffrrghtthgvrhhnpeegkeehjeeuffefkedtheelffeftdfgveejhfekudejffdvffev vddvheegkedtfeenucffohhmrghinhepghhnuhdrohhrghdpkhgvrhhnvghlrdhorhhgpd hnohgrlhhighhnmhgvnhhtrdhssgenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgr mhepmhgrihhlfhhrohhmpehsohhrvggrrhesfhgrshhtmhgrihhlrdgtohhm X-ME-Proxy: Feedback-ID: i84414492:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 155031700096; Fri, 7 Jul 2023 22:17:05 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.9.0-alpha0-531-gfdfa13a06d-fm-20230703.001-gfdfa13a0 Mime-Version: 1.0 Message-Id: In-Reply-To: <20230706192947.1566767-4-evan@rivosinc.com> References: <20230706192947.1566767-1-evan@rivosinc.com> <20230706192947.1566767-4-evan@rivosinc.com> Date: Fri, 07 Jul 2023 22:16:43 -0400 From: "Stefan O'Rear" To: "Evan Green" , "Stefan O'Rear via Libc-alpha" Cc: "Palmer Dabbelt" , slewis@rivosinc.com, "Vineet Gupta" , "Florian Weimer" Subject: Re: [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy Content-Type: text/plain X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Jul 6, 2023, at 3:29 PM, Evan Green wrote: > For CPU implementations that can perform unaligned accesses with little > or no performance penalty, create a memcpy implementation that does not > bother aligning buffers. It will use a block of integer registers, a > single integer register, and fall back to bytewise copy for the > remainder. > > Signed-off-by: Evan Green > Reviewed-by: Palmer Dabbelt > > --- > > Changes in v4: > - Fixed comment style (Florian) > > Changes in v3: > - Word align dest for large memcpy()s. > - Add tags > - Remove spurious blank line from sysdeps/riscv/memcpy.c > > Changes in v2: > - Used _MASK instead of _FAST value itself. > > > --- > sysdeps/riscv/memcopy.h | 26 ++++ > sysdeps/riscv/memcpy.c | 64 +++++++++ > sysdeps/riscv/memcpy_noalignment.S | 121 ++++++++++++++++++ > sysdeps/unix/sysv/linux/riscv/Makefile | 4 + > .../unix/sysv/linux/riscv/memcpy-generic.c | 24 ++++ > 5 files changed, 239 insertions(+) > create mode 100644 sysdeps/riscv/memcopy.h > create mode 100644 sysdeps/riscv/memcpy.c > create mode 100644 sysdeps/riscv/memcpy_noalignment.S > create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > > diff --git a/sysdeps/riscv/memcopy.h b/sysdeps/riscv/memcopy.h > new file mode 100644 > index 0000000000..2b685c8aa0 > --- /dev/null > +++ b/sysdeps/riscv/memcopy.h > @@ -0,0 +1,26 @@ > +/* memcopy.h -- definitions for memory copy functions. RISC-V version. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > + > +/* Redefine the generic memcpy implementation to __memcpy_generic, so > + the memcpy ifunc can select between generic and special versions. > + In rtld, don't bother with all the ifunciness. */ > +#if IS_IN (libc) > +#define MEMCPY __memcpy_generic > +#endif > diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/riscv/memcpy.c > new file mode 100644 > index 0000000000..fdb8dc3208 > --- /dev/null > +++ b/sysdeps/riscv/memcpy.c > @@ -0,0 +1,64 @@ > +/* Multiple versions of memcpy. > + All versions must be listed in ifunc-impl-list.c. > + Copyright (C) 2017-2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#if IS_IN (libc) > +/* Redefine memcpy so that the compiler won't complain about the type > + mismatch with the IFUNC selector in strong_alias, below. */ > +# undef memcpy > +# define memcpy __redirect_memcpy > +# include > +#include > +#include > + > +#define INIT_ARCH() > + > +extern __typeof (__redirect_memcpy) __libc_memcpy; > + > +extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden; > +extern __typeof (__redirect_memcpy) __memcpy_noalignment > attribute_hidden; > + > +static inline __typeof (__redirect_memcpy) * > +select_memcpy_ifunc (void) > +{ > + INIT_ARCH (); > + > + struct riscv_hwprobe pair; > + > + pair.key = RISCV_HWPROBE_KEY_CPUPERF_0; > + if (__riscv_hwprobe(&pair, 1, 0, NULL, 0) != 0) > + return __memcpy_generic; > + > + if ((pair.key > 0) && > + (pair.value & RISCV_HWPROBE_MISALIGNED_MASK) == > + RISCV_HWPROBE_MISALIGNED_FAST) > + return __memcpy_noalignment; It's unclear whether this is semantically correct as a use of __riscv_hwprobe. [1] describes the result of hwprobe as "what's possible to enable", leaving open the possibility that additional system calls are needed to determine whether unaligned accesses are supported right now in the current process, and [2] adds an (inherited, IIUC) prctl for unaligned access which doesn't affect the return value of hwprobe and would break this code as written. (There is nothing in either the privileged spec or the SBI spec to prohibit an implementation which provides FAST unaligned access from supporting an optional strict alignment checking mode and making it available through fw_feature.) -s [1]: https://lore.kernel.org/linux-riscv/mhng-97928779-5d76-4390-a84c-398fdc6a0a4f@palmer-ri-x1c9/ [2]: https://lore.kernel.org/linux-riscv/20230624122049.7886-6-cleger@rivosinc.com/ > + > + return __memcpy_generic; > +} > + > +libc_ifunc (__libc_memcpy, select_memcpy_ifunc ()); > + > +# undef memcpy > +strong_alias (__libc_memcpy, memcpy); > +# ifdef SHARED > +__hidden_ver1 (memcpy, __GI_memcpy, __redirect_memcpy) > + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memcpy); > +# endif > + > +#endif > diff --git a/sysdeps/riscv/memcpy_noalignment.S > b/sysdeps/riscv/memcpy_noalignment.S > new file mode 100644 > index 0000000000..80f5e09ebb > --- /dev/null > +++ b/sysdeps/riscv/memcpy_noalignment.S > @@ -0,0 +1,121 @@ > +/* memcpy for RISC-V, ignoring buffer alignment > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library. If not, see > + . */ > + > +#include > +#include > + > +/* void *memcpy(void *, const void *, size_t) */ > +ENTRY (__memcpy_noalignment) > + move t6, a0 /* Preserve return value */ > + > + /* Round down to the nearest "page" size */ > + andi a4, a2, ~((16*SZREG)-1) > + beqz a4, 2f > + add a3, a1, a4 > + > + /* Copy the first word to get dest word aligned */ > + andi a5, t6, SZREG-1 > + beqz a5, 1f > + REG_L a6, (a1) > + REG_S a6, (t6) > + > + /* Align dst up to a word, move src and size as well. */ > + addi t6, t6, SZREG-1 > + andi t6, t6, ~(SZREG-1) > + sub a5, t6, a0 > + add a1, a1, a5 > + sub a2, a2, a5 > + > + /* Recompute page count */ > + andi a4, a2, ~((16*SZREG)-1) > + beqz a4, 2f > + > +1: > + /* Copy "pages" (chunks of 16 registers) */ > + REG_L a4, 0(a1) > + REG_L a5, SZREG(a1) > + REG_L a6, 2*SZREG(a1) > + REG_L a7, 3*SZREG(a1) > + REG_L t0, 4*SZREG(a1) > + REG_L t1, 5*SZREG(a1) > + REG_L t2, 6*SZREG(a1) > + REG_L t3, 7*SZREG(a1) > + REG_L t4, 8*SZREG(a1) > + REG_L t5, 9*SZREG(a1) > + REG_S a4, 0(t6) > + REG_S a5, SZREG(t6) > + REG_S a6, 2*SZREG(t6) > + REG_S a7, 3*SZREG(t6) > + REG_S t0, 4*SZREG(t6) > + REG_S t1, 5*SZREG(t6) > + REG_S t2, 6*SZREG(t6) > + REG_S t3, 7*SZREG(t6) > + REG_S t4, 8*SZREG(t6) > + REG_S t5, 9*SZREG(t6) > + REG_L a4, 10*SZREG(a1) > + REG_L a5, 11*SZREG(a1) > + REG_L a6, 12*SZREG(a1) > + REG_L a7, 13*SZREG(a1) > + REG_L t0, 14*SZREG(a1) > + REG_L t1, 15*SZREG(a1) > + addi a1, a1, 16*SZREG > + REG_S a4, 10*SZREG(t6) > + REG_S a5, 11*SZREG(t6) > + REG_S a6, 12*SZREG(t6) > + REG_S a7, 13*SZREG(t6) > + REG_S t0, 14*SZREG(t6) > + REG_S t1, 15*SZREG(t6) > + addi t6, t6, 16*SZREG > + bltu a1, a3, 1b > + andi a2, a2, (16*SZREG)-1 /* Update count */ > + > +2: > + /* Remainder is smaller than a page, compute native word count */ > + beqz a2, 6f > + andi a5, a2, ~(SZREG-1) > + andi a2, a2, (SZREG-1) > + add a3, a1, a5 > + /* Jump directly to byte copy if no words. */ > + beqz a5, 4f > + > +3: > + /* Use single native register copy */ > + REG_L a4, 0(a1) > + addi a1, a1, SZREG > + REG_S a4, 0(t6) > + addi t6, t6, SZREG > + bltu a1, a3, 3b > + > + /* Jump directly out if no more bytes */ > + beqz a2, 6f > + > +4: > + /* Copy the last few individual bytes */ > + add a3, a1, a2 > +5: > + lb a4, 0(a1) > + addi a1, a1, 1 > + sb a4, 0(t6) > + addi t6, t6, 1 > + bltu a1, a3, 5b > +6: > + ret > + > +END (__memcpy_noalignment) > + > +hidden_def (__memcpy_noalignment) > diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile > b/sysdeps/unix/sysv/linux/riscv/Makefile > index 45cc29e40d..aa9ea443d6 100644 > --- a/sysdeps/unix/sysv/linux/riscv/Makefile > +++ b/sysdeps/unix/sysv/linux/riscv/Makefile > @@ -7,6 +7,10 @@ ifeq ($(subdir),stdlib) > gen-as-const-headers += ucontext_i.sym > endif > > +ifeq ($(subdir),string) > +sysdep_routines += memcpy memcpy-generic memcpy_noalignment > +endif > + > abi-variants := ilp32 ilp32d lp64 lp64d > > ifeq (,$(filter $(default-abi),$(abi-variants))) > diff --git a/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > new file mode 100644 > index 0000000000..0abe03f7f5 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > @@ -0,0 +1,24 @@ > +/* Re-include the default memcpy implementation. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > + > +extern __typeof (memcpy) __memcpy_generic; > +hidden_proto(__memcpy_generic) > + > +#include > -- > 2.34.1