From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by sourceware.org (Postfix) with ESMTPS id A463A3858C33 for ; Mon, 8 Jan 2024 18:56:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A463A3858C33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A463A3858C33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::436 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704740204; cv=none; b=wfrnQWM8nFa0M/CcaKpWkESaQqYCK1u9xBTZrosW9ezaT4DV/I9LNWhdgYUg9pQGPj2Tv/szPHjYhc1AXJFyoqtJsiaNGuXBQFPd6C0v9EvjzdoT74XJInXgXQ9qnhGXiqUSB6o/I20CiuAblA8FhLrj5lZ6FIsIYvb6P7GswDc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704740204; c=relaxed/simple; bh=p94gyK7i+7euuIKx41eG+8S6SByreSlY3Z10ncsKEW4=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=dVvosq7Gx49faZPkfKhUgzbN0wkiKo710+eELUWJuSDB0vYKj3vDz7ZnQW7WlBvwPj56HentWvMR2ceQ1kLcquCmutexd5BisMZojJe2ZUN+9qagR+mXHL5J4djdkMnz1XgHbwbSvaLOQ5i3N2nKV0Oaltl7CCobm3KJvGWO+ys= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-6daf9d5f111so930615b3a.0 for ; Mon, 08 Jan 2024 10:56:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1704740199; x=1705344999; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=YyZtOTcpipNEAfs4jesG1Fz8j1oCfjYSLyDeMs+L4n8=; b=lwaSnJQzPFvtjwpsS+hLkvNDTbnU/lraWPBn/C+8tD0eZaTxCjFucmcl4AMY1bQhWb xv7uBAbyQopyU8mIwqV4WgNRjlv2z4VwsLKnzda6LgGB3B7duuEbbXLFrPteB8L3AkgF bmc5lGXsflGbK/SEr5f9Jfl6lBNAmc5eiZG/aCjf8VtoEwOZUxxF15dnLmzBGI12PAL8 aPKOADuOMe87ciFIFK6FbfSdGu74Iomu8tZzBypifkIQS0Dea1SR6YeucRo8hplgZQwz uY3ZdkVPNI5+xIVizGseJHNvq/gQq2BExQQkq2zHwTYVPqPmgE34puK/1BmPa94k0llh TONQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704740199; x=1705344999; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=YyZtOTcpipNEAfs4jesG1Fz8j1oCfjYSLyDeMs+L4n8=; b=pzrp6+tk1slPY77bL2V4y368bidC7WJT2ZomezJhke8nBdDvYZPYmqmJsJsxI1QGe1 C3oYfJ5thOiVtWbMIsPKNUb9uMjHyQa/STcU1mMYgud9NBR+GW3WdslHTeX/8ZABIbFl w9gibt7HX3549QWCOMT7e0rC6JJ2OPeY4qz9vbBSxDuovTKGo67Rjo4QUYvmTCszLpXx fEideryDRh/jYZRDYN4EOcbbfLMY7OQYlTsSGeGETds6kIeFEACewRUQT2sR2qs1hE0e CXRhlPIhCi8k7d2mdqHOfFb0vTO7Tf8mAdhHGB8miCH+QHbJywEfKp6Onhe+RMSvUSPb SCpg== X-Gm-Message-State: AOJu0Yy2jVH9q14Vj5B3lQLXiefIVpLPT+q7A26hwNTsSeYpVDu+AObe OrkLYTaKIPhVhexqAm1i9zWQc676N2eNvQ== X-Google-Smtp-Source: AGHT+IEmBg0s3p4N4qLto3Qd9glVWCeGqb+ad9+oFvnxpeQQVQ3GHAQwM1pATG/vLqzrR/ZIs9UIrA== X-Received: by 2002:a05:6a00:1e16:b0:6d9:ba7f:8b02 with SMTP id gx22-20020a056a001e1600b006d9ba7f8b02mr4439838pfb.32.1704740199566; Mon, 08 Jan 2024 10:56:39 -0800 (PST) Received: from ?IPV6:2804:1b3:a7c1:9dd2:aced:98c2:3027:39ba? ([2804:1b3:a7c1:9dd2:aced:98c2:3027:39ba]) by smtp.gmail.com with ESMTPSA id fe17-20020a056a002f1100b006daa9c80ab0sm203379pfb.147.2024.01.08.10.56.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 08 Jan 2024 10:56:39 -0800 (PST) Message-ID: <5264520a-dfc9-4b92-85d9-f244179d5d87@linaro.org> Date: Mon, 8 Jan 2024 15:56:35 -0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v10 7/7] riscv: Add and use alignment-ignorant memcpy Content-Language: en-US To: Evan Green , libc-alpha@sourceware.org Cc: vineetg@rivosinc.com, slewis@rivosinc.com, palmer@rivosinc.com, Florian Weimer References: <20231213211142.1543025-1-evan@rivosinc.com> <20231213211142.1543025-8-evan@rivosinc.com> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <20231213211142.1543025-8-evan@rivosinc.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 13/12/23 18:11, Evan Green wrote: > For CPU implementations that can perform unaligned accesses with little > or no performance penalty, create a memcpy implementation that does not > bother aligning buffers. It will use a block of integer registers, a > single integer register, and fall back to bytewise copy for the > remainder. > > Signed-off-by: Evan Green > Reviewed-by: Palmer Dabbelt > > --- > > Changes in v10: > - One line per function in Makefile for memcpy (Adhemerval) > - Space before argument-like things (Adhemerval) > > Changes in v7: > - Use new helper function in memcpy ifunc selector (Richard) > > Changes in v6: > - Fix a couple regressions in the assembly from v5 :/ > - Use passed hwprobe pointer in memcpy ifunc selector. > > Changes in v5: > - Do unaligned word access for final trailing bytes (Richard) > > Changes in v4: > - Fixed comment style (Florian) > > Changes in v3: > - Word align dest for large memcpy()s. > - Add tags > - Remove spurious blank line from sysdeps/riscv/memcpy.c > > Changes in v2: > - Used _MASK instead of _FAST value itself. > > > --- > sysdeps/riscv/memcopy.h | 26 ++++ > sysdeps/riscv/memcpy.c | 63 ++++++++ > sysdeps/riscv/memcpy_noalignment.S | 138 ++++++++++++++++++ > sysdeps/unix/sysv/linux/riscv/Makefile | 9 ++ > .../unix/sysv/linux/riscv/memcpy-generic.c | 24 +++ > 5 files changed, 260 insertions(+) > create mode 100644 sysdeps/riscv/memcopy.h > create mode 100644 sysdeps/riscv/memcpy.c > create mode 100644 sysdeps/riscv/memcpy_noalignment.S > create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > > diff --git a/sysdeps/riscv/memcopy.h b/sysdeps/riscv/memcopy.h > new file mode 100644 > index 0000000000..2b685c8aa0 > --- /dev/null > +++ b/sysdeps/riscv/memcopy.h > @@ -0,0 +1,26 @@ > +/* memcopy.h -- definitions for memory copy functions. RISC-V version. > + Copyright (C) 2023 Free Software Foundation, Inc. The Copyright year need to be 2024 here and on all other new files. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > + > +/* Redefine the generic memcpy implementation to __memcpy_generic, so > + the memcpy ifunc can select between generic and special versions. > + In rtld, don't bother with all the ifunciness. */ > +#if IS_IN (libc) > +#define MEMCPY __memcpy_generic > +#endif > diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/riscv/memcpy.c > new file mode 100644 > index 0000000000..df6b7db1f4 > --- /dev/null > +++ b/sysdeps/riscv/memcpy.c > @@ -0,0 +1,63 @@ > +/* Multiple versions of memcpy. > + All versions must be listed in ifunc-impl-list.c. > + Copyright (C) 2017-2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#if IS_IN (libc) > +/* Redefine memcpy so that the compiler won't complain about the type > + mismatch with the IFUNC selector in strong_alias, below. */ > +# undef memcpy > +# define memcpy __redirect_memcpy > +# include > +# include > +# include > +# include > +# include > + > +# define INIT_ARCH() > + > +extern __typeof (__redirect_memcpy) __libc_memcpy; > + > +extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden; > +extern __typeof (__redirect_memcpy) __memcpy_noalignment attribute_hidden; > + > +static inline __typeof (__redirect_memcpy) * > +select_memcpy_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_func) > +{ > + unsigned long long int value; > + > + INIT_ARCH (); > + > + if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_CPUPERF_0, &value)) No implicit checks for function that return 'int'. > + return __memcpy_generic; > + > + if ((value & RISCV_HWPROBE_MISALIGNED_MASK) == RISCV_HWPROBE_MISALIGNED_FAST) > + return __memcpy_noalignment; > + > + return __memcpy_generic; > +} > + > +riscv_libc_ifunc (__libc_memcpy, select_memcpy_ifunc); > + > +# undef memcpy > +strong_alias (__libc_memcpy, memcpy); > +# ifdef SHARED > +__hidden_ver1 (memcpy, __GI_memcpy, __redirect_memcpy) > + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memcpy); > +# endif > + > +#endif > diff --git a/sysdeps/riscv/memcpy_noalignment.S b/sysdeps/riscv/memcpy_noalignment.S > new file mode 100644 > index 0000000000..f3bf8e5867 > --- /dev/null > +++ b/sysdeps/riscv/memcpy_noalignment.S > @@ -0,0 +1,138 @@ > +/* memcpy for RISC-V, ignoring buffer alignment > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library. If not, see > + . */ > + > +#include > +#include > + > +/* void *memcpy(void *, const void *, size_t) */ > +ENTRY (__memcpy_noalignment) > + move t6, a0 /* Preserve return value */ > + > + /* Bail if 0 */ > + beqz a2, 7f > + > + /* Jump to byte copy if size < SZREG */ > + li a4, SZREG > + bltu a2, a4, 5f > + > + /* Round down to the nearest "page" size */ > + andi a4, a2, ~((16*SZREG)-1) > + beqz a4, 2f > + add a3, a1, a4 > + > + /* Copy the first word to get dest word aligned */ > + andi a5, t6, SZREG-1 > + beqz a5, 1f > + REG_L a6, (a1) > + REG_S a6, (t6) > + > + /* Align dst up to a word, move src and size as well. */ > + addi t6, t6, SZREG-1 > + andi t6, t6, ~(SZREG-1) > + sub a5, t6, a0 > + add a1, a1, a5 > + sub a2, a2, a5 > + > + /* Recompute page count */ > + andi a4, a2, ~((16*SZREG)-1) > + beqz a4, 2f > + > +1: > + /* Copy "pages" (chunks of 16 registers) */ > + REG_L a4, 0(a1) > + REG_L a5, SZREG(a1) > + REG_L a6, 2*SZREG(a1) > + REG_L a7, 3*SZREG(a1) > + REG_L t0, 4*SZREG(a1) > + REG_L t1, 5*SZREG(a1) > + REG_L t2, 6*SZREG(a1) > + REG_L t3, 7*SZREG(a1) > + REG_L t4, 8*SZREG(a1) > + REG_L t5, 9*SZREG(a1) > + REG_S a4, 0(t6) > + REG_S a5, SZREG(t6) > + REG_S a6, 2*SZREG(t6) > + REG_S a7, 3*SZREG(t6) > + REG_S t0, 4*SZREG(t6) > + REG_S t1, 5*SZREG(t6) > + REG_S t2, 6*SZREG(t6) > + REG_S t3, 7*SZREG(t6) > + REG_S t4, 8*SZREG(t6) > + REG_S t5, 9*SZREG(t6) > + REG_L a4, 10*SZREG(a1) > + REG_L a5, 11*SZREG(a1) > + REG_L a6, 12*SZREG(a1) > + REG_L a7, 13*SZREG(a1) > + REG_L t0, 14*SZREG(a1) > + REG_L t1, 15*SZREG(a1) > + addi a1, a1, 16*SZREG > + REG_S a4, 10*SZREG(t6) > + REG_S a5, 11*SZREG(t6) > + REG_S a6, 12*SZREG(t6) > + REG_S a7, 13*SZREG(t6) > + REG_S t0, 14*SZREG(t6) > + REG_S t1, 15*SZREG(t6) > + addi t6, t6, 16*SZREG > + bltu a1, a3, 1b > + andi a2, a2, (16*SZREG)-1 /* Update count */ > + > +2: > + /* Remainder is smaller than a page, compute native word count */ > + beqz a2, 7f > + andi a5, a2, ~(SZREG-1) > + andi a2, a2, (SZREG-1) > + add a3, a1, a5 > + /* Jump directly to last word if no words. */ > + beqz a5, 4f > + > +3: > + /* Use single native register copy */ > + REG_L a4, 0(a1) > + addi a1, a1, SZREG > + REG_S a4, 0(t6) > + addi t6, t6, SZREG > + bltu a1, a3, 3b > + > + /* Jump directly out if no more bytes */ > + beqz a2, 7f > + > +4: > + /* Copy the last word unaligned */ > + add a3, a1, a2 > + add a4, t6, a2 > + REG_L a5, -SZREG(a3) > + REG_S a5, -SZREG(a4) > + ret > + > +5: > + /* Copy bytes when the total copy is + add a3, a1, a2 > + > +6: > + lb a4, 0(a1) > + addi a1, a1, 1 > + sb a4, 0(t6) > + addi t6, t6, 1 > + bltu a1, a3, 6b > + > +7: > + ret > + > +END (__memcpy_noalignment) > + > +hidden_def (__memcpy_noalignment) No need to define a hidden alias here. > diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile > index 04abf226ad..398ff7418b 100644 > --- a/sysdeps/unix/sysv/linux/riscv/Makefile > +++ b/sysdeps/unix/sysv/linux/riscv/Makefile > @@ -15,6 +15,15 @@ ifeq ($(subdir),stdlib) > gen-as-const-headers += ucontext_i.sym > endif > > +ifeq ($(subdir),string) > +sysdep_routines += \ > + memcpy \ > + memcpy-generic \ > + memcpy_noalignment \ > + # sysdep_routines > + > +endif > + > abi-variants := ilp32 ilp32d lp64 lp64d > > ifeq (,$(filter $(default-abi),$(abi-variants))) > diff --git a/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > new file mode 100644 > index 0000000000..ec8ad93df2 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c > @@ -0,0 +1,24 @@ > +/* Re-include the default memcpy implementation. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > + > +extern __typeof (memcpy) __memcpy_generic; > +hidden_proto (__memcpy_generic) > + > +#include