From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 602DB3858C2D for ; Mon, 7 Aug 2023 01:04:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 602DB3858C2D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.20.4.187]) by gateway (Coremail) with SMTP id _____8CxyOg0Q9Bk_LsRAA--.4066S3; Mon, 07 Aug 2023 09:04:52 +0800 (CST) Received: from [10.20.4.187] (unknown [10.20.4.187]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Bx7c4yQ9BkTH9MAA--.22655S2; Mon, 07 Aug 2023 09:04:50 +0800 (CST) Subject: Re: [PATCH v3 3/3] Loongarch: Add ifunc support and add different versions of strlen To: dengjianbo , libc-alpha@sourceware.org Cc: adhemerval.zanella@linaro.org, xry111@xry111.site, xuchenghua@loongson.cn, huangpei@loongson.cn References: <20230804095359.2384557-1-dengjianbo@loongson.cn> <20230804095359.2384557-3-dengjianbo@loongson.cn> From: caiyinyu Message-ID: <29fda8ac-9f49-4060-8331-051209afbe10@loongson.cn> Date: Mon, 7 Aug 2023 09:04:50 +0800 User-Agent: Mozilla/5.0 (X11; Linux mips64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20230804095359.2384557-3-dengjianbo@loongson.cn> Content-Type: text/plain; charset=gbk; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-CM-TRANSID:AQAAf8Bx7c4yQ9BkTH9MAA--.22655S2 X-CM-SenderInfo: 5fdl5xhq1xqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj9fXoW3ury5uw15CF4fWw17uw17Arc_yoW8Xw4DWo WftFsrJr4Ikrs7Aw4DC39rZr9Fgr1xGr42yrWUZa95Jr18tryUCrW8CayFgrZxJr98WF4r Ja4jqa9xJFZIkF95l-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUY27kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUGVWUXwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6r4j6r4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI0UMc 02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAF wI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcVAKI48JMxk0xIA0c2IEe2xFo4 CEbIxvr21l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG 67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMI IYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E 14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07j8yCJU UUUU= X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,KAM_SHORT,KAM_STOCKGEN,MIME_CHARSET_FARAWAY,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: ÔÚ 2023/8/4 ÏÂÎç5:53, dengjianbo дµÀ: > strlen-lasx is implemeted by LASX simd instructions(256bit) > strlen-lsx is implemeted by LSX simd instructions(128bit) > strlen-align is implemented by LA basic instructions and never use unaligned memory acess > --- > sysdeps/loongarch/lp64/multiarch/Makefile | 7 ++ > .../lp64/multiarch/ifunc-impl-list.c | 41 +++++++ > .../loongarch/lp64/multiarch/ifunc-strlen.h | 40 +++++++ > .../loongarch/lp64/multiarch/strlen-aligned.S | 100 ++++++++++++++++++ > .../loongarch/lp64/multiarch/strlen-lasx.S | 63 +++++++++++ > sysdeps/loongarch/lp64/multiarch/strlen-lsx.S | 71 +++++++++++++ > sysdeps/loongarch/lp64/multiarch/strlen.c | 37 +++++++ > sysdeps/loongarch/sys/regdef.h | 57 ++++++++++ > .../unix/sysv/linux/loongarch/cpu-features.h | 2 + > 9 files changed, 418 insertions(+) > create mode 100644 sysdeps/loongarch/lp64/multiarch/Makefile > create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c > create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-strlen.h > create mode 100644 sysdeps/loongarch/lp64/multiarch/strlen-aligned.S > create mode 100644 sysdeps/loongarch/lp64/multiarch/strlen-lasx.S > create mode 100644 sysdeps/loongarch/lp64/multiarch/strlen-lsx.S > create mode 100644 sysdeps/loongarch/lp64/multiarch/strlen.c > > diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile > new file mode 100644 > index 0000000000..76c506c966 > --- /dev/null > +++ b/sysdeps/loongarch/lp64/multiarch/Makefile > @@ -0,0 +1,7 @@ > +ifeq ($(subdir),string) > +sysdep_routines += \ > + strlen-aligned \ > + strlen-lsx \ > + strlen-lasx \ > +# sysdep_routines > +endif > diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c > new file mode 100644 > index 0000000000..1a2a576fcd > --- /dev/null > +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c > @@ -0,0 +1,41 @@ > +/* Enumerate available IFUNC implementations of a function LoongArch64 version. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +size_t > +__libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > + size_t max) > +{ > + > + size_t i = max; > + > + IFUNC_IMPL (i, name, strlen, > +#if !defined __loongarch_soft_float > + IFUNC_IMPL_ADD (array, i, strlen, SUPPORT_LASX, __strlen_lasx) > + IFUNC_IMPL_ADD (array, i, strlen, SUPPORT_LSX, __strlen_lsx) > +#endif > + IFUNC_IMPL_ADD (array, i, strlen, 1, __strlen_aligned) > + ) > + return i; > +} > diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-strlen.h b/sysdeps/loongarch/lp64/multiarch/ifunc-strlen.h > new file mode 100644 > index 0000000000..6258bb76c3 > --- /dev/null > +++ b/sysdeps/loongarch/lp64/multiarch/ifunc-strlen.h > @@ -0,0 +1,40 @@ > +/* Common definition for strlen ifunc selections. > + All versions must be listed in ifunc-impl-list.c. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > + > +#if !defined __loongarch_soft_float > +extern __typeof (REDIRECT_NAME) OPTIMIZE (lasx) attribute_hidden; > +extern __typeof (REDIRECT_NAME) OPTIMIZE (lsx) attribute_hidden; > +#endif > +extern __typeof (REDIRECT_NAME) OPTIMIZE (aligned) attribute_hidden; > + > +static inline void * > +IFUNC_SELECTOR (void) > +{ > +#if !defined __loongarch_soft_float > + if (SUPPORT_LASX) > + return OPTIMIZE (lasx); > + else if (SUPPORT_LSX) > + return OPTIMIZE (lsx); > + else > +#endif > + return OPTIMIZE (aligned); > +} > diff --git a/sysdeps/loongarch/lp64/multiarch/strlen-aligned.S b/sysdeps/loongarch/lp64/multiarch/strlen-aligned.S > new file mode 100644 > index 0000000000..e9e1d2fc04 > --- /dev/null > +++ b/sysdeps/loongarch/lp64/multiarch/strlen-aligned.S > @@ -0,0 +1,100 @@ > +/* Optimized strlen implementation using basic Loongarch instructions. > + Copyright (C) 2023 Free Software Foundation, Inc. > + > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library. If not, see > + . */ > + > +#include > +#include > +#include > + > +#if IS_IN (libc) > +# define STRLEN __strlen_aligned > +#else > +# define STRLEN strlen > +#endif > + > +LEAF(STRLEN, 6) > + move a1, a0 > + bstrins.d a0, zero, 2, 0 > + lu12i.w a2, 0x01010 > + li.w t0, -1 > + > + ld.d t2, a0, 0 > + andi t1, a1, 0x7 > + ori a2, a2, 0x101 > + slli.d t1, t1, 3 > + > + bstrins.d a2, a2, 63, 32 > + sll.d t1, t0, t1 > + slli.d t3, a2, 7 > + nor a3, zero, t3 > + > + orn t2, t2, t1 > + sub.d t0, t2, a2 > + nor t1, t2, a3 > + and t0, t0, t1 > + > + > + bnez t0, L(count_pos) > + addi.d a0, a0, 8 > +L(loop_16_7bit): > + ld.d t2, a0, 0 > + sub.d t1, t2, a2 > + > + and t0, t1, t3 > + bnez t0, L(more_check) > + ld.d t2, a0, 8 > + sub.d t1, t2, a2 > + > + and t0, t1, t3 > + addi.d a0, a0, 16 > + beqz t0, L(loop_16_7bit) > + addi.d a0, a0, -8 > + > +L(more_check): > + nor t0, t2, a3 > + and t0, t1, t0 > + bnez t0, L(count_pos) > + addi.d a0, a0, 8 > + > + > +L(loop_16_8bit): > + ld.d t2, a0, 0 > + sub.d t1, t2, a2 > + nor t0, t2, a3 > + and t0, t0, t1 > + > + bnez t0, L(count_pos) > + ld.d t2, a0, 8 > + addi.d a0, a0, 16 > + sub.d t1, t2, a2 > + > + nor t0, t2, a3 > + and t0, t0, t1 > + beqz t0, L(loop_16_8bit) > + addi.d a0, a0, -8 > + > +L(count_pos): > + ctz.d t1, t0 > + sub.d a0, a0, a1 > + srli.d t1, t1, 3 > + add.d a0, a0, t1 > + > + jr ra > +END(STRLEN) > + > +libc_hidden_builtin_def (STRLEN) The lasx/lsx codes should not be compiled when we build soft-float abi libs. > diff --git a/sysdeps/loongarch/lp64/multiarch/strlen-lasx.S b/sysdeps/loongarch/lp64/multiarch/strlen-lasx.S > new file mode 100644 > index 0000000000..61edeeb26d > --- /dev/null > +++ b/sysdeps/loongarch/lp64/multiarch/strlen-lasx.S > @@ -0,0 +1,63 @@ > +/* Optimized strlen implementation using loongarch LASX SIMD instructions. > + Copyright (C) 2023 Free Software Foundation, Inc. > + > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library. If not, see > + . */ > + > +#include > +#include > +#include > + > +#if IS_IN (libc) > + > +# define STRLEN __strlen_lasx > + > +LEAF(STRLEN, 6) > + move a1, a0 > + bstrins.d a0, zero, 4, 0 > + li.d t1, -1 > + xvld xr0, a0, 0 > + > + xvmsknz.b xr0, xr0 > + xvpickve.w xr1, xr0, 4 > + vilvl.h vr0, vr1, vr0 > + movfr2gr.s t0, fa0 # sign extend > + > + sra.w t0, t0, a1 > + beq t0, t1, L(loop) > + cto.w a0, t0 > + jr ra > + > +L(loop): > + xvld xr0, a0, 32 > + addi.d a0, a0, 32 > + xvsetanyeqz.b fcc0, xr0 > + bceqz fcc0, L(loop) > + > + > + xvmsknz.b xr0, xr0 > + sub.d a0, a0, a1 > + xvpickve.w xr1, xr0, 4 > + vilvl.h vr0, vr1, vr0 > + > + movfr2gr.s t0, fa0 > + cto.w t0, t0 > + add.d a0, a0, t0 > + jr ra > +END(STRLEN) > + > +libc_hidden_builtin_def (STRLEN) > +#endif The lasx/lsx codes should not be compiled when we build soft-float abi libs. > diff --git a/sysdeps/loongarch/lp64/multiarch/strlen-lsx.S b/sysdeps/loongarch/lp64/multiarch/strlen-lsx.S > new file mode 100644 > index 0000000000..34f995004c > --- /dev/null > +++ b/sysdeps/loongarch/lp64/multiarch/strlen-lsx.S > @@ -0,0 +1,71 @@ > +/* Optimized strlen implementation using Loongarch LSX SIMD instructions. > + Copyright (C) 2023 Free Software Foundation, Inc. > + > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library. If not, see > + . */ > + > +#include > +#include > +#include > + > +#if IS_IN (libc) > + > +# define STRLEN __strlen_lsx > + > +LEAF(STRLEN, 6) > + move a1, a0 > + bstrins.d a0, zero, 4, 0 > + vld vr0, a0, 0 > + vld vr1, a0, 16 > + > + li.d t1, -1 > + vmsknz.b vr0, vr0 > + vmsknz.b vr1, vr1 > + vilvl.h vr0, vr1, vr0 > + > + movfr2gr.s t0, fa0 > + sra.w t0, t0, a1 > + beq t0, t1, L(loop) > + cto.w a0, t0 > + > + jr ra > + nop > + nop > + nop > + > + > +L(loop): > + vld vr0, a0, 32 > + vld vr1, a0, 48 > + addi.d a0, a0, 32 > + vmin.bu vr2, vr0, vr1 > + > + vsetanyeqz.b fcc0, vr2 > + bceqz fcc0, L(loop) > + vmsknz.b vr0, vr0 > + vmsknz.b vr1, vr1 > + > + vilvl.h vr0, vr1, vr0 > + sub.d a0, a0, a1 > + movfr2gr.s t0, fa0 > + cto.w t0, t0 > + > + add.d a0, a0, t0 > + jr ra > +END(STRLEN) > + > +libc_hidden_builtin_def (STRLEN) > +#endif > diff --git a/sysdeps/loongarch/lp64/multiarch/strlen.c b/sysdeps/loongarch/lp64/multiarch/strlen.c > new file mode 100644 > index 0000000000..381c2daa86 > --- /dev/null > +++ b/sysdeps/loongarch/lp64/multiarch/strlen.c > @@ -0,0 +1,37 @@ > +/* Multiple versions of strlen. > + All versions must be listed in ifunc-impl-list.c. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +/* Define multiple versions only for the definition in libc. */ > + > +#if IS_IN (libc) > +# define strlen __redirect_strlen > +# include > +# undef strlen > + > +# define SYMBOL_NAME strlen > +# include "ifunc-strlen.h" > + > +libc_ifunc_redirected (__redirect_strlen, strlen, IFUNC_SELECTOR ()); > + > +# ifdef SHARED > +__hidden_ver1 (strlen, __GI_strlen, __redirect_strlen) > + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (strlen); > +# endif > + > +#endif > diff --git a/sysdeps/loongarch/sys/regdef.h b/sysdeps/loongarch/sys/regdef.h > index 5100f36d24..524d2e3277 100644 > --- a/sysdeps/loongarch/sys/regdef.h > +++ b/sysdeps/loongarch/sys/regdef.h > @@ -89,6 +89,14 @@ > #define fs5 $f29 > #define fs6 $f30 > #define fs7 $f31 > +#define fcc0 $fcc0 > +#define fcc1 $fcc1 > +#define fcc2 $fcc2 > +#define fcc3 $fcc3 > +#define fcc4 $fcc4 > +#define fcc5 $fcc5 > +#define fcc6 $fcc6 > +#define fcc7 $fcc7 > > #define vr0 $vr0 > #define vr1 $vr1 > @@ -98,6 +106,30 @@ > #define vr5 $vr5 > #define vr6 $vr6 > #define vr7 $vr7 > +#define vr8 $vr8 > +#define vr9 $vr9 > +#define vr10 $vr10 > +#define vr11 $vr11 > +#define vr12 $vr12 > +#define vr13 $vr13 > +#define vr14 $vr14 > +#define vr15 $vr15 > +#define vr16 $vr16 > +#define vr17 $vr17 > +#define vr18 $vr18 > +#define vr19 $vr19 > +#define vr20 $vr20 > +#define vr21 $vr21 > +#define vr22 $vr22 > +#define vr23 $vr23 > +#define vr24 $vr24 > +#define vr25 $vr25 > +#define vr26 $vr26 > +#define vr27 $vr27 > +#define vr28 $vr28 > +#define vr29 $vr29 > +#define vr30 $vr30 > +#define vr31 $vr31 > > #define xr0 $xr0 > #define xr1 $xr1 > @@ -107,5 +139,30 @@ > #define xr5 $xr5 > #define xr6 $xr6 > #define xr7 $xr7 > +#define xr7 $xr7 > +#define xr8 $xr8 > +#define xr9 $xr9 > +#define xr10 $xr10 > +#define xr11 $xr11 > +#define xr12 $xr12 > +#define xr13 $xr13 > +#define xr14 $xr14 > +#define xr15 $xr15 > +#define xr16 $xr16 > +#define xr17 $xr17 > +#define xr18 $xr18 > +#define xr19 $xr19 > +#define xr20 $xr20 > +#define xr21 $xr21 > +#define xr22 $xr22 > +#define xr23 $xr23 > +#define xr24 $xr24 > +#define xr25 $xr25 > +#define xr26 $xr26 > +#define xr27 $xr27 > +#define xr28 $xr28 > +#define xr29 $xr29 > +#define xr30 $xr30 > +#define xr31 $xr31 > > #endif /* _SYS_REGDEF_H */ > diff --git a/sysdeps/unix/sysv/linux/loongarch/cpu-features.h b/sysdeps/unix/sysv/linux/loongarch/cpu-features.h > index e371e13b15..d1a280a5ee 100644 > --- a/sysdeps/unix/sysv/linux/loongarch/cpu-features.h > +++ b/sysdeps/unix/sysv/linux/loongarch/cpu-features.h > @@ -25,5 +25,7 @@ > #define SUPPORT_LSX (GLRO (dl_hwcap) & HWCAP_LOONGARCH_LSX) > #define SUPPORT_LASX (GLRO (dl_hwcap) & HWCAP_LOONGARCH_LASX) > > +#define INIT_ARCH() > + > #endif /* _CPU_FEATURES_LOONGARCH64_H */ >