From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2b.google.com (mail-oa1-x2b.google.com [IPv6:2001:4860:4864:20::2b]) by sourceware.org (Postfix) with ESMTPS id 623863858D28 for ; Wed, 6 Jul 2022 18:49:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 623863858D28 Received: by mail-oa1-x2b.google.com with SMTP id 586e51a60fabf-10c0119dd16so11513648fac.6 for ; Wed, 06 Jul 2022 11:49:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VnkjUhueUCnVNrbmLLqDHfcfmviMYm9Fjs4MCjsgMXU=; b=NHtR51mAggahrihe3i9E3hbWn5SHlmz5zNXjxbJFsFTBimL3tqWWHdD2VDXH7TU30j TD606yMbTCaJwLbOMRVKx9Y/UZaHOF9ExrfTDPS7r+5hMGi3X/pJWNIOqiLqwJMN47+E xVVtpLS3tBebG1oSu7dflpqt6rfVOwZsOOa1siWuWrq6y33f8wHw69dbp+HH0pTEwiVz kVX8UsxeKir2qC7mOHKY7UP+0g3Tmo/x2/6QcamJZ7rbTk3Ewp1bLsfodHdzierwwjkm 2YtYYe0fOBOgKgrG7AmU0YrJO73wwpNlvmHtHf6OSzYPoHu0IutJzPyMM4p80I8zJ8lg Et0Q== X-Gm-Message-State: AJIora/L4vr/baJ8rR4V1G3z7YzE556au4rdCan5oId/71VOd1rv4D1d 0vp8SwIJ0029nKPFIBqpBhhpcvuieP9k+xGmZ0UDBvUEHjo= X-Google-Smtp-Source: AGRyM1uubH+QRTqQBGlyRN1+m5T7HtFo3dKjN88uYtxbEVHskshOoY9edXg0S9MRKN/7GvBriy+uHUNUNj0M/Wz2QL8= X-Received: by 2002:a05:6870:ea01:b0:f1:e631:ed6f with SMTP id g1-20020a056870ea0100b000f1e631ed6fmr17287oap.89.1657133378624; Wed, 06 Jul 2022 11:49:38 -0700 (PDT) MIME-Version: 1.0 References: <20220706000641.3657347-1-goldstein.w.n@gmail.com> In-Reply-To: From: Noah Goldstein Date: Wed, 6 Jul 2022 11:49:27 -0700 Message-ID: Subject: Re: [PATCH v1] x86: Remove generic strncat, strncpy, and stpncpy implementations To: Mayshao-oc Cc: "H.J. Lu" , GNU C Library , "Carlos O'Donell" , "Louis Qi(BJ-RD)" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, KAM_STOCKGEN, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2022 18:49:42 -0000 On Wed, Jul 6, 2022 at 12:43 AM Mayshao-oc wrote: > > On Wed, Jul 6, 2022 at 10:14 AM Noah Goldstein wrote: > > > > On Tue, Jul 5, 2022 at 7:05 PM Noah Goldstein wrote: > > > > > > On Tue, Jul 5, 2022 at 6:33 PM H.J. Lu wrote: > > > > > > > > On Tue, Jul 5, 2022 at 5:06 PM Noah Goldstein wrote: > > > > > > > > > > These functions all have optimized versions: > > > > > __strncat_sse2_unaligned, __strncpy_sse2_unaligned, and > > > > > stpncpy_sse2_unaligned which are faster than their respective generic > > > > > implementations. Since the sse2 versions can run on baseline x86_64, > > > > > > > > Is this true on all x86-64 processors? > > > > > > That they are faster or that sse2 can run on baseline x86_64? > > Both cases are true on Zhaoxin processors. > > > > > Assuming you mean faster: > > > > maybe not for large strings on a processor that heavily prefers aligned > > access. > > > > The generics are implemented as: > > > > strncat = strnlen + strlen + memcpy > > strncpy = strnlen + memset + memcpy > > stpncpy = strnlen + memset + memcpy > > > > For very large strings on processors with SSSE3 > > all of those functions can be implemented with only aligned > > access in the loops. > > > > Not sure if it would work in favor of the generic implementations though as > > they are essentially doing 2x + the number of memory accesses. > > > > mayshao do you have benchmark numbers on these functions? > > The results are attached. If there is any problem with the results, please > feel free to let me know. Thank you, Any chance you have performance numbers for: strcpy_sse2 vs strcpy_sse2_unaligned stpcpy_sse2 vs stpcpy_sse2_unaligned strcat_sse2 vs strcat_sse2_unaligned ? > > > Best Regards, > May Shao > > > > > > > > > > > > > > > > > > we should use these as the baseline implementation and can remove the > > > > > generic implementations. > > > > > > > > > > Geometric mean of N=20 runs of the entire benchmark suite on: > > > > > 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz (Tigerlake) > > > > > > > > > > __strncat_sse2_unaligned / __strncat_generic: .944 > > > > > __strncpy_sse2_unaligned / __strncpy_generic: .726 > > > > > __stpncpy_sse2_unaligned / __stpncpy_generic: .650 > > > > > > > > > > Tested build with and without multiarch and full check with multiarch. > > > > > --- > > > > > sysdeps/x86_64/multiarch/Makefile | 3 -- > > > > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 9 ++-- > > > > > sysdeps/x86_64/multiarch/ifunc-strcpy.h | 8 +--- > > > > > sysdeps/x86_64/multiarch/ifunc-strncpy.h | 48 ++++++++++++++++++++++ > > > > > sysdeps/x86_64/multiarch/stpncpy-generic.c | 26 ------------ > > > > > sysdeps/x86_64/multiarch/stpncpy.c | 3 +- > > > > > sysdeps/x86_64/multiarch/strncat-generic.c | 21 ---------- > > > > > sysdeps/x86_64/multiarch/strncat.c | 3 +- > > > > > sysdeps/x86_64/multiarch/strncpy-generic.c | 24 ----------- > > > > > sysdeps/x86_64/multiarch/strncpy.c | 3 +- > > > > > 10 files changed, 56 insertions(+), 92 deletions(-) > > > > > create mode 100644 sysdeps/x86_64/multiarch/ifunc-strncpy.h > > > > > delete mode 100644 sysdeps/x86_64/multiarch/stpncpy-generic.c > > > > > delete mode 100644 sysdeps/x86_64/multiarch/strncat-generic.c > > > > > delete mode 100644 sysdeps/x86_64/multiarch/strncpy-generic.c > > > > > > > > > > diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile > > > > > index 18cea04423..04e238efa0 100644 > > > > > --- a/sysdeps/x86_64/multiarch/Makefile > > > > > +++ b/sysdeps/x86_64/multiarch/Makefile > > > > > @@ -45,7 +45,6 @@ sysdep_routines += \ > > > > > stpcpy-sse2-unaligned \ > > > > > stpncpy-avx2 \ > > > > > stpncpy-avx2-rtm \ > > > > > - stpncpy-generic \ > > > > > stpncpy-evex \ > > > > > stpncpy-sse2-unaligned \ > > > > > strcasecmp_l-avx2 \ > > > > > @@ -92,7 +91,6 @@ sysdep_routines += \ > > > > > strncase_l-sse4_2 \ > > > > > strncat-avx2 \ > > > > > strncat-avx2-rtm \ > > > > > - strncat-generic \ > > > > > strncat-evex \ > > > > > strncat-sse2-unaligned \ > > > > > strncmp-avx2 \ > > > > > @@ -102,7 +100,6 @@ sysdep_routines += \ > > > > > strncmp-sse4_2 \ > > > > > strncpy-avx2 \ > > > > > strncpy-avx2-rtm \ > > > > > - strncpy-generic \ > > > > > strncpy-evex \ > > > > > strncpy-sse2-unaligned \ > > > > > strnlen-avx2 \ > > > > > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > > > > index adf7d4bafd..2c96cb62d2 100644 > > > > > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > > > > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > > > > @@ -403,8 +403,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > > > > && CPU_FEATURE_USABLE (AVX512BW)), > > > > > __stpncpy_evex) > > > > > IFUNC_IMPL_ADD (array, i, stpncpy, 1, > > > > > - __stpncpy_sse2_unaligned) > > > > > - IFUNC_IMPL_ADD (array, i, stpncpy, 1, __stpncpy_generic)) > > > > > + __stpncpy_sse2_unaligned)) > > > > > > > > > > /* Support sysdeps/x86_64/multiarch/stpcpy.c. */ > > > > > IFUNC_IMPL (i, name, stpcpy, > > > > > @@ -618,8 +617,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > > > > && CPU_FEATURE_USABLE (AVX512BW)), > > > > > __strncat_evex) > > > > > IFUNC_IMPL_ADD (array, i, strncat, 1, > > > > > - __strncat_sse2_unaligned) > > > > > - IFUNC_IMPL_ADD (array, i, strncat, 1, __strncat_generic)) > > > > > + __strncat_sse2_unaligned)) > > > > > > > > > > /* Support sysdeps/x86_64/multiarch/strncpy.c. */ > > > > > IFUNC_IMPL (i, name, strncpy, > > > > > @@ -634,8 +632,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > > > > && CPU_FEATURE_USABLE (AVX512BW)), > > > > > __strncpy_evex) > > > > > IFUNC_IMPL_ADD (array, i, strncpy, 1, > > > > > - __strncpy_sse2_unaligned) > > > > > - IFUNC_IMPL_ADD (array, i, strncpy, 1, __strncpy_generic)) > > > > > + __strncpy_sse2_unaligned)) > > > > > > > > > > /* Support sysdeps/x86_64/multiarch/strpbrk.c. */ > > > > > IFUNC_IMPL (i, name, strpbrk, > > > > > diff --git a/sysdeps/x86_64/multiarch/ifunc-strcpy.h b/sysdeps/x86_64/multiarch/ifunc-strcpy.h > > > > > index 80529458d1..a15afa44e9 100644 > > > > > --- a/sysdeps/x86_64/multiarch/ifunc-strcpy.h > > > > > +++ b/sysdeps/x86_64/multiarch/ifunc-strcpy.h > > > > > @@ -20,11 +20,7 @@ > > > > > > > > > > #include > > > > > > > > > > -#ifndef GENERIC > > > > > -# define GENERIC sse2 > > > > > -#endif > > > > > - > > > > > -extern __typeof (REDIRECT_NAME) OPTIMIZE (GENERIC) attribute_hidden; > > > > > +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden; > > > > > extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2_unaligned) > > > > > attribute_hidden; > > > > > extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden; > > > > > @@ -53,5 +49,5 @@ IFUNC_SELECTOR (void) > > > > > if (CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Load)) > > > > > return OPTIMIZE (sse2_unaligned); > > > > > > > > > > - return OPTIMIZE (GENERIC); > > > > > + return OPTIMIZE (sse2); > > > > > } > > > > > diff --git a/sysdeps/x86_64/multiarch/ifunc-strncpy.h b/sysdeps/x86_64/multiarch/ifunc-strncpy.h > > > > > new file mode 100644 > > > > > index 0000000000..323225af4d > > > > > --- /dev/null > > > > > +++ b/sysdeps/x86_64/multiarch/ifunc-strncpy.h > > > > > @@ -0,0 +1,48 @@ > > > > > +/* Common definition for ifunc st{r|p}n{cpy|cat} > > > > > + All versions must be listed in ifunc-impl-list.c. > > > > > + Copyright (C) 2022 Free Software Foundation, Inc. > > > > > + This file is part of the GNU C Library. > > > > > + > > > > > + The GNU C Library is free software; you can redistribute it and/or > > > > > + modify it under the terms of the GNU Lesser General Public > > > > > + License as published by the Free Software Foundation; either > > > > > + version 2.1 of the License, or (at your option) any later version. > > > > > + > > > > > + The GNU C Library is distributed in the hope that it will be useful, > > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > > > + Lesser General Public License for more details. > > > > > + > > > > > + You should have received a copy of the GNU Lesser General Public > > > > > + License along with the GNU C Library; if not, see > > > > > + . */ > > > > > + > > > > > +#include > > > > > + > > > > > +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2_unaligned) > > > > > + attribute_hidden; > > > > > +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden; > > > > > +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden; > > > > > +extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden; > > > > > + > > > > > +static inline void * > > > > > +IFUNC_SELECTOR (void) > > > > > +{ > > > > > + const struct cpu_features* cpu_features = __get_cpu_features (); > > > > > + > > > > > + if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) > > > > > + && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load)) > > > > > + { > > > > > + if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) > > > > > + && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) > > > > > + return OPTIMIZE (evex); > > > > > + > > > > > + if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) > > > > > + return OPTIMIZE (avx2_rtm); > > > > > + > > > > > + if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER)) > > > > > + return OPTIMIZE (avx2); > > > > > + } > > > > > + > > > > > + return OPTIMIZE (sse2_unaligned); > > > > > +} > > > > > diff --git a/sysdeps/x86_64/multiarch/stpncpy-generic.c b/sysdeps/x86_64/multiarch/stpncpy-generic.c > > > > > deleted file mode 100644 > > > > > index 87826845b0..0000000000 > > > > > --- a/sysdeps/x86_64/multiarch/stpncpy-generic.c > > > > > +++ /dev/null > > > > > @@ -1,26 +0,0 @@ > > > > > -/* stpncpy. > > > > > - Copyright (C) 2022 Free Software Foundation, Inc. > > > > > - This file is part of the GNU C Library. > > > > > - > > > > > - The GNU C Library is free software; you can redistribute it and/or > > > > > - modify it under the terms of the GNU Lesser General Public > > > > > - License as published by the Free Software Foundation; either > > > > > - version 2.1 of the License, or (at your option) any later version. > > > > > - > > > > > - The GNU C Library is distributed in the hope that it will be useful, > > > > > - but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > > > - Lesser General Public License for more details. > > > > > - > > > > > - You should have received a copy of the GNU Lesser General Public > > > > > - License along with the GNU C Library; if not, see > > > > > - . */ > > > > > - > > > > > - > > > > > -#define STPNCPY __stpncpy_generic > > > > > -#undef weak_alias > > > > > -#define weak_alias(ignored1, ignored2) > > > > > -#undef libc_hidden_def > > > > > -#define libc_hidden_def(stpncpy) > > > > > - > > > > > -#include > > > > > diff --git a/sysdeps/x86_64/multiarch/stpncpy.c b/sysdeps/x86_64/multiarch/stpncpy.c > > > > > index 879bc83f0b..a8d083ff0d 100644 > > > > > --- a/sysdeps/x86_64/multiarch/stpncpy.c > > > > > +++ b/sysdeps/x86_64/multiarch/stpncpy.c > > > > > @@ -25,9 +25,8 @@ > > > > > # undef stpncpy > > > > > # undef __stpncpy > > > > > > > > > > -# define GENERIC generic > > > > > # define SYMBOL_NAME stpncpy > > > > > -# include "ifunc-strcpy.h" > > > > > +# include "ifunc-strncpy.h" > > > > > > > > > > libc_ifunc_redirected (__redirect_stpncpy, __stpncpy, IFUNC_SELECTOR ()); > > > > > > > > > > diff --git a/sysdeps/x86_64/multiarch/strncat-generic.c b/sysdeps/x86_64/multiarch/strncat-generic.c > > > > > deleted file mode 100644 > > > > > index 0090669cd1..0000000000 > > > > > --- a/sysdeps/x86_64/multiarch/strncat-generic.c > > > > > +++ /dev/null > > > > > @@ -1,21 +0,0 @@ > > > > > -/* strncat. > > > > > - Copyright (C) 2022 Free Software Foundation, Inc. > > > > > - This file is part of the GNU C Library. > > > > > - > > > > > - The GNU C Library is free software; you can redistribute it and/or > > > > > - modify it under the terms of the GNU Lesser General Public > > > > > - License as published by the Free Software Foundation; either > > > > > - version 2.1 of the License, or (at your option) any later version. > > > > > - > > > > > - The GNU C Library is distributed in the hope that it will be useful, > > > > > - but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > > > - Lesser General Public License for more details. > > > > > - > > > > > - You should have received a copy of the GNU Lesser General Public > > > > > - License along with the GNU C Library; if not, see > > > > > - . */ > > > > > - > > > > > - > > > > > -#define STRNCAT __strncat_generic > > > > > -#include > > > > > diff --git a/sysdeps/x86_64/multiarch/strncat.c b/sysdeps/x86_64/multiarch/strncat.c > > > > > index 50fba8a41f..a590c25d51 100644 > > > > > --- a/sysdeps/x86_64/multiarch/strncat.c > > > > > +++ b/sysdeps/x86_64/multiarch/strncat.c > > > > > @@ -24,8 +24,7 @@ > > > > > # undef strncat > > > > > > > > > > # define SYMBOL_NAME strncat > > > > > -# define GENERIC generic > > > > > -# include "ifunc-strcpy.h" > > > > > +# include "ifunc-strncpy.h" > > > > > > > > > > libc_ifunc_redirected (__redirect_strncat, strncat, IFUNC_SELECTOR ()); > > > > > strong_alias (strncat, __strncat); > > > > > diff --git a/sysdeps/x86_64/multiarch/strncpy-generic.c b/sysdeps/x86_64/multiarch/strncpy-generic.c > > > > > deleted file mode 100644 > > > > > index 9916153dd5..0000000000 > > > > > --- a/sysdeps/x86_64/multiarch/strncpy-generic.c > > > > > +++ /dev/null > > > > > @@ -1,24 +0,0 @@ > > > > > -/* strncpy. > > > > > - Copyright (C) 2022 Free Software Foundation, Inc. > > > > > - This file is part of the GNU C Library. > > > > > - > > > > > - The GNU C Library is free software; you can redistribute it and/or > > > > > - modify it under the terms of the GNU Lesser General Public > > > > > - License as published by the Free Software Foundation; either > > > > > - version 2.1 of the License, or (at your option) any later version. > > > > > - > > > > > - The GNU C Library is distributed in the hope that it will be useful, > > > > > - but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > > > - Lesser General Public License for more details. > > > > > - > > > > > - You should have received a copy of the GNU Lesser General Public > > > > > - License along with the GNU C Library; if not, see > > > > > - . */ > > > > > - > > > > > - > > > > > -#define STRNCPY __strncpy_generic > > > > > -#undef libc_hidden_builtin_def > > > > > -#define libc_hidden_builtin_def(strncpy) > > > > > - > > > > > -#include > > > > > diff --git a/sysdeps/x86_64/multiarch/strncpy.c b/sysdeps/x86_64/multiarch/strncpy.c > > > > > index 7fc7d72ec5..c83440f0e3 100644 > > > > > --- a/sysdeps/x86_64/multiarch/strncpy.c > > > > > +++ b/sysdeps/x86_64/multiarch/strncpy.c > > > > > @@ -24,8 +24,7 @@ > > > > > # undef strncpy > > > > > > > > > > # define SYMBOL_NAME strncpy > > > > > -# define GENERIC generic > > > > > -# include "ifunc-strcpy.h" > > > > > +# include "ifunc-strncpy.h" > > > > > > > > > > libc_ifunc_redirected (__redirect_strncpy, strncpy, IFUNC_SELECTOR ()); > > > > > > > > > > -- > > > > > 2.34.1 > > > > > > > > > > > > > > > > > -- > > > > H.J. > >