From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x33.google.com (mail-oa1-x33.google.com [IPv6:2001:4860:4864:20::33]) by sourceware.org (Postfix) with ESMTPS id 53EBA3858412 for ; Wed, 6 Jul 2022 02:05:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 53EBA3858412 Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-10bd4812c29so13929442fac.11 for ; Tue, 05 Jul 2022 19:05:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=57OuEWGUB5pIEug/bXOIgSbnPEuPOTUviZm2yk9pXAc=; b=3bRSAviGTzXDseajcXaY1gSOkq2bzS9dyFrZUvx30s47CHve4UxkPKyTYeRFEFSFLm EOXH1aetrfCHmMENljMNUcz1fsAQdVGPzaTsDHS3xTtzrwlgqHfSt/g6iQGSN7ljwUqn AtOFU1S9q2N4yv1GlAtcPir6WXJ2vjgBseKkDecwpEeHm5z5J7As9eZ0/XJ3UX5GN0Oh VkViwbtrldCOC5Qfj8liaTQy3fwnPt3LeJ203FtCoICB3ANI5WraS7Ui5kNmFtSZK/42 d+KMaWGgga2ILhk+3oOoMyPmdWjwh8UPWD/OiIt4V17Svp7u8BI+0fDjk5mN62M19+Zu CFeA== X-Gm-Message-State: AJIora+9IJRxW7nNKq8Bwi2WqUh19aoN0fPHQreeoESxJcCcXUMCxNJC Ujuh3ZCJyU6HUYNkE0ib/hIvHEdB7BJ0FS6PwfxavTafBFo= X-Google-Smtp-Source: AGRyM1sv335Nfqho7LfHf13Q5Y6bLEcLXBwuDHpy7SnNWt/tWuueILi8vIbW6N0otMaLboeHdolXeZYqeZ3J1x+c2QI= X-Received: by 2002:a05:6870:41ca:b0:101:d588:6241 with SMTP id z10-20020a05687041ca00b00101d5886241mr23633938oac.175.1657073153493; Tue, 05 Jul 2022 19:05:53 -0700 (PDT) MIME-Version: 1.0 References: <20220706000641.3657347-1-goldstein.w.n@gmail.com> In-Reply-To: From: Noah Goldstein Date: Tue, 5 Jul 2022 19:05:42 -0700 Message-ID: Subject: Re: [PATCH v1] x86: Remove generic strncat, strncpy, and stpncpy implementations To: "H.J. Lu" Cc: Mayshao-oc , GNU C Library , "Carlos O'Donell" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, KAM_STOCKGEN, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2022 02:05:57 -0000 On Tue, Jul 5, 2022 at 6:33 PM H.J. Lu wrote: > > On Tue, Jul 5, 2022 at 5:06 PM Noah Goldstein wrote: > > > > These functions all have optimized versions: > > __strncat_sse2_unaligned, __strncpy_sse2_unaligned, and > > stpncpy_sse2_unaligned which are faster than their respective generic > > implementations. Since the sse2 versions can run on baseline x86_64, > > Is this true on all x86-64 processors? That they are faster or that sse2 can run on baseline x86_64? > > > we should use these as the baseline implementation and can remove the > > generic implementations. > > > > Geometric mean of N=20 runs of the entire benchmark suite on: > > 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz (Tigerlake) > > > > __strncat_sse2_unaligned / __strncat_generic: .944 > > __strncpy_sse2_unaligned / __strncpy_generic: .726 > > __stpncpy_sse2_unaligned / __stpncpy_generic: .650 > > > > Tested build with and without multiarch and full check with multiarch. > > --- > > sysdeps/x86_64/multiarch/Makefile | 3 -- > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 9 ++-- > > sysdeps/x86_64/multiarch/ifunc-strcpy.h | 8 +--- > > sysdeps/x86_64/multiarch/ifunc-strncpy.h | 48 ++++++++++++++++++++++ > > sysdeps/x86_64/multiarch/stpncpy-generic.c | 26 ------------ > > sysdeps/x86_64/multiarch/stpncpy.c | 3 +- > > sysdeps/x86_64/multiarch/strncat-generic.c | 21 ---------- > > sysdeps/x86_64/multiarch/strncat.c | 3 +- > > sysdeps/x86_64/multiarch/strncpy-generic.c | 24 ----------- > > sysdeps/x86_64/multiarch/strncpy.c | 3 +- > > 10 files changed, 56 insertions(+), 92 deletions(-) > > create mode 100644 sysdeps/x86_64/multiarch/ifunc-strncpy.h > > delete mode 100644 sysdeps/x86_64/multiarch/stpncpy-generic.c > > delete mode 100644 sysdeps/x86_64/multiarch/strncat-generic.c > > delete mode 100644 sysdeps/x86_64/multiarch/strncpy-generic.c > > > > diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile > > index 18cea04423..04e238efa0 100644 > > --- a/sysdeps/x86_64/multiarch/Makefile > > +++ b/sysdeps/x86_64/multiarch/Makefile > > @@ -45,7 +45,6 @@ sysdep_routines += \ > > stpcpy-sse2-unaligned \ > > stpncpy-avx2 \ > > stpncpy-avx2-rtm \ > > - stpncpy-generic \ > > stpncpy-evex \ > > stpncpy-sse2-unaligned \ > > strcasecmp_l-avx2 \ > > @@ -92,7 +91,6 @@ sysdep_routines += \ > > strncase_l-sse4_2 \ > > strncat-avx2 \ > > strncat-avx2-rtm \ > > - strncat-generic \ > > strncat-evex \ > > strncat-sse2-unaligned \ > > strncmp-avx2 \ > > @@ -102,7 +100,6 @@ sysdep_routines += \ > > strncmp-sse4_2 \ > > strncpy-avx2 \ > > strncpy-avx2-rtm \ > > - strncpy-generic \ > > strncpy-evex \ > > strncpy-sse2-unaligned \ > > strnlen-avx2 \ > > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > index adf7d4bafd..2c96cb62d2 100644 > > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > @@ -403,8 +403,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > && CPU_FEATURE_USABLE (AVX512BW)), > > __stpncpy_evex) > > IFUNC_IMPL_ADD (array, i, stpncpy, 1, > > - __stpncpy_sse2_unaligned) > > - IFUNC_IMPL_ADD (array, i, stpncpy, 1, __stpncpy_generic)) > > + __stpncpy_sse2_unaligned)) > > > > /* Support sysdeps/x86_64/multiarch/stpcpy.c. */ > > IFUNC_IMPL (i, name, stpcpy, > > @@ -618,8 +617,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > && CPU_FEATURE_USABLE (AVX512BW)), > > __strncat_evex) > > IFUNC_IMPL_ADD (array, i, strncat, 1, > > - __strncat_sse2_unaligned) > > - IFUNC_IMPL_ADD (array, i, strncat, 1, __strncat_generic)) > > + __strncat_sse2_unaligned)) > > > > /* Support sysdeps/x86_64/multiarch/strncpy.c. */ > > IFUNC_IMPL (i, name, strncpy, > > @@ -634,8 +632,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > && CPU_FEATURE_USABLE (AVX512BW)), > > __strncpy_evex) > > IFUNC_IMPL_ADD (array, i, strncpy, 1, > > - __strncpy_sse2_unaligned) > > - IFUNC_IMPL_ADD (array, i, strncpy, 1, __strncpy_generic)) > > + __strncpy_sse2_unaligned)) > > > > /* Support sysdeps/x86_64/multiarch/strpbrk.c. */ > > IFUNC_IMPL (i, name, strpbrk, > > diff --git a/sysdeps/x86_64/multiarch/ifunc-strcpy.h b/sysdeps/x86_64/multiarch/ifunc-strcpy.h > > index 80529458d1..a15afa44e9 100644 > > --- a/sysdeps/x86_64/multiarch/ifunc-strcpy.h > > +++ b/sysdeps/x86_64/multiarch/ifunc-strcpy.h > > @@ -20,11 +20,7 @@ > > > > #include > > > > -#ifndef GENERIC > > -# define GENERIC sse2 > > -#endif > > - > > -extern __typeof (REDIRECT_NAME) OPTIMIZE (GENERIC) attribute_hidden; > > +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden; > > extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2_unaligned) > > attribute_hidden; > > extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden; > > @@ -53,5 +49,5 @@ IFUNC_SELECTOR (void) > > if (CPU_FEATURES_ARCH_P (cpu_features, Fast_Unaligned_Load)) > > return OPTIMIZE (sse2_unaligned); > > > > - return OPTIMIZE (GENERIC); > > + return OPTIMIZE (sse2); > > } > > diff --git a/sysdeps/x86_64/multiarch/ifunc-strncpy.h b/sysdeps/x86_64/multiarch/ifunc-strncpy.h > > new file mode 100644 > > index 0000000000..323225af4d > > --- /dev/null > > +++ b/sysdeps/x86_64/multiarch/ifunc-strncpy.h > > @@ -0,0 +1,48 @@ > > +/* Common definition for ifunc st{r|p}n{cpy|cat} > > + All versions must be listed in ifunc-impl-list.c. > > + Copyright (C) 2022 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful, > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + . */ > > + > > +#include > > + > > +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2_unaligned) > > + attribute_hidden; > > +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden; > > +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden; > > +extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden; > > + > > +static inline void * > > +IFUNC_SELECTOR (void) > > +{ > > + const struct cpu_features* cpu_features = __get_cpu_features (); > > + > > + if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) > > + && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load)) > > + { > > + if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) > > + && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) > > + return OPTIMIZE (evex); > > + > > + if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) > > + return OPTIMIZE (avx2_rtm); > > + > > + if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER)) > > + return OPTIMIZE (avx2); > > + } > > + > > + return OPTIMIZE (sse2_unaligned); > > +} > > diff --git a/sysdeps/x86_64/multiarch/stpncpy-generic.c b/sysdeps/x86_64/multiarch/stpncpy-generic.c > > deleted file mode 100644 > > index 87826845b0..0000000000 > > --- a/sysdeps/x86_64/multiarch/stpncpy-generic.c > > +++ /dev/null > > @@ -1,26 +0,0 @@ > > -/* stpncpy. > > - Copyright (C) 2022 Free Software Foundation, Inc. > > - This file is part of the GNU C Library. > > - > > - The GNU C Library is free software; you can redistribute it and/or > > - modify it under the terms of the GNU Lesser General Public > > - License as published by the Free Software Foundation; either > > - version 2.1 of the License, or (at your option) any later version. > > - > > - The GNU C Library is distributed in the hope that it will be useful, > > - but WITHOUT ANY WARRANTY; without even the implied warranty of > > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > - Lesser General Public License for more details. > > - > > - You should have received a copy of the GNU Lesser General Public > > - License along with the GNU C Library; if not, see > > - . */ > > - > > - > > -#define STPNCPY __stpncpy_generic > > -#undef weak_alias > > -#define weak_alias(ignored1, ignored2) > > -#undef libc_hidden_def > > -#define libc_hidden_def(stpncpy) > > - > > -#include > > diff --git a/sysdeps/x86_64/multiarch/stpncpy.c b/sysdeps/x86_64/multiarch/stpncpy.c > > index 879bc83f0b..a8d083ff0d 100644 > > --- a/sysdeps/x86_64/multiarch/stpncpy.c > > +++ b/sysdeps/x86_64/multiarch/stpncpy.c > > @@ -25,9 +25,8 @@ > > # undef stpncpy > > # undef __stpncpy > > > > -# define GENERIC generic > > # define SYMBOL_NAME stpncpy > > -# include "ifunc-strcpy.h" > > +# include "ifunc-strncpy.h" > > > > libc_ifunc_redirected (__redirect_stpncpy, __stpncpy, IFUNC_SELECTOR ()); > > > > diff --git a/sysdeps/x86_64/multiarch/strncat-generic.c b/sysdeps/x86_64/multiarch/strncat-generic.c > > deleted file mode 100644 > > index 0090669cd1..0000000000 > > --- a/sysdeps/x86_64/multiarch/strncat-generic.c > > +++ /dev/null > > @@ -1,21 +0,0 @@ > > -/* strncat. > > - Copyright (C) 2022 Free Software Foundation, Inc. > > - This file is part of the GNU C Library. > > - > > - The GNU C Library is free software; you can redistribute it and/or > > - modify it under the terms of the GNU Lesser General Public > > - License as published by the Free Software Foundation; either > > - version 2.1 of the License, or (at your option) any later version. > > - > > - The GNU C Library is distributed in the hope that it will be useful, > > - but WITHOUT ANY WARRANTY; without even the implied warranty of > > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > - Lesser General Public License for more details. > > - > > - You should have received a copy of the GNU Lesser General Public > > - License along with the GNU C Library; if not, see > > - . */ > > - > > - > > -#define STRNCAT __strncat_generic > > -#include > > diff --git a/sysdeps/x86_64/multiarch/strncat.c b/sysdeps/x86_64/multiarch/strncat.c > > index 50fba8a41f..a590c25d51 100644 > > --- a/sysdeps/x86_64/multiarch/strncat.c > > +++ b/sysdeps/x86_64/multiarch/strncat.c > > @@ -24,8 +24,7 @@ > > # undef strncat > > > > # define SYMBOL_NAME strncat > > -# define GENERIC generic > > -# include "ifunc-strcpy.h" > > +# include "ifunc-strncpy.h" > > > > libc_ifunc_redirected (__redirect_strncat, strncat, IFUNC_SELECTOR ()); > > strong_alias (strncat, __strncat); > > diff --git a/sysdeps/x86_64/multiarch/strncpy-generic.c b/sysdeps/x86_64/multiarch/strncpy-generic.c > > deleted file mode 100644 > > index 9916153dd5..0000000000 > > --- a/sysdeps/x86_64/multiarch/strncpy-generic.c > > +++ /dev/null > > @@ -1,24 +0,0 @@ > > -/* strncpy. > > - Copyright (C) 2022 Free Software Foundation, Inc. > > - This file is part of the GNU C Library. > > - > > - The GNU C Library is free software; you can redistribute it and/or > > - modify it under the terms of the GNU Lesser General Public > > - License as published by the Free Software Foundation; either > > - version 2.1 of the License, or (at your option) any later version. > > - > > - The GNU C Library is distributed in the hope that it will be useful, > > - but WITHOUT ANY WARRANTY; without even the implied warranty of > > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > - Lesser General Public License for more details. > > - > > - You should have received a copy of the GNU Lesser General Public > > - License along with the GNU C Library; if not, see > > - . */ > > - > > - > > -#define STRNCPY __strncpy_generic > > -#undef libc_hidden_builtin_def > > -#define libc_hidden_builtin_def(strncpy) > > - > > -#include > > diff --git a/sysdeps/x86_64/multiarch/strncpy.c b/sysdeps/x86_64/multiarch/strncpy.c > > index 7fc7d72ec5..c83440f0e3 100644 > > --- a/sysdeps/x86_64/multiarch/strncpy.c > > +++ b/sysdeps/x86_64/multiarch/strncpy.c > > @@ -24,8 +24,7 @@ > > # undef strncpy > > > > # define SYMBOL_NAME strncpy > > -# define GENERIC generic > > -# include "ifunc-strcpy.h" > > +# include "ifunc-strncpy.h" > > > > libc_ifunc_redirected (__redirect_strncpy, strncpy, IFUNC_SELECTOR ()); > > > > -- > > 2.34.1 > > > > > -- > H.J.