From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by sourceware.org (Postfix) with ESMTPS id 2AE763973026 for ; Wed, 23 Jun 2021 06:34:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2AE763973026 Received: by mail-pl1-x62f.google.com with SMTP id x22so624862pll.11 for ; Tue, 22 Jun 2021 23:34:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=HAY+Z1ya0RwC1o/ZfGoOpggMYnBkilZ/dQBzenJDSxY=; b=LY60iweEGr7jTtoMDIN+4CikZ2bhCFDlF0iJIT5C9T6CKzoQlDTK3CwAkZIDUhejff kQTjtYaHEDHTnNxnJL5BFrh/U/exn2rOUIvpYwZPZIjflkbvNgwrz7HtNQ2tqMhbzjOl IMn9/9cpF1KnH4sMroHs16dSb2Fv4pff8P1tN+1bXyyRV2dAGR7ksxklQFvIWAa599qI WuojLbwqrJ0zzUKvtSRaOMbZZPA4NPw/9rRxbpq204LeHo/t1X4vGSCwwol5XYzrH8wJ WcEqlow01o8wV3bq+8wQTiowJ+485FsMkLl17wFfue+BA4GCg0dOeSDJsCbL7xMVMjAP RXSA== X-Gm-Message-State: AOAM531IbhhLQbvVjgUKUFFnm8qqbVP+02O9HElBz4IAc+KcFUU3JslT qwzpVHMPhx3fvetiEBMMgz2CqrwPSDOtmJxaOAvi7ps9 X-Google-Smtp-Source: ABdhPJxdvIULhfhS32quhJWbwP+vue0RQQ/uNV8/KthX/cv/kkWIoDKQQeAuyCFG2lgeCz7pBapi8mtwAhnbWFyyHuc= X-Received: by 2002:a17:902:aa04:b029:ec:f779:3a2b with SMTP id be4-20020a170902aa04b02900ecf7793a2bmr26031247plb.44.1624430079132; Tue, 22 Jun 2021 23:34:39 -0700 (PDT) MIME-Version: 1.0 References: <20210623062821.1166822-1-goldstein.w.n@gmail.com> In-Reply-To: <20210623062821.1166822-1-goldstein.w.n@gmail.com> From: Noah Goldstein Date: Wed, 23 Jun 2021 02:34:28 -0400 Message-ID: Subject: Re: [PATCH] x86-64: Add wcslen optimize for sse4.1 To: GNU C Library X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, HTML_MESSAGE, KAM_NUMSUBJECT, KAM_SHORT, KAM_STOCKGEN, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jun 2021 06:34:42 -0000 On Wed, Jun 23, 2021 at 2:29 AM Noah Goldstein wrote: > No bug. This comment adds the ifunc / build infrastructure > necessary for wcslen to prefer the sse4.1 implementation > in strlen-vec.S. test-wcslen.c is passing. > > Signed-off-by: Noah Goldstein > --- > Rebased on [PATCH] x86-64: Move strlen.S to multiarch/strlen-vec.S > sysdeps/x86_64/multiarch/Makefile | 4 +- > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 3 ++ > sysdeps/x86_64/multiarch/ifunc-wcslen.h | 52 ++++++++++++++++++++++ > sysdeps/x86_64/multiarch/wcslen-sse4_1.S | 4 ++ > sysdeps/x86_64/multiarch/wcslen.c | 2 +- > sysdeps/x86_64/multiarch/wcsnlen.c | 34 +------------- > 6 files changed, 63 insertions(+), 36 deletions(-) > create mode 100644 sysdeps/x86_64/multiarch/ifunc-wcslen.h > create mode 100644 sysdeps/x86_64/multiarch/wcslen-sse4_1.S > > diff --git a/sysdeps/x86_64/multiarch/Makefile > b/sysdeps/x86_64/multiarch/Makefile > index 2c2aad3a7e..26be40959c 100644 > --- a/sysdeps/x86_64/multiarch/Makefile > +++ b/sysdeps/x86_64/multiarch/Makefile > @@ -95,8 +95,8 @@ sysdep_routines += wmemcmp-sse4 wmemcmp-ssse3 wmemcmp-c \ > wcscpy-ssse3 wcscpy-c \ > wcschr-sse2 wcschr-avx2 \ > wcsrchr-sse2 wcsrchr-avx2 \ > - wcsnlen-sse4_1 wcsnlen-c \ > - wcslen-sse2 wcslen-avx2 wcsnlen-avx2 \ > + wcslen-sse2 wcslen-sse4_1 wcslen-avx2 \ > + wcsnlen-c wcsnlen-sse4_1 wcsnlen-avx2 \ > wcschr-avx2-rtm \ > wcscmp-avx2-rtm \ > wcslen-avx2-rtm \ > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > index 15eda47667..dbd1ebf298 100644 > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > @@ -684,6 +684,9 @@ __libc_ifunc_impl_list (const char *name, struct > libc_ifunc_impl *array, > && CPU_FEATURE_USABLE (AVX512BW) > && CPU_FEATURE_USABLE (BMI2)), > __wcslen_evex) > + IFUNC_IMPL_ADD (array, i, wcsnlen, > + CPU_FEATURE_USABLE (SSE4_1), > + __wcsnlen_sse4_1) > IFUNC_IMPL_ADD (array, i, wcslen, 1, __wcslen_sse2)) > > /* Support sysdeps/x86_64/multiarch/wcsnlen.c. */ > diff --git a/sysdeps/x86_64/multiarch/ifunc-wcslen.h > b/sysdeps/x86_64/multiarch/ifunc-wcslen.h > new file mode 100644 > index 0000000000..39e3347378 > --- /dev/null > +++ b/sysdeps/x86_64/multiarch/ifunc-wcslen.h > @@ -0,0 +1,52 @@ > +/* Common definition for ifunc selections for wcslen and wcsnlen > + All versions must be listed in ifunc-impl-list.c. > + Copyright (C) 2017-2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > + > +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden; > +extern __typeof (REDIRECT_NAME) OPTIMIZE (sse4_1) attribute_hidden; > +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden; > +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden; > +extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden; > + > +static inline void * > +IFUNC_SELECTOR (void) > +{ > + const struct cpu_features* cpu_features = __get_cpu_features (); > + > + if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) > + && CPU_FEATURE_USABLE_P (cpu_features, BMI2) > + && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load)) > + { > + if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) > + && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) > + return OPTIMIZE (evex); > + > + if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) > + return OPTIMIZE (avx2_rtm); > + > + if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER)) > + return OPTIMIZE (avx2); > + } > + > + if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_1)) > + return OPTIMIZE (sse4_1); > + > + return OPTIMIZE (sse2); > +} > diff --git a/sysdeps/x86_64/multiarch/wcslen-sse4_1.S > b/sysdeps/x86_64/multiarch/wcslen-sse4_1.S > new file mode 100644 > index 0000000000..7e62621afc > --- /dev/null > +++ b/sysdeps/x86_64/multiarch/wcslen-sse4_1.S > @@ -0,0 +1,4 @@ > +#define AS_WCSLEN > +#define strlen __wcslen_sse4_1 > + > +#include "strlen-vec.S" > diff --git a/sysdeps/x86_64/multiarch/wcslen.c > b/sysdeps/x86_64/multiarch/wcslen.c > index f89bed42a0..3032061d3b 100644 > --- a/sysdeps/x86_64/multiarch/wcslen.c > +++ b/sysdeps/x86_64/multiarch/wcslen.c > @@ -24,7 +24,7 @@ > # undef __wcslen > > # define SYMBOL_NAME wcslen > -# include "ifunc-avx2.h" > +# include "ifunc-wcslen.h" > > libc_ifunc_redirected (__redirect_wcslen, __wcslen, IFUNC_SELECTOR ()); > weak_alias (__wcslen, wcslen); > diff --git a/sysdeps/x86_64/multiarch/wcsnlen.c > b/sysdeps/x86_64/multiarch/wcsnlen.c > index 4983f1b222..2963fbe059 100644 > --- a/sysdeps/x86_64/multiarch/wcsnlen.c > +++ b/sysdeps/x86_64/multiarch/wcsnlen.c > @@ -24,39 +24,7 @@ > # undef __wcsnlen > > # define SYMBOL_NAME wcsnlen > -# include > - > -extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden; > -extern __typeof (REDIRECT_NAME) OPTIMIZE (sse4_1) attribute_hidden; > -extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden; > -extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden; > -extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden; > - > -static inline void * > -IFUNC_SELECTOR (void) > -{ > - const struct cpu_features* cpu_features = __get_cpu_features (); > - > - if (CPU_FEATURE_USABLE_P (cpu_features, AVX2) > - && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load)) > - { > - if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) > - && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) > - && CPU_FEATURE_USABLE_P (cpu_features, BMI2)) > - return OPTIMIZE (evex); > - > - if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) > - return OPTIMIZE (avx2_rtm); > - > - if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER)) > - return OPTIMIZE (avx2); > - } > - > - if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_1)) > - return OPTIMIZE (sse4_1); > - > - return OPTIMIZE (sse2); > -} > +# include "ifunc-wcslen.h" > > libc_ifunc_redirected (__redirect_wcsnlen, __wcsnlen, IFUNC_SELECTOR ()); > weak_alias (__wcsnlen, wcsnlen); > -- > 2.25.1 > >