From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x22c.google.com (mail-oi1-x22c.google.com [IPv6:2607:f8b0:4864:20::22c]) by sourceware.org (Postfix) with ESMTPS id B828A3858401 for ; Mon, 3 Oct 2022 16:20:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B828A3858401 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-oi1-x22c.google.com with SMTP id m81so11819718oia.1 for ; Mon, 03 Oct 2022 09:20:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=+UAfYq6m3j1+Q/6f2Q/CvJRDNVuFSNytWpMOvJOm9m8=; b=nAGT0oA5bg0TS864t0WfIe/xP1fUneyqN0BOIBTPCKU+vL/uSyI49sCfgWRUhtsPwS ekiYU9FNAs8hDwb6E8Au5pOkBWAkop/NUXfjIEX1HBxyPxvH7+kD5BSvkcOroUKMLIwq Q3aQyVM5ngLD4L9IZx83UbL1fIWlVDGJCwcqLjEr414JvDt3Uqf1mSCH6bPgq7hqZ2B3 G9CfSelGIjpAYd4EzgF8AkzXv5ezR2gdAqItv5bxJYodJq7Kobh6X687FLjtPPoOgVh+ 5BaOgvT3LnrMgbXARtmqHrFKOQAIo6JB0o1f85OWoWKriXn3FMn6QT9yKOW8YZ37Fiy4 eFPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=+UAfYq6m3j1+Q/6f2Q/CvJRDNVuFSNytWpMOvJOm9m8=; b=DLqZu0j9vhRMjEdWJmtQqJj6xLb7Ko+ngiLTJuWibwVnel5AAxiheeQCUAAuQ0KmLY p1dQf4LkbFgHB1RuiYr5TDj4rstCTUyh920GINnE1CnkKBlviouxKLYd8ygKIB3dbF5O jFSrQVJR9PG3VZab7TcDWUXjPd/KPH4NidaEGdb8Nl1asbNg7kJVzvwVJ6qZ3i9CHgE7 vmK2zOBQ09xKVIrcgcQ0bC9SuFIweMF1Ze9HiDWe9Ka82XA4qkJYBILrUN/SJTPGt87f xlhSRKMhh6pF1cb5y1sOmJ1A/sOcRgLMPyXKNgHT3oV0kNt25BeuRxbd98a0EUDEXl7L MFuA== X-Gm-Message-State: ACrzQf0p7XNQ3AfwtyfPSyJRpF6iXaQt0IdFwYCJ0SxFiFpsCqeF3+BM W+NkUNIzF/XWcwjwVLI/usVytXBdyJNSNq07vjsALOP2 X-Google-Smtp-Source: AMsMyM7asfYnIen1dM6KJmVfmLMNjeWwm0E2VJb/aKT9Zon33ji1QMaDASj3Dl1KzdcvKnIonDV45+PDTrXl4Ud3xfM= X-Received: by 2002:a05:6808:1187:b0:353:a617:6acd with SMTP id j7-20020a056808118700b00353a6176acdmr2637804oil.105.1664814021997; Mon, 03 Oct 2022 09:20:21 -0700 (PDT) MIME-Version: 1.0 References: <20221002123424.3079805-1-aurelien@aurel32.net> <20221002123424.3079805-3-aurelien@aurel32.net> In-Reply-To: From: Sunil Pandey Date: Mon, 3 Oct 2022 09:19:46 -0700 Message-ID: Subject: Re: [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations To: Noah Goldstein Cc: Aurelien Jarno , libc-alpha@sourceware.org, "H . J . Lu" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Please separate this patch into 4 separate patches. Patch1: sysdeps/x86_64/multiarch/strncmp.c Patch2: sysdeps/x86_64/multiarch/strcmp.c Patch3: sysdeps/x86_64/multiarch/ifunc-strcasecmp.h Patch4: sysdeps/x86_64/multiarch/ifunc-impl-list.c Rest of them looks OK to me. On Sun, Oct 2, 2022 at 2:08 PM Noah Goldstein wrote: > > On Sun, Oct 2, 2022 at 8:34 AM Aurelien Jarno wrote: > > > > The AVX2 str*cmp and wcs(n)cmp implementations use the 'bzhi' > > instruction, which belongs to the BMI2 CPU feature. > > > > NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF > > as BSF if the CPU doesn't support TZCNT, and produces the same result > > for non-zero input. > > > > Fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S") > > Partially resolves: BZ #29611 > > --- > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 47 +++++++++++++++------ > > sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + > > sysdeps/x86_64/multiarch/strcmp.c | 4 +- > > sysdeps/x86_64/multiarch/strncmp.c | 4 +- > > 4 files changed, 39 insertions(+), 17 deletions(-) > > > > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > index a71444eccb..fec8790c11 100644 > > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > @@ -448,13 +448,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > IFUNC_IMPL (i, name, strcasecmp, > > X86_IFUNC_IMPL_ADD_V4 (array, i, strcasecmp, > > (CPU_FEATURE_USABLE (AVX512VL) > > - && CPU_FEATURE_USABLE (AVX512BW)), > > + && CPU_FEATURE_USABLE (AVX512BW) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strcasecmp_evex) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp, > > - CPU_FEATURE_USABLE (AVX2), > > + (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strcasecmp_avx2) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp, > > (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2) > > && CPU_FEATURE_USABLE (RTM)), > > __strcasecmp_avx2_rtm) > > X86_IFUNC_IMPL_ADD_V2 (array, i, strcasecmp, > > @@ -470,13 +473,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > IFUNC_IMPL (i, name, strcasecmp_l, > > X86_IFUNC_IMPL_ADD_V4 (array, i, strcasecmp, > > (CPU_FEATURE_USABLE (AVX512VL) > > - && CPU_FEATURE_USABLE (AVX512BW)), > > + && CPU_FEATURE_USABLE (AVX512BW) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strcasecmp_l_evex) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp, > > - CPU_FEATURE_USABLE (AVX2), > > + (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strcasecmp_l_avx2) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp, > > (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2) > > && CPU_FEATURE_USABLE (RTM)), > > __strcasecmp_l_avx2_rtm) > > X86_IFUNC_IMPL_ADD_V2 (array, i, strcasecmp_l, > > @@ -585,10 +591,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > && CPU_FEATURE_USABLE (BMI2)), > > __strcmp_evex) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strcmp, > > - CPU_FEATURE_USABLE (AVX2), > > + (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strcmp_avx2) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strcmp, > > (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2) > > && CPU_FEATURE_USABLE (RTM)), > > __strcmp_avx2_rtm) > > X86_IFUNC_IMPL_ADD_V2 (array, i, strcmp, > > @@ -638,13 +646,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > IFUNC_IMPL (i, name, strncasecmp, > > X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, > > (CPU_FEATURE_USABLE (AVX512VL) > > - && CPU_FEATURE_USABLE (AVX512BW)), > > + && CPU_FEATURE_USABLE (AVX512BW) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strncasecmp_evex) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > > - CPU_FEATURE_USABLE (AVX2), > > + (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strncasecmp_avx2) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > > (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2) > > && CPU_FEATURE_USABLE (RTM)), > > __strncasecmp_avx2_rtm) > > X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp, > > @@ -660,13 +671,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > IFUNC_IMPL (i, name, strncasecmp_l, > > X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, > > (CPU_FEATURE_USABLE (AVX512VL) > > - && CPU_FEATURE_USABLE (AVX512BW)), > > + & CPU_FEATURE_USABLE (AVX512BW) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strncasecmp_l_evex) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > > - CPU_FEATURE_USABLE (AVX2), > > + (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strncasecmp_l_avx2) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > > (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2) > > && CPU_FEATURE_USABLE (RTM)), > > __strncasecmp_l_avx2_rtm) > > X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp_l, > > @@ -796,10 +810,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > && CPU_FEATURE_USABLE (BMI2)), > > __wcscmp_evex) > > X86_IFUNC_IMPL_ADD_V3 (array, i, wcscmp, > > - CPU_FEATURE_USABLE (AVX2), > > + (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2)), > > __wcscmp_avx2) > > X86_IFUNC_IMPL_ADD_V3 (array, i, wcscmp, > > (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2) > > && CPU_FEATURE_USABLE (RTM)), > > __wcscmp_avx2_rtm) > > /* ISA V2 wrapper for SSE2 implementation because the SSE2 > > @@ -816,10 +832,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > && CPU_FEATURE_USABLE (BMI2)), > > __wcsncmp_evex) > > X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, > > - CPU_FEATURE_USABLE (AVX2), > > + (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2)), > > __wcsncmp_avx2) > > X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, > > (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2) > > && CPU_FEATURE_USABLE (RTM)), > > __wcsncmp_avx2_rtm) > > /* ISA V2 wrapper for GENERIC implementation because the > > @@ -1162,13 +1180,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > IFUNC_IMPL (i, name, strncmp, > > X86_IFUNC_IMPL_ADD_V4 (array, i, strncmp, > > (CPU_FEATURE_USABLE (AVX512VL) > > - && CPU_FEATURE_USABLE (AVX512BW)), > > + && CPU_FEATURE_USABLE (AVX512BW) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strncmp_evex) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, > > - CPU_FEATURE_USABLE (AVX2), > > + (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strncmp_avx2) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, > > (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2) > > && CPU_FEATURE_USABLE (RTM)), > > __strncmp_avx2_rtm) > > X86_IFUNC_IMPL_ADD_V2 (array, i, strncmp, > > diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > > index 68646ef199..7622af259c 100644 > > --- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > > +++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > > @@ -34,6 +34,7 @@ IFUNC_SELECTOR (void) > > const struct cpu_features *cpu_features = __get_cpu_features (); > > > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) > > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, > > AVX_Fast_Unaligned_Load, )) > > { > > diff --git a/sysdeps/x86_64/multiarch/strcmp.c b/sysdeps/x86_64/multiarch/strcmp.c > > index fdd5afe3af..9d6c9f66ba 100644 > > --- a/sysdeps/x86_64/multiarch/strcmp.c > > +++ b/sysdeps/x86_64/multiarch/strcmp.c > > @@ -45,12 +45,12 @@ IFUNC_SELECTOR (void) > > const struct cpu_features *cpu_features = __get_cpu_features (); > > > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) > > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, > > AVX_Fast_Unaligned_Load, )) > > { > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) > > - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) > > - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)) > > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) > > return OPTIMIZE (evex); > > > > if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) > > diff --git a/sysdeps/x86_64/multiarch/strncmp.c b/sysdeps/x86_64/multiarch/strncmp.c > > index 4ebe4bde30..c4f8b6bbb5 100644 > > --- a/sysdeps/x86_64/multiarch/strncmp.c > > +++ b/sysdeps/x86_64/multiarch/strncmp.c > > @@ -41,12 +41,12 @@ IFUNC_SELECTOR (void) > > const struct cpu_features *cpu_features = __get_cpu_features (); > > > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) > > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, > > AVX_Fast_Unaligned_Load, )) > > { > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) > > - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) > > - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)) > > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) > > return OPTIMIZE (evex); > > > > if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) > > -- > > 2.35.1 > > > > LGTM.