From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com [IPv6:2a00:1450:4864:20::633]) by sourceware.org (Postfix) with ESMTPS id 34050385841F for ; Sat, 1 Oct 2022 22:16:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 34050385841F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-x633.google.com with SMTP id lh5so15533643ejb.10 for ; Sat, 01 Oct 2022 15:16:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=f84ZxmvFhKhUIoCyXFvTea2ycHzYDEipfCxux6X6+Zc=; b=JHEioazSu+EOitysjPxYasseGCrGlx0Oa7OtAk+LRlju9plhtDpKJg660a0itvxZ6t GOvsOS0LCdIgUAJibc6FQNHi0C+dneWYE2ZRqGhN3U9GQhG345zXuxuPI82Aa9mSRif8 +8m4EnbQ7CptTzYIfUR8pJ/bFNUszzC7xXwqOfk/Q475wBWVXkecMF6mAeMxNrPPXmfw 3QO7BF8r2hL3SGIHAoVFS3pdFUcwGF5So2m91zr6hAtFNUPRgnQ61/kheaWUUa/YDM6B zq5id0Qck615GlzFnqRUNlTW5PvlC6PIV597HHjwHsKXZqv6TTqvZ40HInVHelNZapoD OnVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=f84ZxmvFhKhUIoCyXFvTea2ycHzYDEipfCxux6X6+Zc=; b=Blq6J3P1HK6aY7LBO+rkLvzUxmXXSJBfbiGmwZF0R0ZW9G8LlcZ3MQqW/cFM5mkSds lx45HtUwWm8jZrU8DVZZMujFRYR19rseJRWeBkBzwMMmXjF2XEq0711FSkPpaLoU8ml+ zBq1W4qQ9e04fM7jtJl6Tjkunepuq7Pj8donhwgs26lvUE6uHQy39caurNAOmOoBYIys 1w8LjxNssxoXBpOEdyqeX7X61R1QhMsVfKqaOFwVE6dcLUBD1vtl3L4I3vVdkr/figmP mimGitWkP8fIK0lkz+RsikpSyxJcTgVhzhoYALN2RhByna/ffqh8jUopbFF2bt28IAKI JpSw== X-Gm-Message-State: ACrzQf15pvjgOb7ntAsBc5dAw6JYeQxS7GKLDK9ov0t7oTHKYVTxNrYT 7Gs5wi2r2xV/a4Mze2mF6oVqQnStxvRgypAhg47WFVOH X-Google-Smtp-Source: AMsMyM4R9XKW4ecLWfrfwrz2vH8oGceH/hXYaDxmuCwdNeF2GAr4l5SbyWmXt8a+RytStrsBjaZOnnrfF0o7epie0R4= X-Received: by 2002:a17:906:fe46:b0:73d:939a:ec99 with SMTP id wz6-20020a170906fe4600b0073d939aec99mr10925339ejb.169.1664662566981; Sat, 01 Oct 2022 15:16:06 -0700 (PDT) MIME-Version: 1.0 References: <20221001190911.2994478-1-aurelien@aurel32.net> <20221001190911.2994478-3-aurelien@aurel32.net> In-Reply-To: <20221001190911.2994478-3-aurelien@aurel32.net> From: Noah Goldstein Date: Sat, 1 Oct 2022 15:15:55 -0700 Message-ID: Subject: Re: [PATCH 2/4] x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp implementations To: Aurelien Jarno Cc: libc-alpha@sourceware.org, "H . J . Lu" , Sunil K Pandey Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno wrote: > > The AVX2 strncmp, strncasecmp and wcsncmp implementations use the bzhil > instructions, which belongs to the BMI2 CPU feature. > > Fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S") > Partially resolves: BZ #29611 > --- > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 25 +++++++++++++++------ > sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + > sysdeps/x86_64/multiarch/strncmp.c | 4 ++-- The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport to 2.33, 2.34, 2.35. Also separate changes for ifunc need to be backport to strncmp.c: 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs BMI1. Finally a corresponding fix is needed for strcmp.c as well (there is missing BMI2 check in strcmp.c ifunc selection as well as missing checks in the impl list). > 3 files changed, 21 insertions(+), 9 deletions(-) > > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > index a71444eccb..ec1a8bff5e 100644 > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > @@ -638,13 +638,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > IFUNC_IMPL (i, name, strncasecmp, > X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, > (CPU_FEATURE_USABLE (AVX512VL) > - && CPU_FEATURE_USABLE (AVX512BW)), > + && CPU_FEATURE_USABLE (AVX512BW) > + && CPU_FEATURE_USABLE (BMI2)), > __strncasecmp_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2)), > __strncasecmp_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2) > && CPU_FEATURE_USABLE (RTM)), > __strncasecmp_avx2_rtm) > X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp, > @@ -660,13 +663,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > IFUNC_IMPL (i, name, strncasecmp_l, > X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, > (CPU_FEATURE_USABLE (AVX512VL) > - && CPU_FEATURE_USABLE (AVX512BW)), > + & CPU_FEATURE_USABLE (AVX512BW) > + && CPU_FEATURE_USABLE (BMI2)), > __strncasecmp_l_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2), > __strncasecmp_l_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2) > && CPU_FEATURE_USABLE (RTM)), > __strncasecmp_l_avx2_rtm) > X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp_l, > @@ -816,10 +822,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > && CPU_FEATURE_USABLE (BMI2)), > __wcsncmp_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2)), > __wcsncmp_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2) > && CPU_FEATURE_USABLE (RTM)), > __wcsncmp_avx2_rtm) > /* ISA V2 wrapper for GENERIC implementation because the > @@ -1162,13 +1170,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > IFUNC_IMPL (i, name, strncmp, > X86_IFUNC_IMPL_ADD_V4 (array, i, strncmp, > (CPU_FEATURE_USABLE (AVX512VL) > - && CPU_FEATURE_USABLE (AVX512BW)), > + && CPU_FEATURE_USABLE (AVX512BW) > + && CPU_FEATURE_USABLE (BMI2)), > __strncmp_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2)), > __strncmp_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2) > && CPU_FEATURE_USABLE (RTM)), > __strncmp_avx2_rtm) > X86_IFUNC_IMPL_ADD_V2 (array, i, strncmp, > diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > index 68646ef199..7622af259c 100644 > --- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > +++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > @@ -34,6 +34,7 @@ IFUNC_SELECTOR (void) > const struct cpu_features *cpu_features = __get_cpu_features (); > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, > AVX_Fast_Unaligned_Load, )) > { > diff --git a/sysdeps/x86_64/multiarch/strncmp.c b/sysdeps/x86_64/multiarch/strncmp.c > index 4ebe4bde30..c4f8b6bbb5 100644 > --- a/sysdeps/x86_64/multiarch/strncmp.c > +++ b/sysdeps/x86_64/multiarch/strncmp.c > @@ -41,12 +41,12 @@ IFUNC_SELECTOR (void) > const struct cpu_features *cpu_features = __get_cpu_features (); > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, > AVX_Fast_Unaligned_Load, )) > { > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) > - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) > - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)) > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) > return OPTIMIZE (evex); > > if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) > -- > 2.35.1 >