From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) by sourceware.org (Postfix) with ESMTPS id 156B8385C8B0 for ; Mon, 3 Oct 2022 21:11:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 156B8385C8B0 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-x634.google.com with SMTP id sd10so24878712ejc.2 for ; Mon, 03 Oct 2022 14:11:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=ZrXG2ZorrLmwIWN+vxDGfcpkRYtJTo4O0EK0k6z6SYo=; b=lJfqVTGTJYMz4KdCGO6yAcevMjTPEphO/WFmodVZe73xlJLkWG6T6XTySIlllLpl0b hgiV9rQZ+ZooXG0tp1+2+LXoeyOcoJgnx90XvYeO8s6oi7Bi0jPp836gMZWRmY2gbRO3 HjMRA/JnLq/jZDCWR6n26Ou6YuEpq/6YGI01fQ0TfLwp3AAvMFByrSo2v5RGyIz9SePj V7Kx79I/F2s8KglQanQoLEtLy4e87MZYjQJqHPBE71Ss3hmn+4uEbuevq6jBhrY4Nsqm mPFiqpilo5NIiyeGcgBLOXu+FZZmFDTNrFqx150ilNWo5Jrx14eYWfWBbSAOCeP/lYmH oMfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=ZrXG2ZorrLmwIWN+vxDGfcpkRYtJTo4O0EK0k6z6SYo=; b=SHFZiz2/haE1d5X2kVkqoiWiTAeANQcouJ1/BRXrXxY5p73Sz2XUX1412Yk3hC2lfc Tdmv9zjiUnV+Cq4bqmn6nD+7vHwWJYN4pqnQKirZpq8IZZqaOAlA6Ohais1xlfWasZG9 FhEznu7KWaXWgdlNrHoj1MABDcdZPef8JInmriDsnou2Pzb5ag418522KsO4e35xx80R 0DX1uSEcK1zx/I/BuzZBvM3hsONzCMAy/u8gMniRslBzjPLSS2f/w/o02Rd+X3cb5Lsv OOHAumIOkx3CDUcubjEll/3SFF07+koF3N2J36pWoISRCejtqFXiTzFkUAWulcgmKhzc UaGg== X-Gm-Message-State: ACrzQf0+rTB5mfvhnhLsIbjwSqOwOpZC7CkeUhGCspPq9iNh2KRSNw52 EROw0t/xvqPp1hmt5mhhXSWyYMnL33B1VbSq/To= X-Google-Smtp-Source: AMsMyM6xHFN58CBVhIwem1+2xZN/9QLtM/F0c66WhO0uBeTtIdwvypjxr8OZstNGYs2ksnQpkygRoHcj8AL0VoKNRio= X-Received: by 2002:a17:906:2bc7:b0:72f:dc70:a3c6 with SMTP id n7-20020a1709062bc700b0072fdc70a3c6mr16668443ejg.645.1664831499840; Mon, 03 Oct 2022 14:11:39 -0700 (PDT) MIME-Version: 1.0 References: <20221003195944.3274548-1-aurelien@aurel32.net> <20221003195944.3274548-3-aurelien@aurel32.net> In-Reply-To: <20221003195944.3274548-3-aurelien@aurel32.net> From: Noah Goldstein Date: Mon, 3 Oct 2022 14:11:27 -0700 Message-ID: Subject: Re: [PATCH v3 2/8] x86-64: Require BMI2 for AVX2 str(n)casecmp implementations To: Aurelien Jarno Cc: libc-alpha@sourceware.org, "H . J . Lu" , Sunil K Pandey Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Oct 3, 2022 at 12:59 PM Aurelien Jarno wrote: > > The AVX2 str(n)casecmp implementations use the 'bzhi' instruction, which > belongs to the BMI2 CPU feature. > > NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF > as BSF if the CPU doesn't support TZCNT, and produces the same result > for non-zero input. > > Partially fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S") > Partially resolves: BZ #29611 > --- > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 28 +++++++++++++++------ > sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + > 2 files changed, 21 insertions(+), 8 deletions(-) > > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > index a71444eccb..d208fae4bf 100644 > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > @@ -448,13 +448,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > IFUNC_IMPL (i, name, strcasecmp, > X86_IFUNC_IMPL_ADD_V4 (array, i, strcasecmp, > (CPU_FEATURE_USABLE (AVX512VL) > - && CPU_FEATURE_USABLE (AVX512BW)), > + && CPU_FEATURE_USABLE (AVX512BW) > + && CPU_FEATURE_USABLE (BMI2)), > __strcasecmp_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2)), > __strcasecmp_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2) > && CPU_FEATURE_USABLE (RTM)), > __strcasecmp_avx2_rtm) > X86_IFUNC_IMPL_ADD_V2 (array, i, strcasecmp, > @@ -470,13 +473,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > IFUNC_IMPL (i, name, strcasecmp_l, > X86_IFUNC_IMPL_ADD_V4 (array, i, strcasecmp, > (CPU_FEATURE_USABLE (AVX512VL) > - && CPU_FEATURE_USABLE (AVX512BW)), > + && CPU_FEATURE_USABLE (AVX512BW) > + && CPU_FEATURE_USABLE (BMI2)), > __strcasecmp_l_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2)), > __strcasecmp_l_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2) > && CPU_FEATURE_USABLE (RTM)), > __strcasecmp_l_avx2_rtm) > X86_IFUNC_IMPL_ADD_V2 (array, i, strcasecmp_l, > @@ -638,13 +644,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > IFUNC_IMPL (i, name, strncasecmp, > X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, > (CPU_FEATURE_USABLE (AVX512VL) > - && CPU_FEATURE_USABLE (AVX512BW)), > + && CPU_FEATURE_USABLE (AVX512BW) > + && CPU_FEATURE_USABLE (BMI2)), > __strncasecmp_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2)), > __strncasecmp_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2) > && CPU_FEATURE_USABLE (RTM)), > __strncasecmp_avx2_rtm) > X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp, > @@ -660,13 +669,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > IFUNC_IMPL (i, name, strncasecmp_l, > X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, > (CPU_FEATURE_USABLE (AVX512VL) > - && CPU_FEATURE_USABLE (AVX512BW)), > + & CPU_FEATURE_USABLE (AVX512BW) > + && CPU_FEATURE_USABLE (BMI2)), > __strncasecmp_l_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2)), > __strncasecmp_l_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2) > && CPU_FEATURE_USABLE (RTM)), > __strncasecmp_l_avx2_rtm) > X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp_l, > diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > index 68646ef199..7622af259c 100644 > --- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > +++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > @@ -34,6 +34,7 @@ IFUNC_SELECTOR (void) > const struct cpu_features *cpu_features = __get_cpu_features (); > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, > AVX_Fast_Unaligned_Load, )) > { > -- > 2.35.1 > LGTM. Reviewed-by: Noah Goldstein