* [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) @ 2022-10-01 19:09 Aurelien Jarno 2022-10-01 19:09 ` [PATCH 1/4] x86: include BMI1 and BMI2 in x86-64-v3 level Aurelien Jarno ` (4 more replies) 0 siblings, 5 replies; 15+ messages in thread From: Aurelien Jarno @ 2022-10-01 19:09 UTC (permalink / raw) To: libc-alpha; +Cc: Noah Goldstein, H . J . Lu, Sunil K Pandey, Aurelien Jarno Some early Intel Haswell CPU have AVX2 instructions, but do not have BMI2 instructions. Some AVX2 string functions only check for AVX2, but use BMI2 or LZCNT instructions. This patchset tries to fix that. While most fixes only change ifunc-impl-list.c, and thus only concerns the testsuite, the strn(case)cmp is a real issue affecting early Intel Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. On the other hand, the check for LZCNT in memrchr is purely for correctness, I am not aware of a CPU implementing AVX2 without LZCNT. This has been tested by remplacing all BMI2 and LZCNT instruction in the source code by the "ud2" instruction and disabling the BMI1, BMI2 feature detection, and running the testsuite. Resolves: BZ #29611 Aurelien Jarno (4): x86: include BMI1 and BMI2 in x86-64-v3 level x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp implementations x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations x86-64: Require LZCNT for AVX2 memrchr implementation sysdeps/x86/get-isa-level.h | 2 + sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + sysdeps/x86_64/multiarch/ifunc-impl-list.c | 44 +++++++++++++++------ sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + sysdeps/x86_64/multiarch/strncmp.c | 4 +- 5 files changed, 38 insertions(+), 14 deletions(-) -- 2.35.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] x86: include BMI1 and BMI2 in x86-64-v3 level 2022-10-01 19:09 [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) Aurelien Jarno @ 2022-10-01 19:09 ` Aurelien Jarno 2022-10-01 19:09 ` [PATCH 2/4] x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp implementations Aurelien Jarno ` (3 subsequent siblings) 4 siblings, 0 replies; 15+ messages in thread From: Aurelien Jarno @ 2022-10-01 19:09 UTC (permalink / raw) To: libc-alpha; +Cc: Noah Goldstein, H . J . Lu, Sunil K Pandey, Aurelien Jarno The "System V Application Binary Interface AMD64 Architecture Processor Supplement" mandates the BMI1 and BMI2 CPU features for the x86-64-v3 level. --- sysdeps/x86/get-isa-level.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sysdeps/x86/get-isa-level.h b/sysdeps/x86/get-isa-level.h index 1ade78ab73..5b4dd5f062 100644 --- a/sysdeps/x86/get-isa-level.h +++ b/sysdeps/x86/get-isa-level.h @@ -47,6 +47,8 @@ get_isa_level (const struct cpu_features *cpu_features) isa_level |= GNU_PROPERTY_X86_ISA_1_V2; if (CPU_FEATURE_USABLE_P (cpu_features, AVX) && CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && CPU_FEATURE_USABLE_P (cpu_features, BMI1) + && CPU_FEATURE_USABLE_P (cpu_features, BMI2) && CPU_FEATURE_USABLE_P (cpu_features, F16C) && CPU_FEATURE_USABLE_P (cpu_features, FMA) && CPU_FEATURE_USABLE_P (cpu_features, LZCNT) -- 2.35.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/4] x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp implementations 2022-10-01 19:09 [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) Aurelien Jarno 2022-10-01 19:09 ` [PATCH 1/4] x86: include BMI1 and BMI2 in x86-64-v3 level Aurelien Jarno @ 2022-10-01 19:09 ` Aurelien Jarno 2022-10-01 22:15 ` Noah Goldstein 2022-10-01 19:09 ` [PATCH 3/4] x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations Aurelien Jarno ` (2 subsequent siblings) 4 siblings, 1 reply; 15+ messages in thread From: Aurelien Jarno @ 2022-10-01 19:09 UTC (permalink / raw) To: libc-alpha; +Cc: Noah Goldstein, H . J . Lu, Sunil K Pandey, Aurelien Jarno The AVX2 strncmp, strncasecmp and wcsncmp implementations use the bzhil instructions, which belongs to the BMI2 CPU feature. Fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 --- sysdeps/x86_64/multiarch/ifunc-impl-list.c | 25 +++++++++++++++------ sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + sysdeps/x86_64/multiarch/strncmp.c | 4 ++-- 3 files changed, 21 insertions(+), 9 deletions(-) diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index a71444eccb..ec1a8bff5e 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -638,13 +638,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, strncasecmp, X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI2)), __strncasecmp_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __strncasecmp_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strncasecmp_avx2_rtm) X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp, @@ -660,13 +663,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, strncasecmp_l, X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + & CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI2)), __strncasecmp_l_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2), __strncasecmp_l_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strncasecmp_l_avx2_rtm) X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp_l, @@ -816,10 +822,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __wcsncmp_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __wcsncmp_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __wcsncmp_avx2_rtm) /* ISA V2 wrapper for GENERIC implementation because the @@ -1162,13 +1170,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, strncmp, X86_IFUNC_IMPL_ADD_V4 (array, i, strncmp, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (BMI2)), __strncmp_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __strncmp_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __strncmp_avx2_rtm) X86_IFUNC_IMPL_ADD_V2 (array, i, strncmp, diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h index 68646ef199..7622af259c 100644 --- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h +++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h @@ -34,6 +34,7 @@ IFUNC_SELECTOR (void) const struct cpu_features *cpu_features = __get_cpu_features (); if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, )) { diff --git a/sysdeps/x86_64/multiarch/strncmp.c b/sysdeps/x86_64/multiarch/strncmp.c index 4ebe4bde30..c4f8b6bbb5 100644 --- a/sysdeps/x86_64/multiarch/strncmp.c +++ b/sysdeps/x86_64/multiarch/strncmp.c @@ -41,12 +41,12 @@ IFUNC_SELECTOR (void) const struct cpu_features *cpu_features = __get_cpu_features (); if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, )) { if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) return OPTIMIZE (evex); if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) -- 2.35.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp implementations 2022-10-01 19:09 ` [PATCH 2/4] x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp implementations Aurelien Jarno @ 2022-10-01 22:15 ` Noah Goldstein 2022-10-01 22:18 ` Noah Goldstein 0 siblings, 1 reply; 15+ messages in thread From: Noah Goldstein @ 2022-10-01 22:15 UTC (permalink / raw) To: Aurelien Jarno; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > The AVX2 strncmp, strncasecmp and wcsncmp implementations use the bzhil > instructions, which belongs to the BMI2 CPU feature. > > Fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S") > Partially resolves: BZ #29611 > --- > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 25 +++++++++++++++------ > sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + > sysdeps/x86_64/multiarch/strncmp.c | 4 ++-- The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport to 2.33, 2.34, 2.35. Also separate changes for ifunc need to be backport to strncmp.c: 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs BMI1. Finally a corresponding fix is needed for strcmp.c as well (there is missing BMI2 check in strcmp.c ifunc selection as well as missing checks in the impl list). > 3 files changed, 21 insertions(+), 9 deletions(-) > > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > index a71444eccb..ec1a8bff5e 100644 > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > @@ -638,13 +638,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > IFUNC_IMPL (i, name, strncasecmp, > X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, > (CPU_FEATURE_USABLE (AVX512VL) > - && CPU_FEATURE_USABLE (AVX512BW)), > + && CPU_FEATURE_USABLE (AVX512BW) > + && CPU_FEATURE_USABLE (BMI2)), > __strncasecmp_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2)), > __strncasecmp_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2) > && CPU_FEATURE_USABLE (RTM)), > __strncasecmp_avx2_rtm) > X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp, > @@ -660,13 +663,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > IFUNC_IMPL (i, name, strncasecmp_l, > X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, > (CPU_FEATURE_USABLE (AVX512VL) > - && CPU_FEATURE_USABLE (AVX512BW)), > + & CPU_FEATURE_USABLE (AVX512BW) > + && CPU_FEATURE_USABLE (BMI2)), > __strncasecmp_l_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2), > __strncasecmp_l_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2) > && CPU_FEATURE_USABLE (RTM)), > __strncasecmp_l_avx2_rtm) > X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp_l, > @@ -816,10 +822,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > && CPU_FEATURE_USABLE (BMI2)), > __wcsncmp_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2)), > __wcsncmp_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2) > && CPU_FEATURE_USABLE (RTM)), > __wcsncmp_avx2_rtm) > /* ISA V2 wrapper for GENERIC implementation because the > @@ -1162,13 +1170,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > IFUNC_IMPL (i, name, strncmp, > X86_IFUNC_IMPL_ADD_V4 (array, i, strncmp, > (CPU_FEATURE_USABLE (AVX512VL) > - && CPU_FEATURE_USABLE (AVX512BW)), > + && CPU_FEATURE_USABLE (AVX512BW) > + && CPU_FEATURE_USABLE (BMI2)), > __strncmp_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2)), > __strncmp_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (BMI2) > && CPU_FEATURE_USABLE (RTM)), > __strncmp_avx2_rtm) > X86_IFUNC_IMPL_ADD_V2 (array, i, strncmp, > diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > index 68646ef199..7622af259c 100644 > --- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > +++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > @@ -34,6 +34,7 @@ IFUNC_SELECTOR (void) > const struct cpu_features *cpu_features = __get_cpu_features (); > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, > AVX_Fast_Unaligned_Load, )) > { > diff --git a/sysdeps/x86_64/multiarch/strncmp.c b/sysdeps/x86_64/multiarch/strncmp.c > index 4ebe4bde30..c4f8b6bbb5 100644 > --- a/sysdeps/x86_64/multiarch/strncmp.c > +++ b/sysdeps/x86_64/multiarch/strncmp.c > @@ -41,12 +41,12 @@ IFUNC_SELECTOR (void) > const struct cpu_features *cpu_features = __get_cpu_features (); > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, > AVX_Fast_Unaligned_Load, )) > { > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) > - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) > - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)) > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) > return OPTIMIZE (evex); > > if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) > -- > 2.35.1 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp implementations 2022-10-01 22:15 ` Noah Goldstein @ 2022-10-01 22:18 ` Noah Goldstein 0 siblings, 0 replies; 15+ messages in thread From: Noah Goldstein @ 2022-10-01 22:18 UTC (permalink / raw) To: Aurelien Jarno; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey On Sat, Oct 1, 2022 at 3:15 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > The AVX2 strncmp, strncasecmp and wcsncmp implementations use the bzhil > > instructions, which belongs to the BMI2 CPU feature. > > > > Fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S") > > Partially resolves: BZ #29611 > > --- > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 25 +++++++++++++++------ > > sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + > > sysdeps/x86_64/multiarch/strncmp.c | 4 ++-- > > The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport > to 2.33, 2.34, 2.35. > > Also separate changes for ifunc need to be backport to strncmp.c: > 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs > BMI1. > > Finally a corresponding fix is needed for strcmp.c as well (there is > missing BMI2 check in strcmp.c ifunc selection as well as missing > checks in the impl list). Don't reply here. Reply (if needed) in main the [0/4] patch thread just to keep the conversation contained. > > > 3 files changed, 21 insertions(+), 9 deletions(-) > > > > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > index a71444eccb..ec1a8bff5e 100644 > > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > @@ -638,13 +638,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > IFUNC_IMPL (i, name, strncasecmp, > > X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, > > (CPU_FEATURE_USABLE (AVX512VL) > > - && CPU_FEATURE_USABLE (AVX512BW)), > > + && CPU_FEATURE_USABLE (AVX512BW) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strncasecmp_evex) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > > - CPU_FEATURE_USABLE (AVX2), > > + (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strncasecmp_avx2) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > > (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2) > > && CPU_FEATURE_USABLE (RTM)), > > __strncasecmp_avx2_rtm) > > X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp, > > @@ -660,13 +663,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > IFUNC_IMPL (i, name, strncasecmp_l, > > X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp, > > (CPU_FEATURE_USABLE (AVX512VL) > > - && CPU_FEATURE_USABLE (AVX512BW)), > > + & CPU_FEATURE_USABLE (AVX512BW) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strncasecmp_l_evex) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > > - CPU_FEATURE_USABLE (AVX2), > > + (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2), > > __strncasecmp_l_avx2) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp, > > (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2) > > && CPU_FEATURE_USABLE (RTM)), > > __strncasecmp_l_avx2_rtm) > > X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp_l, > > @@ -816,10 +822,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > && CPU_FEATURE_USABLE (BMI2)), > > __wcsncmp_evex) > > X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, > > - CPU_FEATURE_USABLE (AVX2), > > + (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2)), > > __wcsncmp_avx2) > > X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp, > > (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2) > > && CPU_FEATURE_USABLE (RTM)), > > __wcsncmp_avx2_rtm) > > /* ISA V2 wrapper for GENERIC implementation because the > > @@ -1162,13 +1170,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > IFUNC_IMPL (i, name, strncmp, > > X86_IFUNC_IMPL_ADD_V4 (array, i, strncmp, > > (CPU_FEATURE_USABLE (AVX512VL) > > - && CPU_FEATURE_USABLE (AVX512BW)), > > + && CPU_FEATURE_USABLE (AVX512BW) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strncmp_evex) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, > > - CPU_FEATURE_USABLE (AVX2), > > + (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2)), > > __strncmp_avx2) > > X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp, > > (CPU_FEATURE_USABLE (AVX2) > > + && CPU_FEATURE_USABLE (BMI2) > > && CPU_FEATURE_USABLE (RTM)), > > __strncmp_avx2_rtm) > > X86_IFUNC_IMPL_ADD_V2 (array, i, strncmp, > > diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > > index 68646ef199..7622af259c 100644 > > --- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > > +++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h > > @@ -34,6 +34,7 @@ IFUNC_SELECTOR (void) > > const struct cpu_features *cpu_features = __get_cpu_features (); > > > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) > > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, > > AVX_Fast_Unaligned_Load, )) > > { > > diff --git a/sysdeps/x86_64/multiarch/strncmp.c b/sysdeps/x86_64/multiarch/strncmp.c > > index 4ebe4bde30..c4f8b6bbb5 100644 > > --- a/sysdeps/x86_64/multiarch/strncmp.c > > +++ b/sysdeps/x86_64/multiarch/strncmp.c > > @@ -41,12 +41,12 @@ IFUNC_SELECTOR (void) > > const struct cpu_features *cpu_features = __get_cpu_features (); > > > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) > > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, > > AVX_Fast_Unaligned_Load, )) > > { > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL) > > - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW) > > - && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)) > > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)) > > return OPTIMIZE (evex); > > > > if (CPU_FEATURE_USABLE_P (cpu_features, RTM)) > > -- > > 2.35.1 > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/4] x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations 2022-10-01 19:09 [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) Aurelien Jarno 2022-10-01 19:09 ` [PATCH 1/4] x86: include BMI1 and BMI2 in x86-64-v3 level Aurelien Jarno 2022-10-01 19:09 ` [PATCH 2/4] x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp implementations Aurelien Jarno @ 2022-10-01 19:09 ` Aurelien Jarno 2022-10-01 19:09 ` [PATCH 4/4] x86-64: Require LZCNT for AVX2 memrchr implementation Aurelien Jarno 2022-10-01 22:11 ` [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) Noah Goldstein 4 siblings, 0 replies; 15+ messages in thread From: Aurelien Jarno @ 2022-10-01 19:09 UTC (permalink / raw) To: libc-alpha; +Cc: Noah Goldstein, H . J . Lu, Sunil K Pandey, Aurelien Jarno The AVX2 memchr, rawmemchr and wmemchr implementations use the bzhiq and sarxl instructions, which belongs to the BMI2 CPU feature. Fixes: acfd088a1963 ("x86: Optimize memchr-avx2.S") Partially resolves: BZ #29611 --- sysdeps/x86_64/multiarch/ifunc-impl-list.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index ec1a8bff5e..c628462d47 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -69,10 +69,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __memchr_evex_rtm) X86_IFUNC_IMPL_ADD_V3 (array, i, memchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __memchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, memchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __memchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -335,10 +337,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __rawmemchr_evex_rtm) X86_IFUNC_IMPL_ADD_V3 (array, i, rawmemchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __rawmemchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, rawmemchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __rawmemchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 @@ -917,10 +921,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, && CPU_FEATURE_USABLE (BMI2)), __wmemchr_evex_rtm) X86_IFUNC_IMPL_ADD_V3 (array, i, wmemchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2)), __wmemchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, wmemchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (BMI2) && CPU_FEATURE_USABLE (RTM)), __wmemchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 -- 2.35.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 4/4] x86-64: Require LZCNT for AVX2 memrchr implementation 2022-10-01 19:09 [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) Aurelien Jarno ` (2 preceding siblings ...) 2022-10-01 19:09 ` [PATCH 3/4] x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations Aurelien Jarno @ 2022-10-01 19:09 ` Aurelien Jarno 2022-10-01 22:06 ` Noah Goldstein 2022-10-01 22:30 ` Noah Goldstein 2022-10-01 22:11 ` [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) Noah Goldstein 4 siblings, 2 replies; 15+ messages in thread From: Aurelien Jarno @ 2022-10-01 19:09 UTC (permalink / raw) To: libc-alpha; +Cc: Noah Goldstein, H . J . Lu, Sunil K Pandey, Aurelien Jarno The AVX2 memrchr implementation uses the lzcntl and lzcntq instructions, which belongs to the LZCNT CPU feature. Fixes: af5306a735eb ("x86: Optimize memrchr-avx2.S") Partially resolves: BZ #29611 --- sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + sysdeps/x86_64/multiarch/ifunc-impl-list.c | 7 +++++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/sysdeps/x86_64/multiarch/ifunc-avx2.h b/sysdeps/x86_64/multiarch/ifunc-avx2.h index a57a9952f3..f1741083fd 100644 --- a/sysdeps/x86_64/multiarch/ifunc-avx2.h +++ b/sysdeps/x86_64/multiarch/ifunc-avx2.h @@ -37,6 +37,7 @@ IFUNC_SELECTOR (void) if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, LZCNT) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, )) { diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index c628462d47..db5a2032d6 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -209,13 +209,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL (i, name, memrchr, X86_IFUNC_IMPL_ADD_V4 (array, i, memrchr, (CPU_FEATURE_USABLE (AVX512VL) - && CPU_FEATURE_USABLE (AVX512BW)), + && CPU_FEATURE_USABLE (AVX512BW) + && CPU_FEATURE_USABLE (LZCNT)), __memrchr_evex) X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr, - CPU_FEATURE_USABLE (AVX2), + (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (LZCNT)), __memrchr_avx2) X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr, (CPU_FEATURE_USABLE (AVX2) + && CPU_FEATURE_USABLE (LZCNT) && CPU_FEATURE_USABLE (RTM)), __memrchr_avx2_rtm) /* ISA V2 wrapper for SSE2 implementation because the SSE2 -- 2.35.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] x86-64: Require LZCNT for AVX2 memrchr implementation 2022-10-01 19:09 ` [PATCH 4/4] x86-64: Require LZCNT for AVX2 memrchr implementation Aurelien Jarno @ 2022-10-01 22:06 ` Noah Goldstein 2022-10-01 22:14 ` Aurelien Jarno 2022-10-01 22:30 ` Noah Goldstein 1 sibling, 1 reply; 15+ messages in thread From: Noah Goldstein @ 2022-10-01 22:06 UTC (permalink / raw) To: Aurelien Jarno; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > The AVX2 memrchr implementation uses the lzcntl and lzcntq instructions, > which belongs to the LZCNT CPU feature. > > Fixes: af5306a735eb ("x86: Optimize memrchr-avx2.S") > Partially resolves: BZ #29611 > --- > sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 7 +++++-- > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/sysdeps/x86_64/multiarch/ifunc-avx2.h b/sysdeps/x86_64/multiarch/ifunc-avx2.h > index a57a9952f3..f1741083fd 100644 > --- a/sysdeps/x86_64/multiarch/ifunc-avx2.h > +++ b/sysdeps/x86_64/multiarch/ifunc-avx2.h > @@ -37,6 +37,7 @@ IFUNC_SELECTOR (void) > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, LZCNT) > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, > AVX_Fast_Unaligned_Load, )) > { > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > index c628462d47..db5a2032d6 100644 > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > @@ -209,13 +209,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > IFUNC_IMPL (i, name, memrchr, > X86_IFUNC_IMPL_ADD_V4 (array, i, memrchr, > (CPU_FEATURE_USABLE (AVX512VL) > - && CPU_FEATURE_USABLE (AVX512BW)), > + && CPU_FEATURE_USABLE (AVX512BW) > + && CPU_FEATURE_USABLE (LZCNT)), Also needs BMI2 for the `shlx`. Likewise for avx2 versions. > __memrchr_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (LZCNT)), > __memrchr_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (LZCNT) > && CPU_FEATURE_USABLE (RTM)), > __memrchr_avx2_rtm) > /* ISA V2 wrapper for SSE2 implementation because the SSE2 > -- > 2.35.1 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] x86-64: Require LZCNT for AVX2 memrchr implementation 2022-10-01 22:06 ` Noah Goldstein @ 2022-10-01 22:14 ` Aurelien Jarno 0 siblings, 0 replies; 15+ messages in thread From: Aurelien Jarno @ 2022-10-01 22:14 UTC (permalink / raw) To: Noah Goldstein; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey On 2022-10-01 15:06, Noah Goldstein wrote: > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > The AVX2 memrchr implementation uses the lzcntl and lzcntq instructions, > > which belongs to the LZCNT CPU feature. > > > > Fixes: af5306a735eb ("x86: Optimize memrchr-avx2.S") > > Partially resolves: BZ #29611 > > --- > > sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 7 +++++-- > > 2 files changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/sysdeps/x86_64/multiarch/ifunc-avx2.h b/sysdeps/x86_64/multiarch/ifunc-avx2.h > > index a57a9952f3..f1741083fd 100644 > > --- a/sysdeps/x86_64/multiarch/ifunc-avx2.h > > +++ b/sysdeps/x86_64/multiarch/ifunc-avx2.h > > @@ -37,6 +37,7 @@ IFUNC_SELECTOR (void) > > > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > > && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) > > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, LZCNT) > > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, > > AVX_Fast_Unaligned_Load, )) > > { > > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > index c628462d47..db5a2032d6 100644 > > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > > @@ -209,13 +209,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > > IFUNC_IMPL (i, name, memrchr, > > X86_IFUNC_IMPL_ADD_V4 (array, i, memrchr, > > (CPU_FEATURE_USABLE (AVX512VL) > > - && CPU_FEATURE_USABLE (AVX512BW)), > > + && CPU_FEATURE_USABLE (AVX512BW) > > + && CPU_FEATURE_USABLE (LZCNT)), > > Also needs BMI2 for the `shlx`. Likewise for avx2 versions. Good catch, I haven't look for that one, so I haven't encountered the issue. Similarly there is 'shrx'. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://www.aurel32.net ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] x86-64: Require LZCNT for AVX2 memrchr implementation 2022-10-01 19:09 ` [PATCH 4/4] x86-64: Require LZCNT for AVX2 memrchr implementation Aurelien Jarno 2022-10-01 22:06 ` Noah Goldstein @ 2022-10-01 22:30 ` Noah Goldstein 1 sibling, 0 replies; 15+ messages in thread From: Noah Goldstein @ 2022-10-01 22:30 UTC (permalink / raw) To: Aurelien Jarno; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > The AVX2 memrchr implementation uses the lzcntl and lzcntq instructions, > which belongs to the LZCNT CPU feature. > > Fixes: af5306a735eb ("x86: Optimize memrchr-avx2.S") > Partially resolves: BZ #29611 > --- > sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 7 +++++-- > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/sysdeps/x86_64/multiarch/ifunc-avx2.h b/sysdeps/x86_64/multiarch/ifunc-avx2.h > index a57a9952f3..f1741083fd 100644 > --- a/sysdeps/x86_64/multiarch/ifunc-avx2.h > +++ b/sysdeps/x86_64/multiarch/ifunc-avx2.h > @@ -37,6 +37,7 @@ IFUNC_SELECTOR (void) > > if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) > && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2) > + && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, LZCNT) This causes a build failure. Need a corresponding macro in sysdeps/x86/isa-level.h Something like: #define LZCNT_X86_ISA_LEVEL 3 after the BMI2 one. > && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, > AVX_Fast_Unaligned_Load, )) > { > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > index c628462d47..db5a2032d6 100644 > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c > @@ -209,13 +209,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, > IFUNC_IMPL (i, name, memrchr, > X86_IFUNC_IMPL_ADD_V4 (array, i, memrchr, > (CPU_FEATURE_USABLE (AVX512VL) > - && CPU_FEATURE_USABLE (AVX512BW)), > + && CPU_FEATURE_USABLE (AVX512BW) > + && CPU_FEATURE_USABLE (LZCNT)), > __memrchr_evex) > X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr, > - CPU_FEATURE_USABLE (AVX2), > + (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (LZCNT)), > __memrchr_avx2) > X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr, > (CPU_FEATURE_USABLE (AVX2) > + && CPU_FEATURE_USABLE (LZCNT) > && CPU_FEATURE_USABLE (RTM)), > __memrchr_avx2_rtm) > /* ISA V2 wrapper for SSE2 implementation because the SSE2 > -- > 2.35.1 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) 2022-10-01 19:09 [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) Aurelien Jarno ` (3 preceding siblings ...) 2022-10-01 19:09 ` [PATCH 4/4] x86-64: Require LZCNT for AVX2 memrchr implementation Aurelien Jarno @ 2022-10-01 22:11 ` Noah Goldstein 2022-10-01 22:17 ` Aurelien Jarno 2022-10-01 22:17 ` Noah Goldstein 4 siblings, 2 replies; 15+ messages in thread From: Noah Goldstein @ 2022-10-01 22:11 UTC (permalink / raw) To: Aurelien Jarno; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > use BMI2 or LZCNT instructions. This patchset tries to fix that. > > While most fixes only change ifunc-impl-list.c, and thus only concerns > the testsuite, the strn(case)cmp is a real issue affecting early Intel > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > On the other hand, the check for LZCNT in memrchr is purely for > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > source code by the "ud2" instruction and disabling the BMI1, BMI2 > feature detection, and running the testsuite. > > Resolves: BZ #29611 > > Aurelien Jarno (4): > x86: include BMI1 and BMI2 in x86-64-v3 level > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > implementations > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > x86-64: Require LZCNT for AVX2 memrchr implementation > We also need BMI2 check in ifunc-impl-list for: strcasecmp strcmp strcasecmp_l strrchr wcsrchr wcscmp If you want you can make patches, otherwise I can. > sysdeps/x86/get-isa-level.h | 2 + > sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 44 +++++++++++++++------ > sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + > sysdeps/x86_64/multiarch/strncmp.c | 4 +- > 5 files changed, 38 insertions(+), 14 deletions(-) > > -- > 2.35.1 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) 2022-10-01 22:11 ` [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) Noah Goldstein @ 2022-10-01 22:17 ` Aurelien Jarno 2022-10-01 22:17 ` Noah Goldstein 1 sibling, 0 replies; 15+ messages in thread From: Aurelien Jarno @ 2022-10-01 22:17 UTC (permalink / raw) To: Noah Goldstein; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey On 2022-10-01 15:11, Noah Goldstein wrote: > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > use BMI2 or LZCNT instructions. This patchset tries to fix that. > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > On the other hand, the check for LZCNT in memrchr is purely for > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > feature detection, and running the testsuite. > > > > Resolves: BZ #29611 > > > > Aurelien Jarno (4): > > x86: include BMI1 and BMI2 in x86-64-v3 level > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > implementations > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > We also need BMI2 check in ifunc-impl-list for: > strcasecmp > strcmp > strcasecmp_l I didn't included those, because 'bzhil' is only used the 'n' case. That said with the 'shlx' you mentioned in the other email, we should indeed include that one. > strrchr > wcsrchr > wcscmp Same for those, I missed 'shlx'. I'll fix that. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://www.aurel32.net ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) 2022-10-01 22:11 ` [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) Noah Goldstein 2022-10-01 22:17 ` Aurelien Jarno @ 2022-10-01 22:17 ` Noah Goldstein 2022-10-02 9:35 ` Aurelien Jarno 1 sibling, 1 reply; 15+ messages in thread From: Noah Goldstein @ 2022-10-01 22:17 UTC (permalink / raw) To: Aurelien Jarno; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey On Sat, Oct 1, 2022 at 3:11 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > use BMI2 or LZCNT instructions. This patchset tries to fix that. > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > On the other hand, the check for LZCNT in memrchr is purely for > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > feature detection, and running the testsuite. > > > > Resolves: BZ #29611 > > > > Aurelien Jarno (4): > > x86: include BMI1 and BMI2 in x86-64-v3 level > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > implementations > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > We also need BMI2 check in ifunc-impl-list for: > strcasecmp > strcmp > strcasecmp_l > strrchr > wcsrchr > wcscmp > > If you want you can make patches, otherwise I can. This is a duplicate of a comment I left in the strn(case)cmp patchset, but leaving here so the information is not scattered: The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport to 2.33, 2.34, 2.35. Also separate changes for ifunc need to be backport to strncmp.c: 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs BMI1. Finally a corresponding fix is needed for strcmp.c as well (there is missing BMI2 check in strcmp.c ifunc selection as well as missing checks in the impl list). > > sysdeps/x86/get-isa-level.h | 2 + > > sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 44 +++++++++++++++------ > > sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + > > sysdeps/x86_64/multiarch/strncmp.c | 4 +- > > 5 files changed, 38 insertions(+), 14 deletions(-) > > > > -- > > 2.35.1 > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) 2022-10-01 22:17 ` Noah Goldstein @ 2022-10-02 9:35 ` Aurelien Jarno 2022-10-02 16:19 ` Noah Goldstein 0 siblings, 1 reply; 15+ messages in thread From: Aurelien Jarno @ 2022-10-02 9:35 UTC (permalink / raw) To: Noah Goldstein; +Cc: libc-alpha On 2022-10-01 15:17, Noah Goldstein via Libc-alpha wrote: > On Sat, Oct 1, 2022 at 3:11 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > > > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > > use BMI2 or LZCNT instructions. This patchset tries to fix that. > > > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > > > On the other hand, the check for LZCNT in memrchr is purely for > > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > > feature detection, and running the testsuite. > > > > > > Resolves: BZ #29611 > > > > > > Aurelien Jarno (4): > > > x86: include BMI1 and BMI2 in x86-64-v3 level > > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > > implementations > > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > > > > We also need BMI2 check in ifunc-impl-list for: > > strcasecmp > > strcmp > > strcasecmp_l > > strrchr > > wcsrchr > > wcscmp > > > > If you want you can make patches, otherwise I can. > > This is a duplicate of a comment I left in the strn(case)cmp patchset, > but leaving here so the information is not scattered: > > The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport > to 2.33, 2.34, 2.35. > > Also separate changes for ifunc need to be backport to strncmp.c: > 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs > BMI1. Is that really correct? According the commit log TZCNT is used in a way that is compatible with BSF: commit 1457016337072d1b6739f571846b619596990cb7 Author: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com> Date: Thu May 3 11:09:30 2018 -0500 x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2 Optimize x86-64 strcmp/wcscmp and strncmp/wcsncmp with AVX2. It uses vector comparison as much as possible. Peak performance observed on a SkyLake machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp, respectively. The larger the comparison length, the more benefit using avx2 functions, except on the strcmp, where peak is observed at length == 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://www.aurel32.net ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) 2022-10-02 9:35 ` Aurelien Jarno @ 2022-10-02 16:19 ` Noah Goldstein 0 siblings, 0 replies; 15+ messages in thread From: Noah Goldstein @ 2022-10-02 16:19 UTC (permalink / raw) To: Noah Goldstein, libc-alpha On Sun, Oct 2, 2022 at 2:35 AM Aurelien Jarno <aurelien@aurel32.net> wrote: > > On 2022-10-01 15:17, Noah Goldstein via Libc-alpha wrote: > > On Sat, Oct 1, 2022 at 3:11 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > > > > > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > > > use BMI2 or LZCNT instructions. This patchset tries to fix that. Think you're right. > > > > > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > > > > > On the other hand, the check for LZCNT in memrchr is purely for > > > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > > > feature detection, and running the testsuite. > > > > > > > > Resolves: BZ #29611 > > > > > > > > Aurelien Jarno (4): > > > > x86: include BMI1 and BMI2 in x86-64-v3 level > > > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > > > implementations > > > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > > > > > > > We also need BMI2 check in ifunc-impl-list for: > > > strcasecmp > > > strcmp > > > strcasecmp_l > > > strrchr > > > wcsrchr > > > wcscmp > > > > > > If you want you can make patches, otherwise I can. > > > > This is a duplicate of a comment I left in the strn(case)cmp patchset, > > but leaving here so the information is not scattered: > > > > The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport > > to 2.33, 2.34, 2.35. > > > > Also separate changes for ifunc need to be backport to strncmp.c: > > 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs > > BMI1. > > Is that really correct? According the commit log TZCNT is used in a way > that is compatible with BSF: > > commit 1457016337072d1b6739f571846b619596990cb7 > Author: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com> > Date: Thu May 3 11:09:30 2018 -0500 > > x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2 > > Optimize x86-64 strcmp/wcscmp and strncmp/wcsncmp with AVX2. It uses vector > comparison as much as possible. Peak performance observed on a SkyLake > machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp, > respectively. The larger the comparison length, the more benefit using > avx2 functions, except on the strcmp, where peak is observed at length > == 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper > is preferred and AVX unaligned load is fast. > > NB: It uses TZCNT instead of BSF since TZCNT produces the same result > as BSF for non-zero input. TZCNT is faster than BSF and is executed > as BSF if machine doesn't support TZCNT. > > -- > Aurelien Jarno GPG: 4096R/1DDD8C9B > aurelien@aurel32.net http://www.aurel32.net ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-10-02 16:19 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-10-01 19:09 [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) Aurelien Jarno 2022-10-01 19:09 ` [PATCH 1/4] x86: include BMI1 and BMI2 in x86-64-v3 level Aurelien Jarno 2022-10-01 19:09 ` [PATCH 2/4] x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp implementations Aurelien Jarno 2022-10-01 22:15 ` Noah Goldstein 2022-10-01 22:18 ` Noah Goldstein 2022-10-01 19:09 ` [PATCH 3/4] x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations Aurelien Jarno 2022-10-01 19:09 ` [PATCH 4/4] x86-64: Require LZCNT for AVX2 memrchr implementation Aurelien Jarno 2022-10-01 22:06 ` Noah Goldstein 2022-10-01 22:14 ` Aurelien Jarno 2022-10-01 22:30 ` Noah Goldstein 2022-10-01 22:11 ` [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) Noah Goldstein 2022-10-01 22:17 ` Aurelien Jarno 2022-10-01 22:17 ` Noah Goldstein 2022-10-02 9:35 ` Aurelien Jarno 2022-10-02 16:19 ` Noah Goldstein
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).