public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611)
@ 2022-10-02 12:34 Aurelien Jarno
  2022-10-02 12:34 ` [PATCH v2 1/6] x86: include BMI1 and BMI2 in x86-64-v3 level Aurelien Jarno
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Aurelien Jarno @ 2022-10-02 12:34 UTC (permalink / raw)
  To: libc-alpha; +Cc: Noah Goldstein, H . J . Lu, Sunil K Pandey, Aurelien Jarno

Some early Intel Haswell CPU have AVX2 instructions, but do not have
BMI1 and BMI2 instructions. Some AVX2 string functions only check for
AVX2, but use BMI1, BMI2 or LZCNT instructions. This patchset tries to
fix that.

While most fixes only change ifunc-impl-list.c, and thus only concerns
the testsuite, the strn(case)cmp is a real issue affecting early Intel
Haswell CPU, reported to affect Debian Sid and Fedora Rawhide.

On the other hand, the check for LZCNT in memrchr is purely for
correctness, I am not aware of a CPU implementing AVX2 without LZCNT.

This has been tested by remplacing all BMI1 and BMI2 instructions in the
source code by the "ud2" instruction and disabling the BMI1, BMI2
feature detection, and running the testsuite.

Resolves: BZ #29611

Change v1 -> v2:
- Better scan for BMI2 instructions (shlx and shrx) and BMI1
  instructions (blsmsk) instructions following the feedback from Noah
  Goldstein

Aurelien Jarno (6):
  x86: include BMI1 and BMI2 in x86-64-v3 level
  x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
  x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations
  x86-64: Require LZCNT for AVX2 memrchr implementation
  x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations
  x86-64: Require BMI2 for AVX2 memrchr implementation

 sysdeps/x86/get-isa-level.h                 |  2 +
 sysdeps/x86/isa-level.h                     |  2 +
 sysdeps/x86_64/multiarch/ifunc-avx2.h       |  2 +
 sysdeps/x86_64/multiarch/ifunc-impl-list.c  | 86 ++++++++++++++++-----
 sysdeps/x86_64/multiarch/ifunc-strcasecmp.h |  1 +
 sysdeps/x86_64/multiarch/strcmp.c           |  4 +-
 sysdeps/x86_64/multiarch/strncmp.c          |  4 +-
 7 files changed, 76 insertions(+), 25 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/6] x86: include BMI1 and BMI2 in x86-64-v3 level
  2022-10-02 12:34 [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611) Aurelien Jarno
@ 2022-10-02 12:34 ` Aurelien Jarno
  2022-10-02 21:07   ` Noah Goldstein
  2022-10-02 12:34 ` [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations Aurelien Jarno
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2022-10-02 12:34 UTC (permalink / raw)
  To: libc-alpha; +Cc: Noah Goldstein, H . J . Lu, Sunil K Pandey, Aurelien Jarno

The "System V Application Binary Interface AMD64 Architecture Processor
Supplement" mandates the BMI1 and BMI2 CPU features for the x86-64-v3
level.
---
 sysdeps/x86/get-isa-level.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sysdeps/x86/get-isa-level.h b/sysdeps/x86/get-isa-level.h
index 1ade78ab73..5b4dd5f062 100644
--- a/sysdeps/x86/get-isa-level.h
+++ b/sysdeps/x86/get-isa-level.h
@@ -47,6 +47,8 @@ get_isa_level (const struct cpu_features *cpu_features)
 	  isa_level |= GNU_PROPERTY_X86_ISA_1_V2;
 	  if (CPU_FEATURE_USABLE_P (cpu_features, AVX)
 	      && CPU_FEATURE_USABLE_P (cpu_features, AVX2)
+	      && CPU_FEATURE_USABLE_P (cpu_features, BMI1)
+	      && CPU_FEATURE_USABLE_P (cpu_features, BMI2)
 	      && CPU_FEATURE_USABLE_P (cpu_features, F16C)
 	      && CPU_FEATURE_USABLE_P (cpu_features, FMA)
 	      && CPU_FEATURE_USABLE_P (cpu_features, LZCNT)
-- 
2.35.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
  2022-10-02 12:34 [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611) Aurelien Jarno
  2022-10-02 12:34 ` [PATCH v2 1/6] x86: include BMI1 and BMI2 in x86-64-v3 level Aurelien Jarno
@ 2022-10-02 12:34 ` Aurelien Jarno
  2022-10-02 21:08   ` Noah Goldstein
  2022-10-02 12:34 ` [PATCH v2 3/6] x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations Aurelien Jarno
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2022-10-02 12:34 UTC (permalink / raw)
  To: libc-alpha; +Cc: Noah Goldstein, H . J . Lu, Sunil K Pandey, Aurelien Jarno

The AVX2 str*cmp and wcs(n)cmp implementations use the 'bzhi'
instruction, which belongs to the BMI2 CPU feature.

NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF
as BSF if the CPU doesn't support TZCNT, and produces the same result
for non-zero input.

Fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S")
Partially resolves: BZ #29611
---
 sysdeps/x86_64/multiarch/ifunc-impl-list.c  | 47 +++++++++++++++------
 sysdeps/x86_64/multiarch/ifunc-strcasecmp.h |  1 +
 sysdeps/x86_64/multiarch/strcmp.c           |  4 +-
 sysdeps/x86_64/multiarch/strncmp.c          |  4 +-
 4 files changed, 39 insertions(+), 17 deletions(-)

diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index a71444eccb..fec8790c11 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -448,13 +448,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
   IFUNC_IMPL (i, name, strcasecmp,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, strcasecmp,
 				     (CPU_FEATURE_USABLE (AVX512VL)
-				      && CPU_FEATURE_USABLE (AVX512BW)),
+				      && CPU_FEATURE_USABLE (AVX512BW)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strcasecmp_evex)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strcasecmp_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __strcasecmp_avx2_rtm)
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, strcasecmp,
@@ -470,13 +473,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
   IFUNC_IMPL (i, name, strcasecmp_l,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, strcasecmp,
 				     (CPU_FEATURE_USABLE (AVX512VL)
-				      && CPU_FEATURE_USABLE (AVX512BW)),
+				      && CPU_FEATURE_USABLE (AVX512BW)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strcasecmp_l_evex)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strcasecmp_l_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __strcasecmp_l_avx2_rtm)
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, strcasecmp_l,
@@ -585,10 +591,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strcmp_evex)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strcmp,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strcmp_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strcmp,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __strcmp_avx2_rtm)
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, strcmp,
@@ -638,13 +646,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
   IFUNC_IMPL (i, name, strncasecmp,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp,
 				     (CPU_FEATURE_USABLE (AVX512VL)
-				      && CPU_FEATURE_USABLE (AVX512BW)),
+				      && CPU_FEATURE_USABLE (AVX512BW)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strncasecmp_evex)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strncasecmp_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __strncasecmp_avx2_rtm)
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp,
@@ -660,13 +671,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
   IFUNC_IMPL (i, name, strncasecmp_l,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp,
 				     (CPU_FEATURE_USABLE (AVX512VL)
-				      && CPU_FEATURE_USABLE (AVX512BW)),
+				      & CPU_FEATURE_USABLE (AVX512BW)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strncasecmp_l_evex)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strncasecmp_l_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __strncasecmp_l_avx2_rtm)
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp_l,
@@ -796,10 +810,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 				      && CPU_FEATURE_USABLE (BMI2)),
 				     __wcscmp_evex)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, wcscmp,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __wcscmp_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, wcscmp,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __wcscmp_avx2_rtm)
 	      /* ISA V2 wrapper for SSE2 implementation because the SSE2
@@ -816,10 +832,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 				      && CPU_FEATURE_USABLE (BMI2)),
 				     __wcsncmp_evex)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __wcsncmp_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __wcsncmp_avx2_rtm)
 	      /* ISA V2 wrapper for GENERIC implementation because the
@@ -1162,13 +1180,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
   IFUNC_IMPL (i, name, strncmp,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, strncmp,
 				     (CPU_FEATURE_USABLE (AVX512VL)
-				      && CPU_FEATURE_USABLE (AVX512BW)),
+				      && CPU_FEATURE_USABLE (AVX512BW)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strncmp_evex)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strncmp_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __strncmp_avx2_rtm)
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, strncmp,
diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
index 68646ef199..7622af259c 100644
--- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
+++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
@@ -34,6 +34,7 @@ IFUNC_SELECTOR (void)
   const struct cpu_features *cpu_features = __get_cpu_features ();
 
   if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
+      && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)
       && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
 				      AVX_Fast_Unaligned_Load, ))
     {
diff --git a/sysdeps/x86_64/multiarch/strcmp.c b/sysdeps/x86_64/multiarch/strcmp.c
index fdd5afe3af..9d6c9f66ba 100644
--- a/sysdeps/x86_64/multiarch/strcmp.c
+++ b/sysdeps/x86_64/multiarch/strcmp.c
@@ -45,12 +45,12 @@ IFUNC_SELECTOR (void)
   const struct cpu_features *cpu_features = __get_cpu_features ();
 
   if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
+      && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)
       && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
 				      AVX_Fast_Unaligned_Load, ))
     {
       if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
-	  && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)
-	  && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2))
+	  && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
 	return OPTIMIZE (evex);
 
       if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
diff --git a/sysdeps/x86_64/multiarch/strncmp.c b/sysdeps/x86_64/multiarch/strncmp.c
index 4ebe4bde30..c4f8b6bbb5 100644
--- a/sysdeps/x86_64/multiarch/strncmp.c
+++ b/sysdeps/x86_64/multiarch/strncmp.c
@@ -41,12 +41,12 @@ IFUNC_SELECTOR (void)
   const struct cpu_features *cpu_features = __get_cpu_features ();
 
   if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
+      && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)
       && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
 				      AVX_Fast_Unaligned_Load, ))
     {
       if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
-	  && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)
-	  && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2))
+	  && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
 	return OPTIMIZE (evex);
 
       if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
-- 
2.35.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 3/6] x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations
  2022-10-02 12:34 [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611) Aurelien Jarno
  2022-10-02 12:34 ` [PATCH v2 1/6] x86: include BMI1 and BMI2 in x86-64-v3 level Aurelien Jarno
  2022-10-02 12:34 ` [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations Aurelien Jarno
@ 2022-10-02 12:34 ` Aurelien Jarno
  2022-10-02 21:08   ` Noah Goldstein
  2022-10-02 12:34 ` [PATCH v2 4/6] x86-64: Require LZCNT for AVX2 memrchr implementation Aurelien Jarno
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2022-10-02 12:34 UTC (permalink / raw)
  To: libc-alpha; +Cc: Noah Goldstein, H . J . Lu, Sunil K Pandey, Aurelien Jarno

The AVX2 memchr, rawmemchr and wmemchr implementations use the 'bzhi'
and 'sarx' instructions, which belongs to the BMI2 CPU feature.

Fixes: acfd088a1963 ("x86: Optimize memchr-avx2.S")
Partially resolves: BZ #29611
---
 sysdeps/x86_64/multiarch/ifunc-impl-list.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index fec8790c11..7c84963d92 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -69,10 +69,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 				      && CPU_FEATURE_USABLE (BMI2)),
 				     __memchr_evex_rtm)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, memchr,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __memchr_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, memchr,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __memchr_avx2_rtm)
 	      /* ISA V2 wrapper for SSE2 implementation because the SSE2
@@ -335,10 +337,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 				      && CPU_FEATURE_USABLE (BMI2)),
 				     __rawmemchr_evex_rtm)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, rawmemchr,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __rawmemchr_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, rawmemchr,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __rawmemchr_avx2_rtm)
 	      /* ISA V2 wrapper for SSE2 implementation because the SSE2
@@ -927,10 +931,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 				      && CPU_FEATURE_USABLE (BMI2)),
 				     __wmemchr_evex_rtm)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, wmemchr,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __wmemchr_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, wmemchr,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __wmemchr_avx2_rtm)
 	      /* ISA V2 wrapper for SSE2 implementation because the SSE2
-- 
2.35.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 4/6] x86-64: Require LZCNT for AVX2 memrchr implementation
  2022-10-02 12:34 [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611) Aurelien Jarno
                   ` (2 preceding siblings ...)
  2022-10-02 12:34 ` [PATCH v2 3/6] x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations Aurelien Jarno
@ 2022-10-02 12:34 ` Aurelien Jarno
  2022-10-02 21:08   ` Noah Goldstein
  2022-10-02 12:34 ` [PATCH v2 5/6] x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations Aurelien Jarno
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2022-10-02 12:34 UTC (permalink / raw)
  To: libc-alpha; +Cc: Noah Goldstein, H . J . Lu, Sunil K Pandey, Aurelien Jarno

The AVX2 memrchr implementation uses the 'lzcnt' instruction, which
belongs to the LZCNT CPU feature.

Fixes: af5306a735eb ("x86: Optimize memrchr-avx2.S")
Partially resolves: BZ #29611
---
 sysdeps/x86/isa-level.h                    | 1 +
 sysdeps/x86_64/multiarch/ifunc-avx2.h      | 1 +
 sysdeps/x86_64/multiarch/ifunc-impl-list.c | 7 +++++--
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h
index 3c4480aba7..bbb90f5c5e 100644
--- a/sysdeps/x86/isa-level.h
+++ b/sysdeps/x86/isa-level.h
@@ -80,6 +80,7 @@
 #define AVX_X86_ISA_LEVEL 3
 #define AVX2_X86_ISA_LEVEL 3
 #define BMI2_X86_ISA_LEVEL 3
+#define LZCNT_X86_ISA_LEVEL 3
 #define MOVBE_X86_ISA_LEVEL 3
 
 /* ISA level >= 2 guaranteed includes.  */
diff --git a/sysdeps/x86_64/multiarch/ifunc-avx2.h b/sysdeps/x86_64/multiarch/ifunc-avx2.h
index a57a9952f3..f1741083fd 100644
--- a/sysdeps/x86_64/multiarch/ifunc-avx2.h
+++ b/sysdeps/x86_64/multiarch/ifunc-avx2.h
@@ -37,6 +37,7 @@ IFUNC_SELECTOR (void)
 
   if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
       && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)
+      && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, LZCNT)
       && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
 				      AVX_Fast_Unaligned_Load, ))
     {
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index 7c84963d92..4ee28c99bd 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -209,13 +209,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
   IFUNC_IMPL (i, name, memrchr,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, memrchr,
 				     (CPU_FEATURE_USABLE (AVX512VL)
-				      && CPU_FEATURE_USABLE (AVX512BW)),
+				      && CPU_FEATURE_USABLE (AVX512BW)
+				      && CPU_FEATURE_USABLE (LZCNT)),
 				     __memrchr_evex)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (LZCNT)),
 				     __memrchr_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (LZCNT)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __memrchr_avx2_rtm)
 	      /* ISA V2 wrapper for SSE2 implementation because the SSE2
-- 
2.35.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 5/6] x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations
  2022-10-02 12:34 [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611) Aurelien Jarno
                   ` (3 preceding siblings ...)
  2022-10-02 12:34 ` [PATCH v2 4/6] x86-64: Require LZCNT for AVX2 memrchr implementation Aurelien Jarno
@ 2022-10-02 12:34 ` Aurelien Jarno
  2022-10-02 21:08   ` Noah Goldstein
  2022-10-02 12:34 ` [PATCH v2 6/6] x86-64: Require BMI2 for AVX2 memrchr implementation Aurelien Jarno
  2022-10-02 16:21 ` [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611) Noah Goldstein
  6 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2022-10-02 12:34 UTC (permalink / raw)
  To: libc-alpha; +Cc: Noah Goldstein, H . J . Lu, Sunil K Pandey, Aurelien Jarno

The AVX2 strrchr and wcsrchr implementation uses the 'blsmsk'
instruction which belongs to the BMI1 CPU feature and the 'shrx'
instruction, which belongs to the BMI2 CPU feature.

Fixes: df7e295d18ff ("x86: Optimize {str|wcs}rchr-avx2")
Partially resolves: BZ #29611
---
 sysdeps/x86/isa-level.h                    |  1 +
 sysdeps/x86_64/multiarch/ifunc-avx2.h      |  1 +
 sysdeps/x86_64/multiarch/ifunc-impl-list.c | 17 ++++++++++++++---
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h
index bbb90f5c5e..06f6c9663e 100644
--- a/sysdeps/x86/isa-level.h
+++ b/sysdeps/x86/isa-level.h
@@ -79,6 +79,7 @@
 /* ISA level >= 3 guaranteed includes.  */
 #define AVX_X86_ISA_LEVEL 3
 #define AVX2_X86_ISA_LEVEL 3
+#define BMI1_X86_ISA_LEVEL 3
 #define BMI2_X86_ISA_LEVEL 3
 #define LZCNT_X86_ISA_LEVEL 3
 #define MOVBE_X86_ISA_LEVEL 3
diff --git a/sysdeps/x86_64/multiarch/ifunc-avx2.h b/sysdeps/x86_64/multiarch/ifunc-avx2.h
index f1741083fd..f2f5e8a211 100644
--- a/sysdeps/x86_64/multiarch/ifunc-avx2.h
+++ b/sysdeps/x86_64/multiarch/ifunc-avx2.h
@@ -36,6 +36,7 @@ IFUNC_SELECTOR (void)
   const struct cpu_features *cpu_features = __get_cpu_features ();
 
   if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
+      && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI1)
       && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)
       && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, LZCNT)
       && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index 4ee28c99bd..1c8afa229f 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -575,13 +575,19 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
   IFUNC_IMPL (i, name, strrchr,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, strrchr,
 				     (CPU_FEATURE_USABLE (AVX512VL)
-				      && CPU_FEATURE_USABLE (AVX512BW)),
+				      && CPU_FEATURE_USABLE (AVX512BW)
+				      && CPU_FEATURE_USABLE (BMI1)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strrchr_evex)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strrchr,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI1)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strrchr_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strrchr,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI1)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __strrchr_avx2_rtm)
 	      /* ISA V2 wrapper for SSE2 implementation because the SSE2
@@ -794,13 +800,18 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, wcsrchr,
 				     (CPU_FEATURE_USABLE (AVX512VL)
 				      && CPU_FEATURE_USABLE (AVX512BW)
+				      && CPU_FEATURE_USABLE (BMI1)
 				      && CPU_FEATURE_USABLE (BMI2)),
 				     __wcsrchr_evex)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, wcsrchr,
-				     CPU_FEATURE_USABLE (AVX2),
+				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI1)
+				      && CPU_FEATURE_USABLE (BMI2)),
 				     __wcsrchr_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, wcsrchr,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI1)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __wcsrchr_avx2_rtm)
 	      /* ISA V2 wrapper for SSE2 implementation because the SSE2
-- 
2.35.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 6/6] x86-64: Require BMI2 for AVX2 memrchr implementation
  2022-10-02 12:34 [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611) Aurelien Jarno
                   ` (4 preceding siblings ...)
  2022-10-02 12:34 ` [PATCH v2 5/6] x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations Aurelien Jarno
@ 2022-10-02 12:34 ` Aurelien Jarno
  2022-10-02 21:09   ` Noah Goldstein
  2022-10-02 16:21 ` [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611) Noah Goldstein
  6 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2022-10-02 12:34 UTC (permalink / raw)
  To: libc-alpha; +Cc: Noah Goldstein, H . J . Lu, Sunil K Pandey, Aurelien Jarno

The AVX2 memrchr implementation use the 'shlxl' instruction, which
belongs to the BMI2 CPU feature.

Fixes: af5306a735eb ("x86: Optimize memrchr-avx2.S")
Partially resolves: BZ #29611
---
 sysdeps/x86_64/multiarch/ifunc-impl-list.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index 1c8afa229f..00a91123d3 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -210,14 +210,17 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, memrchr,
 				     (CPU_FEATURE_USABLE (AVX512VL)
 				      && CPU_FEATURE_USABLE (AVX512BW)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (LZCNT)),
 				     __memrchr_evex)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (LZCNT)),
 				     __memrchr_avx2)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr,
 				     (CPU_FEATURE_USABLE (AVX2)
+				      && CPU_FEATURE_USABLE (BMI2)
 				      && CPU_FEATURE_USABLE (LZCNT)
 				      && CPU_FEATURE_USABLE (RTM)),
 				     __memrchr_avx2_rtm)
-- 
2.35.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611)
  2022-10-02 12:34 [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611) Aurelien Jarno
                   ` (5 preceding siblings ...)
  2022-10-02 12:34 ` [PATCH v2 6/6] x86-64: Require BMI2 for AVX2 memrchr implementation Aurelien Jarno
@ 2022-10-02 16:21 ` Noah Goldstein
  2022-10-02 18:09   ` Aurelien Jarno
  6 siblings, 1 reply; 25+ messages in thread
From: Noah Goldstein @ 2022-10-02 16:21 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey

On Sun, Oct 2, 2022 at 5:34 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> Some early Intel Haswell CPU have AVX2 instructions, but do not have
> BMI1 and BMI2 instructions. Some AVX2 string functions only check for
> AVX2, but use BMI1, BMI2 or LZCNT instructions. This patchset tries to
> fix that.
>
> While most fixes only change ifunc-impl-list.c, and thus only concerns
> the testsuite, the strn(case)cmp is a real issue affecting early Intel

str(case)cmp as well, correct?

> Haswell CPU, reported to affect Debian Sid and Fedora Rawhide.
>
> On the other hand, the check for LZCNT in memrchr is purely for
> correctness, I am not aware of a CPU implementing AVX2 without LZCNT.
>
> This has been tested by remplacing all BMI1 and BMI2 instructions in the
> source code by the "ud2" instruction and disabling the BMI1, BMI2
> feature detection, and running the testsuite.
>
> Resolves: BZ #29611
>
> Change v1 -> v2:
> - Better scan for BMI2 instructions (shlx and shrx) and BMI1
>   instructions (blsmsk) instructions following the feedback from Noah
>   Goldstein
>
> Aurelien Jarno (6):
>   x86: include BMI1 and BMI2 in x86-64-v3 level
>   x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
>   x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations
>   x86-64: Require LZCNT for AVX2 memrchr implementation
>   x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations
>   x86-64: Require BMI2 for AVX2 memrchr implementation
>
>  sysdeps/x86/get-isa-level.h                 |  2 +
>  sysdeps/x86/isa-level.h                     |  2 +
>  sysdeps/x86_64/multiarch/ifunc-avx2.h       |  2 +
>  sysdeps/x86_64/multiarch/ifunc-impl-list.c  | 86 ++++++++++++++++-----
>  sysdeps/x86_64/multiarch/ifunc-strcasecmp.h |  1 +
>  sysdeps/x86_64/multiarch/strcmp.c           |  4 +-
>  sysdeps/x86_64/multiarch/strncmp.c          |  4 +-
>  7 files changed, 76 insertions(+), 25 deletions(-)
>
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611)
  2022-10-02 16:21 ` [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611) Noah Goldstein
@ 2022-10-02 18:09   ` Aurelien Jarno
  2022-10-02 21:11     ` Noah Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2022-10-02 18:09 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey

On 2022-10-02 09:21, Noah Goldstein wrote:
> On Sun, Oct 2, 2022 at 5:34 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
> >
> > Some early Intel Haswell CPU have AVX2 instructions, but do not have
> > BMI1 and BMI2 instructions. Some AVX2 string functions only check for
> > AVX2, but use BMI1, BMI2 or LZCNT instructions. This patchset tries to
> > fix that.
> >
> > While most fixes only change ifunc-impl-list.c, and thus only concerns
> > the testsuite, the strn(case)cmp is a real issue affecting early Intel
> 
> str(case)cmp as well, correct?

Oops, yes forgot to update the cover letter on that aspect.

> > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide.
> >
> > On the other hand, the check for LZCNT in memrchr is purely for
> > correctness, I am not aware of a CPU implementing AVX2 without LZCNT.
> >
> > This has been tested by remplacing all BMI1 and BMI2 instructions in the
> > source code by the "ud2" instruction and disabling the BMI1, BMI2
> > feature detection, and running the testsuite.
> >
> > Resolves: BZ #29611
> >
> > Change v1 -> v2:
> > - Better scan for BMI2 instructions (shlx and shrx) and BMI1
> >   instructions (blsmsk) instructions following the feedback from Noah
> >   Goldstein
> >
> > Aurelien Jarno (6):
> >   x86: include BMI1 and BMI2 in x86-64-v3 level
> >   x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
> >   x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations
> >   x86-64: Require LZCNT for AVX2 memrchr implementation
> >   x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations
> >   x86-64: Require BMI2 for AVX2 memrchr implementation
> >
> >  sysdeps/x86/get-isa-level.h                 |  2 +
> >  sysdeps/x86/isa-level.h                     |  2 +
> >  sysdeps/x86_64/multiarch/ifunc-avx2.h       |  2 +
> >  sysdeps/x86_64/multiarch/ifunc-impl-list.c  | 86 ++++++++++++++++-----
> >  sysdeps/x86_64/multiarch/ifunc-strcasecmp.h |  1 +
> >  sysdeps/x86_64/multiarch/strcmp.c           |  4 +-
> >  sysdeps/x86_64/multiarch/strncmp.c          |  4 +-
> >  7 files changed, 76 insertions(+), 25 deletions(-)
> >
> > --
> > 2.35.1
> >
> 

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/6] x86: include BMI1 and BMI2 in x86-64-v3 level
  2022-10-02 12:34 ` [PATCH v2 1/6] x86: include BMI1 and BMI2 in x86-64-v3 level Aurelien Jarno
@ 2022-10-02 21:07   ` Noah Goldstein
  0 siblings, 0 replies; 25+ messages in thread
From: Noah Goldstein @ 2022-10-02 21:07 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey

On Sun, Oct 2, 2022 at 8:34 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> The "System V Application Binary Interface AMD64 Architecture Processor
> Supplement" mandates the BMI1 and BMI2 CPU features for the x86-64-v3
> level.
> ---
>  sysdeps/x86/get-isa-level.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/sysdeps/x86/get-isa-level.h b/sysdeps/x86/get-isa-level.h
> index 1ade78ab73..5b4dd5f062 100644
> --- a/sysdeps/x86/get-isa-level.h
> +++ b/sysdeps/x86/get-isa-level.h
> @@ -47,6 +47,8 @@ get_isa_level (const struct cpu_features *cpu_features)
>           isa_level |= GNU_PROPERTY_X86_ISA_1_V2;
>           if (CPU_FEATURE_USABLE_P (cpu_features, AVX)
>               && CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> +             && CPU_FEATURE_USABLE_P (cpu_features, BMI1)
> +             && CPU_FEATURE_USABLE_P (cpu_features, BMI2)
>               && CPU_FEATURE_USABLE_P (cpu_features, F16C)
>               && CPU_FEATURE_USABLE_P (cpu_features, FMA)
>               && CPU_FEATURE_USABLE_P (cpu_features, LZCNT)
> --
> 2.35.1
>

LGTM.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
  2022-10-02 12:34 ` [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations Aurelien Jarno
@ 2022-10-02 21:08   ` Noah Goldstein
  2022-10-03 16:19     ` Sunil Pandey
  0 siblings, 1 reply; 25+ messages in thread
From: Noah Goldstein @ 2022-10-02 21:08 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey

On Sun, Oct 2, 2022 at 8:34 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> The AVX2 str*cmp and wcs(n)cmp implementations use the 'bzhi'
> instruction, which belongs to the BMI2 CPU feature.
>
> NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF
> as BSF if the CPU doesn't support TZCNT, and produces the same result
> for non-zero input.
>
> Fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S")
> Partially resolves: BZ #29611
> ---
>  sysdeps/x86_64/multiarch/ifunc-impl-list.c  | 47 +++++++++++++++------
>  sysdeps/x86_64/multiarch/ifunc-strcasecmp.h |  1 +
>  sysdeps/x86_64/multiarch/strcmp.c           |  4 +-
>  sysdeps/x86_64/multiarch/strncmp.c          |  4 +-
>  4 files changed, 39 insertions(+), 17 deletions(-)
>
> diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> index a71444eccb..fec8790c11 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> @@ -448,13 +448,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>    IFUNC_IMPL (i, name, strcasecmp,
>               X86_IFUNC_IMPL_ADD_V4 (array, i, strcasecmp,
>                                      (CPU_FEATURE_USABLE (AVX512VL)
> -                                     && CPU_FEATURE_USABLE (AVX512BW)),
> +                                     && CPU_FEATURE_USABLE (AVX512BW)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __strcasecmp_evex)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __strcasecmp_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __strcasecmp_avx2_rtm)
>               X86_IFUNC_IMPL_ADD_V2 (array, i, strcasecmp,
> @@ -470,13 +473,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>    IFUNC_IMPL (i, name, strcasecmp_l,
>               X86_IFUNC_IMPL_ADD_V4 (array, i, strcasecmp,
>                                      (CPU_FEATURE_USABLE (AVX512VL)
> -                                     && CPU_FEATURE_USABLE (AVX512BW)),
> +                                     && CPU_FEATURE_USABLE (AVX512BW)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __strcasecmp_l_evex)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __strcasecmp_l_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __strcasecmp_l_avx2_rtm)
>               X86_IFUNC_IMPL_ADD_V2 (array, i, strcasecmp_l,
> @@ -585,10 +591,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>                                       && CPU_FEATURE_USABLE (BMI2)),
>                                      __strcmp_evex)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strcmp,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __strcmp_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strcmp,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __strcmp_avx2_rtm)
>               X86_IFUNC_IMPL_ADD_V2 (array, i, strcmp,
> @@ -638,13 +646,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>    IFUNC_IMPL (i, name, strncasecmp,
>               X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp,
>                                      (CPU_FEATURE_USABLE (AVX512VL)
> -                                     && CPU_FEATURE_USABLE (AVX512BW)),
> +                                     && CPU_FEATURE_USABLE (AVX512BW)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __strncasecmp_evex)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __strncasecmp_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __strncasecmp_avx2_rtm)
>               X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp,
> @@ -660,13 +671,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>    IFUNC_IMPL (i, name, strncasecmp_l,
>               X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp,
>                                      (CPU_FEATURE_USABLE (AVX512VL)
> -                                     && CPU_FEATURE_USABLE (AVX512BW)),
> +                                     & CPU_FEATURE_USABLE (AVX512BW)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __strncasecmp_l_evex)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __strncasecmp_l_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __strncasecmp_l_avx2_rtm)
>               X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp_l,
> @@ -796,10 +810,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>                                       && CPU_FEATURE_USABLE (BMI2)),
>                                      __wcscmp_evex)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, wcscmp,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __wcscmp_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, wcscmp,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __wcscmp_avx2_rtm)
>               /* ISA V2 wrapper for SSE2 implementation because the SSE2
> @@ -816,10 +832,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>                                       && CPU_FEATURE_USABLE (BMI2)),
>                                      __wcsncmp_evex)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __wcsncmp_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __wcsncmp_avx2_rtm)
>               /* ISA V2 wrapper for GENERIC implementation because the
> @@ -1162,13 +1180,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>    IFUNC_IMPL (i, name, strncmp,
>               X86_IFUNC_IMPL_ADD_V4 (array, i, strncmp,
>                                      (CPU_FEATURE_USABLE (AVX512VL)
> -                                     && CPU_FEATURE_USABLE (AVX512BW)),
> +                                     && CPU_FEATURE_USABLE (AVX512BW)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __strncmp_evex)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __strncmp_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __strncmp_avx2_rtm)
>               X86_IFUNC_IMPL_ADD_V2 (array, i, strncmp,
> diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
> index 68646ef199..7622af259c 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
> +++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
> @@ -34,6 +34,7 @@ IFUNC_SELECTOR (void)
>    const struct cpu_features *cpu_features = __get_cpu_features ();
>
>    if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> +      && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)
>        && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
>                                       AVX_Fast_Unaligned_Load, ))
>      {
> diff --git a/sysdeps/x86_64/multiarch/strcmp.c b/sysdeps/x86_64/multiarch/strcmp.c
> index fdd5afe3af..9d6c9f66ba 100644
> --- a/sysdeps/x86_64/multiarch/strcmp.c
> +++ b/sysdeps/x86_64/multiarch/strcmp.c
> @@ -45,12 +45,12 @@ IFUNC_SELECTOR (void)
>    const struct cpu_features *cpu_features = __get_cpu_features ();
>
>    if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> +      && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)
>        && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
>                                       AVX_Fast_Unaligned_Load, ))
>      {
>        if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
> -         && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)
> -         && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2))
> +         && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
>         return OPTIMIZE (evex);
>
>        if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
> diff --git a/sysdeps/x86_64/multiarch/strncmp.c b/sysdeps/x86_64/multiarch/strncmp.c
> index 4ebe4bde30..c4f8b6bbb5 100644
> --- a/sysdeps/x86_64/multiarch/strncmp.c
> +++ b/sysdeps/x86_64/multiarch/strncmp.c
> @@ -41,12 +41,12 @@ IFUNC_SELECTOR (void)
>    const struct cpu_features *cpu_features = __get_cpu_features ();
>
>    if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> +      && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)
>        && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
>                                       AVX_Fast_Unaligned_Load, ))
>      {
>        if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
> -         && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)
> -         && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2))
> +         && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
>         return OPTIMIZE (evex);
>
>        if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
> --
> 2.35.1
>

LGTM.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/6] x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations
  2022-10-02 12:34 ` [PATCH v2 3/6] x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations Aurelien Jarno
@ 2022-10-02 21:08   ` Noah Goldstein
  0 siblings, 0 replies; 25+ messages in thread
From: Noah Goldstein @ 2022-10-02 21:08 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey

On Sun, Oct 2, 2022 at 8:34 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> The AVX2 memchr, rawmemchr and wmemchr implementations use the 'bzhi'
> and 'sarx' instructions, which belongs to the BMI2 CPU feature.
>
> Fixes: acfd088a1963 ("x86: Optimize memchr-avx2.S")
> Partially resolves: BZ #29611
> ---
>  sysdeps/x86_64/multiarch/ifunc-impl-list.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> index fec8790c11..7c84963d92 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> @@ -69,10 +69,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>                                       && CPU_FEATURE_USABLE (BMI2)),
>                                      __memchr_evex_rtm)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, memchr,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __memchr_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, memchr,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __memchr_avx2_rtm)
>               /* ISA V2 wrapper for SSE2 implementation because the SSE2
> @@ -335,10 +337,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>                                       && CPU_FEATURE_USABLE (BMI2)),
>                                      __rawmemchr_evex_rtm)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, rawmemchr,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __rawmemchr_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, rawmemchr,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __rawmemchr_avx2_rtm)
>               /* ISA V2 wrapper for SSE2 implementation because the SSE2
> @@ -927,10 +931,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>                                       && CPU_FEATURE_USABLE (BMI2)),
>                                      __wmemchr_evex_rtm)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, wmemchr,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __wmemchr_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, wmemchr,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __wmemchr_avx2_rtm)
>               /* ISA V2 wrapper for SSE2 implementation because the SSE2
> --
> 2.35.1
>

LGTM.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/6] x86-64: Require LZCNT for AVX2 memrchr implementation
  2022-10-02 12:34 ` [PATCH v2 4/6] x86-64: Require LZCNT for AVX2 memrchr implementation Aurelien Jarno
@ 2022-10-02 21:08   ` Noah Goldstein
  0 siblings, 0 replies; 25+ messages in thread
From: Noah Goldstein @ 2022-10-02 21:08 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey

On Sun, Oct 2, 2022 at 8:34 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> The AVX2 memrchr implementation uses the 'lzcnt' instruction, which
> belongs to the LZCNT CPU feature.
>
> Fixes: af5306a735eb ("x86: Optimize memrchr-avx2.S")
> Partially resolves: BZ #29611
> ---
>  sysdeps/x86/isa-level.h                    | 1 +
>  sysdeps/x86_64/multiarch/ifunc-avx2.h      | 1 +
>  sysdeps/x86_64/multiarch/ifunc-impl-list.c | 7 +++++--
>  3 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h
> index 3c4480aba7..bbb90f5c5e 100644
> --- a/sysdeps/x86/isa-level.h
> +++ b/sysdeps/x86/isa-level.h
> @@ -80,6 +80,7 @@
>  #define AVX_X86_ISA_LEVEL 3
>  #define AVX2_X86_ISA_LEVEL 3
>  #define BMI2_X86_ISA_LEVEL 3
> +#define LZCNT_X86_ISA_LEVEL 3
>  #define MOVBE_X86_ISA_LEVEL 3
>
>  /* ISA level >= 2 guaranteed includes.  */
> diff --git a/sysdeps/x86_64/multiarch/ifunc-avx2.h b/sysdeps/x86_64/multiarch/ifunc-avx2.h
> index a57a9952f3..f1741083fd 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-avx2.h
> +++ b/sysdeps/x86_64/multiarch/ifunc-avx2.h
> @@ -37,6 +37,7 @@ IFUNC_SELECTOR (void)
>
>    if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
>        && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)
> +      && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, LZCNT)
>        && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
>                                       AVX_Fast_Unaligned_Load, ))
>      {
> diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> index 7c84963d92..4ee28c99bd 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> @@ -209,13 +209,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>    IFUNC_IMPL (i, name, memrchr,
>               X86_IFUNC_IMPL_ADD_V4 (array, i, memrchr,
>                                      (CPU_FEATURE_USABLE (AVX512VL)
> -                                     && CPU_FEATURE_USABLE (AVX512BW)),
> +                                     && CPU_FEATURE_USABLE (AVX512BW)
> +                                     && CPU_FEATURE_USABLE (LZCNT)),
>                                      __memrchr_evex)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (LZCNT)),
>                                      __memrchr_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (LZCNT)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __memrchr_avx2_rtm)
>               /* ISA V2 wrapper for SSE2 implementation because the SSE2
> --
> 2.35.1
>

LGTM.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/6] x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations
  2022-10-02 12:34 ` [PATCH v2 5/6] x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations Aurelien Jarno
@ 2022-10-02 21:08   ` Noah Goldstein
  0 siblings, 0 replies; 25+ messages in thread
From: Noah Goldstein @ 2022-10-02 21:08 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey

On Sun, Oct 2, 2022 at 8:34 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> The AVX2 strrchr and wcsrchr implementation uses the 'blsmsk'
> instruction which belongs to the BMI1 CPU feature and the 'shrx'
> instruction, which belongs to the BMI2 CPU feature.
>
> Fixes: df7e295d18ff ("x86: Optimize {str|wcs}rchr-avx2")
> Partially resolves: BZ #29611
> ---
>  sysdeps/x86/isa-level.h                    |  1 +
>  sysdeps/x86_64/multiarch/ifunc-avx2.h      |  1 +
>  sysdeps/x86_64/multiarch/ifunc-impl-list.c | 17 ++++++++++++++---
>  3 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h
> index bbb90f5c5e..06f6c9663e 100644
> --- a/sysdeps/x86/isa-level.h
> +++ b/sysdeps/x86/isa-level.h
> @@ -79,6 +79,7 @@
>  /* ISA level >= 3 guaranteed includes.  */
>  #define AVX_X86_ISA_LEVEL 3
>  #define AVX2_X86_ISA_LEVEL 3
> +#define BMI1_X86_ISA_LEVEL 3
>  #define BMI2_X86_ISA_LEVEL 3
>  #define LZCNT_X86_ISA_LEVEL 3
>  #define MOVBE_X86_ISA_LEVEL 3
> diff --git a/sysdeps/x86_64/multiarch/ifunc-avx2.h b/sysdeps/x86_64/multiarch/ifunc-avx2.h
> index f1741083fd..f2f5e8a211 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-avx2.h
> +++ b/sysdeps/x86_64/multiarch/ifunc-avx2.h
> @@ -36,6 +36,7 @@ IFUNC_SELECTOR (void)
>    const struct cpu_features *cpu_features = __get_cpu_features ();
>
>    if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> +      && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI1)
>        && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)
>        && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, LZCNT)
>        && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
> diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> index 4ee28c99bd..1c8afa229f 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> @@ -575,13 +575,19 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>    IFUNC_IMPL (i, name, strrchr,
>               X86_IFUNC_IMPL_ADD_V4 (array, i, strrchr,
>                                      (CPU_FEATURE_USABLE (AVX512VL)
> -                                     && CPU_FEATURE_USABLE (AVX512BW)),
> +                                     && CPU_FEATURE_USABLE (AVX512BW)
> +                                     && CPU_FEATURE_USABLE (BMI1)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __strrchr_evex)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strrchr,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI1)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __strrchr_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, strrchr,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI1)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __strrchr_avx2_rtm)
>               /* ISA V2 wrapper for SSE2 implementation because the SSE2
> @@ -794,13 +800,18 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>               X86_IFUNC_IMPL_ADD_V4 (array, i, wcsrchr,
>                                      (CPU_FEATURE_USABLE (AVX512VL)
>                                       && CPU_FEATURE_USABLE (AVX512BW)
> +                                     && CPU_FEATURE_USABLE (BMI1)
>                                       && CPU_FEATURE_USABLE (BMI2)),
>                                      __wcsrchr_evex)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, wcsrchr,
> -                                    CPU_FEATURE_USABLE (AVX2),
> +                                    (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI1)
> +                                     && CPU_FEATURE_USABLE (BMI2)),
>                                      __wcsrchr_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, wcsrchr,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI1)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __wcsrchr_avx2_rtm)
>               /* ISA V2 wrapper for SSE2 implementation because the SSE2
> --
> 2.35.1
>

LGTM.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 6/6] x86-64: Require BMI2 for AVX2 memrchr implementation
  2022-10-02 12:34 ` [PATCH v2 6/6] x86-64: Require BMI2 for AVX2 memrchr implementation Aurelien Jarno
@ 2022-10-02 21:09   ` Noah Goldstein
  0 siblings, 0 replies; 25+ messages in thread
From: Noah Goldstein @ 2022-10-02 21:09 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey

On Sun, Oct 2, 2022 at 8:34 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> The AVX2 memrchr implementation use the 'shlxl' instruction, which
> belongs to the BMI2 CPU feature.
>
> Fixes: af5306a735eb ("x86: Optimize memrchr-avx2.S")
> Partially resolves: BZ #29611
> ---
>  sysdeps/x86_64/multiarch/ifunc-impl-list.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> index 1c8afa229f..00a91123d3 100644
> --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> @@ -210,14 +210,17 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>               X86_IFUNC_IMPL_ADD_V4 (array, i, memrchr,
>                                      (CPU_FEATURE_USABLE (AVX512VL)
>                                       && CPU_FEATURE_USABLE (AVX512BW)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (LZCNT)),
>                                      __memrchr_evex)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (LZCNT)),
>                                      __memrchr_avx2)
>               X86_IFUNC_IMPL_ADD_V3 (array, i, memrchr,
>                                      (CPU_FEATURE_USABLE (AVX2)
> +                                     && CPU_FEATURE_USABLE (BMI2)
>                                       && CPU_FEATURE_USABLE (LZCNT)
>                                       && CPU_FEATURE_USABLE (RTM)),
>                                      __memrchr_avx2_rtm)
> --
> 2.35.1
>

LGTM.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611)
  2022-10-02 18:09   ` Aurelien Jarno
@ 2022-10-02 21:11     ` Noah Goldstein
  2022-10-03 17:36       ` Aurelien Jarno
  0 siblings, 1 reply; 25+ messages in thread
From: Noah Goldstein @ 2022-10-02 21:11 UTC (permalink / raw)
  To: Noah Goldstein, libc-alpha, H . J . Lu, Sunil K Pandey

On Sun, Oct 2, 2022 at 2:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> On 2022-10-02 09:21, Noah Goldstein wrote:
> > On Sun, Oct 2, 2022 at 5:34 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
> > >
> > > Some early Intel Haswell CPU have AVX2 instructions, but do not have
> > > BMI1 and BMI2 instructions. Some AVX2 string functions only check for
> > > AVX2, but use BMI1, BMI2 or LZCNT instructions. This patchset tries to
> > > fix that.
> > >
> > > While most fixes only change ifunc-impl-list.c, and thus only concerns
> > > the testsuite, the strn(case)cmp is a real issue affecting early Intel
> >
> > str(case)cmp as well, correct?
>
> Oops, yes forgot to update the cover letter on that aspect.
>
> > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide.
> > >
> > > On the other hand, the check for LZCNT in memrchr is purely for
> > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT.
> > >
> > > This has been tested by remplacing all BMI1 and BMI2 instructions in the
> > > source code by the "ud2" instruction and disabling the BMI1, BMI2
> > > feature detection, and running the testsuite.
> > >
> > > Resolves: BZ #29611
> > >
> > > Change v1 -> v2:
> > > - Better scan for BMI2 instructions (shlx and shrx) and BMI1
> > >   instructions (blsmsk) instructions following the feedback from Noah
> > >   Goldstein
> > >
> > > Aurelien Jarno (6):
> > >   x86: include BMI1 and BMI2 in x86-64-v3 level
> > >   x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
> > >   x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations
> > >   x86-64: Require LZCNT for AVX2 memrchr implementation
> > >   x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations
> > >   x86-64: Require BMI2 for AVX2 memrchr implementation
> > >
> > >  sysdeps/x86/get-isa-level.h                 |  2 +
> > >  sysdeps/x86/isa-level.h                     |  2 +
> > >  sysdeps/x86_64/multiarch/ifunc-avx2.h       |  2 +
> > >  sysdeps/x86_64/multiarch/ifunc-impl-list.c  | 86 ++++++++++++++++-----
> > >  sysdeps/x86_64/multiarch/ifunc-strcasecmp.h |  1 +
> > >  sysdeps/x86_64/multiarch/strcmp.c           |  4 +-
> > >  sysdeps/x86_64/multiarch/strncmp.c          |  4 +-
> > >  7 files changed, 76 insertions(+), 25 deletions(-)
> > >
> > > --
> > > 2.35.1
> > >
> >
>
> --
> Aurelien Jarno                          GPG: 4096R/1DDD8C9B
> aurelien@aurel32.net                 http://www.aurel32.net

Patchset looks good.

Do you have commit permissions? If not I can push them for you.

Thanks for the bugfix!

NB:
the str*(case)cmp, wcs(n)cmp bug affects 2.36, 2.35, 2.34, 2.33.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
  2022-10-02 21:08   ` Noah Goldstein
@ 2022-10-03 16:19     ` Sunil Pandey
  2022-10-03 17:35       ` Aurelien Jarno
  0 siblings, 1 reply; 25+ messages in thread
From: Sunil Pandey @ 2022-10-03 16:19 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Aurelien Jarno, libc-alpha, H . J . Lu

Please separate this patch into 4 separate patches.

Patch1: sysdeps/x86_64/multiarch/strncmp.c
Patch2: sysdeps/x86_64/multiarch/strcmp.c
Patch3: sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
Patch4:  sysdeps/x86_64/multiarch/ifunc-impl-list.c

Rest of them looks OK to me.

On Sun, Oct 2, 2022 at 2:08 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Sun, Oct 2, 2022 at 8:34 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
> >
> > The AVX2 str*cmp and wcs(n)cmp implementations use the 'bzhi'
> > instruction, which belongs to the BMI2 CPU feature.
> >
> > NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF
> > as BSF if the CPU doesn't support TZCNT, and produces the same result
> > for non-zero input.
> >
> > Fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S")
> > Partially resolves: BZ #29611
> > ---
> >  sysdeps/x86_64/multiarch/ifunc-impl-list.c  | 47 +++++++++++++++------
> >  sysdeps/x86_64/multiarch/ifunc-strcasecmp.h |  1 +
> >  sysdeps/x86_64/multiarch/strcmp.c           |  4 +-
> >  sysdeps/x86_64/multiarch/strncmp.c          |  4 +-
> >  4 files changed, 39 insertions(+), 17 deletions(-)
> >
> > diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> > index a71444eccb..fec8790c11 100644
> > --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> > +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
> > @@ -448,13 +448,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
> >    IFUNC_IMPL (i, name, strcasecmp,
> >               X86_IFUNC_IMPL_ADD_V4 (array, i, strcasecmp,
> >                                      (CPU_FEATURE_USABLE (AVX512VL)
> > -                                     && CPU_FEATURE_USABLE (AVX512BW)),
> > +                                     && CPU_FEATURE_USABLE (AVX512BW)
> > +                                     && CPU_FEATURE_USABLE (BMI2)),
> >                                      __strcasecmp_evex)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp,
> > -                                    CPU_FEATURE_USABLE (AVX2),
> > +                                    (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)),
> >                                      __strcasecmp_avx2)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp,
> >                                      (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)
> >                                       && CPU_FEATURE_USABLE (RTM)),
> >                                      __strcasecmp_avx2_rtm)
> >               X86_IFUNC_IMPL_ADD_V2 (array, i, strcasecmp,
> > @@ -470,13 +473,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
> >    IFUNC_IMPL (i, name, strcasecmp_l,
> >               X86_IFUNC_IMPL_ADD_V4 (array, i, strcasecmp,
> >                                      (CPU_FEATURE_USABLE (AVX512VL)
> > -                                     && CPU_FEATURE_USABLE (AVX512BW)),
> > +                                     && CPU_FEATURE_USABLE (AVX512BW)
> > +                                     && CPU_FEATURE_USABLE (BMI2)),
> >                                      __strcasecmp_l_evex)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp,
> > -                                    CPU_FEATURE_USABLE (AVX2),
> > +                                    (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)),
> >                                      __strcasecmp_l_avx2)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, strcasecmp,
> >                                      (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)
> >                                       && CPU_FEATURE_USABLE (RTM)),
> >                                      __strcasecmp_l_avx2_rtm)
> >               X86_IFUNC_IMPL_ADD_V2 (array, i, strcasecmp_l,
> > @@ -585,10 +591,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
> >                                       && CPU_FEATURE_USABLE (BMI2)),
> >                                      __strcmp_evex)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, strcmp,
> > -                                    CPU_FEATURE_USABLE (AVX2),
> > +                                    (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)),
> >                                      __strcmp_avx2)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, strcmp,
> >                                      (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)
> >                                       && CPU_FEATURE_USABLE (RTM)),
> >                                      __strcmp_avx2_rtm)
> >               X86_IFUNC_IMPL_ADD_V2 (array, i, strcmp,
> > @@ -638,13 +646,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
> >    IFUNC_IMPL (i, name, strncasecmp,
> >               X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp,
> >                                      (CPU_FEATURE_USABLE (AVX512VL)
> > -                                     && CPU_FEATURE_USABLE (AVX512BW)),
> > +                                     && CPU_FEATURE_USABLE (AVX512BW)
> > +                                     && CPU_FEATURE_USABLE (BMI2)),
> >                                      __strncasecmp_evex)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp,
> > -                                    CPU_FEATURE_USABLE (AVX2),
> > +                                    (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)),
> >                                      __strncasecmp_avx2)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp,
> >                                      (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)
> >                                       && CPU_FEATURE_USABLE (RTM)),
> >                                      __strncasecmp_avx2_rtm)
> >               X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp,
> > @@ -660,13 +671,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
> >    IFUNC_IMPL (i, name, strncasecmp_l,
> >               X86_IFUNC_IMPL_ADD_V4 (array, i, strncasecmp,
> >                                      (CPU_FEATURE_USABLE (AVX512VL)
> > -                                     && CPU_FEATURE_USABLE (AVX512BW)),
> > +                                     & CPU_FEATURE_USABLE (AVX512BW)
> > +                                     && CPU_FEATURE_USABLE (BMI2)),
> >                                      __strncasecmp_l_evex)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp,
> > -                                    CPU_FEATURE_USABLE (AVX2),
> > +                                    (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)),
> >                                      __strncasecmp_l_avx2)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, strncasecmp,
> >                                      (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)
> >                                       && CPU_FEATURE_USABLE (RTM)),
> >                                      __strncasecmp_l_avx2_rtm)
> >               X86_IFUNC_IMPL_ADD_V2 (array, i, strncasecmp_l,
> > @@ -796,10 +810,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
> >                                       && CPU_FEATURE_USABLE (BMI2)),
> >                                      __wcscmp_evex)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, wcscmp,
> > -                                    CPU_FEATURE_USABLE (AVX2),
> > +                                    (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)),
> >                                      __wcscmp_avx2)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, wcscmp,
> >                                      (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)
> >                                       && CPU_FEATURE_USABLE (RTM)),
> >                                      __wcscmp_avx2_rtm)
> >               /* ISA V2 wrapper for SSE2 implementation because the SSE2
> > @@ -816,10 +832,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
> >                                       && CPU_FEATURE_USABLE (BMI2)),
> >                                      __wcsncmp_evex)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp,
> > -                                    CPU_FEATURE_USABLE (AVX2),
> > +                                    (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)),
> >                                      __wcsncmp_avx2)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, wcsncmp,
> >                                      (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)
> >                                       && CPU_FEATURE_USABLE (RTM)),
> >                                      __wcsncmp_avx2_rtm)
> >               /* ISA V2 wrapper for GENERIC implementation because the
> > @@ -1162,13 +1180,16 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
> >    IFUNC_IMPL (i, name, strncmp,
> >               X86_IFUNC_IMPL_ADD_V4 (array, i, strncmp,
> >                                      (CPU_FEATURE_USABLE (AVX512VL)
> > -                                     && CPU_FEATURE_USABLE (AVX512BW)),
> > +                                     && CPU_FEATURE_USABLE (AVX512BW)
> > +                                     && CPU_FEATURE_USABLE (BMI2)),
> >                                      __strncmp_evex)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp,
> > -                                    CPU_FEATURE_USABLE (AVX2),
> > +                                    (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)),
> >                                      __strncmp_avx2)
> >               X86_IFUNC_IMPL_ADD_V3 (array, i, strncmp,
> >                                      (CPU_FEATURE_USABLE (AVX2)
> > +                                     && CPU_FEATURE_USABLE (BMI2)
> >                                       && CPU_FEATURE_USABLE (RTM)),
> >                                      __strncmp_avx2_rtm)
> >               X86_IFUNC_IMPL_ADD_V2 (array, i, strncmp,
> > diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
> > index 68646ef199..7622af259c 100644
> > --- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
> > +++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
> > @@ -34,6 +34,7 @@ IFUNC_SELECTOR (void)
> >    const struct cpu_features *cpu_features = __get_cpu_features ();
> >
> >    if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> > +      && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)
> >        && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
> >                                       AVX_Fast_Unaligned_Load, ))
> >      {
> > diff --git a/sysdeps/x86_64/multiarch/strcmp.c b/sysdeps/x86_64/multiarch/strcmp.c
> > index fdd5afe3af..9d6c9f66ba 100644
> > --- a/sysdeps/x86_64/multiarch/strcmp.c
> > +++ b/sysdeps/x86_64/multiarch/strcmp.c
> > @@ -45,12 +45,12 @@ IFUNC_SELECTOR (void)
> >    const struct cpu_features *cpu_features = __get_cpu_features ();
> >
> >    if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> > +      && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)
> >        && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
> >                                       AVX_Fast_Unaligned_Load, ))
> >      {
> >        if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
> > -         && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)
> > -         && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2))
> > +         && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
> >         return OPTIMIZE (evex);
> >
> >        if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
> > diff --git a/sysdeps/x86_64/multiarch/strncmp.c b/sysdeps/x86_64/multiarch/strncmp.c
> > index 4ebe4bde30..c4f8b6bbb5 100644
> > --- a/sysdeps/x86_64/multiarch/strncmp.c
> > +++ b/sysdeps/x86_64/multiarch/strncmp.c
> > @@ -41,12 +41,12 @@ IFUNC_SELECTOR (void)
> >    const struct cpu_features *cpu_features = __get_cpu_features ();
> >
> >    if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2)
> > +      && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2)
> >        && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features,
> >                                       AVX_Fast_Unaligned_Load, ))
> >      {
> >        if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
> > -         && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)
> > -         && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, BMI2))
> > +         && X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
> >         return OPTIMIZE (evex);
> >
> >        if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
> > --
> > 2.35.1
> >
>
> LGTM.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
  2022-10-03 16:19     ` Sunil Pandey
@ 2022-10-03 17:35       ` Aurelien Jarno
  2022-10-03 17:50         ` Noah Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2022-10-03 17:35 UTC (permalink / raw)
  To: Sunil Pandey; +Cc: Noah Goldstein, libc-alpha

On 2022-10-03 09:19, Sunil Pandey via Libc-alpha wrote:
> Please separate this patch into 4 separate patches.
> 
> Patch1: sysdeps/x86_64/multiarch/strncmp.c
> Patch2: sysdeps/x86_64/multiarch/strcmp.c
> Patch3: sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
> Patch4:  sysdeps/x86_64/multiarch/ifunc-impl-list.c
> 
> Rest of them looks OK to me.

I don't fully see the point of doing that, but i'll do it in the next
version.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611)
  2022-10-02 21:11     ` Noah Goldstein
@ 2022-10-03 17:36       ` Aurelien Jarno
  2022-10-03 17:51         ` Noah Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2022-10-03 17:36 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: libc-alpha, H . J . Lu, Sunil K Pandey

On 2022-10-02 17:11, Noah Goldstein via Libc-alpha wrote:
> On Sun, Oct 2, 2022 at 2:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote:
> >
> > On 2022-10-02 09:21, Noah Goldstein wrote:
> > > On Sun, Oct 2, 2022 at 5:34 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
> > > >
> > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have
> > > > BMI1 and BMI2 instructions. Some AVX2 string functions only check for
> > > > AVX2, but use BMI1, BMI2 or LZCNT instructions. This patchset tries to
> > > > fix that.
> > > >
> > > > While most fixes only change ifunc-impl-list.c, and thus only concerns
> > > > the testsuite, the strn(case)cmp is a real issue affecting early Intel
> > >
> > > str(case)cmp as well, correct?
> >
> > Oops, yes forgot to update the cover letter on that aspect.
> >
> > > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide.
> > > >
> > > > On the other hand, the check for LZCNT in memrchr is purely for
> > > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT.
> > > >
> > > > This has been tested by remplacing all BMI1 and BMI2 instructions in the
> > > > source code by the "ud2" instruction and disabling the BMI1, BMI2
> > > > feature detection, and running the testsuite.
> > > >
> > > > Resolves: BZ #29611
> > > >
> > > > Change v1 -> v2:
> > > > - Better scan for BMI2 instructions (shlx and shrx) and BMI1
> > > >   instructions (blsmsk) instructions following the feedback from Noah
> > > >   Goldstein
> > > >
> > > > Aurelien Jarno (6):
> > > >   x86: include BMI1 and BMI2 in x86-64-v3 level
> > > >   x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
> > > >   x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations
> > > >   x86-64: Require LZCNT for AVX2 memrchr implementation
> > > >   x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations
> > > >   x86-64: Require BMI2 for AVX2 memrchr implementation
> > > >
> > > >  sysdeps/x86/get-isa-level.h                 |  2 +
> > > >  sysdeps/x86/isa-level.h                     |  2 +
> > > >  sysdeps/x86_64/multiarch/ifunc-avx2.h       |  2 +
> > > >  sysdeps/x86_64/multiarch/ifunc-impl-list.c  | 86 ++++++++++++++++-----
> > > >  sysdeps/x86_64/multiarch/ifunc-strcasecmp.h |  1 +
> > > >  sysdeps/x86_64/multiarch/strcmp.c           |  4 +-
> > > >  sysdeps/x86_64/multiarch/strncmp.c          |  4 +-
> > > >  7 files changed, 76 insertions(+), 25 deletions(-)
> > > >
> > > > --
> > > > 2.35.1
> > > >
> > >
> >
> > --
> > Aurelien Jarno                          GPG: 4096R/1DDD8C9B
> > aurelien@aurel32.net                 http://www.aurel32.net
> 
> Patchset looks good.
> 
> Do you have commit permissions? If not I can push them for you.

Yes, I can commit them, though I'll wait for v3 to be reviewed.

> Thanks for the bugfix!
> 
> NB:
> the str*(case)cmp, wcs(n)cmp bug affects 2.36, 2.35, 2.34, 2.33.

Yep, I have already prepared a backport of the whole patchset to those
branches.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
  2022-10-03 17:35       ` Aurelien Jarno
@ 2022-10-03 17:50         ` Noah Goldstein
  2022-10-03 18:43           ` Sunil Pandey
  0 siblings, 1 reply; 25+ messages in thread
From: Noah Goldstein @ 2022-10-03 17:50 UTC (permalink / raw)
  To: Sunil Pandey, Noah Goldstein, libc-alpha

On Mon, Oct 3, 2022 at 10:35 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> On 2022-10-03 09:19, Sunil Pandey via Libc-alpha wrote:
> > Please separate this patch into 4 separate patches.
> >
> > Patch1: sysdeps/x86_64/multiarch/strncmp.c
> > Patch2: sysdeps/x86_64/multiarch/strcmp.c
> > Patch3: sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
> > Patch4:  sysdeps/x86_64/multiarch/ifunc-impl-list.c
> >
> > Rest of them looks OK to me.
>
> I don't fully see the point of doing that, but i'll do it in the next
> version.

I think to make backporting easier.
>
> --
> Aurelien Jarno                          GPG: 4096R/1DDD8C9B
> aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611)
  2022-10-03 17:36       ` Aurelien Jarno
@ 2022-10-03 17:51         ` Noah Goldstein
  0 siblings, 0 replies; 25+ messages in thread
From: Noah Goldstein @ 2022-10-03 17:51 UTC (permalink / raw)
  To: Noah Goldstein, libc-alpha, H . J . Lu, Sunil K Pandey

On Mon, Oct 3, 2022 at 10:36 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> On 2022-10-02 17:11, Noah Goldstein via Libc-alpha wrote:
> > On Sun, Oct 2, 2022 at 2:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote:
> > >
> > > On 2022-10-02 09:21, Noah Goldstein wrote:
> > > > On Sun, Oct 2, 2022 at 5:34 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
> > > > >
> > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have
> > > > > BMI1 and BMI2 instructions. Some AVX2 string functions only check for
> > > > > AVX2, but use BMI1, BMI2 or LZCNT instructions. This patchset tries to
> > > > > fix that.
> > > > >
> > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns
> > > > > the testsuite, the strn(case)cmp is a real issue affecting early Intel
> > > >
> > > > str(case)cmp as well, correct?
> > >
> > > Oops, yes forgot to update the cover letter on that aspect.
> > >
> > > > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide.
> > > > >
> > > > > On the other hand, the check for LZCNT in memrchr is purely for
> > > > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT.
> > > > >
> > > > > This has been tested by remplacing all BMI1 and BMI2 instructions in the
> > > > > source code by the "ud2" instruction and disabling the BMI1, BMI2
> > > > > feature detection, and running the testsuite.
> > > > >
> > > > > Resolves: BZ #29611
> > > > >
> > > > > Change v1 -> v2:
> > > > > - Better scan for BMI2 instructions (shlx and shrx) and BMI1
> > > > >   instructions (blsmsk) instructions following the feedback from Noah
> > > > >   Goldstein
> > > > >
> > > > > Aurelien Jarno (6):
> > > > >   x86: include BMI1 and BMI2 in x86-64-v3 level
> > > > >   x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
> > > > >   x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations
> > > > >   x86-64: Require LZCNT for AVX2 memrchr implementation
> > > > >   x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations
> > > > >   x86-64: Require BMI2 for AVX2 memrchr implementation
> > > > >
> > > > >  sysdeps/x86/get-isa-level.h                 |  2 +
> > > > >  sysdeps/x86/isa-level.h                     |  2 +
> > > > >  sysdeps/x86_64/multiarch/ifunc-avx2.h       |  2 +
> > > > >  sysdeps/x86_64/multiarch/ifunc-impl-list.c  | 86 ++++++++++++++++-----
> > > > >  sysdeps/x86_64/multiarch/ifunc-strcasecmp.h |  1 +
> > > > >  sysdeps/x86_64/multiarch/strcmp.c           |  4 +-
> > > > >  sysdeps/x86_64/multiarch/strncmp.c          |  4 +-
> > > > >  7 files changed, 76 insertions(+), 25 deletions(-)
> > > > >
> > > > > --
> > > > > 2.35.1
> > > > >
> > > >
> > >
> > > --
> > > Aurelien Jarno                          GPG: 4096R/1DDD8C9B
> > > aurelien@aurel32.net                 http://www.aurel32.net
> >
> > Patchset looks good.
> >
> > Do you have commit permissions? If not I can push them for you.
>
> Yes, I can commit them, though I'll wait for v3 to be reviewed.
>
> > Thanks for the bugfix!
> >
> > NB:
> > the str*(case)cmp, wcs(n)cmp bug affects 2.36, 2.35, 2.34, 2.33.
>
> Yep, I have already prepared a backport of the whole patchset to those
> branches.
>

If thats the case you may not need V3 because AFAIK thats the only
reason to split the strcmp patches up.

Sunil can you comment?
> --
> Aurelien Jarno                          GPG: 4096R/1DDD8C9B
> aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
  2022-10-03 17:50         ` Noah Goldstein
@ 2022-10-03 18:43           ` Sunil Pandey
  2022-10-03 19:21             ` Aurelien Jarno
  0 siblings, 1 reply; 25+ messages in thread
From: Sunil Pandey @ 2022-10-03 18:43 UTC (permalink / raw)
  To: Noah Goldstein, Aurelien Jarno, Hongjiu Lu; +Cc: libc-alpha

On Mon, Oct 3, 2022 at 10:50 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> On Mon, Oct 3, 2022 at 10:35 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
> >
> > On 2022-10-03 09:19, Sunil Pandey via Libc-alpha wrote:
> > > Please separate this patch into 4 separate patches.
> > >
> > > Patch1: sysdeps/x86_64/multiarch/strncmp.c
> > > Patch2: sysdeps/x86_64/multiarch/strcmp.c
> > > Patch3: sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
> > > Patch4:  sysdeps/x86_64/multiarch/ifunc-impl-list.c
> > >
> > > Rest of them looks OK to me.
> >
> > I don't fully see the point of doing that, but i'll do it in the next
> > version.
>
> I think to make backporting easier.

Exactly.

If you look closely, this patch combines 4 independent ifunc
functionality in one single patch.

As per latest backporting guideline, backport patches must apply
cleanly without any change. Having small independent patches
makes backporting a little easier.

> >
> > --
> > Aurelien Jarno                          GPG: 4096R/1DDD8C9B
> > aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
  2022-10-03 18:43           ` Sunil Pandey
@ 2022-10-03 19:21             ` Aurelien Jarno
  2022-10-03 19:59               ` Aurelien Jarno
  0 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2022-10-03 19:21 UTC (permalink / raw)
  To: Sunil Pandey; +Cc: Noah Goldstein, Hongjiu Lu, libc-alpha

On 2022-10-03 11:43, Sunil Pandey wrote:
> On Mon, Oct 3, 2022 at 10:50 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> >
> > On Mon, Oct 3, 2022 at 10:35 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
> > >
> > > On 2022-10-03 09:19, Sunil Pandey via Libc-alpha wrote:
> > > > Please separate this patch into 4 separate patches.
> > > >
> > > > Patch1: sysdeps/x86_64/multiarch/strncmp.c
> > > > Patch2: sysdeps/x86_64/multiarch/strcmp.c
> > > > Patch3: sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
> > > > Patch4:  sysdeps/x86_64/multiarch/ifunc-impl-list.c
> > > >
> > > > Rest of them looks OK to me.
> > >
> > > I don't fully see the point of doing that, but i'll do it in the next
> > > version.
> >
> > I think to make backporting easier.
> 
> Exactly.
> 
> If you look closely, this patch combines 4 independent ifunc
> functionality in one single patch.

It's partially true. I actually count 3 different functionalities. The
changes in ifunc-impl-list are all related to the other 3.

> As per latest backporting guideline, backport patches must apply
> cleanly without any change. Having small independent patches
> makes backporting a little easier.
> 

I agree with that when we are talking about backporting improvements. In
that case we have to backport the changes to fix the bug, even if they
don't apply cleanly.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
  2022-10-03 19:21             ` Aurelien Jarno
@ 2022-10-03 19:59               ` Aurelien Jarno
  2022-10-03 20:51                 ` Sunil Pandey
  0 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2022-10-03 19:59 UTC (permalink / raw)
  To: Sunil Pandey, Noah Goldstein, Hongjiu Lu, libc-alpha

On 2022-10-03 21:21, Aurelien Jarno wrote:
> On 2022-10-03 11:43, Sunil Pandey wrote:
> > On Mon, Oct 3, 2022 at 10:50 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > >
> > > On Mon, Oct 3, 2022 at 10:35 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
> > > >
> > > > On 2022-10-03 09:19, Sunil Pandey via Libc-alpha wrote:
> > > > > Please separate this patch into 4 separate patches.
> > > > >
> > > > > Patch1: sysdeps/x86_64/multiarch/strncmp.c
> > > > > Patch2: sysdeps/x86_64/multiarch/strcmp.c
> > > > > Patch3: sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
> > > > > Patch4:  sysdeps/x86_64/multiarch/ifunc-impl-list.c
> > > > >
> > > > > Rest of them looks OK to me.
> > > >
> > > > I don't fully see the point of doing that, but i'll do it in the next
> > > > version.
> > >
> > > I think to make backporting easier.
> > 
> > Exactly.
> > 
> > If you look closely, this patch combines 4 independent ifunc
> > functionality in one single patch.
> 
> It's partially true. I actually count 3 different functionalities. The
> changes in ifunc-impl-list are all related to the other 3.

I have just sent a v3 that split that patch in smaller pieces:
- str(n)casecmp
- strcmp
- strncmp
- wcs(n)cmp

I do not see the point in separating the changes from ifunc-impl-list.c,
as we should actually ensure that they are in sync with the ifunc
selector so that the testing is done properly.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations
  2022-10-03 19:59               ` Aurelien Jarno
@ 2022-10-03 20:51                 ` Sunil Pandey
  0 siblings, 0 replies; 25+ messages in thread
From: Sunil Pandey @ 2022-10-03 20:51 UTC (permalink / raw)
  To: Sunil Pandey, Noah Goldstein, Hongjiu Lu, libc-alpha

On Mon, Oct 3, 2022 at 12:59 PM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> On 2022-10-03 21:21, Aurelien Jarno wrote:
> > On 2022-10-03 11:43, Sunil Pandey wrote:
> > > On Mon, Oct 3, 2022 at 10:50 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > > >
> > > > On Mon, Oct 3, 2022 at 10:35 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
> > > > >
> > > > > On 2022-10-03 09:19, Sunil Pandey via Libc-alpha wrote:
> > > > > > Please separate this patch into 4 separate patches.
> > > > > >
> > > > > > Patch1: sysdeps/x86_64/multiarch/strncmp.c
> > > > > > Patch2: sysdeps/x86_64/multiarch/strcmp.c
> > > > > > Patch3: sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
> > > > > > Patch4:  sysdeps/x86_64/multiarch/ifunc-impl-list.c
> > > > > >
> > > > > > Rest of them looks OK to me.
> > > > >
> > > > > I don't fully see the point of doing that, but i'll do it in the next
> > > > > version.
> > > >
> > > > I think to make backporting easier.
> > >
> > > Exactly.
> > >
> > > If you look closely, this patch combines 4 independent ifunc
> > > functionality in one single patch.
> >
> > It's partially true. I actually count 3 different functionalities. The
> > changes in ifunc-impl-list are all related to the other 3.
>
> I have just sent a v3 that split that patch in smaller pieces:
> - str(n)casecmp
> - strcmp
> - strncmp
> - wcs(n)cmp
>
> I do not see the point in separating the changes from ifunc-impl-list.c,
> as we should actually ensure that they are in sync with the ifunc
> selector so that the testing is done properly.

My feedback was for the specific patch for your previous arrangement.
After you rearrange in v3, it looks ok.

>
> --
> Aurelien Jarno                          GPG: 4096R/1DDD8C9B
> aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-10-03 20:52 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-02 12:34 [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611) Aurelien Jarno
2022-10-02 12:34 ` [PATCH v2 1/6] x86: include BMI1 and BMI2 in x86-64-v3 level Aurelien Jarno
2022-10-02 21:07   ` Noah Goldstein
2022-10-02 12:34 ` [PATCH v2 2/6] x86-64: Require BMI2 for AVX2 str*cmp and wcs(n)cmp implementations Aurelien Jarno
2022-10-02 21:08   ` Noah Goldstein
2022-10-03 16:19     ` Sunil Pandey
2022-10-03 17:35       ` Aurelien Jarno
2022-10-03 17:50         ` Noah Goldstein
2022-10-03 18:43           ` Sunil Pandey
2022-10-03 19:21             ` Aurelien Jarno
2022-10-03 19:59               ` Aurelien Jarno
2022-10-03 20:51                 ` Sunil Pandey
2022-10-02 12:34 ` [PATCH v2 3/6] x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations Aurelien Jarno
2022-10-02 21:08   ` Noah Goldstein
2022-10-02 12:34 ` [PATCH v2 4/6] x86-64: Require LZCNT for AVX2 memrchr implementation Aurelien Jarno
2022-10-02 21:08   ` Noah Goldstein
2022-10-02 12:34 ` [PATCH v2 5/6] x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations Aurelien Jarno
2022-10-02 21:08   ` Noah Goldstein
2022-10-02 12:34 ` [PATCH v2 6/6] x86-64: Require BMI2 for AVX2 memrchr implementation Aurelien Jarno
2022-10-02 21:09   ` Noah Goldstein
2022-10-02 16:21 ` [PATCH v2 0/6] x86: Fix AVX2 string functions requiring BMI1, BMI2 or LZCNT (BZ #29611) Noah Goldstein
2022-10-02 18:09   ` Aurelien Jarno
2022-10-02 21:11     ` Noah Goldstein
2022-10-03 17:36       ` Aurelien Jarno
2022-10-03 17:51         ` Noah Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).