From: Noah Goldstein <goldstein.w.n@gmail.com>
To: libc-alpha@sourceware.org
Subject: [PATCH v1 04/23] x86: Code cleanup in strchr-evex and comment justifying branch
Date: Wed, 23 Mar 2022 16:57:18 -0500 [thread overview]
Message-ID: <20220323215734.3927131-4-goldstein.w.n@gmail.com> (raw)
In-Reply-To: <20220323215734.3927131-1-goldstein.w.n@gmail.com>
Small code cleanup for size: -81 bytes.
Add comment justifying using a branch to do NULL/non-null return.
All string/memory tests pass and no regressions in benchtests.
geometric_mean(N=20) of all benchmarks New / Original: .985
---
Geomtric Mean N=20 runs; All functions page aligned
length, alignment, pos, rand, seek_char/branch, max_char/perc-zero, New Time / Old Time
2048, 0, 32, 0, 23, 127, 0.878
2048, 1, 32, 0, 23, 127, 0.88
2048, 0, 64, 0, 23, 127, 0.997
2048, 2, 64, 0, 23, 127, 1.001
2048, 0, 128, 0, 23, 127, 0.973
2048, 3, 128, 0, 23, 127, 0.971
2048, 0, 256, 0, 23, 127, 0.976
2048, 4, 256, 0, 23, 127, 0.973
2048, 0, 512, 0, 23, 127, 1.001
2048, 5, 512, 0, 23, 127, 1.004
2048, 0, 1024, 0, 23, 127, 1.005
2048, 6, 1024, 0, 23, 127, 1.007
2048, 0, 2048, 0, 23, 127, 1.035
2048, 7, 2048, 0, 23, 127, 1.03
4096, 0, 32, 0, 23, 127, 0.889
4096, 1, 32, 0, 23, 127, 0.891
4096, 0, 64, 0, 23, 127, 1.012
4096, 2, 64, 0, 23, 127, 1.017
4096, 0, 128, 0, 23, 127, 0.975
4096, 3, 128, 0, 23, 127, 0.974
4096, 0, 256, 0, 23, 127, 0.974
4096, 4, 256, 0, 23, 127, 0.972
4096, 0, 512, 0, 23, 127, 1.002
4096, 5, 512, 0, 23, 127, 1.016
4096, 0, 1024, 0, 23, 127, 1.009
4096, 6, 1024, 0, 23, 127, 1.008
4096, 0, 2048, 0, 23, 127, 1.003
4096, 7, 2048, 0, 23, 127, 1.004
256, 1, 64, 0, 23, 127, 0.993
256, 2, 64, 0, 23, 127, 0.999
256, 3, 64, 0, 23, 127, 0.992
256, 4, 64, 0, 23, 127, 0.99
256, 5, 64, 0, 23, 127, 0.99
256, 6, 64, 0, 23, 127, 0.994
256, 7, 64, 0, 23, 127, 0.991
512, 0, 256, 0, 23, 127, 0.971
512, 16, 256, 0, 23, 127, 0.971
512, 32, 256, 0, 23, 127, 1.005
512, 48, 256, 0, 23, 127, 0.998
512, 64, 256, 0, 23, 127, 1.001
512, 80, 256, 0, 23, 127, 1.002
512, 96, 256, 0, 23, 127, 1.005
512, 112, 256, 0, 23, 127, 1.012
1, 0, 0, 0, 23, 127, 1.024
2, 0, 1, 0, 23, 127, 0.991
3, 0, 2, 0, 23, 127, 0.997
4, 0, 3, 0, 23, 127, 0.984
5, 0, 4, 0, 23, 127, 0.993
6, 0, 5, 0, 23, 127, 0.985
7, 0, 6, 0, 23, 127, 0.979
8, 0, 7, 0, 23, 127, 0.975
9, 0, 8, 0, 23, 127, 0.965
10, 0, 9, 0, 23, 127, 0.957
11, 0, 10, 0, 23, 127, 0.979
12, 0, 11, 0, 23, 127, 0.987
13, 0, 12, 0, 23, 127, 1.023
14, 0, 13, 0, 23, 127, 0.997
15, 0, 14, 0, 23, 127, 0.983
16, 0, 15, 0, 23, 127, 0.987
17, 0, 16, 0, 23, 127, 0.993
18, 0, 17, 0, 23, 127, 0.985
19, 0, 18, 0, 23, 127, 0.999
20, 0, 19, 0, 23, 127, 0.998
21, 0, 20, 0, 23, 127, 0.983
22, 0, 21, 0, 23, 127, 0.983
23, 0, 22, 0, 23, 127, 1.002
24, 0, 23, 0, 23, 127, 1.0
25, 0, 24, 0, 23, 127, 1.002
26, 0, 25, 0, 23, 127, 0.984
27, 0, 26, 0, 23, 127, 0.994
28, 0, 27, 0, 23, 127, 0.995
29, 0, 28, 0, 23, 127, 1.017
30, 0, 29, 0, 23, 127, 1.009
31, 0, 30, 0, 23, 127, 1.001
32, 0, 31, 0, 23, 127, 1.021
2048, 0, 32, 0, 0, 127, 0.899
2048, 1, 32, 0, 0, 127, 0.93
2048, 0, 64, 0, 0, 127, 1.009
2048, 2, 64, 0, 0, 127, 1.023
2048, 0, 128, 0, 0, 127, 0.973
2048, 3, 128, 0, 0, 127, 0.975
2048, 0, 256, 0, 0, 127, 0.974
2048, 4, 256, 0, 0, 127, 0.97
2048, 0, 512, 0, 0, 127, 0.999
2048, 5, 512, 0, 0, 127, 1.004
2048, 0, 1024, 0, 0, 127, 1.008
2048, 6, 1024, 0, 0, 127, 1.008
2048, 0, 2048, 0, 0, 127, 0.996
2048, 7, 2048, 0, 0, 127, 1.002
4096, 0, 32, 0, 0, 127, 0.872
4096, 1, 32, 0, 0, 127, 0.881
4096, 0, 64, 0, 0, 127, 1.006
4096, 2, 64, 0, 0, 127, 1.005
4096, 0, 128, 0, 0, 127, 0.973
4096, 3, 128, 0, 0, 127, 0.974
4096, 0, 256, 0, 0, 127, 0.969
4096, 4, 256, 0, 0, 127, 0.971
4096, 0, 512, 0, 0, 127, 1.0
4096, 5, 512, 0, 0, 127, 1.005
4096, 0, 1024, 0, 0, 127, 1.007
4096, 6, 1024, 0, 0, 127, 1.009
4096, 0, 2048, 0, 0, 127, 1.005
4096, 7, 2048, 0, 0, 127, 1.007
256, 1, 64, 0, 0, 127, 0.994
256, 2, 64, 0, 0, 127, 1.008
256, 3, 64, 0, 0, 127, 1.019
256, 4, 64, 0, 0, 127, 0.991
256, 5, 64, 0, 0, 127, 0.992
256, 6, 64, 0, 0, 127, 0.991
256, 7, 64, 0, 0, 127, 0.988
512, 0, 256, 0, 0, 127, 0.971
512, 16, 256, 0, 0, 127, 0.967
512, 32, 256, 0, 0, 127, 1.005
512, 48, 256, 0, 0, 127, 1.001
512, 64, 256, 0, 0, 127, 1.009
512, 80, 256, 0, 0, 127, 1.008
512, 96, 256, 0, 0, 127, 1.009
512, 112, 256, 0, 0, 127, 1.016
1, 0, 0, 0, 0, 127, 1.038
2, 0, 1, 0, 0, 127, 1.009
3, 0, 2, 0, 0, 127, 0.992
4, 0, 3, 0, 0, 127, 1.004
5, 0, 4, 0, 0, 127, 0.966
6, 0, 5, 0, 0, 127, 0.968
7, 0, 6, 0, 0, 127, 1.004
8, 0, 7, 0, 0, 127, 0.99
9, 0, 8, 0, 0, 127, 0.958
10, 0, 9, 0, 0, 127, 0.96
11, 0, 10, 0, 0, 127, 0.948
12, 0, 11, 0, 0, 127, 0.984
13, 0, 12, 0, 0, 127, 0.967
14, 0, 13, 0, 0, 127, 0.993
15, 0, 14, 0, 0, 127, 0.991
16, 0, 15, 0, 0, 127, 1.0
17, 0, 16, 0, 0, 127, 0.982
18, 0, 17, 0, 0, 127, 0.977
19, 0, 18, 0, 0, 127, 0.987
20, 0, 19, 0, 0, 127, 0.978
21, 0, 20, 0, 0, 127, 1.0
22, 0, 21, 0, 0, 127, 0.99
23, 0, 22, 0, 0, 127, 0.988
24, 0, 23, 0, 0, 127, 0.997
25, 0, 24, 0, 0, 127, 1.003
26, 0, 25, 0, 0, 127, 1.004
27, 0, 26, 0, 0, 127, 0.982
28, 0, 27, 0, 0, 127, 0.972
29, 0, 28, 0, 0, 127, 0.978
30, 0, 29, 0, 0, 127, 0.992
31, 0, 30, 0, 0, 127, 0.986
32, 0, 31, 0, 0, 127, 1.0
16, 0, 15, 1, 1, 0, 0.997
16, 0, 15, 1, 0, 0, 1.001
16, 0, 15, 1, 1, 0.1, 0.984
16, 0, 15, 1, 0, 0.1, 0.999
16, 0, 15, 1, 1, 0.25, 0.929
16, 0, 15, 1, 0, 0.25, 1.001
16, 0, 15, 1, 1, 0.33, 0.892
16, 0, 15, 1, 0, 0.33, 0.996
16, 0, 15, 1, 1, 0.5, 0.897
16, 0, 15, 1, 0, 0.5, 1.009
16, 0, 15, 1, 1, 0.66, 0.882
16, 0, 15, 1, 0, 0.66, 0.967
16, 0, 15, 1, 1, 0.75, 0.919
16, 0, 15, 1, 0, 0.75, 1.027
16, 0, 15, 1, 1, 0.9, 0.949
16, 0, 15, 1, 0, 0.9, 1.021
16, 0, 15, 1, 1, 1, 0.998
16, 0, 15, 1, 0, 1, 0.999
sysdeps/x86_64/multiarch/strchr-evex.S | 146 ++++++++++++++-----------
1 file changed, 80 insertions(+), 66 deletions(-)
diff --git a/sysdeps/x86_64/multiarch/strchr-evex.S b/sysdeps/x86_64/multiarch/strchr-evex.S
index f62cd9d144..ec739fb8f9 100644
--- a/sysdeps/x86_64/multiarch/strchr-evex.S
+++ b/sysdeps/x86_64/multiarch/strchr-evex.S
@@ -30,6 +30,7 @@
# ifdef USE_AS_WCSCHR
# define VPBROADCAST vpbroadcastd
# define VPCMP vpcmpd
+# define VPTESTN vptestnmd
# define VPMINU vpminud
# define CHAR_REG esi
# define SHIFT_REG ecx
@@ -37,6 +38,7 @@
# else
# define VPBROADCAST vpbroadcastb
# define VPCMP vpcmpb
+# define VPTESTN vptestnmb
# define VPMINU vpminub
# define CHAR_REG sil
# define SHIFT_REG edx
@@ -61,13 +63,11 @@
# define CHAR_PER_VEC (VEC_SIZE / CHAR_SIZE)
.section .text.evex,"ax",@progbits
-ENTRY (STRCHR)
+ENTRY_P2ALIGN (STRCHR, 5)
/* Broadcast CHAR to YMM0. */
VPBROADCAST %esi, %YMM0
movl %edi, %eax
andl $(PAGE_SIZE - 1), %eax
- vpxorq %XMMZERO, %XMMZERO, %XMMZERO
-
/* Check if we cross page boundary with one vector load.
Otherwise it is safe to use an unaligned load. */
cmpl $(PAGE_SIZE - VEC_SIZE), %eax
@@ -81,49 +81,35 @@ ENTRY (STRCHR)
vpxorq %YMM1, %YMM0, %YMM2
VPMINU %YMM2, %YMM1, %YMM2
/* Each bit in K0 represents a CHAR or a null byte in YMM1. */
- VPCMP $0, %YMMZERO, %YMM2, %k0
+ VPTESTN %YMM2, %YMM2, %k0
kmovd %k0, %eax
testl %eax, %eax
jz L(aligned_more)
tzcntl %eax, %eax
+# ifndef USE_AS_STRCHRNUL
+ /* Found CHAR or the null byte. */
+ cmp (%rdi, %rax, CHAR_SIZE), %CHAR_REG
+ /* NB: Use a branch instead of cmovcc here. The expectation is
+ that with strchr the user will branch based on input being
+ null. Since this branch will be 100% predictive of the user
+ branch a branch miss here should save what otherwise would
+ be branch miss in the user code. Otherwise using a branch 1)
+ saves code size and 2) is faster in highly predictable
+ environments. */
+ jne L(zero)
+# endif
# ifdef USE_AS_WCSCHR
/* NB: Multiply wchar_t count by 4 to get the number of bytes.
*/
leaq (%rdi, %rax, CHAR_SIZE), %rax
# else
addq %rdi, %rax
-# endif
-# ifndef USE_AS_STRCHRNUL
- /* Found CHAR or the null byte. */
- cmp (%rax), %CHAR_REG
- jne L(zero)
# endif
ret
- /* .p2align 5 helps keep performance more consistent if ENTRY()
- alignment % 32 was either 16 or 0. As well this makes the
- alignment % 32 of the loop_4x_vec fixed which makes tuning it
- easier. */
- .p2align 5
-L(first_vec_x3):
- tzcntl %eax, %eax
-# ifndef USE_AS_STRCHRNUL
- /* Found CHAR or the null byte. */
- cmp (VEC_SIZE * 3)(%rdi, %rax, CHAR_SIZE), %CHAR_REG
- jne L(zero)
-# endif
- /* NB: Multiply sizeof char type (1 or 4) to get the number of
- bytes. */
- leaq (VEC_SIZE * 3)(%rdi, %rax, CHAR_SIZE), %rax
- ret
-# ifndef USE_AS_STRCHRNUL
-L(zero):
- xorl %eax, %eax
- ret
-# endif
- .p2align 4
+ .p2align 4,, 10
L(first_vec_x4):
# ifndef USE_AS_STRCHRNUL
/* Check to see if first match was CHAR (k0) or null (k1). */
@@ -144,9 +130,18 @@ L(first_vec_x4):
leaq (VEC_SIZE * 4)(%rdi, %rax, CHAR_SIZE), %rax
ret
+# ifndef USE_AS_STRCHRNUL
+L(zero):
+ xorl %eax, %eax
+ ret
+# endif
+
+
.p2align 4
L(first_vec_x1):
- tzcntl %eax, %eax
+ /* Use bsf here to save 1-byte keeping keeping the block in 1x
+ fetch block. eax guranteed non-zero. */
+ bsfl %eax, %eax
# ifndef USE_AS_STRCHRNUL
/* Found CHAR or the null byte. */
cmp (VEC_SIZE)(%rdi, %rax, CHAR_SIZE), %CHAR_REG
@@ -158,7 +153,7 @@ L(first_vec_x1):
leaq (VEC_SIZE)(%rdi, %rax, CHAR_SIZE), %rax
ret
- .p2align 4
+ .p2align 4,, 10
L(first_vec_x2):
# ifndef USE_AS_STRCHRNUL
/* Check to see if first match was CHAR (k0) or null (k1). */
@@ -179,6 +174,21 @@ L(first_vec_x2):
leaq (VEC_SIZE * 2)(%rdi, %rax, CHAR_SIZE), %rax
ret
+ .p2align 4,, 10
+L(first_vec_x3):
+ /* Use bsf here to save 1-byte keeping keeping the block in 1x
+ fetch block. eax guranteed non-zero. */
+ bsfl %eax, %eax
+# ifndef USE_AS_STRCHRNUL
+ /* Found CHAR or the null byte. */
+ cmp (VEC_SIZE * 3)(%rdi, %rax, CHAR_SIZE), %CHAR_REG
+ jne L(zero)
+# endif
+ /* NB: Multiply sizeof char type (1 or 4) to get the number of
+ bytes. */
+ leaq (VEC_SIZE * 3)(%rdi, %rax, CHAR_SIZE), %rax
+ ret
+
.p2align 4
L(aligned_more):
/* Align data to VEC_SIZE. */
@@ -195,7 +205,7 @@ L(cross_page_continue):
vpxorq %YMM1, %YMM0, %YMM2
VPMINU %YMM2, %YMM1, %YMM2
/* Each bit in K0 represents a CHAR or a null byte in YMM1. */
- VPCMP $0, %YMMZERO, %YMM2, %k0
+ VPTESTN %YMM2, %YMM2, %k0
kmovd %k0, %eax
testl %eax, %eax
jnz L(first_vec_x1)
@@ -206,7 +216,7 @@ L(cross_page_continue):
/* Each bit in K0 represents a CHAR in YMM1. */
VPCMP $0, %YMM1, %YMM0, %k0
/* Each bit in K1 represents a CHAR in YMM1. */
- VPCMP $0, %YMM1, %YMMZERO, %k1
+ VPTESTN %YMM1, %YMM1, %k1
kortestd %k0, %k1
jnz L(first_vec_x2)
@@ -215,7 +225,7 @@ L(cross_page_continue):
vpxorq %YMM1, %YMM0, %YMM2
VPMINU %YMM2, %YMM1, %YMM2
/* Each bit in K0 represents a CHAR or a null byte in YMM1. */
- VPCMP $0, %YMMZERO, %YMM2, %k0
+ VPTESTN %YMM2, %YMM2, %k0
kmovd %k0, %eax
testl %eax, %eax
jnz L(first_vec_x3)
@@ -224,7 +234,7 @@ L(cross_page_continue):
/* Each bit in K0 represents a CHAR in YMM1. */
VPCMP $0, %YMM1, %YMM0, %k0
/* Each bit in K1 represents a CHAR in YMM1. */
- VPCMP $0, %YMM1, %YMMZERO, %k1
+ VPTESTN %YMM1, %YMM1, %k1
kortestd %k0, %k1
jnz L(first_vec_x4)
@@ -265,33 +275,33 @@ L(loop_4x_vec):
VPMINU %YMM3, %YMM4, %YMM4
VPMINU %YMM2, %YMM4, %YMM4{%k4}{z}
- VPCMP $0, %YMMZERO, %YMM4, %k1
+ VPTESTN %YMM4, %YMM4, %k1
kmovd %k1, %ecx
subq $-(VEC_SIZE * 4), %rdi
testl %ecx, %ecx
jz L(loop_4x_vec)
- VPCMP $0, %YMMZERO, %YMM1, %k0
+ VPTESTN %YMM1, %YMM1, %k0
kmovd %k0, %eax
testl %eax, %eax
jnz L(last_vec_x1)
- VPCMP $0, %YMMZERO, %YMM2, %k0
+ VPTESTN %YMM2, %YMM2, %k0
kmovd %k0, %eax
testl %eax, %eax
jnz L(last_vec_x2)
- VPCMP $0, %YMMZERO, %YMM3, %k0
+ VPTESTN %YMM3, %YMM3, %k0
kmovd %k0, %eax
/* Combine YMM3 matches (eax) with YMM4 matches (ecx). */
# ifdef USE_AS_WCSCHR
sall $8, %ecx
orl %ecx, %eax
- tzcntl %eax, %eax
+ bsfl %eax, %eax
# else
salq $32, %rcx
orq %rcx, %rax
- tzcntq %rax, %rax
+ bsfq %rax, %rax
# endif
# ifndef USE_AS_STRCHRNUL
/* Check if match was CHAR or null. */
@@ -303,28 +313,28 @@ L(loop_4x_vec):
leaq (VEC_SIZE * 2)(%rdi, %rax, CHAR_SIZE), %rax
ret
-# ifndef USE_AS_STRCHRNUL
-L(zero_end):
- xorl %eax, %eax
- ret
+ .p2align 4,, 8
+L(last_vec_x1):
+ bsfl %eax, %eax
+# ifdef USE_AS_WCSCHR
+ /* NB: Multiply wchar_t count by 4 to get the number of bytes.
+ */
+ leaq (%rdi, %rax, CHAR_SIZE), %rax
+# else
+ addq %rdi, %rax
# endif
- .p2align 4
-L(last_vec_x1):
- tzcntl %eax, %eax
# ifndef USE_AS_STRCHRNUL
/* Check if match was null. */
- cmp (%rdi, %rax, CHAR_SIZE), %CHAR_REG
+ cmp (%rax), %CHAR_REG
jne L(zero_end)
# endif
- /* NB: Multiply sizeof char type (1 or 4) to get the number of
- bytes. */
- leaq (%rdi, %rax, CHAR_SIZE), %rax
+
ret
- .p2align 4
+ .p2align 4,, 8
L(last_vec_x2):
- tzcntl %eax, %eax
+ bsfl %eax, %eax
# ifndef USE_AS_STRCHRNUL
/* Check if match was null. */
cmp (VEC_SIZE)(%rdi, %rax, CHAR_SIZE), %CHAR_REG
@@ -336,7 +346,7 @@ L(last_vec_x2):
ret
/* Cold case for crossing page with first load. */
- .p2align 4
+ .p2align 4,, 8
L(cross_page_boundary):
movq %rdi, %rdx
/* Align rdi. */
@@ -346,9 +356,9 @@ L(cross_page_boundary):
vpxorq %YMM1, %YMM0, %YMM2
VPMINU %YMM2, %YMM1, %YMM2
/* Each bit in K0 represents a CHAR or a null byte in YMM1. */
- VPCMP $0, %YMMZERO, %YMM2, %k0
+ VPTESTN %YMM2, %YMM2, %k0
kmovd %k0, %eax
- /* Remove the leading bits. */
+ /* Remove the leading bits. */
# ifdef USE_AS_WCSCHR
movl %edx, %SHIFT_REG
/* NB: Divide shift count by 4 since each bit in K1 represent 4
@@ -360,20 +370,24 @@ L(cross_page_boundary):
/* If eax is zero continue. */
testl %eax, %eax
jz L(cross_page_continue)
- tzcntl %eax, %eax
-# ifndef USE_AS_STRCHRNUL
- /* Check to see if match was CHAR or null. */
- cmp (%rdx, %rax, CHAR_SIZE), %CHAR_REG
- jne L(zero_end)
-# endif
+ bsfl %eax, %eax
+
# ifdef USE_AS_WCSCHR
/* NB: Multiply wchar_t count by 4 to get the number of
bytes. */
leaq (%rdx, %rax, CHAR_SIZE), %rax
# else
addq %rdx, %rax
+# endif
+# ifndef USE_AS_STRCHRNUL
+ /* Check to see if match was CHAR or null. */
+ cmp (%rax), %CHAR_REG
+ je L(cross_page_ret)
+L(zero_end):
+ xorl %eax, %eax
+L(cross_page_ret):
# endif
ret
END (STRCHR)
-# endif
+#endif
--
2.25.1
next prev parent reply other threads:[~2022-03-23 21:58 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-23 21:57 [PATCH v1 01/23] benchtests: Use json-lib in bench-strchr.c Noah Goldstein
2022-03-23 21:57 ` [PATCH v1 02/23] benchtests: Add random benchmark " Noah Goldstein
2022-03-24 18:44 ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 03/23] x86: Code cleanup in strchr-avx2 and comment justifying branch Noah Goldstein
2022-03-24 18:53 ` H.J. Lu
2022-03-24 19:20 ` Noah Goldstein
2022-03-24 19:36 ` H.J. Lu
2022-05-12 19:31 ` Sunil Pandey
2022-03-23 21:57 ` Noah Goldstein [this message]
2022-03-24 18:54 ` [PATCH v1 04/23] x86: Code cleanup in strchr-evex " H.J. Lu
2022-05-12 19:32 ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 05/23] benchtests: Use json-lib in bench-strpbrk.c Noah Goldstein
2022-03-24 18:54 ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 06/23] benchtests: Use json-lib in bench-strspn.c Noah Goldstein
2022-03-24 18:54 ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 07/23] x86: Optimize strcspn and strpbrk in strcspn-c.c Noah Goldstein
2022-03-24 18:55 ` H.J. Lu
2022-05-12 19:34 ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 08/23] x86: Optimize strspn in strspn-c.c Noah Goldstein
2022-03-24 18:56 ` H.J. Lu
2022-05-12 19:39 ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 09/23] x86: Remove strcspn-sse2.S and use the generic implementation Noah Goldstein
2022-03-24 18:57 ` H.J. Lu
2022-05-12 19:40 ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 10/23] x86: Remove strpbrk-sse2.S " Noah Goldstein
2022-03-24 18:57 ` H.J. Lu
2022-05-12 19:41 ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 11/23] x86: Remove strspn-sse2.S " Noah Goldstein
2022-03-24 18:57 ` H.J. Lu
2022-05-12 19:42 ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 12/23] x86: Fix fallback for wcsncmp_avx2 in strcmp-avx2.S [BZ #28896] Noah Goldstein
2022-03-24 18:59 ` H.J. Lu
2022-03-24 19:18 ` Noah Goldstein
2022-03-24 19:34 ` H.J. Lu
2022-03-24 19:39 ` Noah Goldstein
2022-03-24 20:50 ` [PATCH v2 12/31] " Noah Goldstein
2022-03-24 21:26 ` H.J. Lu
2022-03-24 21:43 ` Noah Goldstein
2022-03-24 21:58 ` H.J. Lu
2022-05-04 6:05 ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 13/23] benchtests: Use json-lib in bench-strcasecmp.c Noah Goldstein
2022-03-24 19:00 ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 14/23] benchtests: Use json-lib in bench-strncasecmp.c Noah Goldstein
2022-03-24 19:00 ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 15/23] string: Expand page cross tests in test-strcasecmp.c Noah Goldstein
2022-03-24 19:01 ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 16/23] string: Expand page cross tests in test-strncasecmp.c Noah Goldstein
2022-03-24 19:01 ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 17/23] x86: Optimize str{n}casecmp TOLOWER logic in strcmp.S Noah Goldstein
2022-03-24 19:02 ` H.J. Lu
2022-05-12 19:44 ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 18/23] x86: Optimize str{n}casecmp TOLOWER logic in strcmp-sse42.S Noah Goldstein
2022-03-24 19:02 ` H.J. Lu
2022-05-12 19:45 ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 19/23] string: Expand page cross test cases in test-strcmp.c Noah Goldstein
2022-03-24 19:02 ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 20/23] string: Expand page cross test cases in test-strncmp.c Noah Goldstein
2022-03-24 19:02 ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 21/23] x86: Add AVX2 optimized str{n}casecmp Noah Goldstein
2022-03-24 19:03 ` H.J. Lu
2022-03-24 22:41 ` [PATCH v3 " Noah Goldstein
2022-03-24 22:41 ` [PATCH v3 22/23] x86: Add EVEX " Noah Goldstein
2022-03-24 23:56 ` [PATCH v4 21/23] x86: Add AVX2 " Noah Goldstein
2022-03-24 23:56 ` [PATCH v4 22/23] x86: Add EVEX " Noah Goldstein
2022-03-25 18:15 ` H.J. Lu
2022-03-25 18:18 ` Noah Goldstein
2022-05-12 19:47 ` Sunil Pandey
2022-05-12 19:52 ` Sunil Pandey
2022-03-25 18:14 ` [PATCH v4 21/23] x86: Add AVX2 " H.J. Lu
2022-05-12 19:52 ` Sunil Pandey
2022-03-23 21:57 ` [PATCH v1 22/23] x86: Add EVEX " Noah Goldstein
2022-03-24 19:04 ` H.J. Lu
2022-03-23 21:57 ` [PATCH v1 23/23] x86: Remove AVX str{n}casecmp Noah Goldstein
2022-03-24 19:04 ` H.J. Lu
2022-05-12 19:54 ` Sunil Pandey
2022-03-24 18:43 ` [PATCH v1 01/23] benchtests: Use json-lib in bench-strchr.c H.J. Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220323215734.3927131-4-goldstein.w.n@gmail.com \
--to=goldstein.w.n@gmail.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).