commit b58bace7cede42c31c8047907b1e7ad318d24c32 Author: Wilco Dijkstra Date: Wed Jun 12 11:42:34 2019 +0100 Improve performance of memmem This patch significantly improves performance of memmem using a novel modified Horspool algorithm. Needles up to size 256 use a bad-character table indexed by hashed pairs of characters to quickly skip past mismatches. Long needles use a self-adapting filtering step to avoid comparing the whole needle repeatedly. By limiting the needle length to 256, the shift table only requires 8 bits per entry, lowering preprocessing overhead and minimizing cache effects. This limit also implies worst-case performance is linear. Small needles up to size 2 use a dedicated linear search. Very long needles use the Two-Way algorithm (to avoid increasing stack size or slowing down the common case, inlining is disabled). The performance gain is 6.6 times on English text on AArch64 using random needles with average size 8. Tested against GLIBC testsuite and randomized tests. Reviewed-by: Szabolcs Nagy * string/memmem.c (__memmem): Rewrite to improve performance. (cherry picked from commit 680942b0167715e123d934b609060cd382f8e39f) commit 354f52e984598f336f963aad7c5cbccf986e72d7 Author: Wilco Dijkstra Date: Wed Jun 12 11:38:52 2019 +0100 Improve performance of strstr This patch significantly improves performance of strstr using a novel modified Horspool algorithm. Needles up to size 256 use a bad-character table indexed by hashed pairs of characters to quickly skip past mismatches. Long needles use a self-adapting filtering step to avoid comparing the whole needle repeatedly. By limiting the needle length to 256, the shift table only requires 8 bits per entry, lowering preprocessing overhead and minimizing cache effects. This limit also implies worst-case performance is linear. Small needles up to size 3 use a dedicated linear search. Very long needles use the Two-Way algorithm. The performance gain using the improved bench-strstr on Cortex-A72 is 5.8 times basic_strstr and 3.7 times twoway_strstr. Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test (https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c). Reviewed-by: Szabolcs Nagy * string/str-two-way.h (two_way_short_needle): Add inline to avoid warning. (two_way_long_needle): Block inlining. * string/strstr.c (strstr2): Add new function. (strstr3): Likewise. (STRSTR): Completely rewrite strstr to improve performance. (cherry picked from commit 5e0a7ecb6629461b28adc1a5aabcc0ede122f201) commit d6ccf2f45c5e09d3a36321ba695325589e8940fe Author: Rajalakshmi Srinivasaraghavan Date: Tue Aug 28 12:42:19 2018 +0530 Speedup first memmem match As done in commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5, memcmp can be used after memchr to avoid the initialization overhead of the two-way algorithm for the first match. This has shown improvement >40% for first match. (cherry picked from commit c8dd67e7c958de04c3783cbea7c384431707b5f8) commit 1def6a34aee55e26a0755a3e36bf2e184a4cdf5e Author: Wilco Dijkstra Date: Fri Aug 3 17:24:12 2018 +0100 Simplify and speedup strstr/strcasestr first match Looking at the benchtests, both strstr and strcasestr spend a lot of time in a slow initialization loop handling one character per iteration. This can be simplified and use the much faster strlen/strnlen/strchr/memcmp. Read ahead a few cachelines to reduce the number of strnlen calls, which improves performance by ~3-4%. This patch improves the time taken for the full strstr benchtest by >40%. * string/strcasestr.c (STRCASESTR): Simplify and speedup first match. * string/strstr.c (AVAILABLE): Likewise. (cherry picked from commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5)