commit 5b4f7382af46b4187a958e40fb3123ac3ce16810
Author: Wilco Dijkstra <wdijkstr@arm.com>
Date:   Fri Sep 13 16:35:12 2019 +0100

    Add undef to fix test failure.

commit 9456483fb2bb47be63cfcac462d0b2366fc4562a
Author: Wilco Dijkstra <wdijkstr@arm.com>
Date:   Wed Jun 12 11:42:34 2019 +0100

    Improve performance of memmem
    
    This patch significantly improves performance of memmem using a novel
    modified Horspool algorithm.  Needles up to size 256 use a bad-character
    table indexed by hashed pairs of characters to quickly skip past mismatches.
    Long needles use a self-adapting filtering step to avoid comparing the whole
    needle repeatedly.
    
    By limiting the needle length to 256, the shift table only requires 8 bits
    per entry, lowering preprocessing overhead and minimizing cache effects.
    This limit also implies worst-case performance is linear.
    
    Small needles up to size 2 use a dedicated linear search.  Very long needles
    use the Two-Way algorithm (to avoid increasing stack size or slowing down
    the common case, inlining is disabled).
    
    The performance gain is 6.6 times on English text on AArch64 using random
    needles with average size 8.
    
    Tested against GLIBC testsuite and randomized tests.
    
    Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
    
        * string/memmem.c (__memmem): Rewrite to improve performance.
    
    (cherry picked from commit 680942b0167715e123d934b609060cd382f8e39f)

commit 373f8b06a3ea763efd36e8b08af7ed7e726de6f8
Author: Wilco Dijkstra <wdijkstr@arm.com>
Date:   Wed Jun 12 11:38:52 2019 +0100

    Improve performance of strstr
    
    This patch significantly improves performance of strstr using a novel
    modified Horspool algorithm.  Needles up to size 256 use a bad-character
    table indexed by hashed pairs of characters to quickly skip past mismatches.
    Long needles use a self-adapting filtering step to avoid comparing the whole
    needle repeatedly.
    
    By limiting the needle length to 256, the shift table only requires 8 bits
    per entry, lowering preprocessing overhead and minimizing cache effects.
    This limit also implies worst-case performance is linear.
    
    Small needles up to size 3 use a dedicated linear search.  Very long needles
    use the Two-Way algorithm.
    
    The performance gain using the improved bench-strstr on Cortex-A72 is 5.8
    times basic_strstr and 3.7 times twoway_strstr.
    
    Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
    (https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).
    
    Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
    
        * string/str-two-way.h (two_way_short_needle): Add inline to avoid
        warning.
        (two_way_long_needle): Block inlining.
        * string/strstr.c (strstr2): Add new function.
        (strstr3): Likewise.
        (STRSTR): Completely rewrite strstr to improve performance.
    
    (cherry picked from commit 5e0a7ecb6629461b28adc1a5aabcc0ede122f201)

commit 4ec1b9e91382e66fb9a89361c93f9f560abc67f7
Author: Wilco Dijkstra <wdijkstr@arm.com>
Date:   Wed Sep 19 16:50:18 2018 +0100

    Fix strstr bug with huge needles (bug 23637)
    
    The generic strstr in GLIBC 2.28 fails to match huge needles.  The optimized
    AVAILABLE macro reads ahead a large fixed amount to reduce the overhead of
    repeatedly checking for the end of the string.  However if the needle length
    is larger than this, two_way_long_needle may confuse this as meaning the end
    of the string and return NULL.  This is fixed by adding the needle length to
    the amount to read ahead.
    
        [BZ #23637]
        * string/test-strstr.c (pr23637): New function.
        (test_main): Add tests with longer needles.
        * string/strcasestr.c (AVAILABLE): Fix readahead distance.
        * string/strstr.c (AVAILABLE): Likewise.
    
    (cherry picked from commit 83a552b0bb9fc2a5e80a0ab3723c0a80ce1db9f2)

commit ecd6271ed80e71b3b1f286e2977f534d54c33af4
Author: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
Date:   Tue Aug 28 12:42:19 2018 +0530

    Speedup first memmem match
    
    As done in commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5, memcmp
    can be used after memchr to avoid the initialization overhead of the
    two-way algorithm for the first match.  This has shown improvement
    >40% for first match.
    
    (cherry picked from commit c8dd67e7c958de04c3783cbea7c384431707b5f8)

commit bba6b9288f5192b67947e478ef9033920d22216a
Author: Wilco Dijkstra <wdijkstr@arm.com>
Date:   Fri Aug 3 17:24:12 2018 +0100

    Simplify and speedup strstr/strcasestr first match
    
    Looking at the benchtests, both strstr and strcasestr spend a lot of time
    in a slow initialization loop handling one character per iteration.
    This can be simplified and use the much faster strlen/strnlen/strchr/memcmp.
    Read ahead a few cachelines to reduce the number of strnlen calls, which
    improves performance by ~3-4%.  This patch improves the time taken for the
    full strstr benchtest by >40%.
    
        * string/strcasestr.c (STRCASESTR): Simplify and speedup first match.
        * string/strstr.c (AVAILABLE): Likewise.
    
    (cherry picked from commit 284f42bc778e487dfd5dff5c01959f93b9e0c4f5)

commit 7a4da6ef7abd8491fa52e8a58a484cfe268575a7
Author: Wilco Dijkstra <wdijkstr@arm.com>
Date:   Mon Jul 16 17:50:09 2018 +0100

    Improve strstr performance
    
    Improve strstr performance.  Strstr tends to be slow because it uses
    many calls to memchr and a slow byte loop to scan for the next match.
    Performance is significantly improved by using strnlen on larger blocks
    and using strchr to search for the next matching character.  strcasestr
    can also use strnlen to scan ahead, and memmem can use memchr to check
    for the next match.
    
    On the GLIBC bench tests the performance gains on Cortex-A72 are:
    strstr: +25%
    strcasestr: +4.3%
    memmem: +18%
    
    On a 256KB dataset strstr performance improves by 67%, strcasestr by 47%.
    
        Reviewd-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
    
    (cherry picked from commit 3ae725dfb6d7f61447d27d00ed83e573bd5454f4)