public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Improve strstr performance of short needles
@ 2018-09-05 13:38 Wilco Dijkstra
  2018-09-28 21:30 ` Gabriel F. T. Gomes
  0 siblings, 1 reply; 7+ messages in thread
From: Wilco Dijkstra @ 2018-09-05 13:38 UTC (permalink / raw)
  To: libc-alpha; +Cc: nd

Improve strstr performance for the common case of short needles.  For 2-4 
characters a small loop is typically fastest.  On large strings the speedup
with a needle size of 4 is about 65%.

Passes GLIBC regression tests. OK for commit?

ChangeLog:
2018-09-05  Wilco Dijkstra  <wdijkstr@arm.com>

	* string/strstr.c (strstr2): Add new function.
	(strstr3): Likewise.
	(strstr3): Likewise.
	(STRSTR): Add special cases for short needles.

--

diff --git a/string/strstr.c b/string/strstr.c
index 33acdc5442d3e9031298a56e4f52a19e2ffa7d84..dd8663494ef391f34f7ed0a961d6c0e399bfdda9 100644
--- a/string/strstr.c
+++ b/string/strstr.c
@@ -46,6 +46,50 @@
 #define STRSTR strstr
 #endif
 
+
+static inline char *
+strstr2 (const char *hs, const char *ne)
+{
+  uint32_t h1 = (ne[0] << 16) | ne[1];
+  uint32_t h2 = 0;
+  int c = hs[0];
+  while (h1 != h2 && c != 0)
+    {
+      h2 = (h2 << 16) | c;
+      c = *++hs;
+    }
+  return h1 == h2 ? (char *)hs - 2 : NULL;
+}
+
+static inline char *
+strstr3 (const char *hs, const char *ne)
+{
+  uint32_t h1 = (ne[0] << 24) | (ne[1] << 16) | (ne[2] << 8);
+  uint32_t h2 = 0;
+  int c = hs[0];
+  while (h1 != h2 && c != 0)
+    {
+      h2 = (h2 | c) << 8;
+      c = *++hs;
+    }
+  return h1 == h2 ? (char *)hs - 3 : NULL;
+}
+
+static inline char *
+strstr4 (const char *hs, const char *ne)
+{
+  uint32_t h1 = (ne[0] << 24) | (ne[1] << 16) | (ne[2] << 8) | ne[3];
+  uint32_t h2 = 0;
+  int c = hs[0];
+  while (h1 != h2 && c != 0)
+    {
+      h2 = (h2 << 8) | c;
+      c = *++hs;
+    }
+  return h1 == h2 ? (char *)hs - 4 : NULL;
+}
+
+
 /* Return the first occurrence of NEEDLE in HAYSTACK.  Return HAYSTACK
    if NEEDLE is empty, otherwise NULL if NEEDLE is not found in
    HAYSTACK.  */
@@ -64,6 +108,13 @@ STRSTR (const char *haystack, const char *needle)
   if (haystack == NULL || needle[1] == '\0')
     return (char *) haystack;
 
+  if (needle[2] == '\0')
+    return strstr2 (haystack, needle);
+  if (needle[3] == '\0')
+    return strstr3 (haystack, needle);
+  if (needle[4] == '\0')
+    return strstr4 (haystack, needle);
+
   /* Ensure HAYSTACK length is at least as long as NEEDLE length.
      Since a match may occur early on in a huge HAYSTACK, use strnlen
      and read ahead a few cachelines for improved performance.  */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Improve strstr performance of short needles
  2018-09-05 13:38 [PATCH] Improve strstr performance of short needles Wilco Dijkstra
@ 2018-09-28 21:30 ` Gabriel F. T. Gomes
  2018-09-29  0:21   ` Wilco Dijkstra
  0 siblings, 1 reply; 7+ messages in thread
From: Gabriel F. T. Gomes @ 2018-09-28 21:30 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: libc-alpha, nd

On Wed, 05 Sep 2018, Wilco Dijkstra wrote:

>Improve strstr performance for the common case of short needles.  For 2-4 
>characters a small loop is typically fastest.  On large strings the speedup
>with a needle size of 4 is about 65%.

I ran some tests on a powerpc64le machine and I noticed that, as the
haystack becomes larger, the search for a needle that is *not* in the
haystack, became slower with this change.

For instance, when the haystack is 32 chars long, searching for a 4 chars
long needle that is not in the haystack [1] took ~11% to ~22% longer to
finish, when compared to the pristine code.  When the needle *is* in the
haystack, that did not happen and the performance was usually 30% better.

In my tests, this is also true for needles 2 and 3 chars long.


Have you observed similar results in your tests?


[1] Tests with the summary similar to this:
      Length   32/4, alignment 0/ 0, fail:

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Improve strstr performance of short needles
  2018-09-28 21:30 ` Gabriel F. T. Gomes
@ 2018-09-29  0:21   ` Wilco Dijkstra
  2018-09-29  1:01     ` Siddhesh Poyarekar
  2018-10-01 17:58     ` Wilco Dijkstra
  0 siblings, 2 replies; 7+ messages in thread
From: Wilco Dijkstra @ 2018-09-29  0:21 UTC (permalink / raw)
  To: Gabriel F. T. Gomes; +Cc: libc-alpha, nd

Gabriel F. T. Gomes wrote:
> On Wed, 05 Sep 2018, Wilco Dijkstra wrote:
>
>>Improve strstr performance for the common case of short needles.  For 2-4 
>>characters a small loop is typically fastest.  On large strings the speedup
>>with a needle size of 4 is about 65%.
>
> I ran some tests on a powerpc64le machine and I noticed that, as the
> haystack becomes larger, the search for a needle that is *not* in the
> haystack, became slower with this change.

That seems odd, particularly since the Two-Way algorithm is pretty slow
for small needles (large preprocessing overhead and small skips).

> For instance, when the haystack is 32 chars long, searching for a 4 chars
> long needle that is not in the haystack [1] took ~11% to ~22% longer to
> finish, when compared to the pristine code.  When the needle *is* in the
> haystack, that did not happen and the performance was usually 30% better.
>
> Have you observed similar results in your tests?

I haven't looked in detail at the GLIBC strstr benchmark results. I'm running
tests with natural language texts from 64 bytes to 256KB. Generally speaking
the short needle Two-Way algorithm is terribly slow on both short and long
haystacks. Short needles make it worse since it can't skip much, so there isn't
any point in keeping it (I posted a much faster replacement in newlib).

Wilco

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Improve strstr performance of short needles
  2018-09-29  0:21   ` Wilco Dijkstra
@ 2018-09-29  1:01     ` Siddhesh Poyarekar
  2018-10-01 17:58     ` Wilco Dijkstra
  1 sibling, 0 replies; 7+ messages in thread
From: Siddhesh Poyarekar @ 2018-09-29  1:01 UTC (permalink / raw)
  To: Wilco Dijkstra, Gabriel F. T. Gomes; +Cc: libc-alpha, nd

On 29/09/18 5:51 AM, Wilco Dijkstra wrote:
> I haven't looked in detail at the GLIBC strstr benchmark results. I'm running
> tests with natural language texts from 64 bytes to 256KB. Generally speaking

Would you be able to add a test that emulates that workload to glibc 
benchtests?

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Improve strstr performance of short needles
  2018-09-29  0:21   ` Wilco Dijkstra
  2018-09-29  1:01     ` Siddhesh Poyarekar
@ 2018-10-01 17:58     ` Wilco Dijkstra
  2018-10-01 22:08       ` Gabriel F. T. Gomes
  2018-10-02 10:39       ` Siddhesh Poyarekar
  1 sibling, 2 replies; 7+ messages in thread
From: Wilco Dijkstra @ 2018-10-01 17:58 UTC (permalink / raw)
  To: Gabriel F. T. Gomes; +Cc: libc-alpha, nd

Hi Gabriel,

>> I ran some tests on a powerpc64le machine and I noticed that, as the
>> haystack becomes larger, the search for a needle that is *not* in the
>> haystack, became slower with this change.
>
> That seems odd, particularly since the Two-Way algorithm is pretty slow
> for small needles (large preprocessing overhead and small skips).

I looked at bench-strstr again. There are cases where Two-Way wins by
10-15% when the string is found but for short strings the linear loop wins.
On average I can't see any performance difference.

However when iterating through 16K needles randomly chosen from natural
text I get the following speedups for different haystack and needle sizes:

    H64  H256 H4096
N2: 64%  73%  46%
N3: 74%  66%  37%
N4: 94%  92%  66%

As you can see the speedup is larger for small needles since the preprocessing
overhead of the Two-Way algorithm is significant. This is completely hidden by
the GLIBC test due to using just 1 needle.

Wilco

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Improve strstr performance of short needles
  2018-10-01 17:58     ` Wilco Dijkstra
@ 2018-10-01 22:08       ` Gabriel F. T. Gomes
  2018-10-02 10:39       ` Siddhesh Poyarekar
  1 sibling, 0 replies; 7+ messages in thread
From: Gabriel F. T. Gomes @ 2018-10-01 22:08 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: libc-alpha, nd

On Mon, 01 Oct 2018, Wilco Dijkstra wrote:

>>> I ran some tests on a powerpc64le machine and I noticed that, as the
>>> haystack becomes larger, the search for a needle that is *not* in the
>>> haystack, became slower with this change.  
>>
>> That seems odd, particularly since the Two-Way algorithm is pretty slow
>> for small needles (large preprocessing overhead and small skips).  
>
>I looked at bench-strstr again. There are cases where Two-Way wins by
>10-15% when the string is found but for short strings the linear loop wins.
>On average I can't see any performance difference.

I ran bench tests with your patch on an x86 machine, and I noticed the
same behaviour that I described previously, however on fewer occasions
(never when needle is 2-chars long, and less frequently for 3- and 4-chars
long needles).

Both on powerpc64le and x86, the overall performance gain for bench tests
looks positive, so I don't actually see a compelling reason to not commit
this patch.  The intention of my question was to know if you have noticed
something similar in your tests, and I'm glad to hear that you have used a
different benchmark.

>However when iterating through 16K needles randomly chosen from natural
>text I get the following speedups for different haystack and needle sizes:
>
>    H64  H256 H4096
>N2: 64%  73%  46%
>N3: 74%  66%  37%
>N4: 94%  92%  66%

Could you add this information (data and explanation) to the commit
message?  It would be good to be able to read it some years in the future
and know that a different benchmark (not bench tests) was used to produce
these numbers. 

>As you can see the speedup is larger for small needles since the preprocessing
>overhead of the Two-Way algorithm is significant. This is completely hidden by
>the GLIBC test due to using just 1 needle.

Agreed.

Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Improve strstr performance of short needles
  2018-10-01 17:58     ` Wilco Dijkstra
  2018-10-01 22:08       ` Gabriel F. T. Gomes
@ 2018-10-02 10:39       ` Siddhesh Poyarekar
  1 sibling, 0 replies; 7+ messages in thread
From: Siddhesh Poyarekar @ 2018-10-02 10:39 UTC (permalink / raw)
  To: Wilco Dijkstra, Gabriel F. T. Gomes; +Cc: libc-alpha, nd

On 01/10/18 10:35 PM, Wilco Dijkstra wrote:
> However when iterating through 16K needles randomly chosen from natural
> text I get the following speedups for different haystack and needle sizes:
> 
>      H64  H256 H4096
> N2: 64%  73%  46%
> N3: 74%  66%  37%
> N4: 94%  92%  66%
> 
> As you can see the speedup is larger for small needles since the preprocessing
> overhead of the Two-Way algorithm is significant. This is completely hidden by
> the GLIBC test due to using just 1 needle.

Can you please add this test to benchtests?

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-10-02  3:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-05 13:38 [PATCH] Improve strstr performance of short needles Wilco Dijkstra
2018-09-28 21:30 ` Gabriel F. T. Gomes
2018-09-29  0:21   ` Wilco Dijkstra
2018-09-29  1:01     ` Siddhesh Poyarekar
2018-10-01 17:58     ` Wilco Dijkstra
2018-10-01 22:08       ` Gabriel F. T. Gomes
2018-10-02 10:39       ` Siddhesh Poyarekar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).