public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Improve memmem.
@ 2015-05-13  9:48 Ondřej Bílka
  2015-05-13 16:31 ` Paul Eggert
  2015-05-14  0:34 ` [PATCH] " Carlos O'Donell
  0 siblings, 2 replies; 8+ messages in thread
From: Ondřej Bílka @ 2015-05-13  9:48 UTC (permalink / raw)
  To: libc-alpha

Hi.

This is same optimization for memmem as for strstr.
On x64 memmem performance sucks as its order of magnitude slower than strstr.
This equalizes performance somewhat and improves it in general case.

OK to commit this?


diff --git a/string/memmem.c b/string/memmem.c
index 8a81f65..ed21ee4 100644
--- a/string/memmem.c
+++ b/string/memmem.c
@@ -46,6 +46,10 @@ __memmem (const void *haystack_start, size_t haystack_len,
      not an array of 'char' values.  See ISO C 99 section 6.2.6.1.  */
   const unsigned char *haystack = (const unsigned char *) haystack_start;
   const unsigned char *needle = (const unsigned char *) needle_start;
+  const unsigned char *haystack_end = (const unsigned char *) 
+				      haystack_start + haystack_len 
+				      - needle_len + 1;
+
 
    if (needle_len == 0)
      /* The first occurrence of the empty string is deemed to occur at
@@ -57,6 +61,29 @@ __memmem (const void *haystack_start, size_t haystack_len,
   if (__glibc_unlikely (haystack_len < needle_len))
     return NULL;
 
+  if (needle_len == 1)
+    return memchr (haystack, needle[0], haystack_end - haystack);
+
+  while ((haystack = memchr (haystack, needle[0], haystack_end - haystack)))
+    {
+      if (haystack[1] == needle[1] 
+          && (needle_len == 2 || haystack[2] == needle[2]))
+        {
+          if (needle_len == 2)
+            return haystack;
+
+          if (!memcmp (haystack + 2, needle + 2, needle_len - 2))
+            return haystack;
+          else  
+            goto slow_path;
+        }
+      haystack++;
+    }
+
+  return NULL;
+
+  slow_path:;
+
   /* Use optimizations in memchr when possible, to reduce the search
      size of haystack using a linear algorithm with a smaller
      coefficient.  However, avoid memchr for long needles, since we

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Improve memmem.
  2015-05-13  9:48 [PATCH] Improve memmem Ondřej Bílka
@ 2015-05-13 16:31 ` Paul Eggert
  2015-05-13 18:51   ` [PATCH v2] " Ondřej Bílka
  2015-05-14  0:34 ` [PATCH] " Carlos O'Donell
  1 sibling, 1 reply; 8+ messages in thread
From: Paul Eggert @ 2015-05-13 16:31 UTC (permalink / raw)
  To: Ondřej Bílka, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 816 bytes --]

Ondřej Bílka wrote:
> +  if (needle_len == 1)
> +    return memchr (haystack, needle[0], haystack_end - haystack);
> +
> +  while ((haystack = memchr (haystack, needle[0], haystack_end - haystack)))
> +    {
> +      if (haystack[1] == needle[1]
> +          && (needle_len == 2 || haystack[2] == needle[2]))
> +        {
> +          if (needle_len == 2)
> +            return haystack;
> +
> +          if (!memcmp (haystack + 2, needle + 2, needle_len - 2))
> +            return haystack;
> +          else
> +            goto slow_path;
> +        }
> +      haystack++;
> +    }
> +
> +  return NULL;
> +
> +  slow_path:;

First, that "haystack[1] == needle[1]" could access past the end of the 
haystack.  Second, can't this be rewritten to avoid that ugly goto?  Something 
like the attached, perhaps.

[-- Attachment #2: memmem-fragment.c --]
[-- Type: text/x-csrc, Size: 412 bytes --]

  if (needle_len == 1)
    return memchr (haystack, needle[0], haystack_end - haystack);

  for (; ; haystack++)
    {
      haystack = memchr (haystack, needle[0],
			 haystack_end - needle_len + 1 - haystack);
      if (!haystack)
	return NULL;
      if (haystack[1] == needle[1])
	{
	  if (needle_len == 2
	      || !memcmp (haystack + 2, needle + 2, needle_len - 2))
	    return haystack;
	  break;
	}
    }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] Improve memmem.
  2015-05-13 16:31 ` Paul Eggert
@ 2015-05-13 18:51   ` Ondřej Bílka
  2015-05-14  1:07     ` Paul Eggert
  0 siblings, 1 reply; 8+ messages in thread
From: Ondřej Bílka @ 2015-05-13 18:51 UTC (permalink / raw)
  To: Paul Eggert; +Cc: libc-alpha

On Wed, May 13, 2015 at 09:31:06AM -0700, Paul Eggert wrote:
> Ondřej Bílka wrote:
> >+  if (needle_len == 1)
> >+    return memchr (haystack, needle[0], haystack_end - haystack);
> >+
> >+  while ((haystack = memchr (haystack, needle[0], haystack_end - haystack)))
> >+    {
> >+      if (haystack[1] == needle[1]
> >+          && (needle_len == 2 || haystack[2] == needle[2]))
> >+        {
> >+          if (needle_len == 2)
> >+            return haystack;
> >+
> >+          if (!memcmp (haystack + 2, needle + 2, needle_len - 2))
> >+            return haystack;
> >+          else
> >+            goto slow_path;
> >+        }
> >+      haystack++;
> >+    }
> >+
> >+  return NULL;
> >+
> >+  slow_path:;
> 
> First, that "haystack[1] == needle[1]" could access past the end of
> the haystack.  Second, can't this be rewritten to avoid that ugly
> goto?  Something like the attached, perhaps.

You wont get illegal acces. Unless I messed up haystack_end calculation.

Here is equivalent without goto. It uses implementation behaviour that
memcmp(x,y,0) doesn't touch anything.

Is this better?

	* string/memmem.c (__memmem): Optimize hot path.

diff --git a/string/memmem.c b/string/memmem.c
index 8a81f65..088a2f1 100644
--- a/string/memmem.c
+++ b/string/memmem.c
@@ -46,6 +46,10 @@ __memmem (const void *haystack_start, size_t haystack_len,
      not an array of 'char' values.  See ISO C 99 section 6.2.6.1.  */
   const unsigned char *haystack = (const unsigned char *) haystack_start;
   const unsigned char *needle = (const unsigned char *) needle_start;
+  const unsigned char *haystack_end = (const unsigned char *)
+                                      haystack_start + haystack_len
+                                      - needle_len + 1;
+
 
   if (needle_len == 0)
     /* The first occurrence of the empty string is deemed to occur at
@@ -57,6 +61,25 @@ __memmem (const void *haystack_start, size_t haystack_len,
   if (__glibc_unlikely (haystack_len < needle_len))
     return NULL;
 
+   if (needle_len == 1)
+     return memchr (haystack, needle[0], haystack_end - haystack);
+
+   for (; ;haystack++) 
+     {
+       haystack = memchr (haystack, needle[0], haystack_end - haystack);
+
+       if (!haystack)
+         return NULL;
+
+       if (haystack[1] == needle[1]
+           && (needle_len == 2 || haystack[2] == needle[2]))
+         {
+           if (!memcmp (haystack + 2, needle + 2, needle_len - 2))
+             return (void *) haystack;
+           break;
+         }
+     }
+ 
   /* Use optimizations in memchr when possible, to reduce the search
      size of haystack using a linear algorithm with a smaller
      coefficient.  However, avoid memchr for long needles, since we

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Improve memmem.
  2015-05-13  9:48 [PATCH] Improve memmem Ondřej Bílka
  2015-05-13 16:31 ` Paul Eggert
@ 2015-05-14  0:34 ` Carlos O'Donell
  1 sibling, 0 replies; 8+ messages in thread
From: Carlos O'Donell @ 2015-05-14  0:34 UTC (permalink / raw)
  To: Ondřej Bílka, libc-alpha

On 05/12/2015 08:03 PM, Ondřej Bílka wrote:
> Hi.
> 
> This is same optimization for memmem as for strstr.
> On x64 memmem performance sucks as its order of magnitude slower than strstr.
> This equalizes performance somewhat and improves it in general case.
> 
> OK to commit this?

What did you use to test the performance?

c.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Improve memmem.
  2015-05-13 18:51   ` [PATCH v2] " Ondřej Bílka
@ 2015-05-14  1:07     ` Paul Eggert
  2015-05-14  9:29       ` Ondřej Bílka
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Eggert @ 2015-05-14  1:07 UTC (permalink / raw)
  To: Ondřej Bílka; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 597 bytes --]

Ondřej Bílka wrote:

> You wont get illegal acces.

I don't see why not.  If memchr returns haystack_end - 1, haystack[1] will be 
equivalent to *haystack_end, i.e., one past the end of the haystack.

> Here is equivalent without goto. It uses implementation behaviour that
> memcmp(x,y,0) doesn't touch anything.

Better, but I think it still has the problem.  Also, come to think of it, it's 
not clear that we should bother with the 'needle_len == 2' test (are needles of 
length 2 really that common?), and things are simpler and smaller without it, so 
how about the attached instead?


[-- Attachment #2: memmem-fragment-2.c --]
[-- Type: text/x-csrc, Size: 438 bytes --]

  if (needle_len == 1)
    return memchr (haystack, needle[0], haystack_end - haystack);

  for (; ;haystack++)
    {
      haystack = memchr (haystack, needle[0],
			 haystack_end - needle_len + 1 - haystack);

      if (!haystack)
        return NULL;

      if (haystack[1] == needle[1])
        {
          if (!memcmp (haystack + 2, needle + 2, needle_len - 2))
            return (void *) haystack;
          break;
        }
    }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Improve memmem.
  2015-05-14  1:07     ` Paul Eggert
@ 2015-05-14  9:29       ` Ondřej Bílka
  2015-05-15  6:08         ` Paul Eggert
  0 siblings, 1 reply; 8+ messages in thread
From: Ondřej Bílka @ 2015-05-14  9:29 UTC (permalink / raw)
  To: Paul Eggert; +Cc: libc-alpha

On Wed, May 13, 2015 at 06:06:58PM -0700, Paul Eggert wrote:
> Ondřej Bílka wrote:
> 
> >You wont get illegal acces.
> 
> I don't see why not.  If memchr returns haystack_end - 1,
> haystack[1] will be equivalent to *haystack_end, i.e., one past the
> end of the haystack.
>
I am using different end here

+  const unsigned char *haystack_end = (const unsigned char *)
+                                      haystack_start + haystack_len
+                                      - needle_len + 1;

 
> >Here is equivalent without goto. It uses implementation behaviour that
> >memcmp(x,y,0) doesn't touch anything.
> 
> Better, but I think it still has the problem.  Also, come to think
> of it, it's not clear that we should bother with the 'needle_len ==
> 2' test (are needles of length 2 really that common?), and things
> are simpler and smaller without it, so how about the attached
> instead?
>
Reason is performance. Idea is to use this algorithm and switch to slow
two-way algorithm only for pathological inputs.

This uses simple heuristic to look only for first k characters and once
it matches character beyond k'th switch to two-way algorithm.

With single memchr as now its heuristic with k=1. What are you
suggesting is k=2. My algorithm uses k=3.

Main motivations is that pairs are still too common and in longer switch
you will quickly switch to slow two-way algorithm.

I could use more sophisticated accounting of runtime cost but it would
likely cost more than benefit as most inputs are quite small.

Finally there is buy-or-rent problem. Computing critical factorization
is very costy, often several times more expensive than using naive
algorithm. A solution to limit loss is buy-or-rent approach. You run a
naive algorithm even if its slower until you reach time diffierence
equal of doing factorization, then you switch to two-way algorithm and
you are at most twice slower than if you guessed correctly what to use.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Improve memmem.
  2015-05-14  9:29       ` Ondřej Bílka
@ 2015-05-15  6:08         ` Paul Eggert
  2015-05-24 21:39           ` Ondřej Bílka
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Eggert @ 2015-05-15  6:08 UTC (permalink / raw)
  To: Ondřej Bílka; +Cc: libc-alpha

Ondřej Bílka wrote:
> I am using different end here
>
> +  const unsigned char *haystack_end = (const unsigned char *)
> +                                      haystack_start + haystack_len
> +                                      - needle_len + 1;

Ah, sorry, didn't see that.  But in that case the name 'haystack_end' is 
misleading -- that's not the haystack's end, but is something else.  So a 
renaming would appear to be in order.

> Main motivations is that pairs are still too common

Too common where?  Do we have traces of actual programs?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Improve memmem.
  2015-05-15  6:08         ` Paul Eggert
@ 2015-05-24 21:39           ` Ondřej Bílka
  0 siblings, 0 replies; 8+ messages in thread
From: Ondřej Bílka @ 2015-05-24 21:39 UTC (permalink / raw)
  To: Paul Eggert; +Cc: libc-alpha

On Thu, May 14, 2015 at 11:08:42PM -0700, Paul Eggert wrote:
> Ondřej Bílka wrote:
> >I am using different end here
> >
> >+  const unsigned char *haystack_end = (const unsigned char *)
> >+                                      haystack_start + haystack_len
> >+                                      - needle_len + 1;
> 
> Ah, sorry, didn't see that.  But in that case the name
> 'haystack_end' is misleading -- that's not the haystack's end, but
> is something else.  So a renaming would appear to be in order.
> 
Do you have better suggestion?

> >Main motivations is that pairs are still too common
> 
> Too common where?  Do we have traces of actual programs?

I actually have applications that I use have most haystacks less than 64
bytes so it doesn't make difference.

However its better to be prepared in case programmer uses kb length
haystacks where it would happen. An english digraph th frequency is
around 1% so you will likely switch in first 1/10 of input. For triplets
there could be same problem but I decided to keep it simple,
alternatively could add quadruple check I am open what to use.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-05-24 14:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-13  9:48 [PATCH] Improve memmem Ondřej Bílka
2015-05-13 16:31 ` Paul Eggert
2015-05-13 18:51   ` [PATCH v2] " Ondřej Bílka
2015-05-14  1:07     ` Paul Eggert
2015-05-14  9:29       ` Ondřej Bílka
2015-05-15  6:08         ` Paul Eggert
2015-05-24 21:39           ` Ondřej Bílka
2015-05-14  0:34 ` [PATCH] " Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).