public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Amit Choudhary <amitchoudhary0523@gmail.com>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: Fastest String Search Algorithm.
Date: Wed, 9 Jun 2021 17:12:09 +0530	[thread overview]
Message-ID: <CAFf+5zi5uM9Tuc-ZuQqf4=CRXF9piKsjA7xof0h8wnpnM9DQ_w@mail.gmail.com> (raw)
In-Reply-To: <VE1PR08MB5599EEAB160CB57B1476587F83369@VE1PR08MB5599.eurprd08.prod.outlook.com>

On Wed, Jun 9, 2021 at 4:02 PM Wilco Dijkstra <Wilco.Dijkstra@arm.com>
wrote:

> Hi Amit,
>
> > If my algorithm is beating strstr() for long search patterns then I think
> > that my algorithm should be included in glibc for long search patterns.
>
> I tried it in bench-strstr.c, strstr is often 10 times faster on many
> inputs.
> I only noticed a case with a 32KB needle and 64KB text where it beats
> strstr by
> a factor 3. On complex needles it behaves like basic_strstr (ie. quadratic
> time)


I included my algorithm in bench-strstr.c and ran the tests myself. I could
not find strstr-inputs file so I printed the haystack and needle.

I found out many things which I think are not correct:

1. In pass cases (positive cases where needle is part of haystack), the
needle is either located at the beginning of the haystack or at the end of
the haystack. To mimic real world scenario, the needle should be at the
middle of the haystack.

2. The large text is mostly repetition of two small paragraphs. This again
is not a real world scenario.

3. Some tests for 256 byte needles have haystack and needle as a
combination of characters 'a' and 'b'. No real world text has been used.

4. Tests for 1 KB / 4 KB / 8 KB / 28 KB needles also have  haystack and
needle as a combination of characters 'a' and 'b'. No real world text has
been used.

So, in my opinion strstr() has never been tested extensively for real world
text. I think this is a very big shortcoming.

Regards,
Amit

  reply	other threads:[~2021-06-09 11:42 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 10:31 Wilco Dijkstra
2021-06-09 11:42 ` Amit Choudhary [this message]
2021-06-09 12:05   ` Phil Blundell
2021-06-09 12:11     ` Adhemerval Zanella
2021-06-09 12:22       ` Amit Choudhary
2021-06-09 12:55         ` Adhemerval Zanella
2021-06-09 13:18           ` Amit Choudhary
2021-06-09 13:32             ` Amit Choudhary
2021-06-09 13:40             ` Adhemerval Zanella
2021-06-09 14:08               ` Amit Choudhary
2021-06-09 14:25                 ` Phil Blundell
2021-06-09 12:18     ` Amit Choudhary
2021-06-09 13:48       ` Phil Blundell
2021-06-09 21:44   ` Wilco Dijkstra
2021-06-10  2:42     ` Amit Choudhary
2021-06-10  3:23       ` Dan Raymond
2021-06-10  4:19         ` Amit Choudhary
2021-06-10  9:44       ` Wilco Dijkstra
2021-06-10 11:08         ` Amit Choudhary
2021-06-10 16:53           ` Amit Choudhary
2021-06-10 20:04             ` Amit Choudhary
2021-06-11  8:28               ` Amit Choudhary
2021-06-11 11:45           ` Wilco Dijkstra
2021-06-11 12:42             ` Amit Choudhary
2021-06-11 13:39               ` Amit Choudhary
2021-06-11 13:46               ` Wilco Dijkstra
2021-06-11 15:52                 ` Amit Choudhary
2021-06-11 19:14                   ` Amit Choudhary
2021-06-11 19:26                     ` Paul Eggert
2021-06-11 19:41                       ` Amit Choudhary
2021-06-11 20:00                         ` Dan Raymond
2021-06-12  4:00                           ` Amit Choudhary
2021-06-12  4:18                             ` Amit Choudhary
2021-06-12  7:16                               ` [Final testing with benchmark tests] " Amit Choudhary
2021-06-12  8:01                                 ` Paul Zimmermann
2021-06-12  8:41                                   ` Amit Choudhary
2021-06-12 13:29                                     ` Amit Choudhary
2021-06-12 10:56                                   ` Siddhesh Poyarekar
2021-06-12 13:47                                     ` Amit Choudhary
2021-06-13  3:32                                       ` Siddhesh Poyarekar
2021-06-13  4:06                                         ` Amit Choudhary
2021-06-11 13:57               ` Paul Zimmermann
2021-06-11 14:04                 ` Amit Choudhary
2021-06-11 14:16                   ` Paul Zimmermann
2021-06-11 14:27                     ` Amit Choudhary
  -- strict thread matches above, loose matches on Subject: below --
2021-06-07 16:17 Wilco Dijkstra
2021-06-07 17:33 ` Florian Weimer
2021-06-08  4:52   ` Siddhesh Poyarekar
2021-06-07 14:46 Wilco Dijkstra
2021-06-07 15:21 ` Amit Choudhary
2021-06-07 16:02   ` Amit Choudhary
2021-06-07 16:04   ` Wilco Dijkstra
2021-06-07 10:53 Wilco Dijkstra
2021-06-07 12:33 ` Amit Choudhary
2021-06-08  6:33 ` Amit Choudhary
2021-06-08  7:25   ` Amit Choudhary
2021-06-06 14:15 Amit Choudhary
2021-06-06 14:30 ` H.J. Lu
2021-06-06 18:38   ` Amit Choudhary
2021-06-09  5:33   ` Guo, Wangyang
2021-06-09  6:18     ` Amit Choudhary
2021-06-09  6:50       ` Siddhesh Poyarekar
2021-06-09  7:38       ` Phil Blundell
2021-06-09  7:54         ` Amit Choudhary
2021-06-09  9:26           ` Phil Blundell
2021-06-06 17:58 ` Paul Eggert
2021-06-06 18:26   ` Amit Choudhary
2021-06-07 11:48     ` Amit Choudhary
2021-06-07 20:12       ` Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFf+5zi5uM9Tuc-ZuQqf4=CRXF9piKsjA7xof0h8wnpnM9DQ_w@mail.gmail.com' \
    --to=amitchoudhary0523@gmail.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).