public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/62318] New: optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm
@ 2014-08-31  6:48 jmichae3 at yahoo dot com
  2014-08-31  6:50 ` [Bug libstdc++/62318] " jmichae3 at yahoo dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: jmichae3 at yahoo dot com @ 2014-08-31  6:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62318

            Bug ID: 62318
           Summary: optimization of string searches using Nigel Horspool's
                    Boyer-Moore-Horpool algorithm
           Product: gcc
           Version: 4.9.1
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jmichae3 at yahoo dot com

optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool
algorithm is what I would like to see. right now it's just a plain old, slow
O(n) loop. in just about any OS, at least half of the funcrtions are string
functions.

the C library should be modified as well, so should the same in objc and objc++
and whatever compiler in GNU compiler set you can put it in.

in C libraries I typically see simple while loops. this can be done much better
with only a little more code, the algorithm is very short.

this could be sped WAY up using this algorithm. google Nigel Horspool and see
if you can get permission to use his algorithm.

this could apply to <algorithm>'s std::find(), std::string.find()
and you would not need to change the function signature.

there are probably other uses for this algorithm I am not thinking of right
now, but if you put your thinking cap on to see where this could fit, I am sure
you could find some choice places for it.

"1980 Practical Fast Searching in Strings" under Publications
http://webhome.cs.uvic.ca/~nigelh/

if you are compiling an OS or OS kernel with this, I would suspect you would
see at least a 2-3x+ general performance improvement, maybe a lot more (though
I have done no testing) due to the fact that most every OS API call has some
sort of string parameter in it. 

Donald Knuth's The Art Of Computer Programming volumes 1-5 has quite an array
of algorithms, and it's (as far as I know) a well known set that's been around
for years.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/62318] optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm
  2014-08-31  6:48 [Bug libstdc++/62318] New: optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm jmichae3 at yahoo dot com
@ 2014-08-31  6:50 ` jmichae3 at yahoo dot com
  2014-08-31  7:37 ` schwab@linux-m68k.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: jmichae3 at yahoo dot com @ 2014-08-31  6:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62318

Jim Michaels <jmichae3 at yahoo dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                URL|                            |http://webhome.cs.uvic.ca/~
                   |                            |nigelh/pubs.html

--- Comment #1 from Jim Michaels <jmichae3 at yahoo dot com> ---
added URL for his publications which include the string searching algorithm.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/62318] optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm
  2014-08-31  6:48 [Bug libstdc++/62318] New: optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm jmichae3 at yahoo dot com
  2014-08-31  6:50 ` [Bug libstdc++/62318] " jmichae3 at yahoo dot com
@ 2014-08-31  7:37 ` schwab@linux-m68k.org
  2014-08-31 12:32 ` manu at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: schwab@linux-m68k.org @ 2014-08-31  7:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62318

--- Comment #2 from Andreas Schwab <schwab@linux-m68k.org> ---
*** Bug 62317 has been marked as a duplicate of this bug. ***


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/62318] optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm
  2014-08-31  6:48 [Bug libstdc++/62318] New: optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm jmichae3 at yahoo dot com
  2014-08-31  6:50 ` [Bug libstdc++/62318] " jmichae3 at yahoo dot com
  2014-08-31  7:37 ` schwab@linux-m68k.org
@ 2014-08-31 12:32 ` manu at gcc dot gnu.org
  2014-09-01 15:34 ` joseph at codesourcery dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: manu at gcc dot gnu.org @ 2014-08-31 12:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62318

Manuel López-Ibáñez <manu at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |manu at gcc dot gnu.org

--- Comment #3 from Manuel López-Ibáñez <manu at gcc dot gnu.org> ---
(In reply to Jim Michaels from comment #0)
> optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool
> algorithm is what I would like to see. right now it's just a plain old, slow
> O(n) loop. in just about any OS, at least half of the funcrtions are string
> functions.

Thanks for the advice. But the best chances of seeing this ever implemented is
to take the challenge yourself. Join the mailing lists, submit patches and
benchmark results and prove your point. See https://gcc.gnu.org/contribute.html

> the C library should be modified as well, so should the same in objc and
> objc++ and whatever compiler in GNU compiler set you can put it in.

The C library is a separate project. http://www.gnu.org/software/libc/libc.html

Perhaps it would be easier to start contributing there. But the same thing
applies: Ideas abound, what is needed is people willing and able to put them in
practice.

> this could apply to <algorithm>'s std::find(), std::string.find()
> and you would not need to change the function signature.

Again, this will probably only ever happen if you implement it and show with
numbers that you are right.
>From gcc-bugs-return-459566-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Sun Aug 31 13:56:44 2014
Return-Path: <gcc-bugs-return-459566-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 4288 invoked by alias); 31 Aug 2014 13:56:44 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 4255 invoked by uid 48); 31 Aug 2014 13:56:40 -0000
From: "simanashrestha at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug boehm-gc/62328] New: HD* watch Guardians of the Galaxy full movie online
Date: Sun, 31 Aug 2014 13:56:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: boehm-gc
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: simanashrestha at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter attachments.created
Message-ID: <bug-62328-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-08/txt/msg02063.txt.bz2
Content-length: 2853

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62328

            Bug ID: 62328
           Summary: HD* watch Guardians of the Galaxy full movie online
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: boehm-gc
          Assignee: unassigned at gcc dot gnu.org
          Reporter: simanashrestha at gmail dot com

Created attachment 33422
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33422&action=edit
guardians of the galaxy

Click here>>>  http://tinyurl.com/pqhy2ob <<< To watch guardians of the galaxy
full movie online

Click here>>>  http://tinyurl.com/pqhy2ob <<< To watch guardians of the galaxy
full movie online

Click here>>>  http://tinyurl.com/pqhy2ob <<< To watch guardians of the galaxy
full movie online

Watch Guardians of the galaxy online full movie
Watch Marvels Guardians of the galaxy full movie online
Watch Guardians of the galaxy 2014 free full online movie
Watch Guardians of the galaxy 2014 full free online movie in hd
Watch Guardians of the galaxy 2014 full movie online in putlocker
Watch Guardians of the galaxy 2014 full movie online in Viooz
Watch Guardians of the galaxy 2014 full movie online in Megasharevideo
Watch Guardians of the galaxy 2014 full movie online in megavideo
Watch Guardians of the galaxy 2014 online full movie with subtitle
Watch Guardians of the galaxy 2014 full movie in youtube
Watch Guardians of the galaxy 2014 official trailer
Watch Guardians of the galaxy 2014 in 1 channel
Watch Guardians of the galaxy 2014 full online movie
Watch Guardians of the galaxy 2014 full online streaming movie
Watch Guardians of the galaxy 2014 full online streaming free movie
Watch Guardians of the galaxy 2014 free full movie online
guardians of the galaxy movie trailer


After long time of your waiting, now you can watch Guardians of the galaxy full
movie online. You can easily access this site where you can easily and
instantly watch Guardians of the galaxy 2014 full online movie. If you are busy
in your personal affairs and works and don’t have enough time to go and watch
guardians of the galaxy 2014 full movie online at that situation remember us to
watch guardians of the galaxy 2014 movie in high quality.  You will able to
acquire 100% fun and entertainment after you watch Guardians of the galaxy 2014
free full online movie here.

Guardians of the galaxy 2014 is directed by James Gunn and written by James
Gunn and Nicole Perlman. This movie which is based on action, adventure,
science fiction and fantasy genre is released on August 1, 2014 in theatres. It
is running time of 2 hours and 1 minute. The cast of this movie are Chris
Pratt, Zoe Saldana, Dave Bautista, Vin Diesal, Bradley Cooper, Lee Pace,
Michael Rooker, Karen Gillan etc…
>From gcc-bugs-return-459565-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Sun Aug 31 13:56:30 2014
Return-Path: <gcc-bugs-return-459565-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 3581 invoked by alias); 31 Aug 2014 13:56:29 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 3538 invoked by uid 48); 31 Aug 2014 13:56:24 -0000
From: "redi at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libstdc++/62318] optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm
Date: Sun, 31 Aug 2014 13:56:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: libstdc++
X-Bugzilla-Version: 4.9.1
X-Bugzilla-Keywords:
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: redi at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-62318-4-PtDFAmlBzK@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-62318-4@http.gcc.gnu.org/bugzilla/>
References: <bug-62318-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-08/txt/msg02062.txt.bz2
Content-length: 444

https://gcc.gnu.org/bugzilla/show_bug.cgi?idb318

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3905.html which
proposes adding it to the C++ standard library as a separate API. The new
components are part of the Library Fundamentals TS:
http://cplusplus.github.io/fundamentals-ts/main.html#func

That part of the TS has not yet been implemented in libstdc++.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/62318] optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm
  2014-08-31  6:48 [Bug libstdc++/62318] New: optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm jmichae3 at yahoo dot com
                   ` (2 preceding siblings ...)
  2014-08-31 12:32 ` manu at gcc dot gnu.org
@ 2014-09-01 15:34 ` joseph at codesourcery dot com
  2014-09-21  8:47 ` jmichae3 at yahoo dot com
  2014-10-29 18:33 ` redi at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: joseph at codesourcery dot com @ 2014-09-01 15:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62318

--- Comment #5 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
glibc's strstr already uses an asymptotically fast (i.e. O(m+n) instead of 
O(mn)) algorithm.  See string/str-two-way.h.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/62318] optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm
  2014-08-31  6:48 [Bug libstdc++/62318] New: optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm jmichae3 at yahoo dot com
                   ` (3 preceding siblings ...)
  2014-09-01 15:34 ` joseph at codesourcery dot com
@ 2014-09-21  8:47 ` jmichae3 at yahoo dot com
  2014-10-29 18:33 ` redi at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: jmichae3 at yahoo dot com @ 2014-09-21  8:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62318

--- Comment #6 from Jim Michaels <jmichae3 at yahoo dot com> ---
ummm. I could get personal permission to use the algorithm. but that does not
give gnu permission to use the algorithm. that's why I posted this here.  that
does not necessarily mean I could post the algorithm to GNU, because I am not
associated with GNU (I don't intend to bristle someone), I am just a user of
gcc trying to make a difference in speed of the compiler. I have an i7-3970x
64GB RAM on windows and I am doing file processing that should *not* take 5
minutes for a 1ast timing, bare or MT19937 random number writes I can do
86.437894MB/s on a hard drive. but my programs that do std::string processing
are many many times slower. 

other compilers out there are just as bad. I was hoping to make a dent in this
performance sluggishness... I am just saying this cpu I have should blaze a
trail. most of that program is a fair amount of string processing, lookups in
vectors, vector structures, etc. I avoided using any of my own classes.

most of this stuff is processed in RAM, so it's got to be the (I got the O()
notation wrong) algorithm. Boyer-Moore is a very good general string search
algorithm (but only for 8 and 16-bit characters) that's been around for at
least 20 years. should not be hard at all to convert. I might do it to my
personal compiler myself if the ACM article and permission doesn't cost too
much. I am using mingw-w64 windows x64 and x32 target, windows x64 host port of
gcc. 
http://www.cs.utexas.edu/users/moore/publications/fstrpos.pdf
http://www.cs.utexas.edu/users/moore

my only concern is that if I were to post the algorithm, me not being in the
GNU organization, GNU would still have to get separate permission, first. I
could say "OK, I've got it, get permission, then I will post the patch." but I
write commercial code. so then there's the whole FSF thing and "tainted code"
(I intend to distribute my source code and some libraries if I am allowed).

someone said Horspool involved startup cost building lookup tables and therein
lies the difficulty, so I got that part wrong I just found out. it is used in
std::search. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3905.html

I already have code which converts a stream of bytes to UTF-8 and back based on
http://doc.cat-v.org/bell_labs/utf-8_history

but that's beside the point.

I would be very happy to just jump in and render a patch. I just wanted to get
the permissions thing done proper. it's a VERY SIMPLE algorithm, all of about
10-20 lines. looked like a piece of cake. I am just concerned about my
"tainting" of your code (?) and permissions, etc.

is this something that I, not associated with GNU, can do? yes, no? my guess is
no. that is why I made this request. I know *you* can do it without causing
problems.

http://www.boost.org/doc/libs/1_50_0/libs/algorithm/doc/html/algorithm/Searching.html

however, boyer-moore would not be recommendable for u32string, since it would
allocate an array that has 2^32 32-bit (16GB) elements as an "alphabet" (part
of the startup costs). 65536 16-bit (128KB) elements for u16string, 256 bytes
for 8-bit char_t. you get the idea. 32-bit targets would obviously fail to
compile, and significant memory loss on 64-bit systems, and they would fail to
run on average joe's box.

so I guess instead of horspool's algorithm for strings, your O(m+n) algorithm
in strstr() would be a more effective replacement. please.

who at the FSF deals with permissions?

not sure if this applies: it's for "small alphabets" (as in DNA):
http://www.cs.utexas.edu/users/moore/publications/sustik-moore.pdf

the US Department of Commerce regulates hashes I think such as Rabin-Karp above
a certain number of bits. when you start getting to char16_t and char32_t that
hash number of bits due to multiplies has already reached 32 bits. sigh. this
is why computers are slow.  I don't know if strstr is in this category. 

maybe std::string.find() should be left the way it is.

maybe some unit-testing on that small-alphabet thing would be interesting, but
I am unsure if it would have the same memory-use problem for char32_t as
Boyer-Moore.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/62318] optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm
  2014-08-31  6:48 [Bug libstdc++/62318] New: optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm jmichae3 at yahoo dot com
                   ` (4 preceding siblings ...)
  2014-09-21  8:47 ` jmichae3 at yahoo dot com
@ 2014-10-29 18:33 ` redi at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2014-10-29 18:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62318

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |5.0

--- Comment #8 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #4)
> See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3905.html which
> proposes adding it to the C++ standard library as a separate API. The new
> components are part of the Library Fundamentals TS:
> http://cplusplus.github.io/fundamentals-ts/main.html#func
> 
> That part of the TS has not yet been implemented in libstdc++.

This is now implemented on trunk.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-10-29 18:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-31  6:48 [Bug libstdc++/62318] New: optimization of string searches using Nigel Horspool's Boyer-Moore-Horpool algorithm jmichae3 at yahoo dot com
2014-08-31  6:50 ` [Bug libstdc++/62318] " jmichae3 at yahoo dot com
2014-08-31  7:37 ` schwab@linux-m68k.org
2014-08-31 12:32 ` manu at gcc dot gnu.org
2014-09-01 15:34 ` joseph at codesourcery dot com
2014-09-21  8:47 ` jmichae3 at yahoo dot com
2014-10-29 18:33 ` redi at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).