public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/78276] regex_search is slow
[not found] <bug-78276-4@http.gcc.gnu.org/bugzilla/>
@ 2023-10-17 1:40 ` jklowden at schemamania dot org
2023-10-17 1:41 ` jklowden at schemamania dot org
1 sibling, 0 replies; 2+ messages in thread
From: jklowden at schemamania dot org @ 2023-10-17 1:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78276
James K. Lowden <jklowden at schemamania dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jklowden at schemamania dot org
--- Comment #2 from James K. Lowden <jklowden at schemamania dot org> ---
Created attachment 56124
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56124&action=edit
test programs and input
^ permalink raw reply [flat|nested] 2+ messages in thread
* [Bug libstdc++/78276] regex_search is slow
[not found] <bug-78276-4@http.gcc.gnu.org/bugzilla/>
2023-10-17 1:40 ` [Bug libstdc++/78276] regex_search is slow jklowden at schemamania dot org
@ 2023-10-17 1:41 ` jklowden at schemamania dot org
1 sibling, 0 replies; 2+ messages in thread
From: jklowden at schemamania dot org @ 2023-10-17 1:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78276
--- Comment #3 from James K. Lowden <jklowden at schemamania dot org> ---
Here is a nonpathological example taken from a real-world problem were
std::regex_search fails.
This pattern is part of the COBOL COPY text-manipulation directive:
([[:space:]]+(LEADING|TRAILING))?[[:space:]]+("((["]{2}|[^"])*)"|'(([']{2}|[^'])*)[']|([[:alnum:]]+([_-]+[[:alnum:]]+)*)|==((=?[^=]+)+)==)[[:space:]]+BY[[:space:]]+(("(["]{2}|[^"])*")|('([']{2}|[^'])*')|([[:alnum:]]+([_-]+[[:alnum:]]+)*)|==((=?[^=]+)*)==)([[:space:]]*[.])?
That pattern has 21 captures. Ignoring the optional LEADING/TRAILING clause,
it accepts 1 of 3 operands on either side of the BY keyword:
1. a quoted string using the " double-quote
2. a quoted string using the ' single-quote
3. an identifier consisting of alphanumerics with hyphens or underscores
Quoted strings in this syntax may include embedded quotes by doubling them.
By "fails", I mean "does not terminate" in a reasonable time. Using gdb I have
seen over 1900 stack frames inside std::regex_search. This is with gcc 11 on
Linux.
I have recast the program using awk and regex(3) from the C standard library,
both of which succeed instantly. I attach a tarball that includes all three
files, the input, and a Makefile to demonstrate them.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-10-17 1:41 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-78276-4@http.gcc.gnu.org/bugzilla/>
2023-10-17 1:40 ` [Bug libstdc++/78276] regex_search is slow jklowden at schemamania dot org
2023-10-17 1:41 ` jklowden at schemamania dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).