public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/78276] regex_search is slow
       [not found] <bug-78276-4@http.gcc.gnu.org/bugzilla/>
@ 2023-10-17  1:40 ` jklowden at schemamania dot org
  2023-10-17  1:41 ` jklowden at schemamania dot org
  1 sibling, 0 replies; 2+ messages in thread
From: jklowden at schemamania dot org @ 2023-10-17  1:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78276

James K. Lowden <jklowden at schemamania dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jklowden at schemamania dot org

--- Comment #2 from James K. Lowden <jklowden at schemamania dot org> ---
Created attachment 56124
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56124&action=edit
test programs and input

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug libstdc++/78276] regex_search is slow
       [not found] <bug-78276-4@http.gcc.gnu.org/bugzilla/>
  2023-10-17  1:40 ` [Bug libstdc++/78276] regex_search is slow jklowden at schemamania dot org
@ 2023-10-17  1:41 ` jklowden at schemamania dot org
  1 sibling, 0 replies; 2+ messages in thread
From: jklowden at schemamania dot org @ 2023-10-17  1:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78276

--- Comment #3 from James K. Lowden <jklowden at schemamania dot org> ---
Here is a nonpathological example taken from a real-world problem were
std::regex_search fails.  

This pattern is part of the COBOL COPY text-manipulation directive: 

([[:space:]]+(LEADING|TRAILING))?[[:space:]]+("((["]{2}|[^"])*)"|'(([']{2}|[^'])*)[']|([[:alnum:]]+([_-]+[[:alnum:]]+)*)|==((=?[^=]+)+)==)[[:space:]]+BY[[:space:]]+(("(["]{2}|[^"])*")|('([']{2}|[^'])*')|([[:alnum:]]+([_-]+[[:alnum:]]+)*)|==((=?[^=]+)*)==)([[:space:]]*[.])?

That pattern has 21 captures.  Ignoring the optional LEADING/TRAILING clause,
it accepts 1 of 3 operands on either side of the BY keyword: 

1.  a quoted string using the " double-quote
2.  a quoted string using the ' single-quote
3.  an identifier consisting of alphanumerics with hyphens or underscores

Quoted strings in this syntax may include embedded quotes by doubling them. 

By "fails", I mean "does not terminate" in a reasonable time.  Using gdb I have
seen over 1900 stack frames inside std::regex_search.  This is with gcc 11 on
Linux.  

I have recast the program using awk and regex(3) from the C standard library,
both of which succeed instantly.  I attach a tarball that includes all three
files, the input, and a Makefile to demonstrate them.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-10-17  1:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-78276-4@http.gcc.gnu.org/bugzilla/>
2023-10-17  1:40 ` [Bug libstdc++/78276] regex_search is slow jklowden at schemamania dot org
2023-10-17  1:41 ` jklowden at schemamania dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).