public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch, PR 61061] Add state limit for regex NFA
@ 2014-06-27  3:56 Tim Shen
  2014-06-27  7:41 ` Paolo Carlini
  0 siblings, 1 reply; 7+ messages in thread
From: Tim Shen @ 2014-06-27  3:56 UTC (permalink / raw)
  To: libstdc++, gcc-patches

The limit can be customized by defining a macro
_GLIBCXX_REGEX_STATE_LIMIT. The default value is 100000.

The testcase can be handled if we optimize consecutive quantifiers
(collapse them to one). But cases like "(a{100}b){100}" can't be
handled still.

We implement range quantifier "(foo){n}" by copying state sequence
(foo) n-1 times. That consumes more space. We may reimplement it (by
adding a new _S_op*) someday.

Bootstrapped and tested.

Thanks!

-- 
Regards,
Tim Shen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch, PR 61061] Add state limit for regex NFA
  2014-06-27  3:56 [Patch, PR 61061] Add state limit for regex NFA Tim Shen
@ 2014-06-27  7:41 ` Paolo Carlini
  2014-06-27 16:53   ` Tim Shen
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Carlini @ 2014-06-27  7:41 UTC (permalink / raw)
  To: Tim Shen, libstdc++, gcc-patches

Hi,

On 06/27/2014 05:56 AM, Tim Shen wrote:
> The limit can be customized by defining a macro
> _GLIBCXX_REGEX_STATE_LIMIT. The default value is 100000.
>
> The testcase can be handled if we optimize consecutive quantifiers
> (collapse them to one). But cases like "(a{100}b){100}" can't be
> handled still.
>
> We implement range quantifier "(foo){n}" by copying state sequence
> (foo) n-1 times. That consumes more space. We may reimplement it (by
> adding a new _S_op*) someday.
>
> Bootstrapped and tested.
The actual patch is missing.. ;)

Paolo.

PS: sorry for being distracted by other issues: what happened to the 
other regex issue? I think we are simply going to apply, when ready, 
your more complete fix, right?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch, PR 61061] Add state limit for regex NFA
  2014-06-27  7:41 ` Paolo Carlini
@ 2014-06-27 16:53   ` Tim Shen
  2014-06-27 17:21     ` Paolo Carlini
  2014-06-28  9:48     ` Jonathan Wakely
  0 siblings, 2 replies; 7+ messages in thread
From: Tim Shen @ 2014-06-27 16:53 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 656 bytes --]

On Fri, Jun 27, 2014 at 12:37 AM, Paolo Carlini
<paolo.carlini@oracle.com> wrote:
> The actual patch is missing.. ;)

Sigh...every time. Sorry.

> PS: sorry for being distracted by other issues: what happened to the other
> regex issue? I think we are simply going to apply, when ready, your more
> complete fix, right?

The problem given in the other PR (PR 61582) is also solved by this
patch (but I forgot to mention that); They are all examples like
nested range quantifiers.

PR 61424's patch is ready; I was waiting for some "Ok go ahead", but
now I realize that I need to send the complete patch first :) I'll
send it later.


-- 
Regards,
Tim Shen

[-- Attachment #2: a.diff --]
[-- Type: text/plain, Size: 5420 bytes --]

commit 78a8e40d53f3083f5217b30908fda9442e0a92c5
Author: Tim Shen <timshen@google.com>
Date:   Thu Jun 26 20:35:33 2014 -0700

    	PR libstdc++/61061
    	PR libstdc++/61582
    	* include/bits/regex_automaton.h (_NFA<>::_M_insert_state): Add
    	a NFA state limit. If it's exceeded, regex_constants::error_space
    	will be throwed.
    	* include/bits/regex_automaton.tcc (_StateSeq<>::_M_clone): Use
    	map (which is sparse) instead of vector. This reduce n times clones'
    	cost from O(n^2) to O(n).
    	* include/std/regex: Add map dependency.
    	* testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc: New
    	testcase.

diff --git a/libstdc++-v3/include/bits/regex_automaton.h b/libstdc++-v3/include/bits/regex_automaton.h
index 1b0da14..27ec671 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -28,6 +28,11 @@
  *  Do not attempt to use it directly. @headername{regex}
  */
 
+// This macro defines the maximal state number a NFA can have.
+#ifndef _GLIBCXX_REGEX_STATE_LIMIT
+#define _GLIBCXX_REGEX_STATE_LIMIT 100000
+#endif
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 namespace __detail
@@ -254,6 +259,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _M_insert_state(_StateT __s)
       {
 	this->push_back(std::move(__s));
+	if (this->size() > _GLIBCXX_REGEX_STATE_LIMIT)
+	  __throw_regex_error(regex_constants::error_space);
 	return this->size()-1;
       }
 
diff --git a/libstdc++-v3/include/bits/regex_automaton.tcc b/libstdc++-v3/include/bits/regex_automaton.tcc
index 1c6cfdd..08d271e 100644
--- a/libstdc++-v3/include/bits/regex_automaton.tcc
+++ b/libstdc++-v3/include/bits/regex_automaton.tcc
@@ -189,7 +189,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     _StateSeq<_TraitsT>
     _StateSeq<_TraitsT>::_M_clone()
     {
-      std::vector<_StateIdT> __m(_M_nfa.size(), -1);
+      std::map<_StateIdT, _StateIdT> __m;
       std::stack<_StateIdT> __stack;
       __stack.push(_M_start);
       while (!__stack.empty())
@@ -203,21 +203,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  if (__dup._M_opcode == _S_opcode_alternative
 	      || __dup._M_opcode == _S_opcode_repeat
 	      || __dup._M_opcode == _S_opcode_subexpr_lookahead)
-	    if (__dup._M_alt != _S_invalid_state_id && __m[__dup._M_alt] == -1)
+	    if (__dup._M_alt != _S_invalid_state_id
+		&& __m.count(__dup._M_alt) == 0)
 	      __stack.push(__dup._M_alt);
 	  if (__u == _M_end)
 	    continue;
-	  if (__dup._M_next != _S_invalid_state_id && __m[__dup._M_next] == -1)
+	  if (__dup._M_next != _S_invalid_state_id
+	      && __m.count(__dup._M_next) == 0)
 	    __stack.push(__dup._M_next);
 	}
-      for (auto __v : __m)
+      for (auto __it : __m)
 	{
-	  if (__v == -1)
-	    continue;
+	  auto __v = __it.second;
 	  auto& __ref = _M_nfa[__v];
 	  if (__ref._M_next != _S_invalid_state_id)
 	    {
-	      _GLIBCXX_DEBUG_ASSERT(__m[__ref._M_next] != -1);
+	      _GLIBCXX_DEBUG_ASSERT(__m.count(__ref._M_next) > 0);
 	      __ref._M_next = __m[__ref._M_next];
 	    }
 	  if (__ref._M_opcode == _S_opcode_alternative
@@ -225,7 +226,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	      || __ref._M_opcode == _S_opcode_subexpr_lookahead)
 	    if (__ref._M_alt != _S_invalid_state_id)
 	      {
-		_GLIBCXX_DEBUG_ASSERT(__m[__ref._M_alt] != -1);
+		_GLIBCXX_DEBUG_ASSERT(__m.count(__ref._M_alt) > 0);
 		__ref._M_alt = __m[__ref._M_alt];
 	      }
 	}
diff --git a/libstdc++-v3/include/std/regex b/libstdc++-v3/include/std/regex
index 9161f48..470772af 100644
--- a/libstdc++-v3/include/std/regex
+++ b/libstdc++-v3/include/std/regex
@@ -50,6 +50,7 @@
 #include <string>
 #include <utility>
 #include <vector>
+#include <map>
 #include <cstring>
 
 #include <bits/regex_constants.h>
diff --git a/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc b/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc
new file mode 100644
index 0000000..88efab5
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/61601.cc
@@ -0,0 +1,48 @@
+// { dg-options "-std=gnu++11" }
+
+// Copyright (C) 2013-2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+// 28.11.2 regex_match
+
+#include <regex>
+#include <testsuite_hooks.h>
+#include <testsuite_regex.h>
+
+using namespace __gnu_test;
+using namespace std;
+
+// libstdc++/61601
+void
+test01()
+{
+  try
+  {
+    regex r("((.*)$1{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100}{100})");
+  }
+  catch (const regex_error& e)
+  {
+    VERIFY(e.code() == regex_constants::error_space);
+  }
+}
+
+int
+main()
+{
+  test01();
+  return 0;
+}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch, PR 61061] Add state limit for regex NFA
  2014-06-27 16:53   ` Tim Shen
@ 2014-06-27 17:21     ` Paolo Carlini
  2014-06-28  9:48     ` Jonathan Wakely
  1 sibling, 0 replies; 7+ messages in thread
From: Paolo Carlini @ 2014-06-27 17:21 UTC (permalink / raw)
  To: Tim Shen; +Cc: libstdc++, gcc-patches

Hi,

On 06/27/2014 06:53 PM, Tim Shen wrote:
>> PS: sorry for being distracted by other issues: what happened to the other
>> regex issue? I think we are simply going to apply, when ready, your more
>> complete fix, right?
> The problem given in the other PR (PR 61582) is also solved by this
> patch (but I forgot to mention that); They are all examples like
> nested range quantifiers.
Good, thanks. I'll let Jon properly reviewing the patch, he followed 
your recent work more closely than me.
> PR 61424's patch is ready; I was waiting for some "Ok go ahead", but
> now I realize that I need to send the complete patch first :) I'll
> send it later.
Yes, I meant that one, was too lazy to look it up ;) Thanks in advance, 
anyway!

Paolo.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch, PR 61061] Add state limit for regex NFA
  2014-06-27 16:53   ` Tim Shen
  2014-06-27 17:21     ` Paolo Carlini
@ 2014-06-28  9:48     ` Jonathan Wakely
  2014-06-28 16:56       ` Tim Shen
  2014-07-01 16:43       ` Tim Shen
  1 sibling, 2 replies; 7+ messages in thread
From: Jonathan Wakely @ 2014-06-28  9:48 UTC (permalink / raw)
  To: Tim Shen; +Cc: Paolo Carlini, libstdc++, gcc-patches

On 27/06/14 09:53 -0700, Tim Shen wrote:
>On Fri, Jun 27, 2014 at 12:37 AM, Paolo Carlini
><paolo.carlini@oracle.com> wrote:
>> The actual patch is missing.. ;)
>
>Sigh...every time. Sorry.
>
>> PS: sorry for being distracted by other issues: what happened to the other
>> regex issue? I think we are simply going to apply, when ready, your more
>> complete fix, right?
>
>The problem given in the other PR (PR 61582) is also solved by this
>patch (but I forgot to mention that); They are all examples like
>nested range quantifiers.

OK for trunk, thanks.

I wonder if it would be better to use a sorted
vector<pair<_StateIdT,_StateIdT>> instead of a map, for improved
memory footprint and runtime speed, but that can be changed later.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch, PR 61061] Add state limit for regex NFA
  2014-06-28  9:48     ` Jonathan Wakely
@ 2014-06-28 16:56       ` Tim Shen
  2014-07-01 16:43       ` Tim Shen
  1 sibling, 0 replies; 7+ messages in thread
From: Tim Shen @ 2014-06-28 16:56 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Paolo Carlini, libstdc++, gcc-patches

On Sat, Jun 28, 2014 at 2:48 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
> I wonder if it would be better to use a sorted
> vector<pair<_StateIdT,_StateIdT>> instead of a map, for improved
> memory footprint and runtime speed, but that can be changed later.

In this case, we keep inserting (__m[__u] = __id) and looking up
(__m.count()) the container in a crossover order, which required a map
to minimize both's complexity.


-- 
Regards,
Tim Shen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch, PR 61061] Add state limit for regex NFA
  2014-06-28  9:48     ` Jonathan Wakely
  2014-06-28 16:56       ` Tim Shen
@ 2014-07-01 16:43       ` Tim Shen
  1 sibling, 0 replies; 7+ messages in thread
From: Tim Shen @ 2014-07-01 16:43 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Paolo Carlini, libstdc++, gcc-patches

On Sat, Jun 28, 2014 at 2:48 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
> OK for trunk, thanks.

Committed.

Thanks!


-- 
Regards,
Tim Shen

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-07-01 16:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-27  3:56 [Patch, PR 61061] Add state limit for regex NFA Tim Shen
2014-06-27  7:41 ` Paolo Carlini
2014-06-27 16:53   ` Tim Shen
2014-06-27 17:21     ` Paolo Carlini
2014-06-28  9:48     ` Jonathan Wakely
2014-06-28 16:56       ` Tim Shen
2014-07-01 16:43       ` Tim Shen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).