public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior
@ 2014-11-18 19:25 Tim Shen
  2014-11-18 19:29 ` Paolo Carlini
  2014-11-24 11:50 ` Jonathan Wakely
  0 siblings, 2 replies; 10+ messages in thread
From: Tim Shen @ 2014-11-18 19:25 UTC (permalink / raw)
  To: libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 58 bytes --]

Bootstrapped and tested.

Thanks!


-- 
Regards,
Tim Shen

[-- Attachment #2: a.diff --]
[-- Type: text/plain, Size: 2815 bytes --]

commit f17155183b9ae1283d04f3bbdb61d05d9279ebe4
Author: timshen <timshen@google.com>
Date:   Tue Nov 18 00:07:28 2014 -0800

    	PR libstdc++/63920
    	* include/bits/regex_executor.h: Make _M_begin non const.
    	* include/bits/regex_executor.tcc (_Executor<>::_M_search): Increase
    	_M_begin in search algorithm, so that _M_begin is treated as
    	"current start position" for each search iteration.
    	* testsuite/28_regex/algorithms/regex_search/ecma/flags.cc: New
    	testcase.

diff --git a/libstdc++-v3/include/bits/regex_executor.h b/libstdc++-v3/include/bits/regex_executor.h
index b26992c..2d7796f 100644
--- a/libstdc++-v3/include/bits/regex_executor.h
+++ b/libstdc++-v3/include/bits/regex_executor.h
@@ -205,7 +205,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     public:
       _ResultsVec                                           _M_cur_results;
       _BiIter                                               _M_current;
-      const _BiIter                                         _M_begin;
+      _BiIter                                               _M_begin;
       const _BiIter                                         _M_end;
       const _RegexT&                                        _M_re;
       const _NFAT&                                          _M_nfa;
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index 38d4781..a973667 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -39,17 +39,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
     _M_search()
     {
+      if (_M_search_from_first())
+	return true;
       if (_M_flags & regex_constants::match_continuous)
-	return _M_search_from_first();
-      auto __cur = _M_begin;
-      do
+	return false;
+      _M_flags |= regex_constants::match_prev_avail;
+      while (_M_begin != _M_end)
 	{
-	  _M_current = __cur;
-	  if (_M_main(_Match_mode::_Prefix))
+	  ++_M_begin;
+	  if (_M_search_from_first())
 	    return true;
 	}
-      // Continue when __cur == _M_end
-      while (__cur++ != _M_end);
       return false;
     }
 
diff --git a/libstdc++-v3/testsuite/28_regex/algorithms/regex_search/ecma/flags.cc b/libstdc++-v3/testsuite/28_regex/algorithms/regex_search/ecma/flags.cc
index 45ad0f6..6b038ee 100644
--- a/libstdc++-v3/testsuite/28_regex/algorithms/regex_search/ecma/flags.cc
+++ b/libstdc++-v3/testsuite/28_regex/algorithms/regex_search/ecma/flags.cc
@@ -65,6 +65,8 @@ test01()
 			     regex_constants::match_prev_avail));
   VERIFY( regex_search_debug("ba"+1, regex("\\Ba"),
 			     regex_constants::match_prev_avail));
+  // PR libstdc++/63920
+  VERIFY(!regex_search_debug("a", regex("b*"), regex_constants::match_not_null));
 }
 
 int

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior
  2014-11-18 19:25 [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior Tim Shen
@ 2014-11-18 19:29 ` Paolo Carlini
  2014-11-19  8:54   ` Tim Shen
  2014-11-24 11:50 ` Jonathan Wakely
  1 sibling, 1 reply; 10+ messages in thread
From: Paolo Carlini @ 2014-11-18 19:29 UTC (permalink / raw)
  To: Tim Shen, libstdc++, gcc-patches

Hi,

On 11/18/2014 08:12 PM, Tim Shen wrote:
> Bootstrapped and tested.
Jonathan lately is following your work much better than me, but naively 
seems weird that _M_begin is non-const and _M_end is const, a different 
type anyway.

Paolo.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior
  2014-11-18 19:29 ` Paolo Carlini
@ 2014-11-19  8:54   ` Tim Shen
  2014-11-19 16:25     ` Paolo Carlini
  0 siblings, 1 reply; 10+ messages in thread
From: Tim Shen @ 2014-11-19  8:54 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: libstdc++, gcc-patches

On Tue, Nov 18, 2014 at 11:19 AM, Paolo Carlini
<paolo.carlini@oracle.com> wrote:
> Jonathan lately is following your work much better than me, but naively
> seems weird that _M_begin is non-const and _M_end is const, a different type
> anyway.

Hmm. The current regex_search algorithm is implemented as try match
starting from _M_begin; if it doesn't match, start over from
_M_begin+1, ...

As we can tell, _M_begin is never changed, and it's const.

The problem is when the executer reaches the accept state (which
indicates a match) we use _M_current == _M_begin to verify if it's an
empty match. It is possible that, when we are not in the first
iteration, say, in the second iteration actually, _M_current is
initialized with _M_begin+1. It turns out even _M_current has never
been increased (no chars are eaten, aka empty match), _M_current !=
_M_begin is still true.

This fix is making each regex_search iteration more thorough, with
increased _M_begin, as if it's a new regex _M_search_from_first.

I've carefully (admittedly, after sending this patch) inspect
everywhere when _M_begin is used. It turns out _M_begin is under
well-defined (the initial position of _M_current when current
iteration starts) invariants (see _Executor<>::_M_at_begin), indicated
by the use of regex_constants::match_prev_avail. This flag actually
implies that __begin iterator passed into regex_search is not always
"the physical boundary" who matches "^". Boost (and we) conforms this
behavior:

    std::regex_search("asdf", std::regex("^asdf"),
std::regex_constants::match_prev_avail)

returns false.

It's more elegant to move _Executor::_M_search out of its class and
make _M_begin still const, but _Executor costs too much to initialize.


-- 
Regards,
Tim Shen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior
  2014-11-19  8:54   ` Tim Shen
@ 2014-11-19 16:25     ` Paolo Carlini
  2014-11-19 18:54       ` Tim Shen
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Carlini @ 2014-11-19 16:25 UTC (permalink / raw)
  To: Tim Shen; +Cc: libstdc++, gcc-patches

Hi Tim,

On 11/19/2014 08:27 AM, Tim Shen wrote:
> On Tue, Nov 18, 2014 at 11:19 AM, Paolo Carlini
> <paolo.carlini@oracle.com> wrote:
>> Jonathan lately is following your work much better than me, but naively
>> seems weird that _M_begin is non-const and _M_end is const, a different type
>> anyway.
> Hmm. The current regex_search algorithm is implemented as try match
> starting from _M_begin; if it doesn't match, start over from
> _M_begin+1, ...
>
> As we can tell, _M_begin is never changed, and it's const.
>
> The problem is when the executer reaches the accept state (which
> indicates a match) we use _M_current == _M_begin to verify if it's an
> empty match. It is possible that, when we are not in the first
> iteration, say, in the second iteration actually, _M_current is
> initialized with _M_begin+1. It turns out even _M_current has never
> been increased (no chars are eaten, aka empty match), _M_current !=
> _M_begin is still true.
>
> This fix is making each regex_search iteration more thorough, with
> increased _M_begin, as if it's a new regex _M_search_from_first.
>
> I've carefully (admittedly, after sending this patch) inspect
> everywhere when _M_begin is used. It turns out _M_begin is under
> well-defined (the initial position of _M_current when current
> iteration starts) invariants (see _Executor<>::_M_at_begin), indicated
> by the use of regex_constants::match_prev_avail. This flag actually
> implies that __begin iterator passed into regex_search is not always
> "the physical boundary" who matches "^". Boost (and we) conforms this
> behavior:
>
>      std::regex_search("asdf", std::regex("^asdf"),
> std::regex_constants::match_prev_avail)
>
> returns false.
>
> It's more elegant to move _Executor::_M_search out of its class and
> make _M_begin still const, but _Executor costs too much to initialize.
Good. To be clear, not having carefully analyzed whatsoever, my point 
was more about changing _M_end too, to non-const, than about not 
touching _M_begin. Would that make sense?

Thanks,
Paolo.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior
  2014-11-19 16:25     ` Paolo Carlini
@ 2014-11-19 18:54       ` Tim Shen
  2014-11-19 19:03         ` Daniel Krügler
  0 siblings, 1 reply; 10+ messages in thread
From: Tim Shen @ 2014-11-19 18:54 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: libstdc++, gcc-patches

On Wed, Nov 19, 2014 at 8:16 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote:
> Good. To be clear, not having carefully analyzed whatsoever, my point was
> more about changing _M_end too, to non-const, than about not touching
> _M_begin. Would that make sense?

Currently we never mutate _M_end. I *believe* that we still won't in
the future, since _M_end is not as volatile as _M_begin.


-- 
Regards,
Tim Shen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior
  2014-11-19 18:54       ` Tim Shen
@ 2014-11-19 19:03         ` Daniel Krügler
  2014-11-19 19:05           ` Paolo Carlini
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Krügler @ 2014-11-19 19:03 UTC (permalink / raw)
  To: Tim Shen; +Cc: Paolo Carlini, libstdc++, gcc-patches

2014-11-19 19:42 GMT+01:00 Tim Shen <timshen@google.com>:
> On Wed, Nov 19, 2014 at 8:16 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote:
>> Good. To be clear, not having carefully analyzed whatsoever, my point was
>> more about changing _M_end too, to non-const, than about not touching
>> _M_begin. Would that make sense?
>
> Currently we never mutate _M_end. I *believe* that we still won't in
> the future, since _M_end is not as volatile as _M_begin.

I agree with Tim here, why shouldn't the const member and the
non-const member coexist?

- Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior
  2014-11-19 19:03         ` Daniel Krügler
@ 2014-11-19 19:05           ` Paolo Carlini
  2014-11-19 19:52             ` Tim Shen
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Carlini @ 2014-11-19 19:05 UTC (permalink / raw)
  To: Daniel Krügler, Tim Shen; +Cc: libstdc++, gcc-patches

Hi,

On 11/19/2014 07:43 PM, Daniel Krügler wrote:
> 2014-11-19 19:42 GMT+01:00 Tim Shen <timshen@google.com>:
>> On Wed, Nov 19, 2014 at 8:16 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote:
>>> Good. To be clear, not having carefully analyzed whatsoever, my point was
>>> more about changing _M_end too, to non-const, than about not touching
>>> _M_begin. Would that make sense?
>> Currently we never mutate _M_end. I *believe* that we still won't in
>> the future, since _M_end is not as volatile as _M_begin.
> I agree with Tim here, why shouldn't the const member and the
> non-const member coexist?
I was just aiming for consistency, from a very, very, general point of 
view. Jon will review the substance of the patch, anyway.

Thanks!
Paolo.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior
  2014-11-19 19:05           ` Paolo Carlini
@ 2014-11-19 19:52             ` Tim Shen
  0 siblings, 0 replies; 10+ messages in thread
From: Tim Shen @ 2014-11-19 19:52 UTC (permalink / raw)
  To: Paolo Carlini; +Cc: Daniel Krügler, libstdc++, gcc-patches

On Wed, Nov 19, 2014 at 10:54 AM, Paolo Carlini
<paolo.carlini@oracle.com> wrote:
> I was just aiming for consistency, from a very, very, general point of view.

Not a problem. Thank you actually, since it's obviously good for us :)


-- 
Regards,
Tim Shen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior
  2014-11-18 19:25 [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior Tim Shen
  2014-11-18 19:29 ` Paolo Carlini
@ 2014-11-24 11:50 ` Jonathan Wakely
  2014-11-25  8:19   ` Tim Shen
  1 sibling, 1 reply; 10+ messages in thread
From: Jonathan Wakely @ 2014-11-24 11:50 UTC (permalink / raw)
  To: Tim Shen; +Cc: libstdc++, gcc-patches

On 18/11/14 11:12 -0800, Tim Shen wrote:
>Bootstrapped and tested.
>
>Thanks!
>
>
>-- 
>Regards,
>Tim Shen

>commit f17155183b9ae1283d04f3bbdb61d05d9279ebe4
>Author: timshen <timshen@google.com>
>Date:   Tue Nov 18 00:07:28 2014 -0800
>
>    	PR libstdc++/63920
>    	* include/bits/regex_executor.h: Make _M_begin non const.
>    	* include/bits/regex_executor.tcc (_Executor<>::_M_search): Increase
>    	_M_begin in search algorithm, so that _M_begin is treated as
>    	"current start position" for each search iteration.
>    	* testsuite/28_regex/algorithms/regex_search/ecma/flags.cc: New
>    	testcase.

OK for trunk - thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior
  2014-11-24 11:50 ` Jonathan Wakely
@ 2014-11-25  8:19   ` Tim Shen
  0 siblings, 0 replies; 10+ messages in thread
From: Tim Shen @ 2014-11-25  8:19 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches

On Mon, Nov 24, 2014 at 3:17 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
> OK for trunk - thanks.

Committed. :)

Thanks!


-- 
Regards,
Tim Shen

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-11-25  8:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-18 19:25 [Patch, libstdc++/63920] Fix regex_constants::match_not_null behavior Tim Shen
2014-11-18 19:29 ` Paolo Carlini
2014-11-19  8:54   ` Tim Shen
2014-11-19 16:25     ` Paolo Carlini
2014-11-19 18:54       ` Tim Shen
2014-11-19 19:03         ` Daniel Krügler
2014-11-19 19:05           ` Paolo Carlini
2014-11-19 19:52             ` Tim Shen
2014-11-24 11:50 ` Jonathan Wakely
2014-11-25  8:19   ` Tim Shen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).