public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [Patch] Small refactor on <regex> _State<>
       [not found] <CAG4ZjNnYLoCYnjNpkJYHa2Q-ux0KnKmVXBDUOa33xk29MzeTKA@mail.gmail.com>
@ 2015-07-25  8:05 ` Tim Shen
       [not found] ` <20150725153121.GW21787@redhat.com>
  1 sibling, 0 replies; 8+ messages in thread
From: Tim Shen @ 2015-07-25  8:05 UTC (permalink / raw)
  To: libstdc++, gcc-patches

On Sat, Jul 25, 2015 at 12:11 AM, Tim Shen <timshen@google.com> wrote:
> It's not a very necessary refactoring, but simply can't resist. :)
>
> I'm not sure of the ::memcpy calls. It looks not very idiomatic, but
> std::copy on char* looks even more weird? :/
>
> Bootstrapped and tested.
>
> Thanks!
>
>
> --
> Regards,
> Tim Shen



-- 
Regards,
Tim Shen

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch] Small refactor on <regex> _State<>
       [not found]   ` <CAG4ZjNnvm=QAwihOUvMLtOezYprtasaJOY3i-MhJV0Tf4z5mpg@mail.gmail.com>
@ 2015-07-28 15:42     ` Jonathan Wakely
  2015-07-29  8:43       ` Jonathan Wakely
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Wakely @ 2015-07-28 15:42 UTC (permalink / raw)
  To: Tim Shen; +Cc: libstdc++, gcc-patches

On 26/07/15 13:38 -0700, Tim Shen wrote:
>On Sat, Jul 25, 2015 at 8:31 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
>> On 25/07/15 00:11 -0700, Tim Shen wrote:
>>>
>>> It's not a very necessary refactoring, but simply can't resist. :)
>>>
>>> I'm not sure of the ::memcpy calls. It looks not very idiomatic, but
>>> std::copy on char* looks even more weird? :/
>>
>>
>> The ::memcpy(this, &rhs, sizeof(rhs))...) makes me quite
>> uncomfortable, because _State is not trivially-copyable, although its
>> _State_base base class _is_ trivially-copyable, and is at the same
>> address and has the same size ... so I think it's safe.
>>
>> But couldn't you replace that memcpy with an assignment?
>>
>>    _State_base::operator=(__rhs);
>
>Done.
>
>> The implicitly defined assignment operator should do the same thing as
>> a memcpy.
>>
>> _State should have a deleted copy assignment operator though (or a
>> user-provided one that correctly handles the _S_opcode_match case, but
>> since it's not needed it should just be deleted).
>
>Actually it's needed in _StateSeq::_M_clone, but I defined a normal
>member function _State::_M_clone to avoid unexpected copying.

But that's a copy construction, I'm talking about assignment.

The copy constructor is fine, you've defined it and it does the right
thing, so I don't think making it private and defining _M_clone() is
useful. Just copying is safe and correct. If there's an unwanted copy
we just get a slight performance hit, but not a disaster.

What I'm concerned about is assignment. You haven't defined an
assignment operator. If there's an unwanted assignment we could get
undefined behaviour. Please delete the assignment operator if it's not
needed.

The private default constructor doesn't seem to be used, so that can
go, and I would get rid of _M_clone() and make the copy constructor
public again, although I'm not going to insist on that if you really
prefer to add _M_clone.


Also, what are the alignment requirements on the _Matcher<> objects?
Is it definitely safe to store them in the _M_matcher_storage buffer?

You could use alignas(_Matcher<char>) to be sure (using char there
should be OK because std::function<bool(char)> has the same alignment
as std::function<bool(wchar_t)> or any other _Matcher<C> type).


>-- 
>Regards,
>Tim Shen

>diff --git a/libstdc++-v3/include/bits/regex_automaton.h b/libstdc++-v3/include/bits/regex_automaton.h
>index fc0eb41..c9f7bb3 100644
>--- a/libstdc++-v3/include/bits/regex_automaton.h
>+++ b/libstdc++-v3/include/bits/regex_automaton.h
>@@ -70,51 +70,115 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       _S_opcode_accept,
>   };
>
>-  struct _State_base
>-  {
>-    _Opcode      _M_opcode;           // type of outgoing transition
>-    _StateIdT    _M_next;             // outgoing transition
>-    union // Since they are mutually exclusive.
>+  template<size_t _Matcher_size>
>+    struct _State_base
>     {
>-      size_t _M_subexpr;        // for _S_opcode_subexpr_*
>-      size_t _M_backref_index;  // for _S_opcode_backref
>-      struct
>+    protected:
>+      _Opcode      _M_opcode;           // type of outgoing transition
>+
>+    public:
>+      _StateIdT    _M_next;             // outgoing transition
>+      union // Since they are mutually exclusive.
>       {
>-	// for _S_opcode_alternative, _S_opcode_repeat and
>-	// _S_opcode_subexpr_lookahead
>-	_StateIdT  _M_alt;
>-	// for _S_opcode_word_boundary or _S_opcode_subexpr_lookahead or
>-	// quantifiers (ungreedy if set true)
>-	bool       _M_neg;
>+	size_t _M_subexpr;        // for _S_opcode_subexpr_*
>+	size_t _M_backref_index;  // for _S_opcode_backref
>+	struct
>+	{
>+	  // for _S_opcode_alternative, _S_opcode_repeat and
>+	  // _S_opcode_subexpr_lookahead
>+	  _StateIdT  _M_alt;
>+	  // for _S_opcode_word_boundary or _S_opcode_subexpr_lookahead or
>+	  // quantifiers (ungreedy if set true)
>+	  bool       _M_neg;
>+	};
>+	char _M_matcher_storage[_Matcher_size];        // for _S_opcode_match
>       };
>-    };
>
>-    explicit _State_base(_Opcode __opcode)
>-    : _M_opcode(__opcode), _M_next(_S_invalid_state_id)
>-    { }
>+    protected:
>+      _State_base() : _M_opcode(_S_opcode_unknown) { }
>
>-  protected:
>-    ~_State_base() = default;
>+      explicit _State_base(_Opcode __opcode)
>+      : _M_opcode(__opcode), _M_next(_S_invalid_state_id)
>+      { }
>+
>+    public:
>+      bool
>+      _M_has_alt()
>+      {
>+	return _M_opcode == _S_opcode_alternative
>+	  || _M_opcode == _S_opcode_repeat
>+	  || _M_opcode == _S_opcode_subexpr_lookahead;
>+      }
>
>-  public:
> #ifdef _GLIBCXX_DEBUG
>-    std::ostream&
>-    _M_print(std::ostream& ostr) const;
>+      std::ostream&
>+      _M_print(std::ostream& ostr) const;
>
>-    // Prints graphviz dot commands for state.
>-    std::ostream&
>-    _M_dot(std::ostream& __ostr, _StateIdT __id) const;
>+      // Prints graphviz dot commands for state.
>+      std::ostream&
>+      _M_dot(std::ostream& __ostr, _StateIdT __id) const;
> #endif
>-  };
>+    };
>
>-  template<typename _TraitsT>
>-    struct _State : _State_base
>+  template<typename _Char_type>
>+    struct _State : _State_base<sizeof(_Matcher<_Char_type>)>
>     {
>-      typedef _Matcher<typename _TraitsT::char_type> _MatcherT;
>+      typedef _Matcher<_Char_type> _MatcherT;
>+      typedef _State_base<sizeof(_MatcherT)> _Base_type;
>
>-      _MatcherT      _M_matches;        // for _S_opcode_match
>+      explicit
>+      _State(_Opcode __opcode) : _Base_type(__opcode)
>+      {
>+	if (_M_opcode() == _S_opcode_match)
>+	  new (this->_M_matcher_storage) _MatcherT();
>+      }
>+
>+      _State(_State&& __rhs)
>+      {
>+	_Base_type::operator=(__rhs);
>+	if (__rhs._M_opcode() == _S_opcode_match)
>+	  new (this->_M_matcher_storage)
>+	    _MatcherT(std::move(__rhs._M_get_matcher()));
>+      }
>+
>+      ~_State()
>+      {
>+	if (_M_opcode() == _S_opcode_match)
>+	  _M_get_matcher().~_MatcherT();
>+      }
>
>-      explicit _State(_Opcode __opcode) : _State_base(__opcode) { }
>+      // Since correct ctor and dtor rely on _M_opcode, it's better not to
>+      // change it over time.
>+      _Opcode
>+      _M_opcode() const
>+      { return _Base_type::_M_opcode; }
>+
>+      bool
>+      _M_matches(_Char_type __char) const
>+      { return _M_get_matcher()(__char); }
>+
>+      _MatcherT&
>+      _M_get_matcher()
>+      { return *reinterpret_cast<_MatcherT*>(this->_M_matcher_storage); }
>+
>+      const _MatcherT&
>+      _M_get_matcher() const
>+      { return *reinterpret_cast<const _MatcherT*>(this->_M_matcher_storage); }
>+
>+      _State
>+      _M_clone() const
>+      { return *this; }
>+
>+    private:
>+      _State() = default;
>+
>+      _State(const _State& __rhs)
>+      {
>+	_Base_type::operator=(__rhs);
>+	if (__rhs._M_opcode() == _S_opcode_match)
>+	  new (this->_M_matcher_storage)
>+	    _MatcherT(__rhs._M_get_matcher());
>+      }
>     };
>
>   struct _NFA_base
>@@ -155,10 +219,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>   template<typename _TraitsT>
>     struct _NFA
>-    : _NFA_base, std::vector<_State<_TraitsT>>
>+    : _NFA_base, std::vector<_State<typename _TraitsT::char_type>>
>     {
>-      typedef _State<_TraitsT>				_StateT;
>-      typedef _Matcher<typename _TraitsT::char_type>	_MatcherT;
>+      typedef typename _TraitsT::char_type	_Char_type;
>+      typedef _State<_Char_type>		_StateT;
>+      typedef _Matcher<_Char_type>		_MatcherT;
>
>       _NFA(const typename _TraitsT::locale_type& __loc, _FlagT __flags)
>       : _NFA_base(__flags)
>@@ -202,7 +267,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       _M_insert_matcher(_MatcherT __m)
>       {
> 	_StateT __tmp(_S_opcode_match);
>-	__tmp._M_matches = std::move(__m);
>+	__tmp._M_get_matcher() = std::move(__m);
> 	return _M_insert_state(std::move(__tmp));
>       }
>
>diff --git a/libstdc++-v3/include/bits/regex_automaton.tcc b/libstdc++-v3/include/bits/regex_automaton.tcc
>index 72fe978..0f5187c 100644
>--- a/libstdc++-v3/include/bits/regex_automaton.tcc
>+++ b/libstdc++-v3/include/bits/regex_automaton.tcc
>@@ -35,101 +35,103 @@ namespace __detail
> _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
> #ifdef _GLIBCXX_DEBUG
>-  inline std::ostream&
>-  _State_base::_M_print(std::ostream& ostr) const
>-  {
>-    switch (_M_opcode)
>+  template<size_t _Matcher_size>
>+    std::ostream&
>+    _State_base<_Matcher_size>::_M_print(std::ostream& ostr) const
>     {
>-      case _S_opcode_alternative:
>-      case _S_opcode_repeat:
>-	ostr << "alt next=" << _M_next << " alt=" << _M_alt;
>-	break;
>-      case _S_opcode_subexpr_begin:
>-	ostr << "subexpr begin next=" << _M_next << " index=" << _M_subexpr;
>-	break;
>-      case _S_opcode_subexpr_end:
>-	ostr << "subexpr end next=" << _M_next << " index=" << _M_subexpr;
>-	break;
>-      case _S_opcode_backref:
>-	ostr << "backref next=" << _M_next << " index=" << _M_backref_index;
>-	break;
>-      case _S_opcode_match:
>-	ostr << "match next=" << _M_next;
>-	break;
>-      case _S_opcode_accept:
>-	ostr << "accept next=" << _M_next;
>-	break;
>-      default:
>-	ostr << "unknown next=" << _M_next;
>-	break;
>+      switch (_M_opcode)
>+      {
>+	case _S_opcode_alternative:
>+	case _S_opcode_repeat:
>+	  ostr << "alt next=" << _M_next << " alt=" << _M_alt;
>+	  break;
>+	case _S_opcode_subexpr_begin:
>+	  ostr << "subexpr begin next=" << _M_next << " index=" << _M_subexpr;
>+	  break;
>+	case _S_opcode_subexpr_end:
>+	  ostr << "subexpr end next=" << _M_next << " index=" << _M_subexpr;
>+	  break;
>+	case _S_opcode_backref:
>+	  ostr << "backref next=" << _M_next << " index=" << _M_backref_index;
>+	  break;
>+	case _S_opcode_match:
>+	  ostr << "match next=" << _M_next;
>+	  break;
>+	case _S_opcode_accept:
>+	  ostr << "accept next=" << _M_next;
>+	  break;
>+	default:
>+	  ostr << "unknown next=" << _M_next;
>+	  break;
>+      }
>+      return ostr;
>     }
>-    return ostr;
>-  }
>
>   // Prints graphviz dot commands for state.
>-  inline std::ostream&
>-  _State_base::_M_dot(std::ostream& __ostr, _StateIdT __id) const
>-  {
>-    switch (_M_opcode)
>+  template<size_t _Matcher_size>
>+    std::ostream&
>+    _State_base<_Matcher_size>::_M_dot(std::ostream& __ostr, _StateIdT __id) const
>     {
>-      case _S_opcode_alternative:
>-      case _S_opcode_repeat:
>-	__ostr << __id << " [label=\"" << __id << "\\nALT\"];\n"
>-	       << __id << " -> " << _M_next
>-	       << " [label=\"next\", tailport=\"s\"];\n"
>-	       << __id << " -> " << _M_alt
>-	       << " [label=\"alt\", tailport=\"n\"];\n";
>-	break;
>-      case _S_opcode_backref:
>-	__ostr << __id << " [label=\"" << __id << "\\nBACKREF "
>-	       << _M_subexpr << "\"];\n"
>-	       << __id << " -> " << _M_next << " [label=\"<match>\"];\n";
>-	break;
>-      case _S_opcode_line_begin_assertion:
>-	__ostr << __id << " [label=\"" << __id << "\\nLINE_BEGIN \"];\n"
>-	       << __id << " -> " << _M_next << " [label=\"epsilon\"];\n";
>-	break;
>-      case _S_opcode_line_end_assertion:
>-	__ostr << __id << " [label=\"" << __id << "\\nLINE_END \"];\n"
>-	       << __id << " -> " << _M_next << " [label=\"epsilon\"];\n";
>-	break;
>-      case _S_opcode_word_boundary:
>-	__ostr << __id << " [label=\"" << __id << "\\nWORD_BOUNDRY "
>-	       << _M_neg << "\"];\n"
>-	       << __id << " -> " << _M_next << " [label=\"epsilon\"];\n";
>-	break;
>-      case _S_opcode_subexpr_lookahead:
>-	__ostr << __id << " [label=\"" << __id << "\\nLOOK_AHEAD\"];\n"
>-	       << __id << " -> " << _M_next
>-	       << " [label=\"epsilon\", tailport=\"s\"];\n"
>-	       << __id << " -> " << _M_alt
>-	       << " [label=\"<assert>\", tailport=\"n\"];\n";
>-	break;
>-      case _S_opcode_subexpr_begin:
>-	__ostr << __id << " [label=\"" << __id << "\\nSBEGIN "
>-	       << _M_subexpr << "\"];\n"
>-	       << __id << " -> " << _M_next << " [label=\"epsilon\"];\n";
>-	break;
>-      case _S_opcode_subexpr_end:
>-	__ostr << __id << " [label=\"" << __id << "\\nSEND "
>-	       << _M_subexpr << "\"];\n"
>-	       << __id << " -> " << _M_next << " [label=\"epsilon\"];\n";
>-	break;
>-      case _S_opcode_dummy:
>-	break;
>-      case _S_opcode_match:
>-	__ostr << __id << " [label=\"" << __id << "\\nMATCH\"];\n"
>-	       << __id << " -> " << _M_next << " [label=\"<match>\"];\n";
>-	break;
>-      case _S_opcode_accept:
>-	__ostr << __id << " [label=\"" << __id << "\\nACC\"];\n" ;
>-	break;
>-      default:
>-	_GLIBCXX_DEBUG_ASSERT(false);
>-	break;
>+      switch (_M_opcode)
>+      {
>+	case _S_opcode_alternative:
>+	case _S_opcode_repeat:
>+	  __ostr << __id << " [label=\"" << __id << "\\nALT\"];\n"
>+		 << __id << " -> " << _M_next
>+		 << " [label=\"next\", tailport=\"s\"];\n"
>+		 << __id << " -> " << _M_alt
>+		 << " [label=\"alt\", tailport=\"n\"];\n";
>+	  break;
>+	case _S_opcode_backref:
>+	  __ostr << __id << " [label=\"" << __id << "\\nBACKREF "
>+		 << _M_subexpr << "\"];\n"
>+		 << __id << " -> " << _M_next << " [label=\"<match>\"];\n";
>+	  break;
>+	case _S_opcode_line_begin_assertion:
>+	  __ostr << __id << " [label=\"" << __id << "\\nLINE_BEGIN \"];\n"
>+		 << __id << " -> " << _M_next << " [label=\"epsilon\"];\n";
>+	  break;
>+	case _S_opcode_line_end_assertion:
>+	  __ostr << __id << " [label=\"" << __id << "\\nLINE_END \"];\n"
>+		 << __id << " -> " << _M_next << " [label=\"epsilon\"];\n";
>+	  break;
>+	case _S_opcode_word_boundary:
>+	  __ostr << __id << " [label=\"" << __id << "\\nWORD_BOUNDRY "
>+		 << _M_neg << "\"];\n"
>+		 << __id << " -> " << _M_next << " [label=\"epsilon\"];\n";
>+	  break;
>+	case _S_opcode_subexpr_lookahead:
>+	  __ostr << __id << " [label=\"" << __id << "\\nLOOK_AHEAD\"];\n"
>+		 << __id << " -> " << _M_next
>+		 << " [label=\"epsilon\", tailport=\"s\"];\n"
>+		 << __id << " -> " << _M_alt
>+		 << " [label=\"<assert>\", tailport=\"n\"];\n";
>+	  break;
>+	case _S_opcode_subexpr_begin:
>+	  __ostr << __id << " [label=\"" << __id << "\\nSBEGIN "
>+		 << _M_subexpr << "\"];\n"
>+		 << __id << " -> " << _M_next << " [label=\"epsilon\"];\n";
>+	  break;
>+	case _S_opcode_subexpr_end:
>+	  __ostr << __id << " [label=\"" << __id << "\\nSEND "
>+		 << _M_subexpr << "\"];\n"
>+		 << __id << " -> " << _M_next << " [label=\"epsilon\"];\n";
>+	  break;
>+	case _S_opcode_dummy:
>+	  break;
>+	case _S_opcode_match:
>+	  __ostr << __id << " [label=\"" << __id << "\\nMATCH\"];\n"
>+		 << __id << " -> " << _M_next << " [label=\"<match>\"];\n";
>+	  break;
>+	case _S_opcode_accept:
>+	  __ostr << __id << " [label=\"" << __id << "\\nACC\"];\n" ;
>+	  break;
>+	default:
>+	  _GLIBCXX_DEBUG_ASSERT(false);
>+	  break;
>+      }
>+      return __ostr;
>     }
>-    return __ostr;
>-  }
>
>   template<typename _TraitsT>
>     std::ostream&
>@@ -174,13 +176,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     {
>       for (auto& __it : *this)
> 	{
>-	  while (__it._M_next >= 0 && (*this)[__it._M_next]._M_opcode
>+	  while (__it._M_next >= 0 && (*this)[__it._M_next]._M_opcode()
> 		 == _S_opcode_dummy)
> 	    __it._M_next = (*this)[__it._M_next]._M_next;
>-	  if (__it._M_opcode == _S_opcode_alternative
>-	      || __it._M_opcode == _S_opcode_repeat
>-	      || __it._M_opcode == _S_opcode_subexpr_lookahead)
>-	    while (__it._M_alt >= 0 && (*this)[__it._M_alt]._M_opcode
>+	  if (__it._M_has_alt())
>+	    while (__it._M_alt >= 0 && (*this)[__it._M_alt]._M_opcode()
> 		   == _S_opcode_dummy)
> 	      __it._M_alt = (*this)[__it._M_alt]._M_next;
> 	}
>@@ -198,13 +198,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 	{
> 	  auto __u = __stack.top();
> 	  __stack.pop();
>-	  auto __dup = _M_nfa[__u];
>+	  auto __dup = _M_nfa[__u]._M_clone();
> 	  // _M_insert_state() never return -1
>-	  auto __id = _M_nfa._M_insert_state(__dup);
>+	  auto __id = _M_nfa._M_insert_state(std::move(__dup));
> 	  __m[__u] = __id;
>-	  if (__dup._M_opcode == _S_opcode_alternative
>-	      || __dup._M_opcode == _S_opcode_repeat
>-	      || __dup._M_opcode == _S_opcode_subexpr_lookahead)
>+	  if (__dup._M_has_alt())
> 	    if (__dup._M_alt != _S_invalid_state_id
> 		&& __m.count(__dup._M_alt) == 0)
> 	      __stack.push(__dup._M_alt);
>@@ -223,9 +221,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 	      _GLIBCXX_DEBUG_ASSERT(__m.count(__ref._M_next) > 0);
> 	      __ref._M_next = __m[__ref._M_next];
> 	    }
>-	  if (__ref._M_opcode == _S_opcode_alternative
>-	      || __ref._M_opcode == _S_opcode_repeat
>-	      || __ref._M_opcode == _S_opcode_subexpr_lookahead)
>+	  if (__ref._M_has_alt())
> 	    if (__ref._M_alt != _S_invalid_state_id)
> 	      {
> 		_GLIBCXX_DEBUG_ASSERT(__m.count(__ref._M_alt) > 0);
>diff --git a/libstdc++-v3/include/bits/regex_executor.h b/libstdc++-v3/include/bits/regex_executor.h
>index 404f30b..c97e056 100644
>--- a/libstdc++-v3/include/bits/regex_executor.h
>+++ b/libstdc++-v3/include/bits/regex_executor.h
>@@ -148,7 +148,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       _M_word_boundary() const;
>
>       bool
>-      _M_lookahead(_State<_TraitsT> __state);
>+      _M_lookahead(const _State<_CharT>& __state);
>
>        // Holds additional information used in BFS-mode.
>       template<typename _SearchMode, typename _ResultsVec>
>diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
>index 9b5c1c6..1bc7739 100644
>--- a/libstdc++-v3/include/bits/regex_executor.tcc
>+++ b/libstdc++-v3/include/bits/regex_executor.tcc
>@@ -145,7 +145,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   template<typename _BiIter, typename _Alloc, typename _TraitsT,
> 	   bool __dfs_mode>
>     bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
>-    _M_lookahead(_State<_TraitsT> __state)
>+    _M_lookahead(const _State<_CharT>& __state)
>     {
>       _ResultsVec __what(_M_cur_results.size());
>       _Executor __sub(_M_current, _M_end, __what, _M_re, _M_flags);
>@@ -203,7 +203,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       const auto& __state = _M_nfa[__i];
>       // Every change on _M_cur_results and _M_current will be rolled back after
>       // finishing the recursion step.
>-      switch (__state._M_opcode)
>+      switch (__state._M_opcode())
> 	{
> 	// _M_alt branch is "match once more", while _M_next is "get me out
> 	// of this quantifier". Executing _M_next first or _M_alt first don't

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch] Small refactor on <regex> _State<>
  2015-07-28 15:42     ` Jonathan Wakely
@ 2015-07-29  8:43       ` Jonathan Wakely
  2015-07-29  9:08         ` Tim Shen
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Wakely @ 2015-07-29  8:43 UTC (permalink / raw)
  To: Tim Shen; +Cc: libstdc++, gcc-patches

On 28/07/15 16:19 +0100, Jonathan Wakely wrote:
>What I'm concerned about is assignment. You haven't defined an
>assignment operator. If there's an unwanted assignment we could get
>undefined behaviour. Please delete the assignment operator if it's not
>needed.

Apologies, you have a user-declared move constructor, so assignment is
already deleted. It wouldn't hurt to make that explicit though:

  _State& operator=(const _State&) = delete;

So it's just the alignment issue that I'm concerned about now.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch] Small refactor on <regex> _State<>
  2015-07-29  8:43       ` Jonathan Wakely
@ 2015-07-29  9:08         ` Tim Shen
  2015-07-29  9:34           ` Jonathan Wakely
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Shen @ 2015-07-29  9:08 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches

On Wed, Jul 29, 2015 at 1:32 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
> Apologies, you have a user-declared move constructor, so assignment is
> already deleted. It wouldn't hurt to make that explicit though:

I'm just bad at memorizing when they are implicitly
declared/defined/deleted and fully unware of the deleted copy
assignment :)

It's better to make it explicit, since I don't want to be clever on
pretending knowing all ctor/assigment rules :)

>  _State& operator=(const _State&) = delete;
>
> So it's just the alignment issue that I'm concerned about now.
>

Will alignof(std::function<bool(C)>) be all the same across
instantiations, including user-defined C, now and in the future?

I created template parameter _Matcher_size (which should be
__matcher_size) because I don't want to make this assumption on
sizeof(). If they (both alignment and size) are expected to be the
same, we can remove that template parameter (and all indentation
changes!); otherwise, is alignment more unlikely to change than size
the reason we always stick on alignof(std::function<bool(char)>)?


-- 
Regards,
Tim Shen

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch] Small refactor on <regex> _State<>
  2015-07-29  9:08         ` Tim Shen
@ 2015-07-29  9:34           ` Jonathan Wakely
  2015-07-30  5:29             ` Tim Shen
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Wakely @ 2015-07-29  9:34 UTC (permalink / raw)
  To: Tim Shen; +Cc: libstdc++, gcc-patches

On 29/07/15 01:43 -0700, Tim Shen wrote:
>On Wed, Jul 29, 2015 at 1:32 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
>> Apologies, you have a user-declared move constructor, so assignment is
>> already deleted. It wouldn't hurt to make that explicit though:
>
>I'm just bad at memorizing when they are implicitly
>declared/defined/deleted and fully unware of the deleted copy
>assignment :)
>
>It's better to make it explicit, since I don't want to be clever on
>pretending knowing all ctor/assigment rules :)

Yes, I think it's better to make it explicit.

>>  _State& operator=(const _State&) = delete;
>>
>> So it's just the alignment issue that I'm concerned about now.
>>
>
>Will alignof(std::function<bool(C)>) be all the same across
>instantiations, including user-defined C, now and in the future?

All member variables of std::function<> are in the _Function_base base
class, which is not a template, so it has the same size and alignment
for all instantiations.

>I created template parameter _Matcher_size (which should be
>__matcher_size) because I don't want to make this assumption on
>sizeof(). If they (both alignment and size) are expected to be the
>same, we can remove that template parameter (and all indentation
>changes!);

Yes, that makes sense. See the code in <ext/aligned_buffer.h> for how
to set the alignment of the buffer appropriately. You can use the size
and alignment of std::function<bool(char)> even though it will
sometimes be a different std::function specialization.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch] Small refactor on <regex> _State<>
  2015-07-29  9:34           ` Jonathan Wakely
@ 2015-07-30  5:29             ` Tim Shen
  2015-07-30  6:20               ` Tim Shen
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Shen @ 2015-07-30  5:29 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 474 bytes --]

On Wed, Jul 29, 2015 at 2:15 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
> Yes, that makes sense. See the code in <ext/aligned_buffer.h> for how
> to set the alignment of the buffer appropriately. You can use the size
> and alignment of std::function<bool(char)> even though it will
> sometimes be a different std::function specialization.

Done.

Also change _Executor::_M_lookahead(_State<>) to
_Executor::_M_lookahead(_StateIdT __next_state).


-- 
Regards,
Tim Shen

[-- Attachment #2: c.diff --]
[-- Type: text/plain, Size: 9599 bytes --]

commit 52c70e70bdbef15c787f81e83722bfc119543ff0
Author: Tim Shen <timshen@google.com>
Date:   Wed Jul 29 21:08:43 2015 -0700

    	* include/bits/regex_automaton.h (_State_base, _State<>):
    	Remove _TraitsT dependency from _State<>; Make matcher member
    	into the union to reduce struct size.
    	* include/bits/regex_automaton.tcc (_State_base<>::_M_print,
    	_State_base<>::_M_dot, _StateSeq<>::_M_clone):
    	Adjust to fit the interface. Factor out common parts in
    	_M_clone as _State<>::_M_has_alt.
    	* include/bits/regex_executor.h (_Executer<>::_M_lookahead):
    	Only pass state id instead of the whole state.
    	* include/bits/regex_executor.tcc (_Executer<>::_M_dfs,
    	_Executer<>::_M_lookahead): Adjust to fit the interface.
    	* include/std/regex: Include <ext/aligned_buffer.h>

diff --git a/libstdc++-v3/include/bits/regex_automaton.h b/libstdc++-v3/include/bits/regex_automaton.h
index fc0eb41..0ef3896 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -72,7 +72,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   struct _State_base
   {
+  protected:
     _Opcode      _M_opcode;           // type of outgoing transition
+
+  public:
     _StateIdT    _M_next;             // outgoing transition
     union // Since they are mutually exclusive.
     {
@@ -87,16 +90,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	// quantifiers (ungreedy if set true)
 	bool       _M_neg;
       };
+      // For _S_opcode_match
+      __gnu_cxx::__aligned_membuf<_Matcher<bool>> _M_matcher_storage;
     };
 
+  protected:
     explicit _State_base(_Opcode __opcode)
     : _M_opcode(__opcode), _M_next(_S_invalid_state_id)
     { }
 
-  protected:
-    ~_State_base() = default;
-
   public:
+    bool
+    _M_has_alt()
+    {
+      return _M_opcode == _S_opcode_alternative
+	|| _M_opcode == _S_opcode_repeat
+	|| _M_opcode == _S_opcode_subexpr_lookahead;
+    }
+
 #ifdef _GLIBCXX_DEBUG
     std::ostream&
     _M_print(std::ostream& ostr) const;
@@ -107,14 +118,64 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
   };
 
-  template<typename _TraitsT>
+  template<typename _Char_type>
     struct _State : _State_base
     {
-      typedef _Matcher<typename _TraitsT::char_type> _MatcherT;
+      typedef _Matcher<_Char_type> _MatcherT;
+      static_assert(sizeof(_MatcherT) == sizeof(_Matcher<char>),
+		    "The aussmption std::function<bool(char)> has "
+		    "the same size as std::function<bool(T)> is violated");
+      static_assert(alignof(_MatcherT) == alignof(_Matcher<char>),
+		    "The aussmption std::function<bool(char)> has "
+		    "the same alignment as std::function<bool(T)> is violated");
+
+      explicit
+      _State(_Opcode __opcode) : _State_base(__opcode)
+      {
+	if (_M_opcode() == _S_opcode_match)
+	  new (this->_M_matcher_storage._M_addr()) _MatcherT();
+      }
+
+      _State(const _State& __rhs) : _State_base(__rhs)
+      {
+	if (__rhs._M_opcode() == _S_opcode_match)
+	  new (this->_M_matcher_storage._M_addr())
+	    _MatcherT(__rhs._M_get_matcher());
+      }
+
+      _State(_State&& __rhs) : _State_base(__rhs)
+      {
+	if (__rhs._M_opcode() == _S_opcode_match)
+	  new (this->_M_matcher_storage._M_addr())
+	    _MatcherT(std::move(__rhs._M_get_matcher()));
+      }
+
+      _State&
+      operator=(const _State&) = delete;
+
+      ~_State()
+      {
+	if (_M_opcode() == _S_opcode_match)
+	  _M_get_matcher().~_MatcherT();
+      }
+
+      // Since correct ctor and dtor rely on _M_opcode, it's better not to
+      // change it over time.
+      _Opcode
+      _M_opcode() const
+      { return _State_base::_M_opcode; }
+
+      bool
+      _M_matches(_Char_type __char) const
+      { return _M_get_matcher()(__char); }
 
-      _MatcherT      _M_matches;        // for _S_opcode_match
+      _MatcherT&
+      _M_get_matcher()
+      { return *reinterpret_cast<_MatcherT*>(this->_M_matcher_storage._M_addr()); }
 
-      explicit _State(_Opcode __opcode) : _State_base(__opcode) { }
+      const _MatcherT&
+      _M_get_matcher() const
+      { return *reinterpret_cast<const _MatcherT*>(this->_M_matcher_storage._M_addr()); }
     };
 
   struct _NFA_base
@@ -155,10 +216,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template<typename _TraitsT>
     struct _NFA
-    : _NFA_base, std::vector<_State<_TraitsT>>
+    : _NFA_base, std::vector<_State<typename _TraitsT::char_type>>
     {
-      typedef _State<_TraitsT>				_StateT;
-      typedef _Matcher<typename _TraitsT::char_type>	_MatcherT;
+      typedef typename _TraitsT::char_type	_Char_type;
+      typedef _State<_Char_type>		_StateT;
+      typedef _Matcher<_Char_type>		_MatcherT;
 
       _NFA(const typename _TraitsT::locale_type& __loc, _FlagT __flags)
       : _NFA_base(__flags)
@@ -202,7 +264,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _M_insert_matcher(_MatcherT __m)
       {
 	_StateT __tmp(_S_opcode_match);
-	__tmp._M_matches = std::move(__m);
+	__tmp._M_get_matcher() = std::move(__m);
 	return _M_insert_state(std::move(__tmp));
       }
 
diff --git a/libstdc++-v3/include/bits/regex_automaton.tcc b/libstdc++-v3/include/bits/regex_automaton.tcc
index 72fe978..cecc407 100644
--- a/libstdc++-v3/include/bits/regex_automaton.tcc
+++ b/libstdc++-v3/include/bits/regex_automaton.tcc
@@ -174,13 +174,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     {
       for (auto& __it : *this)
 	{
-	  while (__it._M_next >= 0 && (*this)[__it._M_next]._M_opcode
+	  while (__it._M_next >= 0 && (*this)[__it._M_next]._M_opcode()
 		 == _S_opcode_dummy)
 	    __it._M_next = (*this)[__it._M_next]._M_next;
-	  if (__it._M_opcode == _S_opcode_alternative
-	      || __it._M_opcode == _S_opcode_repeat
-	      || __it._M_opcode == _S_opcode_subexpr_lookahead)
-	    while (__it._M_alt >= 0 && (*this)[__it._M_alt]._M_opcode
+	  if (__it._M_has_alt())
+	    while (__it._M_alt >= 0 && (*this)[__it._M_alt]._M_opcode()
 		   == _S_opcode_dummy)
 	      __it._M_alt = (*this)[__it._M_alt]._M_next;
 	}
@@ -200,11 +198,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  __stack.pop();
 	  auto __dup = _M_nfa[__u];
 	  // _M_insert_state() never return -1
-	  auto __id = _M_nfa._M_insert_state(__dup);
+	  auto __id = _M_nfa._M_insert_state(std::move(__dup));
 	  __m[__u] = __id;
-	  if (__dup._M_opcode == _S_opcode_alternative
-	      || __dup._M_opcode == _S_opcode_repeat
-	      || __dup._M_opcode == _S_opcode_subexpr_lookahead)
+	  if (__dup._M_has_alt())
 	    if (__dup._M_alt != _S_invalid_state_id
 		&& __m.count(__dup._M_alt) == 0)
 	      __stack.push(__dup._M_alt);
@@ -223,9 +219,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	      _GLIBCXX_DEBUG_ASSERT(__m.count(__ref._M_next) > 0);
 	      __ref._M_next = __m[__ref._M_next];
 	    }
-	  if (__ref._M_opcode == _S_opcode_alternative
-	      || __ref._M_opcode == _S_opcode_repeat
-	      || __ref._M_opcode == _S_opcode_subexpr_lookahead)
+	  if (__ref._M_has_alt())
 	    if (__ref._M_alt != _S_invalid_state_id)
 	      {
 		_GLIBCXX_DEBUG_ASSERT(__m.count(__ref._M_alt) > 0);
diff --git a/libstdc++-v3/include/bits/regex_executor.h b/libstdc++-v3/include/bits/regex_executor.h
index 404f30b..f3f8876 100644
--- a/libstdc++-v3/include/bits/regex_executor.h
+++ b/libstdc++-v3/include/bits/regex_executor.h
@@ -148,7 +148,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _M_word_boundary() const;
 
       bool
-      _M_lookahead(_State<_TraitsT> __state);
+      _M_lookahead(_StateIdT __next);
 
        // Holds additional information used in BFS-mode.
       template<typename _SearchMode, typename _ResultsVec>
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index 9b5c1c6..3fd17f6 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -145,11 +145,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template<typename _BiIter, typename _Alloc, typename _TraitsT,
 	   bool __dfs_mode>
     bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
-    _M_lookahead(_State<_TraitsT> __state)
+    _M_lookahead(_StateIdT __next)
     {
       _ResultsVec __what(_M_cur_results.size());
       _Executor __sub(_M_current, _M_end, __what, _M_re, _M_flags);
-      __sub._M_states._M_start = __state._M_alt;
+      __sub._M_states._M_start = __next;
       if (__sub._M_search_from_first())
 	{
 	  for (size_t __i = 0; __i < __what.size(); __i++)
@@ -203,7 +203,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       const auto& __state = _M_nfa[__i];
       // Every change on _M_cur_results and _M_current will be rolled back after
       // finishing the recursion step.
-      switch (__state._M_opcode)
+      switch (__state._M_opcode())
 	{
 	// _M_alt branch is "match once more", while _M_next is "get me out
 	// of this quantifier". Executing _M_next first or _M_alt first don't
@@ -280,7 +280,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	// Here __state._M_alt offers a single start node for a sub-NFA.
 	// We recursively invoke our algorithm to match the sub-NFA.
 	case _S_opcode_subexpr_lookahead:
-	  if (_M_lookahead(__state) == !__state._M_neg)
+	  if (_M_lookahead(__state._M_alt) == !__state._M_neg)
 	    _M_dfs(__match_mode, __state._M_next);
 	  break;
 	case _S_opcode_match:
diff --git a/libstdc++-v3/include/std/regex b/libstdc++-v3/include/std/regex
index 3dff372..b6fe4c7 100644
--- a/libstdc++-v3/include/std/regex
+++ b/libstdc++-v3/include/std/regex
@@ -53,6 +53,7 @@
 #include <map>
 #include <cstring>
 
+#include <ext/aligned_buffer.h>
 #include <bits/regex_constants.h>
 #include <bits/regex_error.h>
 #include <bits/regex_automaton.h>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch] Small refactor on <regex> _State<>
  2015-07-30  5:29             ` Tim Shen
@ 2015-07-30  6:20               ` Tim Shen
  2015-07-30 10:30                 ` Jonathan Wakely
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Shen @ 2015-07-30  6:20 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 511 bytes --]

On Wed, Jul 29, 2015 at 9:21 PM, Tim Shen <timshen@google.com> wrote:
> On Wed, Jul 29, 2015 at 2:15 AM, Jonathan Wakely <jwakely@redhat.com> wrote:
>> Yes, that makes sense. See the code in <ext/aligned_buffer.h> for how
>> to set the alignment of the buffer appropriately. You can use the size
>> and alignment of std::function<bool(char)> even though it will
>> sometimes be a different std::function specialization.
>
> Done.

Oops, fixed one typo: s/_Matcher<bool>/_Matcher<char>/.


-- 
Regards,
Tim Shen

[-- Attachment #2: d.diff --]
[-- Type: text/plain, Size: 9599 bytes --]

commit d9fb4e3ec5eb9fcaf08f757c2a9ddcf57289684f
Author: Tim Shen <timshen@google.com>
Date:   Wed Jul 29 21:08:43 2015 -0700

    	* include/bits/regex_automaton.h (_State_base, _State<>):
    	Remove _TraitsT dependency from _State<>; Make matcher member
    	into the union to reduce struct size.
    	* include/bits/regex_automaton.tcc (_State_base<>::_M_print,
    	_State_base<>::_M_dot, _StateSeq<>::_M_clone):
    	Adjust to fit the interface. Factor out common parts in
    	_M_clone as _State<>::_M_has_alt.
    	* include/bits/regex_executor.h (_Executer<>::_M_lookahead):
    	Only pass state id instead of the whole state.
    	* include/bits/regex_executor.tcc (_Executer<>::_M_dfs,
    	_Executer<>::_M_lookahead): Adjust to fit the interface.
    	* include/std/regex: Include <ext/aligned_buffer.h>

diff --git a/libstdc++-v3/include/bits/regex_automaton.h b/libstdc++-v3/include/bits/regex_automaton.h
index fc0eb41..e153d42 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -72,7 +72,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   struct _State_base
   {
+  protected:
     _Opcode      _M_opcode;           // type of outgoing transition
+
+  public:
     _StateIdT    _M_next;             // outgoing transition
     union // Since they are mutually exclusive.
     {
@@ -87,16 +90,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	// quantifiers (ungreedy if set true)
 	bool       _M_neg;
       };
+      // For _S_opcode_match
+      __gnu_cxx::__aligned_membuf<_Matcher<char>> _M_matcher_storage;
     };
 
+  protected:
     explicit _State_base(_Opcode __opcode)
     : _M_opcode(__opcode), _M_next(_S_invalid_state_id)
     { }
 
-  protected:
-    ~_State_base() = default;
-
   public:
+    bool
+    _M_has_alt()
+    {
+      return _M_opcode == _S_opcode_alternative
+	|| _M_opcode == _S_opcode_repeat
+	|| _M_opcode == _S_opcode_subexpr_lookahead;
+    }
+
 #ifdef _GLIBCXX_DEBUG
     std::ostream&
     _M_print(std::ostream& ostr) const;
@@ -107,14 +118,64 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
   };
 
-  template<typename _TraitsT>
+  template<typename _Char_type>
     struct _State : _State_base
     {
-      typedef _Matcher<typename _TraitsT::char_type> _MatcherT;
+      typedef _Matcher<_Char_type> _MatcherT;
+      static_assert(sizeof(_MatcherT) == sizeof(_Matcher<char>),
+		    "The aussmption std::function<bool(char)> has "
+		    "the same size as std::function<bool(T)> is violated");
+      static_assert(alignof(_MatcherT) == alignof(_Matcher<char>),
+		    "The aussmption std::function<bool(char)> has "
+		    "the same alignment as std::function<bool(T)> is violated");
+
+      explicit
+      _State(_Opcode __opcode) : _State_base(__opcode)
+      {
+	if (_M_opcode() == _S_opcode_match)
+	  new (this->_M_matcher_storage._M_addr()) _MatcherT();
+      }
+
+      _State(const _State& __rhs) : _State_base(__rhs)
+      {
+	if (__rhs._M_opcode() == _S_opcode_match)
+	  new (this->_M_matcher_storage._M_addr())
+	    _MatcherT(__rhs._M_get_matcher());
+      }
+
+      _State(_State&& __rhs) : _State_base(__rhs)
+      {
+	if (__rhs._M_opcode() == _S_opcode_match)
+	  new (this->_M_matcher_storage._M_addr())
+	    _MatcherT(std::move(__rhs._M_get_matcher()));
+      }
+
+      _State&
+      operator=(const _State&) = delete;
+
+      ~_State()
+      {
+	if (_M_opcode() == _S_opcode_match)
+	  _M_get_matcher().~_MatcherT();
+      }
+
+      // Since correct ctor and dtor rely on _M_opcode, it's better not to
+      // change it over time.
+      _Opcode
+      _M_opcode() const
+      { return _State_base::_M_opcode; }
+
+      bool
+      _M_matches(_Char_type __char) const
+      { return _M_get_matcher()(__char); }
 
-      _MatcherT      _M_matches;        // for _S_opcode_match
+      _MatcherT&
+      _M_get_matcher()
+      { return *reinterpret_cast<_MatcherT*>(this->_M_matcher_storage._M_addr()); }
 
-      explicit _State(_Opcode __opcode) : _State_base(__opcode) { }
+      const _MatcherT&
+      _M_get_matcher() const
+      { return *reinterpret_cast<const _MatcherT*>(this->_M_matcher_storage._M_addr()); }
     };
 
   struct _NFA_base
@@ -155,10 +216,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template<typename _TraitsT>
     struct _NFA
-    : _NFA_base, std::vector<_State<_TraitsT>>
+    : _NFA_base, std::vector<_State<typename _TraitsT::char_type>>
     {
-      typedef _State<_TraitsT>				_StateT;
-      typedef _Matcher<typename _TraitsT::char_type>	_MatcherT;
+      typedef typename _TraitsT::char_type	_Char_type;
+      typedef _State<_Char_type>		_StateT;
+      typedef _Matcher<_Char_type>		_MatcherT;
 
       _NFA(const typename _TraitsT::locale_type& __loc, _FlagT __flags)
       : _NFA_base(__flags)
@@ -202,7 +264,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _M_insert_matcher(_MatcherT __m)
       {
 	_StateT __tmp(_S_opcode_match);
-	__tmp._M_matches = std::move(__m);
+	__tmp._M_get_matcher() = std::move(__m);
 	return _M_insert_state(std::move(__tmp));
       }
 
diff --git a/libstdc++-v3/include/bits/regex_automaton.tcc b/libstdc++-v3/include/bits/regex_automaton.tcc
index 72fe978..cecc407 100644
--- a/libstdc++-v3/include/bits/regex_automaton.tcc
+++ b/libstdc++-v3/include/bits/regex_automaton.tcc
@@ -174,13 +174,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     {
       for (auto& __it : *this)
 	{
-	  while (__it._M_next >= 0 && (*this)[__it._M_next]._M_opcode
+	  while (__it._M_next >= 0 && (*this)[__it._M_next]._M_opcode()
 		 == _S_opcode_dummy)
 	    __it._M_next = (*this)[__it._M_next]._M_next;
-	  if (__it._M_opcode == _S_opcode_alternative
-	      || __it._M_opcode == _S_opcode_repeat
-	      || __it._M_opcode == _S_opcode_subexpr_lookahead)
-	    while (__it._M_alt >= 0 && (*this)[__it._M_alt]._M_opcode
+	  if (__it._M_has_alt())
+	    while (__it._M_alt >= 0 && (*this)[__it._M_alt]._M_opcode()
 		   == _S_opcode_dummy)
 	      __it._M_alt = (*this)[__it._M_alt]._M_next;
 	}
@@ -200,11 +198,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  __stack.pop();
 	  auto __dup = _M_nfa[__u];
 	  // _M_insert_state() never return -1
-	  auto __id = _M_nfa._M_insert_state(__dup);
+	  auto __id = _M_nfa._M_insert_state(std::move(__dup));
 	  __m[__u] = __id;
-	  if (__dup._M_opcode == _S_opcode_alternative
-	      || __dup._M_opcode == _S_opcode_repeat
-	      || __dup._M_opcode == _S_opcode_subexpr_lookahead)
+	  if (__dup._M_has_alt())
 	    if (__dup._M_alt != _S_invalid_state_id
 		&& __m.count(__dup._M_alt) == 0)
 	      __stack.push(__dup._M_alt);
@@ -223,9 +219,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	      _GLIBCXX_DEBUG_ASSERT(__m.count(__ref._M_next) > 0);
 	      __ref._M_next = __m[__ref._M_next];
 	    }
-	  if (__ref._M_opcode == _S_opcode_alternative
-	      || __ref._M_opcode == _S_opcode_repeat
-	      || __ref._M_opcode == _S_opcode_subexpr_lookahead)
+	  if (__ref._M_has_alt())
 	    if (__ref._M_alt != _S_invalid_state_id)
 	      {
 		_GLIBCXX_DEBUG_ASSERT(__m.count(__ref._M_alt) > 0);
diff --git a/libstdc++-v3/include/bits/regex_executor.h b/libstdc++-v3/include/bits/regex_executor.h
index 404f30b..f3f8876 100644
--- a/libstdc++-v3/include/bits/regex_executor.h
+++ b/libstdc++-v3/include/bits/regex_executor.h
@@ -148,7 +148,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _M_word_boundary() const;
 
       bool
-      _M_lookahead(_State<_TraitsT> __state);
+      _M_lookahead(_StateIdT __next);
 
        // Holds additional information used in BFS-mode.
       template<typename _SearchMode, typename _ResultsVec>
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index 9b5c1c6..3fd17f6 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -145,11 +145,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template<typename _BiIter, typename _Alloc, typename _TraitsT,
 	   bool __dfs_mode>
     bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
-    _M_lookahead(_State<_TraitsT> __state)
+    _M_lookahead(_StateIdT __next)
     {
       _ResultsVec __what(_M_cur_results.size());
       _Executor __sub(_M_current, _M_end, __what, _M_re, _M_flags);
-      __sub._M_states._M_start = __state._M_alt;
+      __sub._M_states._M_start = __next;
       if (__sub._M_search_from_first())
 	{
 	  for (size_t __i = 0; __i < __what.size(); __i++)
@@ -203,7 +203,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       const auto& __state = _M_nfa[__i];
       // Every change on _M_cur_results and _M_current will be rolled back after
       // finishing the recursion step.
-      switch (__state._M_opcode)
+      switch (__state._M_opcode())
 	{
 	// _M_alt branch is "match once more", while _M_next is "get me out
 	// of this quantifier". Executing _M_next first or _M_alt first don't
@@ -280,7 +280,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	// Here __state._M_alt offers a single start node for a sub-NFA.
 	// We recursively invoke our algorithm to match the sub-NFA.
 	case _S_opcode_subexpr_lookahead:
-	  if (_M_lookahead(__state) == !__state._M_neg)
+	  if (_M_lookahead(__state._M_alt) == !__state._M_neg)
 	    _M_dfs(__match_mode, __state._M_next);
 	  break;
 	case _S_opcode_match:
diff --git a/libstdc++-v3/include/std/regex b/libstdc++-v3/include/std/regex
index 3dff372..b6fe4c7 100644
--- a/libstdc++-v3/include/std/regex
+++ b/libstdc++-v3/include/std/regex
@@ -53,6 +53,7 @@
 #include <map>
 #include <cstring>
 
+#include <ext/aligned_buffer.h>
 #include <bits/regex_constants.h>
 #include <bits/regex_error.h>
 #include <bits/regex_automaton.h>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch] Small refactor on <regex> _State<>
  2015-07-30  6:20               ` Tim Shen
@ 2015-07-30 10:30                 ` Jonathan Wakely
  0 siblings, 0 replies; 8+ messages in thread
From: Jonathan Wakely @ 2015-07-30 10:30 UTC (permalink / raw)
  To: Tim Shen; +Cc: libstdc++, gcc-patches

On 29/07/15 21:55 -0700, Tim Shen wrote:
>-      typedef _Matcher<typename _TraitsT::char_type> _MatcherT;
>+      typedef _Matcher<_Char_type> _MatcherT;
>+      static_assert(sizeof(_MatcherT) == sizeof(_Matcher<char>),
>+		    "The aussmption std::function<bool(char)> has "
>+		    "the same size as std::function<bool(T)> is violated");
>+      static_assert(alignof(_MatcherT) == alignof(_Matcher<char>),
>+		    "The aussmption std::function<bool(char)> has "
>+		    "the same alignment as std::function<bool(T)> is violated");

Good idea adding these assertions, but "aussmption" :-)

In accordance with https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41759
we should avoid negative wording ("X is not Y" or "X does not meet Y")
in static assertions, so maybe something like:

      static_assert(alignof(_MatcherT) == alignof(_Matcher<char>),
		    "std::function<bool(T)> has the same size as "
		    "std::function<bool(char)>");

or:

      static_assert(alignof(_MatcherT) == alignof(_Matcher<char>),
		    "std::function<bool(T)> must have the same size as "
		    "std::function<bool(char)>");


>-      _MatcherT      _M_matches;        // for _S_opcode_match
>+      _MatcherT&
>+      _M_get_matcher()
>+      { return *reinterpret_cast<_MatcherT*>(this->_M_matcher_storage._M_addr()); }
>
>-      explicit _State(_Opcode __opcode) : _State_base(__opcode) { }
>+      const _MatcherT&
>+      _M_get_matcher() const
>+      { return *reinterpret_cast<const _MatcherT*>(this->_M_matcher_storage._M_addr()); }

These can use static_cast, because _M_addr() returns void*

OK for trunk with those two tweaks, thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-07-30 10:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAG4ZjNnYLoCYnjNpkJYHa2Q-ux0KnKmVXBDUOa33xk29MzeTKA@mail.gmail.com>
2015-07-25  8:05 ` [Patch] Small refactor on <regex> _State<> Tim Shen
     [not found] ` <20150725153121.GW21787@redhat.com>
     [not found]   ` <CAG4ZjNnvm=QAwihOUvMLtOezYprtasaJOY3i-MhJV0Tf4z5mpg@mail.gmail.com>
2015-07-28 15:42     ` Jonathan Wakely
2015-07-29  8:43       ` Jonathan Wakely
2015-07-29  9:08         ` Tim Shen
2015-07-29  9:34           ` Jonathan Wakely
2015-07-30  5:29             ` Tim Shen
2015-07-30  6:20               ` Tim Shen
2015-07-30 10:30                 ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).