public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jonathan Wakely <jwakely@redhat.com>
To: "François Dumont" <frs.dumont@gmail.com>
Cc: libstdc++ <libstdc++@gcc.gnu.org>, gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH][_Hashtable] Use RAII to restore Rehash state
Date: Thu, 26 Oct 2023 11:43:55 +0100	[thread overview]
Message-ID: <ZTpC6wxExIrq6Vme@redhat.com> (raw)
In-Reply-To: <7f61df18-dd99-4ff5-9fcd-8ca7820403d4@gmail.com>

On 26/10/23 07:18 +0200, François Dumont wrote:
>    libstdc++: [_Hashtable] Use RAII type to manage rehash functor state
>
>    Replace usage of __try/__catch with a RAII type to restore rehash 
>functor
>    state when needed.

Generally I really like replacing try-catch with RAII but I have some
questions below.

>    libstdc++-v3/ChangeLog:
>
>            * include/bits/hashtable_policy.h (_RehashStateGuard): New.
>            (_Insert_base<>::_M_insert_range(_IIt, _IIt, const 
>_NodeGet&, false_type)):
>            Adapt.
>            * include/bits/hashtable.h (__rehash_guard_t): New.
>            (__rehash_state): Remove.
>            (_M_rehash): Remove.
>            (_M_rehash_aux): Rename into _M_rehash.
>            (_M_assign_elements, _M_insert_unique_node, 
>_M_insert_multi_node): Adapt.
>            (rehash): Adapt.
>
>
>Tested under Linux x64.
>
>Ok to commit ?
>
>François

>diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
>index 0857448f7ed..64071ac1fb2 100644
>--- a/libstdc++-v3/include/bits/hashtable.h
>+++ b/libstdc++-v3/include/bits/hashtable.h
>@@ -234,6 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 					      _RehashPolicy, _Traits>;
>       using __enable_default_ctor
> 	= _Hashtable_enable_default_ctor<_Equal, _Hash, _Alloc>;
>+      using __rehash_guard_t
>+	= __detail::_RehashStateGuard<_RehashPolicy>;
> 
>     public:
>       typedef _Key						key_type;
>@@ -264,7 +266,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 
>     private:
>       using __rehash_type = _RehashPolicy;
>-      using __rehash_state = typename __rehash_type::_State;
> 
>       using __unique_keys = typename __traits_type::__unique_keys;
> 
>@@ -1200,14 +1201,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 
>     private:
>       // Helper rehash method used when keys are unique.
>-      void _M_rehash_aux(size_type __bkt_count, true_type __uks);
>+      void _M_rehash(size_type __bkt_count, true_type __uks);
> 
>       // Helper rehash method used when keys can be non-unique.
>-      void _M_rehash_aux(size_type __bkt_count, false_type __uks);
>-
>-      // Unconditionally change size of bucket array to n, restore
>-      // hash policy state to __state on exception.
>-      void _M_rehash(size_type __bkt_count, const __rehash_state& __state);
>+      void _M_rehash(size_type __bkt_count, false_type __uks);
>     };
> 
>   // Definitions of class template _Hashtable's out-of-line member functions.
>@@ -1337,7 +1334,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       {
> 	__buckets_ptr __former_buckets = nullptr;
> 	std::size_t __former_bucket_count = _M_bucket_count;
>-	const __rehash_state& __former_state = _M_rehash_policy._M_state();
>+	__rehash_guard_t __rehash_guard(_M_rehash_policy);
> 
> 	if (_M_bucket_count != __ht._M_bucket_count)
> 	  {
>@@ -1359,6 +1356,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 	    _M_assign(std::forward<_Ht>(__ht), __roan);
> 	    if (__former_buckets)
> 	      _M_deallocate_buckets(__former_buckets, __former_bucket_count);
>+	    __rehash_guard._M_reset = false;

I find this confusing. Usually "reset" means that something is
cleared, so won't take an action in the destructor. e.g. if you use
std::unique_ptr::reset() then the object is destroyed immediately, and
then nothing happens in the destructor. Here it's the opposite,
_M_reset=true means that it _sould_ do something in the destructor.

The problem is the ambiguity between "reset the state in the
destructor later" and "reset the object to an empty state now".

If the member was called _M_guarded then it might be clearer.
_M_guarded=true means the guard is active, and will restore the state
later. Or _M_active, or _M_armed, or even _M_reset_in_dtor. Any of
those names avoids the confusion with the semantics of
std::unique_ptr::reset() and similar well-known APIs.

Or what I usually do is store a pointer to the guarded object in the
RAII guard type, and then just null the pointer to disarm the guard.
That means you don't need a separate bool member variable. If
_RehashStateGuard::_M_rehash_policy was called _M_guarded_obj and was
a _RehashPolicy* instead of _RehashPolicy& then disarming it would be:

        __rehash_guard._M_guarded_obj = nullptr;

This seems clear to me, as it says that the guard no longer has
anything to guard, so won't do anything in the destructor.


> 	  }
> 	__catch(...)
> 	  {
>@@ -1366,7 +1364,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 	      {
> 		// Restore previous buckets.
> 		_M_deallocate_buckets();
>-		_M_rehash_policy._M_reset(__former_state);
> 		_M_buckets = __former_buckets;
> 		_M_bucket_count = __former_bucket_count;
> 	      }
>@@ -2142,17 +2139,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 			  __node_ptr __node, size_type __n_elt)
>     -> iterator
>     {
>-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
>+      __rehash_guard_t __rehash_guard(_M_rehash_policy);
>       std::pair<bool, std::size_t> __do_rehash
> 	= _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
> 					  __n_elt);
> 
>       if (__do_rehash.first)
> 	{
>-	  _M_rehash(__do_rehash.second, __saved_state);
>+	  _M_rehash(__do_rehash.second, true_type{});
> 	  __bkt = _M_bucket_index(__code);
> 	}
> 
>+      __rehash_guard._M_reset = false;
>       this->_M_store_code(*__node, __code);

This changes behaviour. Previously the try-catch was inside the call to
_M_rehash. Now we're starting to guard before calling _M_need_rehash,
and we run the destructor unconditionally even if we never call
_M_rehash. I think that's OK, because _M_need_rehash is noexcept (but
it's confusing because we have a const overload of _M_need_rehash
which is noexcept(false) and a non-const overload which is
noexcept(true) ... should they both be noexcept?)

Can we do this instead:


-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
        std::pair<bool, std::size_t> __do_rehash
  	= _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
  					  __n_elt);
  
        if (__do_rehash.first)
  	{
+	  __rehash_guard_t __rehash_guard(_M_rehash_policy);
-	  _M_rehash(__do_rehash.second, __saved_state);
+	  _M_rehash(__do_rehash.second, true_type{});
+	  __rehash_guard._M_reset = false;
  	  __bkt = _M_bucket_index(__code);
  	}
  
        this->_M_store_code(*__node, __code);

This only creates a guard object if we're actually going to rehash, so
we don't run its destructor (and check if we need t restore anything)
when no rehash is needed.

N.B. _M_bucket_index is also confusing. There are two overloads of it,
one is noexcept and one isn't. The one that we call here is
noexcept(false), so that call could throw ... is it OK that we don't
restore the old state if that happens? Should that be guarded too? (I
have no idea).

Also, I see that the noexcept(true) overload of _M_bucket_index calls
__hash_code_base::_M_bucket_index, which is noexcept(/* condition */),
so the caller says it cannot throw, but it calls something that
sometimes throws. Is that correct?

> 
>       // Always insert at the beginning of the bucket.
>@@ -2172,13 +2170,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 			 __hash_code __code, __node_ptr __node)
>     -> iterator
>     {
>-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
>+      __rehash_guard_t __rehash_guard(_M_rehash_policy);
>       std::pair<bool, std::size_t> __do_rehash
> 	= _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count, 1);
> 
>       if (__do_rehash.first)
>-	_M_rehash(__do_rehash.second, __saved_state);
>+	_M_rehash(__do_rehash.second, false_type{});
> 
>+      __rehash_guard._M_reset = false;

Again, we're creating the guard object in a larger scope than needed.

>       this->_M_store_code(*__node, __code);
>       const key_type& __k = _ExtractKey{}(__node->_M_v());
>       size_type __bkt = _M_bucket_index(__code);
>@@ -2509,39 +2508,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>     rehash(size_type __bkt_count)
>     {
>-      const __rehash_state& __saved_state = _M_rehash_policy._M_state();
>+      __rehash_guard_t __rehash_guard(_M_rehash_policy);
>       __bkt_count
> 	= std::max(_M_rehash_policy._M_bkt_for_elements(_M_element_count + 1),
> 		   __bkt_count);
>       __bkt_count = _M_rehash_policy._M_next_bkt(__bkt_count);
> 
>       if (__bkt_count != _M_bucket_count)
>-	_M_rehash(__bkt_count, __saved_state);
>-      else
>-	// No rehash, restore previous state to keep it consistent with
>-	// container state.
>-	_M_rehash_policy._M_reset(__saved_state);
>-    }
>-
>-  template<typename _Key, typename _Value, typename _Alloc,
>-	   typename _ExtractKey, typename _Equal,
>-	   typename _Hash, typename _RangeHash, typename _Unused,
>-	   typename _RehashPolicy, typename _Traits>
>-    void
>-    _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
>-	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>-    _M_rehash(size_type __bkt_count, const __rehash_state& __state)
>-    {
>-      __try
>-	{
>-	  _M_rehash_aux(__bkt_count, __unique_keys{});
>-	}
>-      __catch(...)
> 	{
>-	  // A failure here means that buckets allocation failed.  We only
>-	  // have to restore hash policy previous state.
>-	  _M_rehash_policy._M_reset(__state);
>-	  __throw_exception_again;
>+	  _M_rehash(__bkt_count, __unique_keys{});
>+	  __rehash_guard._M_reset = false;

And a larger scope again.

> 	}
>     }
> 
>@@ -2553,7 +2529,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     void
>     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>-    _M_rehash_aux(size_type __bkt_count, true_type /* __uks */)
>+    _M_rehash(size_type __bkt_count, true_type /* __uks */)
>     {
>       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
>       __node_ptr __p = _M_begin();
>@@ -2596,7 +2572,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     void
>     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
>-    _M_rehash_aux(size_type __bkt_count, false_type /* __uks */)
>+    _M_rehash(size_type __bkt_count, false_type /* __uks */)
>     {
>       __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
>       __node_ptr __p = _M_begin();
>diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
>index 5d162463dc3..8b9626b1575 100644
>--- a/libstdc++-v3/include/bits/hashtable_policy.h
>+++ b/libstdc++-v3/include/bits/hashtable_policy.h
>@@ -715,6 +715,26 @@ namespace __detail
>     std::size_t	_M_next_resize;
>   };
> 
>+  template<typename _RehashPolicy>
>+    struct _RehashStateGuard
>+    {
>+      _RehashPolicy& _M_rehash_policy;
>+      typename _RehashPolicy::_State _M_prev_state;
>+      bool _M_reset = true;

This could be:

+      _RehashPolicy* _M_guarded_obj;
+      typename _RehashPolicy::_State _M_prev_state;

>+
>+      _RehashStateGuard(_RehashPolicy& __policy)
>+      : _M_rehash_policy(__policy)
>+      , _M_prev_state(__policy._M_state())

+      _RehashStateGuard(_RehashPolicy& __policy)
+      : _M_guarded_obj(std::__addressof(__policy))
+      , _M_prev_state(__policy._M_state())

>+      { }
>+      _RehashStateGuard(const _RehashStateGuard&) = delete;
>+
>+      ~_RehashStateGuard()
>+      {
>+	if (_M_reset)
>+	  _M_rehash_policy._M_reset(_M_prev_state);

+	if (_M_guarded_obj)
+	  _M_guarded_obj->_M_reset(_M_prev_state);

>+      }
>+    };
>+
>   // Base classes for std::_Hashtable.  We define these base classes
>   // because in some cases we want to do different things depending on
>   // the value of a policy class.  In some cases the policy class
>@@ -1007,7 +1027,7 @@ namespace __detail
> 		      const _NodeGetter& __node_gen, false_type __uks)
>       {
> 	using __rehash_type = typename __hashtable::__rehash_type;
>-	using __rehash_state = typename __hashtable::__rehash_state;
>+	using __rehash_guard_t = typename __hashtable::__rehash_guard_t;
> 	using pair_type = std::pair<bool, std::size_t>;
> 
> 	size_type __n_elt = __detail::__distance_fw(__first, __last);
>@@ -1016,14 +1036,15 @@ namespace __detail
> 
> 	__hashtable& __h = _M_conjure_hashtable();
> 	__rehash_type& __rehash = __h._M_rehash_policy;
>-	const __rehash_state& __saved_state = __rehash._M_state();
>+	__rehash_guard_t __rehash_guard(__rehash);
> 	pair_type __do_rehash = __rehash._M_need_rehash(__h._M_bucket_count,
> 							__h._M_element_count,
> 							__n_elt);
> 
> 	if (__do_rehash.first)
>-	  __h._M_rehash(__do_rehash.second, __saved_state);
>+	  __h._M_rehash(__do_rehash.second, __uks);
> 
>+	__rehash_guard._M_reset = false;

A larger scope again.

> 	for (; __first != __last; ++__first)
> 	  __h._M_insert(*__first, __node_gen, __uks);
>       }


  parent reply	other threads:[~2023-10-26 10:43 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-26  5:18 François Dumont
2023-10-26 10:14 ` Jonathan Wakely
2023-10-26 10:43 ` Jonathan Wakely [this message]
2023-10-26 20:52   ` François Dumont
2023-11-09  8:11     ` Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZTpC6wxExIrq6Vme@redhat.com \
    --to=jwakely@redhat.com \
    --cc=frs.dumont@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).