From: Jonathan Wakely <jwakely@redhat.com>
To: "François Dumont" <frs.dumont@gmail.com>
Cc: "libstdc++" <libstdc++@gcc.gnu.org>,
gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH][_Hashtable] Use RAII to restore Rehash state
Date: Thu, 9 Nov 2023 08:11:31 +0000 [thread overview]
Message-ID: <CACb0b4kZvh7aBEbMmTzxQ4Q1AxKgLmX79AF8XUhYztgA7LGJow@mail.gmail.com> (raw)
In-Reply-To: <d55d6fe7-7d45-49eb-a8f0-544be67c5e89@gmail.com>
On Thu, 26 Oct 2023 at 21:52, François Dumont <frs.dumont@gmail.com> wrote:
>
>
> On 26/10/2023 12:43, Jonathan Wakely wrote:
> > On 26/10/23 07:18 +0200, François Dumont wrote:
> >> libstdc++: [_Hashtable] Use RAII type to manage rehash functor state
> >>
> >> Replace usage of __try/__catch with a RAII type to restore rehash
> >> functor
> >> state when needed.
> >
> > Generally I really like replacing try-catch with RAII but I have some
> > questions below.
> >
> >> libstdc++-v3/ChangeLog:
> >>
> >> * include/bits/hashtable_policy.h (_RehashStateGuard): New.
> >> (_Insert_base<>::_M_insert_range(_IIt, _IIt, const
> >> _NodeGet&, false_type)):
> >> Adapt.
> >> * include/bits/hashtable.h (__rehash_guard_t): New.
> >> (__rehash_state): Remove.
> >> (_M_rehash): Remove.
> >> (_M_rehash_aux): Rename into _M_rehash.
> >> (_M_assign_elements, _M_insert_unique_node,
> >> _M_insert_multi_node): Adapt.
> >> (rehash): Adapt.
> >>
> >>
> >> Tested under Linux x64.
> >>
> >> Ok to commit ?
> >>
> >> François
> >
> >> diff --git a/libstdc++-v3/include/bits/hashtable.h
> >> b/libstdc++-v3/include/bits/hashtable.h
> >> index 0857448f7ed..64071ac1fb2 100644
> >> --- a/libstdc++-v3/include/bits/hashtable.h
> >> +++ b/libstdc++-v3/include/bits/hashtable.h
> >> @@ -234,6 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >> _RehashPolicy, _Traits>;
> >> using __enable_default_ctor
> >> = _Hashtable_enable_default_ctor<_Equal, _Hash, _Alloc>;
> >> + using __rehash_guard_t
> >> + = __detail::_RehashStateGuard<_RehashPolicy>;
> >>
> >> public:
> >> typedef _Key key_type;
> >> @@ -264,7 +266,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>
> >> private:
> >> using __rehash_type = _RehashPolicy;
> >> - using __rehash_state = typename __rehash_type::_State;
> >>
> >> using __unique_keys = typename __traits_type::__unique_keys;
> >>
> >> @@ -1200,14 +1201,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>
> >> private:
> >> // Helper rehash method used when keys are unique.
> >> - void _M_rehash_aux(size_type __bkt_count, true_type __uks);
> >> + void _M_rehash(size_type __bkt_count, true_type __uks);
> >>
> >> // Helper rehash method used when keys can be non-unique.
> >> - void _M_rehash_aux(size_type __bkt_count, false_type __uks);
> >> -
> >> - // Unconditionally change size of bucket array to n, restore
> >> - // hash policy state to __state on exception.
> >> - void _M_rehash(size_type __bkt_count, const __rehash_state&
> >> __state);
> >> + void _M_rehash(size_type __bkt_count, false_type __uks);
> >> };
> >>
> >> // Definitions of class template _Hashtable's out-of-line member
> >> functions.
> >> @@ -1337,7 +1334,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >> {
> >> __buckets_ptr __former_buckets = nullptr;
> >> std::size_t __former_bucket_count = _M_bucket_count;
> >> - const __rehash_state& __former_state = _M_rehash_policy._M_state();
> >> + __rehash_guard_t __rehash_guard(_M_rehash_policy);
> >>
> >> if (_M_bucket_count != __ht._M_bucket_count)
> >> {
> >> @@ -1359,6 +1356,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >> _M_assign(std::forward<_Ht>(__ht), __roan);
> >> if (__former_buckets)
> >> _M_deallocate_buckets(__former_buckets,
> >> __former_bucket_count);
> >> + __rehash_guard._M_reset = false;
> >
> > I find this confusing. Usually "reset" means that something is
> > cleared, so won't take an action in the destructor. e.g. if you use
> > std::unique_ptr::reset() then the object is destroyed immediately, and
> > then nothing happens in the destructor. Here it's the opposite,
> > _M_reset=true means that it _sould_ do something in the destructor.
> >
> > The problem is the ambiguity between "reset the state in the
> > destructor later" and "reset the object to an empty state now".
> >
> > If the member was called _M_guarded then it might be clearer.
> > _M_guarded=true means the guard is active, and will restore the state
> > later. Or _M_active, or _M_armed, or even _M_reset_in_dtor. Any of
> > those names avoids the confusion with the semantics of
> > std::unique_ptr::reset() and similar well-known APIs.
> >
> > Or what I usually do is store a pointer to the guarded object in the
> > RAII guard type, and then just null the pointer to disarm the guard.
> > That means you don't need a separate bool member variable. If
> > _RehashStateGuard::_M_rehash_policy was called _M_guarded_obj and was
> > a _RehashPolicy* instead of _RehashPolicy& then disarming it would be:
> >
> > __rehash_guard._M_guarded_obj = nullptr;
> >
> > This seems clear to me, as it says that the guard no longer has
> > anything to guard, so won't do anything in the destructor.
> >
> Looks clearer to me too.
>
> I started with a _RehashStateScope and a _M_commit() method, it could
> have been worst :-)
>
>
> >
> >> }
> >> __catch(...)
> >> {
> >> @@ -1366,7 +1364,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >> {
> >> // Restore previous buckets.
> >> _M_deallocate_buckets();
> >> - _M_rehash_policy._M_reset(__former_state);
> >> _M_buckets = __former_buckets;
> >> _M_bucket_count = __former_bucket_count;
> >> }
> >> @@ -2142,17 +2139,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >> __node_ptr __node, size_type __n_elt)
> >> -> iterator
> >> {
> >> - const __rehash_state& __saved_state =
> >> _M_rehash_policy._M_state();
> >> + __rehash_guard_t __rehash_guard(_M_rehash_policy);
> >> std::pair<bool, std::size_t> __do_rehash
> >> = _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
> >> __n_elt);
> >>
> >> if (__do_rehash.first)
> >> {
> >> - _M_rehash(__do_rehash.second, __saved_state);
> >> + _M_rehash(__do_rehash.second, true_type{});
> >> __bkt = _M_bucket_index(__code);
> >> }
> >>
> >> + __rehash_guard._M_reset = false;
> >> this->_M_store_code(*__node, __code);
> >
> > This changes behaviour. Previously the try-catch was inside the call to
> > _M_rehash. Now we're starting to guard before calling _M_need_rehash,
>
> We were capturing _M_rehash_policy state before the call to
> _M_need_rehash so it's not different.
>
> We really need to instantiate the guard before we call _M_need_rehash
> cause it is the method changing the _M_rehash_policy state.
>
> > and we run the destructor unconditionally even if we never call
> > _M_rehash.
>
> Here we only want to reset rehash state on exception.
>
> Note that with the 2 rehash policy implementations we provide it doesn't
> matter if we reset or not when we do not call _M_rehash because their
> state changes only when rehash is needed. But some user's fancy rehash
> policy might update their state even if no rehash is requested and in
> this case it is better to reset this state only on exception.
>
>
> > I think that's OK, because _M_need_rehash is noexcept (but
> > it's confusing because we have a const overload of _M_need_rehash
> > which is noexcept(false) and a non-const overload which is
> > noexcept(true) ... should they both be noexcept?)
>
> That's another subject but yes, _M_need_rehash should be non-const and
> noexcept on both rehash policy implementations. I considered doing such
> a thing but won't it impact abi as _Prime_rehash_policy._M_need_rehash
> symbols is coming from the lib ?
>
>
> >
> > Can we do this instead:
> >
> >
> > - const __rehash_state& __saved_state = _M_rehash_policy._M_state();
> > std::pair<bool, std::size_t> __do_rehash
> > = _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
> > __n_elt);
> >
> > if (__do_rehash.first)
> > {
> > + __rehash_guard_t __rehash_guard(_M_rehash_policy);
>
> Clearly no, we need to capture state before _M_need_rehash. Like it was
> done with the __saved_state above.
>
>
> > - _M_rehash(__do_rehash.second, __saved_state);
> > + _M_rehash(__do_rehash.second, true_type{});
> > + __rehash_guard._M_reset = false;
> > __bkt = _M_bucket_index(__code);
> > }
> >
> > this->_M_store_code(*__node, __code);
> >
> > This only creates a guard object if we're actually going to rehash, so
> > we don't run its destructor (and check if we need t restore anything)
> > when no rehash is needed.
> >
> > N.B. _M_bucket_index is also confusing. There are two overloads of it,
> > one is noexcept and one isn't. The one that we call here is
> > noexcept(false), so that call could throw ... is it OK that we don't
> > restore the old state if that happens? Should that be guarded too? (I
> > have no idea).
> >
> > Also, I see that the noexcept(true) overload of _M_bucket_index calls
> > __hash_code_base::_M_bucket_index, which is noexcept(/* condition */),
> > so the caller says it cannot throw, but it calls something that
> > sometimes throws. Is that correct?
>
> It can be noexcept because usage of hash code cache depends on either
> hash functor is noexcept or not.
>
> >
> >>
> >> // Always insert at the beginning of the bucket.
> >> @@ -2172,13 +2170,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >> __hash_code __code, __node_ptr __node)
> >> -> iterator
> >> {
> >> - const __rehash_state& __saved_state =
> >> _M_rehash_policy._M_state();
> >> + __rehash_guard_t __rehash_guard(_M_rehash_policy);
> >> std::pair<bool, std::size_t> __do_rehash
> >> = _M_rehash_policy._M_need_rehash(_M_bucket_count,
> >> _M_element_count, 1);
> >>
> >> if (__do_rehash.first)
> >> - _M_rehash(__do_rehash.second, __saved_state);
> >> + _M_rehash(__do_rehash.second, false_type{});
> >>
> >> + __rehash_guard._M_reset = false;
> >
> > Again, we're creating the guard object in a larger scope than needed.
> >
> >> this->_M_store_code(*__node, __code);
> >> const key_type& __k = _ExtractKey{}(__node->_M_v());
> >> size_type __bkt = _M_bucket_index(__code);
> >> @@ -2509,39 +2508,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >> _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >> rehash(size_type __bkt_count)
> >> {
> >> - const __rehash_state& __saved_state =
> >> _M_rehash_policy._M_state();
> >> + __rehash_guard_t __rehash_guard(_M_rehash_policy);
> >> __bkt_count
> >> = std::max(_M_rehash_policy._M_bkt_for_elements(_M_element_count
> >> + 1),
> >> __bkt_count);
> >> __bkt_count = _M_rehash_policy._M_next_bkt(__bkt_count);
> >>
> >> if (__bkt_count != _M_bucket_count)
> >> - _M_rehash(__bkt_count, __saved_state);
> >> - else
> >> - // No rehash, restore previous state to keep it consistent with
> >> - // container state.
> >> - _M_rehash_policy._M_reset(__saved_state);
> >> - }
> >> -
> >> - template<typename _Key, typename _Value, typename _Alloc,
> >> - typename _ExtractKey, typename _Equal,
> >> - typename _Hash, typename _RangeHash, typename _Unused,
> >> - typename _RehashPolicy, typename _Traits>
> >> - void
> >> - _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> >> - _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >> - _M_rehash(size_type __bkt_count, const __rehash_state& __state)
> >> - {
> >> - __try
> >> - {
> >> - _M_rehash_aux(__bkt_count, __unique_keys{});
> >> - }
> >> - __catch(...)
> >> {
> >> - // A failure here means that buckets allocation failed. We only
> >> - // have to restore hash policy previous state.
> >> - _M_rehash_policy._M_reset(__state);
> >> - __throw_exception_again;
> >> + _M_rehash(__bkt_count, __unique_keys{});
> >> + __rehash_guard._M_reset = false;
> >
> > And a larger scope again.
> >
> >> }
> >> }
> >>
> >> @@ -2553,7 +2529,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >> void
> >> _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> >> _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >> - _M_rehash_aux(size_type __bkt_count, true_type /* __uks */)
> >> + _M_rehash(size_type __bkt_count, true_type /* __uks */)
> >> {
> >> __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
> >> __node_ptr __p = _M_begin();
> >> @@ -2596,7 +2572,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >> void
> >> _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
> >> _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
> >> - _M_rehash_aux(size_type __bkt_count, false_type /* __uks */)
> >> + _M_rehash(size_type __bkt_count, false_type /* __uks */)
> >> {
> >> __buckets_ptr __new_buckets = _M_allocate_buckets(__bkt_count);
> >> __node_ptr __p = _M_begin();
> >> diff --git a/libstdc++-v3/include/bits/hashtable_policy.h
> >> b/libstdc++-v3/include/bits/hashtable_policy.h
> >> index 5d162463dc3..8b9626b1575 100644
> >> --- a/libstdc++-v3/include/bits/hashtable_policy.h
> >> +++ b/libstdc++-v3/include/bits/hashtable_policy.h
> >> @@ -715,6 +715,26 @@ namespace __detail
> >> std::size_t _M_next_resize;
> >> };
> >>
> >> + template<typename _RehashPolicy>
> >> + struct _RehashStateGuard
> >> + {
> >> + _RehashPolicy& _M_rehash_policy;
> >> + typename _RehashPolicy::_State _M_prev_state;
> >> + bool _M_reset = true;
> >
> > This could be:
> >
> > + _RehashPolicy* _M_guarded_obj;
> > + typename _RehashPolicy::_State _M_prev_state;
> >
> >> +
> >> + _RehashStateGuard(_RehashPolicy& __policy)
> >> + : _M_rehash_policy(__policy)
> >> + , _M_prev_state(__policy._M_state())
> >
> > + _RehashStateGuard(_RehashPolicy& __policy)
> > + : _M_guarded_obj(std::__addressof(__policy))
> > + , _M_prev_state(__policy._M_state())
> >
> >> + { }
> >> + _RehashStateGuard(const _RehashStateGuard&) = delete;
> >> +
> >> + ~_RehashStateGuard()
> >> + {
> >> + if (_M_reset)
> >> + _M_rehash_policy._M_reset(_M_prev_state);
> >
> > + if (_M_guarded_obj)
> > + _M_guarded_obj->_M_reset(_M_prev_state);
> >
> >> + }
> >> + };
> >> +
> >> // Base classes for std::_Hashtable. We define these base classes
> >> // because in some cases we want to do different things depending on
> >> // the value of a policy class. In some cases the policy class
> >> @@ -1007,7 +1027,7 @@ namespace __detail
> >> const _NodeGetter& __node_gen, false_type __uks)
> >> {
> >> using __rehash_type = typename __hashtable::__rehash_type;
> >> - using __rehash_state = typename __hashtable::__rehash_state;
> >> + using __rehash_guard_t = typename __hashtable::__rehash_guard_t;
> >> using pair_type = std::pair<bool, std::size_t>;
> >>
> >> size_type __n_elt = __detail::__distance_fw(__first, __last);
> >> @@ -1016,14 +1036,15 @@ namespace __detail
> >>
> >> __hashtable& __h = _M_conjure_hashtable();
> >> __rehash_type& __rehash = __h._M_rehash_policy;
> >> - const __rehash_state& __saved_state = __rehash._M_state();
> >> + __rehash_guard_t __rehash_guard(__rehash);
> >> pair_type __do_rehash = __rehash._M_need_rehash(__h._M_bucket_count,
> >> __h._M_element_count,
> >> __n_elt);
> >>
> >> if (__do_rehash.first)
> >> - __h._M_rehash(__do_rehash.second, __saved_state);
> >> + __h._M_rehash(__do_rehash.second, __uks);
> >>
> >> + __rehash_guard._M_reset = false;
> >
> > A larger scope again.
>
> Note that there is only one place where we reset rehash policy state
> even if no exception took place. It is in the std rehash method when the
> user requests a rehash that has no effect.
>
> Here is the updated patch, ok to commit ?
Thanks for the explanations, OK for trunk.
prev parent reply other threads:[~2023-11-09 8:11 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-26 5:18 François Dumont
2023-10-26 10:14 ` Jonathan Wakely
2023-10-26 10:43 ` Jonathan Wakely
2023-10-26 20:52 ` François Dumont
2023-11-09 8:11 ` Jonathan Wakely [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACb0b4kZvh7aBEbMmTzxQ4Q1AxKgLmX79AF8XUhYztgA7LGJow@mail.gmail.com \
--to=jwakely@redhat.com \
--cc=frs.dumont@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=libstdc++@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).