public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][Hashtable] Performance optimization through use of insertion hint
@ 2023-07-24 11:02 François Dumont
  2023-08-29 19:51 ` François Dumont
  0 siblings, 1 reply; 5+ messages in thread
From: François Dumont @ 2023-07-24 11:02 UTC (permalink / raw)
  To: libstdc++; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 8819 bytes --]

     libstdc++: [_Hashtable] Make more use of insertion hint


     When inserting an element into an empty bucket we currently insert 
the new node
     after the before-begin node so in first position. The drawback of 
doing this is
     that we are forced to update the bucket that was containing this 
before-begin
     node to point to the newly inserted node. To do so we need at best 
to do a modulo
     to find this bucket and when hash code is not cached also compute it.

     To avoid this side effect it is better to insert after the last 
node. Adding
     a member to keep track of this last node would be an abi breaking 
change. Still we
     can ask the user to maintain and provide this last node as an 
insertion hint.

     Adapt range insertion methods to try to detect the last node and 
then use it as
     the insertion hint.

     libstdc++-v3/ChangeLog:

             * include/bits/hashtable_policy.h:
             (_Insert_base::try_emplace): Adapt to use hint.
             (_Insert_base::_M_insert_range): Try to detect container 
last node and use it
             as hint for insertions.
             (_Hash_code_base::_M_hash_code(const _Hash&, const 
_Hash_node_value<>&)): Remove.
             (_Hash_code_base::_M_hash_code<_H2>(const _H2&, const 
_Hash_node_value<>&)): Remove.
             * include/bits/hashtable.h
             (_Hashtable<>::_M_insert_bucket_begin): Add hint parameter 
and use it when inserting
             into an empty bucket if hint is the last container node.
             (_Hashtable<>::_InsertInfo): New struct.
             (_Hashtable<>::_M_get_insert_info): New, return latter.
             (_Hashtable<>::_M_insert_multi_node): Add _InsertInfo 
parameter.
             (_Hashtable<>::_M_insert_unique_node): Add __node_ptr hint 
parameter.
             (_Hashtable<>::_M_emplace_unique(__node_ptr, _Args&&...)): New.
             (_Hashtable<>::_M_emplace_multi(__node_ptr, _Args&&...)): New.
             (_Hashtable<>::_M_emplace()): Adapt to use latters.
             (_Hashtable<>::_M_insert_unique): Add __node_ptr parameter.
             (_Hashtable<>::_M_insert_unique_aux): Add __node_ptr parameter.
             (_Hashtable<>::_M_insert(__node_ptr, _Arg&&, const 
_NodeGenerator&, true_type)):
             Use latter.
             (_Hashtable<>::_M_reinsert_node(const_iterator, node_type&&)):
             Add hint parameter, adapt to use it.
             (_Hashtable<>::_M_reinsert_node_multi): Use hint parameter 
if available to extract
             hash code.
             (_Hashtable<>::_M_compute_hash_code(const _Hash&, 
__node_ptr, __node_ptr,
             const key_type&)): New.
(_Hashtable<>::_M_compute_hash_code<_H2>(const _H2&, __node_ptr, __node_ptr,
             const key_type&)): New.
             (_Hashtable<>::_M_merge_unique): Adapt to use latter. 
Implement small size
             optimization.
             (_Hashtable<>::_M_get_insert_info(const _Hash&, __node_ptr, 
__node_ptr,
             const key_type&)): New.
(_Hashtable<>::_M_get_insert_info<_H2>(const _H2&, __node_ptr, __node_ptr,
             const key_type&)): New.
             (_Hashtable<>::_M_merge_multi): Adapt to use latter.
             * include/bits/unordered_map.h 
(unordered_map<>::insert(node_type&&)): Pass cend as
             hint.
             (unordered_map<>::insert(const_iterator, node_type&&)): 
Adapt to use hint.
             * include/bits/unordered_set.h 
(unordered_set<>::insert(node_type&&)): Pass cend as
             hint.
             (unordered_set<>::insert(const_iterator, node_type&&)): 
Adapt to use hint.
             * 
testsuite/23_containers/unordered_multimap/insert/hint.cc: Adapt 
implementation
             specific tests.

Tested under linux x86_64.

Here is the performance results of this patch, before:

unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
insertions w/o hint      94r   94u    0s 191999712mem    0pf
unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
insertions with perfect hint      95r   94u    0s 191999712mem 0pf
unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
insertions with bad hint      94r   94u    0s 191999712mem    0pf
unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 range 
insertions      88r   88u    0s 191999712mem    0pf
unordered_multiset_hint.cc-thread    hash code cached 2000000 insertions 
w/o hint      91r   91u    0s 191999712mem    0pf
unordered_multiset_hint.cc-thread    hash code cached 2000000 insertions 
with perfect hint      92r   93u    0s 191999712mem 0pf
unordered_multiset_hint.cc-thread    hash code cached 2000000 insertions 
with bad hint      93r   93u    0s 191999712mem    0pf
unordered_multiset_hint.cc-thread    hash code cached 2000000 range 
insertions      88r   88u    0s 191999712mem    0pf
unordered_set_hint.cc-thread    hash code NOT cached 2000000 insertions 
w/o hint      94r   95u    0s 191999712mem    0pf
unordered_set_hint.cc-thread    hash code NOT cached 2000000 insertions 
with perfect hint      94r   95u    0s 191999712mem 0pf
unordered_set_hint.cc-thread    hash code NOT cached 2000000 insertions 
with bad hint      94r   94u    0s 191999712mem    0pf
unordered_set_hint.cc-thread    hash code NOT cached 2000000 range 
insertions      90r   90u    0s 191999712mem    0pf
unordered_set_hint.cc-thread    hash code cached 2000000 insertions w/o 
hint      68r   68u    0s 223999264mem    0pf
unordered_set_hint.cc-thread    hash code cached 2000000 insertions with 
perfect hint      67r   67u    0s 223999264mem 0pf
unordered_set_hint.cc-thread    hash code cached 2000000 insertions with 
bad hint      68r   68u    0s 223999264mem    0pf
unordered_set_hint.cc-thread    hash code cached 2000000 range 
insertions      62r   62u    0s 223999264mem    0pf

After:

unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
insertions w/o hint      93r   93u    0s 191999712mem    0pf
unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
insertions with perfect hint      58r   59u    0s 191999712mem 0pf
unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
insertions with bad hint      93r   94u    0s 191999712mem    0pf
unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 range 
insertions      52r   51u    0s 191999712mem    0pf
unordered_multiset_hint.cc-thread    hash code cached 2000000 insertions 
w/o hint      96r   95u    0s 191999712mem    0pf
unordered_multiset_hint.cc-thread    hash code cached 2000000 insertions 
with perfect hint      61r   62u    0s 191999712mem 0pf
unordered_multiset_hint.cc-thread    hash code cached 2000000 insertions 
with bad hint      94r   94u    0s 191999712mem    0pf
unordered_multiset_hint.cc-thread    hash code cached 2000000 range 
insertions      52r   52u    0s 191999712mem    0pf
unordered_set_hint.cc-thread    hash code NOT cached 2000000 insertions 
w/o hint      96r   95u    0s 191999712mem    0pf
unordered_set_hint.cc-thread    hash code NOT cached 2000000 insertions 
with perfect hint      58r   59u    0s 191999712mem 0pf
unordered_set_hint.cc-thread    hash code NOT cached 2000000 insertions 
with bad hint      94r   94u    0s 191999712mem    0pf
unordered_set_hint.cc-thread    hash code NOT cached 2000000 range 
insertions      52r   52u    0s 191999712mem    0pf
unordered_set_hint.cc-thread    hash code cached 2000000 insertions w/o 
hint      70r   69u    0s 223999264mem    0pf
unordered_set_hint.cc-thread    hash code cached 2000000 insertions with 
perfect hint      67r   67u    0s 223999264mem 0pf
unordered_set_hint.cc-thread    hash code cached 2000000 insertions with 
bad hint      67r   67u    0s 223999264mem    0pf
unordered_set_hint.cc-thread    hash code cached 2000000 range 
insertions      63r   62u    0s 223999264mem    0pf

Ok to commit ?

François

[-- Attachment #2: hashtable.patch --]
[-- Type: text/x-patch, Size: 50199 bytes --]

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 954a1c7a58d..4b84b566aea 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -835,30 +835,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
       // Insert a node at the beginning of a bucket.
       void
-      _M_insert_bucket_begin(size_type __bkt, __node_ptr __node)
+      _M_insert_bucket_begin(__node_base_ptr __hint,
+			     size_type __bkt, __node_ptr __node)
       {
+	__node_base_ptr __prev;
 	if (_M_buckets[__bkt])
 	  {
 	    // Bucket is not empty, we just need to insert the new node
 	    // after the bucket before begin.
-	    __node->_M_nxt = _M_buckets[__bkt]->_M_nxt;
-	    _M_buckets[__bkt]->_M_nxt = __node;
+	    __prev = _M_buckets[__bkt];
 	  }
 	else
 	  {
-	    // The bucket is empty, the new node is inserted at the
-	    // beginning of the singly-linked list and the bucket will
-	    // contain _M_before_begin pointer.
-	    __node->_M_nxt = _M_before_begin._M_nxt;
-	    _M_before_begin._M_nxt = __node;
-
-	    if (__node->_M_nxt)
-	      // We must update former begin bucket that is pointing to
-	      // _M_before_begin.
-	      _M_buckets[_M_bucket_index(*__node->_M_next())] = __node;
-
-	    _M_buckets[__bkt] = &_M_before_begin;
+	    // Based on benches using last node is not interested when hash code is
+	    // cached in case of unique keys.
+	    if (!(__unique_keys::value && __hash_cached::value) &&
+		__hint && !__hint->_M_nxt)
+	      __prev = __hint;
+	    else
+	      {
+		// The bucket is empty, the new node is inserted at the
+		// beginning of the singly-linked list and the bucket will
+		// contain _M_before_begin pointer.
+		__prev = &_M_before_begin;
+
+		if (__prev->_M_nxt)
+		  {
+		    // We must update former begin bucket that is pointing to
+		    // _M_before_begin.
+		    size_type __bb_bkt = _M_bucket_index(
+				*static_cast<__node_ptr>(__prev->_M_nxt));
+		    _M_buckets[__bb_bkt] = __node;
+		  }
+	      }
+
+	    _M_buckets[__bkt] = __prev;
 	  }
+
+	__node->_M_nxt = __prev->_M_nxt;
+	__prev->_M_nxt = __node;
       }
 
       // Remove the bucket first node
@@ -884,44 +899,64 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       __node_base_ptr
       _M_get_previous_node(size_type __bkt, __node_ptr __n);
 
-      pair<__node_ptr, __hash_code>
-      _M_compute_hash_code(__node_ptr __hint, const key_type& __k) const;
+      struct _InsertInfo
+      {
+	__node_base_ptr	_M_prev;
+	__node_ptr	_M_equi_n;
+	__node_ptr	_M_hint;
+	__hash_code	_M_code;
+      };
+
+      _InsertInfo
+      _M_get_insert_info(__node_ptr __hint, const key_type& __k,
+			 __node_ptr __n = nullptr) const;
 
       // Insert node __n with hash code __code, in bucket __bkt if no
       // rehash (assumes no element with same key already present).
       // Takes ownership of __n if insertion succeeds, throws otherwise.
       iterator
-      _M_insert_unique_node(size_type __bkt, __hash_code,
+      _M_insert_unique_node(__node_ptr __hint, size_type __bkt, __hash_code,
 			    __node_ptr __n, size_type __n_elt = 1);
 
-      // Insert node __n with key __k and hash code __code.
+      // Insert node __n after __prev if any.
       // Takes ownership of __n if insertion succeeds, throws otherwise.
       iterator
-      _M_insert_multi_node(__node_ptr __hint,
-			   __hash_code __code, __node_ptr __n);
+      _M_insert_multi_node(const _InsertInfo& __info, __node_ptr __n);
 
       template<typename... _Args>
 	std::pair<iterator, bool>
-	_M_emplace(true_type __uks, _Args&&... __args);
+	_M_emplace_unique(__node_ptr __hint, _Args&&... __args);
 
       template<typename... _Args>
 	iterator
-	_M_emplace(false_type __uks, _Args&&... __args)
-	{ return _M_emplace(cend(), __uks, std::forward<_Args>(__args)...); }
+	_M_emplace_multi(__node_ptr __hint, _Args&&... __args);
+
+      template<typename... _Args>
+	std::pair<iterator, bool>
+	_M_emplace(true_type /*__uks*/, _Args&&... __args)
+	{ return _M_emplace_unique(nullptr, std::forward<_Args>(__args)...); }
+
+      template<typename... _Args>
+	iterator
+	_M_emplace(false_type /*__uks*/, _Args&&... __args)
+	{ return _M_emplace_multi(nullptr, std::forward<_Args>(__args)...); }
 
-      // Emplace with hint, useless when keys are unique.
       template<typename... _Args>
 	iterator
-	_M_emplace(const_iterator, true_type __uks, _Args&&... __args)
-	{ return _M_emplace(__uks, std::forward<_Args>(__args)...).first; }
+	_M_emplace(__node_ptr __hint, true_type /*__uks*/, _Args&&... __args)
+	{
+	  return _M_emplace_unique(__hint,
+				   std::forward<_Args>(__args)...).first;
+	}
 
       template<typename... _Args>
 	iterator
-	_M_emplace(const_iterator, false_type __uks, _Args&&... __args);
+	_M_emplace(__node_ptr __hint, false_type /*__uks*/, _Args&&... __args)
+	{ return _M_emplace_multi(__hint, std::forward<_Args>(__args)...); }
 
       template<typename _Kt, typename _Arg, typename _NodeGenerator>
 	std::pair<iterator, bool>
-	_M_insert_unique(_Kt&&, _Arg&&, const _NodeGenerator&);
+	_M_insert_unique(__node_ptr, _Kt&&, _Arg&&, const _NodeGenerator&);
 
       template<typename _Kt>
 	static __conditional_t<
@@ -941,9 +976,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
       template<typename _Arg, typename _NodeGenerator>
 	std::pair<iterator, bool>
-	_M_insert_unique_aux(_Arg&& __arg, const _NodeGenerator& __node_gen)
+	_M_insert_unique_aux(__node_ptr __hint,
+			     _Arg&& __arg, const _NodeGenerator& __node_gen)
 	{
-	  return _M_insert_unique(
+	  return _M_insert_unique(__hint,
 	    _S_forward_key(_ExtractKey{}(std::forward<_Arg>(__arg))),
 	    std::forward<_Arg>(__arg), __node_gen);
 	}
@@ -955,7 +991,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	{
 	  using __to_value
 	    = __detail::_ConvertToValueType<_ExtractKey, value_type>;
-	  return _M_insert_unique_aux(
+	  return _M_insert_unique_aux(nullptr,
 	    __to_value{}(std::forward<_Arg>(__arg)), __node_gen);
 	}
 
@@ -966,25 +1002,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	{
 	  using __to_value
 	    = __detail::_ConvertToValueType<_ExtractKey, value_type>;
-	  return _M_insert(cend(),
+	  return _M_insert(nullptr,
 	    __to_value{}(std::forward<_Arg>(__arg)), __node_gen, __uks);
 	}
 
-      // Insert with hint, not used when keys are unique.
+      // Insert with hint when keys are unique.
       template<typename _Arg, typename _NodeGenerator>
 	iterator
-	_M_insert(const_iterator, _Arg&& __arg,
-		  const _NodeGenerator& __node_gen, true_type __uks)
+	_M_insert(__node_ptr __hint, _Arg&& __arg,
+		  const _NodeGenerator& __node_gen, true_type /* __uks */)
 	{
-	  return
-	    _M_insert(std::forward<_Arg>(__arg), __node_gen, __uks).first;
+	  using __to_value
+	    = __detail::_ConvertToValueType<_ExtractKey, value_type>;
+	  return _M_insert_unique_aux(__hint,
+	    __to_value{}(std::forward<_Arg>(__arg)), __node_gen).first;
 	}
 
       // Insert with hint when keys are not unique.
       template<typename _Arg, typename _NodeGenerator>
 	iterator
-	_M_insert(const_iterator, _Arg&&,
-		  const _NodeGenerator&, false_type __uks);
+	_M_insert(__node_ptr __hint, _Arg&& __arg,
+		  const _NodeGenerator& __node_gen, false_type /* __uks */);
 
       size_type
       _M_erase(true_type __uks, const key_type&);
@@ -1006,7 +1044,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	iterator
 	emplace_hint(const_iterator __hint, _Args&&... __args)
 	{
-	  return _M_emplace(__hint, __unique_keys{},
+	  return _M_emplace(__hint._M_cur, __unique_keys{},
 			    std::forward<_Args>(__args)...);
 	}
 
@@ -1041,7 +1079,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if __cplusplus > 201402L
       /// Re-insert an extracted node into a container with unique keys.
       insert_return_type
-      _M_reinsert_node(node_type&& __nh)
+      _M_reinsert_node(const_iterator __hint, node_type&& __nh)
       {
 	insert_return_type __ret;
 	if (__nh.empty())
@@ -1062,7 +1100,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	    else
 	      {
 		__ret.position
-		  = _M_insert_unique_node(__bkt, __code, __nh._M_ptr);
+		  = _M_insert_unique_node(__hint._M_cur,
+					  __bkt, __code, __nh._M_ptr);
 		__nh._M_ptr = nullptr;
 		__ret.inserted = true;
 	      }
@@ -1080,11 +1119,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__glibcxx_assert(get_allocator() == __nh.get_allocator());
 
 	const key_type& __k = __nh._M_key();
-	auto __code = this->_M_hash_code(__k);
-	auto __ret
-	  = _M_insert_multi_node(__hint._M_cur, __code, __nh._M_ptr);
+	auto __info = _M_get_insert_info(__hint._M_cur, __k);
+	auto __it = _M_insert_multi_node(__info, __nh._M_ptr);
 	__nh._M_ptr = nullptr;
-	return __ret;
+	return __it;
       }
 
     private:
@@ -1130,6 +1168,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __nh;
       }
 
+    private:
+      __hash_code
+      _M_compute_hash_code(const _Hash&, __node_ptr __n, const key_type&)
+      { return this->_M_hash_code(*__n); }
+
+      template<typename _H2>
+	__hash_code
+	_M_compute_hash_code(const _H2&, __node_ptr, const key_type& __k)
+	{ return this->_M_hash_code(__k); }
+
+    public:
       /// Merge from a compatible container into one with unique keys.
       template<typename _Compatible_Hashtable>
 	void
@@ -1139,18 +1188,48 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	      node_type>, "Node types are compatible");
 	  __glibcxx_assert(get_allocator() == __src.get_allocator());
 
+	  __node_ptr __hint = nullptr;
 	  auto __n_elt = __src.size();
 	  for (auto __i = __src.cbegin(), __end = __src.cend(); __i != __end;)
 	    {
 	      auto __pos = __i++;
 	      const key_type& __k = _ExtractKey{}(*__pos);
-	      __hash_code __code
-		= this->_M_hash_code(__src.hash_function(), *__pos._M_cur);
+	      if (size() <= __small_size_threshold())
+		{
+		  __node_ptr __last_n = nullptr;
+		  bool __found = false;
+		  for (auto __n = _M_begin(); __n;
+		       __last_n = __n, __n = __n->_M_next())
+		    if (this->_M_key_equals(__k, *__n))
+		      {
+			__found = true;
+			break;
+		      }
+
+		  if (__found)
+		    {
+		      if (__n_elt != 1)
+			--__n_elt;
+		      continue;
+		    }
+		  else if (!__hint)
+		    __hint = __last_n;
+		}
+
+	      __hash_code __code =
+		_M_compute_hash_code(__src.hash_function(), __pos._M_cur, __k);
 	      size_type __bkt = _M_bucket_index(__code);
-	      if (_M_find_node(__bkt, __k, __code) == nullptr)
+	      if (size() <= __small_size_threshold()
+		  || _M_find_node(__bkt, __k, __code) == nullptr)
 		{
 		  auto __nh = __src.extract(__pos);
-		  _M_insert_unique_node(__bkt, __code, __nh._M_ptr, __n_elt);
+		  auto __pos = _M_insert_unique_node(
+		    __hint, __bkt, __code, __nh._M_ptr, __n_elt)._M_cur;
+
+		  // Keep only the last node as an insertion hint.
+		  if (!__pos->_M_nxt)
+		    __hint = __pos;
+
 		  __nh._M_ptr = nullptr;
 		  __n_elt = 1;
 		}
@@ -1159,6 +1238,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	    }
 	}
 
+    private:
+      _InsertInfo
+      _M_get_insert_info(const _Hash&, __node_ptr __hint, __node_ptr __n,
+			 const key_type& __k)
+      {
+	// Same _Hash functor, we can reuse __n hash code.
+	return _M_get_insert_info(__hint, __k, __n);
+      }
+
+      template<typename _H2>
+	_InsertInfo
+	_M_get_insert_info(const _H2&, __node_ptr __hint, __node_ptr,
+			   const key_type& __k)
+	{ return _M_get_insert_info(__hint, __k); }
+
+    public:
       /// Merge from a compatible container into one with equivalent keys.
       template<typename _Compatible_Hashtable>
 	void
@@ -1173,10 +1268,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  for (auto __i = __src.cbegin(), __end = __src.cend(); __i != __end;)
 	    {
 	      auto __pos = __i++;
-	      __hash_code __code
-		= this->_M_hash_code(__src.hash_function(), *__pos._M_cur);
+	      const key_type& __k = _ExtractKey{}(*__pos);
+	      auto __info = _M_get_insert_info(__src.hash_function(),
+					       __hint, __pos._M_cur, __k);
 	      auto __nh = __src.extract(__pos);
-	      __hint = _M_insert_multi_node(__hint, __code, __nh._M_ptr)._M_cur;
+	      auto __inserted_n =
+		_M_insert_multi_node(__info, __nh._M_ptr)._M_cur;
+
+	      // Keep only the last node as an insertion hint.
+	      if (!__inserted_n->_M_nxt)
+		__hint = __inserted_n;
+
 	      __nh._M_ptr = nullptr;
 	    }
 	}
@@ -1252,9 +1354,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	    _M_bucket_count = __bkt_count;
 	  }
 
+	__node_ptr __hint = nullptr;
 	__alloc_node_gen_t __node_gen(*this);
 	for (; __f != __l; ++__f)
-	  _M_insert(*__f, __node_gen, __uks);
+	  {
+	    __node_ptr __insert_pos
+	      = _M_insert(__hint, *__f, __node_gen, __uks)._M_cur;
+	    if (!__insert_pos->_M_nxt)
+	      __hint = __insert_pos;
+	  }
       }
 
   template<typename _Key, typename _Value, typename _Alloc,
@@ -1681,9 +1789,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     {
       if (size() <= __small_size_threshold())
 	{
-	  for (auto __it = _M_begin(); __it; __it = __it->_M_next())
-	    if (this->_M_key_equals(__k, *__it))
-	      return iterator(__it);
+	  for (auto __n = _M_begin(); __n; __n = __n->_M_next())
+	    if (this->_M_key_equals(__k, *__n))
+	      return iterator(__n);
 	  return end();
 	}
 
@@ -1704,9 +1812,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     {
       if (size() <= __small_size_threshold())
 	{
-	  for (auto __it = _M_begin(); __it; __it = __it->_M_next())
-	    if (this->_M_key_equals(__k, *__it))
-	      return const_iterator(__it);
+	  for (auto __n = _M_begin(); __n; __n = __n->_M_next())
+	    if (this->_M_key_equals(__k, *__n))
+	      return const_iterator(__n);
 	  return end();
 	}
 
@@ -2036,7 +2144,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       auto
       _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 		 _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-      _M_emplace(true_type /* __uks */, _Args&&... __args)
+      _M_emplace_unique(__node_ptr __hint, _Args&&... __args)
       -> pair<iterator, bool>
       {
 	// First build the node to get access to the hash code
@@ -2044,10 +2152,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	const key_type& __k = _ExtractKey{}(__node._M_node->_M_v());
 	if (size() <= __small_size_threshold())
 	  {
-	    for (auto __it = _M_begin(); __it; __it = __it->_M_next())
-	      if (this->_M_key_equals(__k, *__it))
+	    __node_ptr __last_n = nullptr;
+	    for (auto __n = _M_begin(); __n;
+		 __last_n = __n, __n = __n->_M_next())
+	      if (this->_M_key_equals(__k, *__n))
 		// There is already an equivalent node, no insertion
-		return { iterator(__it), false };
+		return { iterator(__n), false };
+
+	    __hint = __last_n;
 	  }
 
 	__hash_code __code = this->_M_hash_code(__k);
@@ -2058,7 +2170,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	    return { iterator(__p), false };
 
 	// Insert the node
-	auto __pos = _M_insert_unique_node(__bkt, __code, __node._M_node);
+	auto __pos = _M_insert_unique_node(__hint, __bkt, __code,
+					   __node._M_node);
 	__node._M_node = nullptr;
 	return { __pos, true };
       }
@@ -2071,17 +2184,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       auto
       _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 		 _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-      _M_emplace(const_iterator __hint, false_type /* __uks */,
-		 _Args&&... __args)
+      _M_emplace_multi(__node_ptr __hint, _Args&&... __args)
       -> iterator
       {
 	// First build the node to get its hash code.
 	_Scoped_node __node { this, std::forward<_Args>(__args)...  };
 	const key_type& __k = _ExtractKey{}(__node._M_node->_M_v());
 
-	auto __res = this->_M_compute_hash_code(__hint._M_cur, __k);
-	auto __pos
-	  = _M_insert_multi_node(__res.first, __res.second, __node._M_node);
+	auto __info = _M_get_insert_info(__hint, __k);
+	auto __pos = _M_insert_multi_node(__info, __node._M_node);
 	__node._M_node = nullptr;
 	return __pos;
       }
@@ -2093,26 +2204,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     auto
     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-    _M_compute_hash_code(__node_ptr __hint, const key_type& __k) const
-    -> pair<__node_ptr, __hash_code>
+    _M_get_insert_info(__node_ptr __hint, const key_type& __k,
+		       __node_ptr __equi_n) const
+    -> _InsertInfo
     {
       if (size() <= __small_size_threshold())
 	{
-	  if (__hint)
-	    {
-	      for (auto __it = __hint; __it; __it = __it->_M_next())
-		if (this->_M_key_equals(__k, *__it))
-		  return { __it, this->_M_hash_code(*__it) };
-	    }
-
-	  for (auto __it = _M_begin(); __it != __hint; __it = __it->_M_next())
-	    if (this->_M_key_equals(__k, *__it))
-	      return { __it, this->_M_hash_code(*__it) };
+	  __node_ptr __last_n = nullptr;
+	  __node_base_ptr __prev
+	    = const_cast<__node_base_ptr>(&_M_before_begin);
+	  for (auto __n = static_cast<__node_ptr>(__prev->_M_nxt); __n;
+	       __prev = __last_n = __n, __n = __n->_M_next())
+	    if (this->_M_key_equals(__k, *__n))
+	      return
+		{
+		  __prev, __n, __hint,
+		  __hash_cached::value ? this->_M_hash_code(*__n) : 0
+		};
 
-	  __hint = nullptr;
+	  __hint = __last_n;
 	}
 
-      return { __hint, this->_M_hash_code(__k) };
+      return
+	{
+	  nullptr, nullptr, __hint,
+	  __equi_n ? this->_M_hash_code(*__equi_n) : this->_M_hash_code(__k)
+	};
     }
 
   template<typename _Key, typename _Value, typename _Alloc,
@@ -2122,7 +2239,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     auto
     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-    _M_insert_unique_node(size_type __bkt, __hash_code __code,
+    _M_insert_unique_node(__node_ptr __hint,
+			  size_type __bkt, __hash_code __code,
 			  __node_ptr __node, size_type __n_elt)
     -> iterator
     {
@@ -2140,7 +2258,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       this->_M_store_code(*__node, __code);
 
       // Always insert at the beginning of the bucket.
-      _M_insert_bucket_begin(__bkt, __node);
+      _M_insert_bucket_begin(__hint, __bkt, __node);
       ++_M_element_count;
       return iterator(__node);
     }
@@ -2152,8 +2270,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     auto
     _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	       _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-    _M_insert_multi_node(__node_ptr __hint,
-			 __hash_code __code, __node_ptr __node)
+    _M_insert_multi_node(const _InsertInfo& __info, __node_ptr __node)
     -> iterator
     {
       const __rehash_state& __saved_state = _M_rehash_policy._M_state();
@@ -2163,39 +2280,46 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       if (__do_rehash.first)
 	_M_rehash(__do_rehash.second, __saved_state);
 
+      auto __code = __info._M_code;
       this->_M_store_code(*__node, __code);
       const key_type& __k = _ExtractKey{}(__node->_M_v());
-      size_type __bkt = _M_bucket_index(__code);
+      auto __prev = __info._M_prev;
+      const bool __has_prev = __prev != nullptr;
+      size_type __bkt;
 
       // Find the node before an equivalent one or use hint if it exists and
       // if it is equivalent.
-      __node_base_ptr __prev
-	= __builtin_expect(__hint != nullptr, false)
-	  && this->_M_equals(__k, __code, *__hint)
-	    ? __hint
-	    : _M_find_before_node(__bkt, __k, __code);
+      if (!__has_prev)
+	{
+	  __bkt = _M_bucket_index(__code);
+	  __prev = _M_find_before_node(__bkt, __k, __code);
+	}
 
       if (__prev)
 	{
 	  // Insert after the node before the equivalent one.
 	  __node->_M_nxt = __prev->_M_nxt;
 	  __prev->_M_nxt = __node;
-	  if (__builtin_expect(__prev == __hint, false))
-	    // hint might be the last bucket node, in this case we need to
-	    // update next bucket.
-	    if (__node->_M_nxt
-		&& !this->_M_equals(__k, __code, *__node->_M_next()))
-	      {
-		size_type __next_bkt = _M_bucket_index(*__node->_M_next());
-		if (__next_bkt != __bkt)
-		  _M_buckets[__next_bkt] = __node;
-	      }
+
+	  // __hint might be the last bucket node, in this case we need to
+	  // update next bucket.
+	  if (__has_prev && __node->_M_nxt != __info._M_equi_n)
+	    {
+	      const auto __nxt = __node->_M_next();
+	      if (!this->_M_equals(__k, __code, *__nxt))
+		{
+		  size_type __next_bkt = _M_bucket_index(*__nxt);
+		  if (__next_bkt != _M_bucket_index(__code))
+		    _M_buckets[__next_bkt] = __node;
+		}
+	    }
 	}
       else
 	// The inserted node has no equivalent in the hashtable. We must
 	// insert the new node at the beginning of the bucket to preserve
 	// equivalent elements' relative positions.
-	_M_insert_bucket_begin(__bkt, __node);
+	_M_insert_bucket_begin(__info._M_hint, __bkt, __node);
+
       ++_M_element_count;
       return iterator(__node);
     }
@@ -2209,14 +2333,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       auto
       _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 		 _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-      _M_insert_unique(_Kt&& __k, _Arg&& __v,
+      _M_insert_unique(__node_ptr __hint, _Kt&& __k, _Arg&& __v,
 		       const _NodeGenerator& __node_gen)
       -> pair<iterator, bool>
       {
 	if (size() <= __small_size_threshold())
-	  for (auto __it = _M_begin(); __it; __it = __it->_M_next())
-	    if (this->_M_key_equals_tr(__k, *__it))
-	      return { iterator(__it), false };
+	  {
+	    __node_ptr __last_n = nullptr;
+	    for (auto __n = _M_begin(); __n;
+		 __last_n = __n, __n = __n->_M_next())
+	      if (this->_M_key_equals_tr(__k, *__n))
+		return { iterator(__n), false };
+
+	    __hint = __last_n;
+	  }
 
 	__hash_code __code = this->_M_hash_code_tr(__k);
 	size_type __bkt = _M_bucket_index(__code);
@@ -2231,8 +2361,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 				     __node_gen),
 	  this
 	};
+
 	auto __pos
-	  = _M_insert_unique_node(__bkt, __code, __node._M_node);
+	  = _M_insert_unique_node(__hint, __bkt, __code, __node._M_node);
 	__node._M_node = nullptr;
 	return { __pos, true };
       }
@@ -2246,20 +2377,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       auto
       _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 		 _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-      _M_insert(const_iterator __hint, _Arg&& __v,
-		const _NodeGenerator& __node_gen,
+      _M_insert(__node_ptr __hint, _Arg&& __v, const _NodeGenerator& __node_gen,
 		false_type /* __uks */)
       -> iterator
       {
 	// First allocate new node so that we don't do anything if it throws.
-	_Scoped_node __node{ __node_gen(std::forward<_Arg>(__v)), this };
+	_Scoped_node __node { __node_gen(std::forward<_Arg>(__v)), this };
 
 	// Second compute the hash code so that we don't rehash if it throws.
-	auto __res = this->_M_compute_hash_code(
-	  __hint._M_cur, _ExtractKey{}(__node._M_node->_M_v()));
+	auto __info =
+	  _M_get_insert_info(__hint, _ExtractKey{}(__node._M_node->_M_v()));
+	auto __pos = _M_insert_multi_node(__info, __node._M_node);
 
-	auto __pos
-	  = _M_insert_multi_node(__res.first, __res.second, __node._M_node);
 	__node._M_node = nullptr;
 	return __pos;
       }
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 347d468ea86..484d7a9eed2 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -818,7 +818,7 @@ namespace __detail
 	std::tuple<>()
       };
       auto __pos
-	= __h->_M_insert_unique_node(__bkt, __code, __node._M_node);
+	= __h->_M_insert_unique_node(nullptr, __bkt, __code, __node._M_node);
       __node._M_node = nullptr;
       return __pos->second;
     }
@@ -845,7 +845,7 @@ namespace __detail
 	std::tuple<>()
       };
       auto __pos
-	= __h->_M_insert_unique_node(__bkt, __code, __node._M_node);
+	= __h->_M_insert_unique_node(nullptr, __bkt, __code, __node._M_node);
       __node._M_node = nullptr;
       return __pos->second;
     }
@@ -933,13 +933,13 @@ namespace __detail
       insert(const_iterator __hint, const value_type& __v)
       {
 	__hashtable& __h = _M_conjure_hashtable();
-	__node_gen_type __node_gen(__h);	
-	return __h._M_insert(__hint, __v, __node_gen, __unique_keys{});
+	__node_gen_type __node_gen(__h);
+	return __h._M_insert(__hint._M_cur, __v, __node_gen, __unique_keys{});
       }
 
       template<typename _KType, typename... _Args>
 	std::pair<iterator, bool>
-	try_emplace(const_iterator, _KType&& __k, _Args&&... __args)
+	try_emplace(const_iterator __hint, _KType&& __k, _Args&&... __args)
 	{
 	  __hashtable& __h = _M_conjure_hashtable();
 	  auto __code = __h._M_hash_code(__k);
@@ -954,7 +954,8 @@ namespace __detail
 	    std::forward_as_tuple(std::forward<_Args>(__args)...)
 	    };
 	  auto __it
-	    = __h._M_insert_unique_node(__bkt, __code, __node._M_node);
+	    = __h._M_insert_unique_node(__hint._M_cur,
+					__bkt, __code, __node._M_node);
 	  __node._M_node = nullptr;
 	  return { __it, true };
 	}
@@ -985,9 +986,16 @@ namespace __detail
       _M_insert_range(_InputIterator __first, _InputIterator __last,
 		      const _NodeGetter& __node_gen, true_type __uks)
       {
+	using __node_ptr = typename __hashtable::__node_ptr;
 	__hashtable& __h = _M_conjure_hashtable();
+	__node_ptr __hint = nullptr;
 	for (; __first != __last; ++__first)
-	  __h._M_insert(*__first, __node_gen, __uks);
+	  {
+	    __node_ptr __insert_pos
+	      = __h._M_insert(__hint, *__first, __node_gen, __uks)._M_cur;
+	    if (!__insert_pos->_M_nxt)
+	      __hint = __insert_pos;
+	  }
       }
 
   template<typename _Key, typename _Value, typename _Alloc,
@@ -1002,6 +1010,7 @@ namespace __detail
       _M_insert_range(_InputIterator __first, _InputIterator __last,
 		      const _NodeGetter& __node_gen, false_type __uks)
       {
+	using __node_ptr = typename __hashtable::__node_ptr;
 	using __rehash_type = typename __hashtable::__rehash_type;
 	using __rehash_state = typename __hashtable::__rehash_state;
 	using pair_type = std::pair<bool, std::size_t>;
@@ -1020,8 +1029,14 @@ namespace __detail
 	if (__do_rehash.first)
 	  __h._M_rehash(__do_rehash.second, __saved_state);
 
+	__node_ptr __hint = nullptr;
 	for (; __first != __last; ++__first)
-	  __h._M_insert(*__first, __node_gen, __uks);
+	  {
+	    __node_ptr __insert_pos
+	      = __h._M_insert(__hint, *__first, __node_gen, __uks)._M_cur;
+	    if (!__insert_pos->_M_nxt)
+	      __hint = __insert_pos;
+	  }
       }
 
   /**
@@ -1076,7 +1091,7 @@ namespace __detail
       {
 	__hashtable& __h = this->_M_conjure_hashtable();
 	__node_gen_type __node_gen(__h);
-	return __h._M_insert(__hint, std::move(__v), __node_gen,
+	return __h._M_insert(__hint._M_cur, std::move(__v), __node_gen,
 			     __unique_keys{});
       }
     };
@@ -1126,7 +1141,7 @@ namespace __detail
 	insert(const_iterator __hint, _Pair&& __v)
 	{
 	  __hashtable& __h = this->_M_conjure_hashtable();
-	  return __h._M_emplace(__hint, __unique_keys{},
+	  return __h._M_emplace(__hint._M_cur, __unique_keys{},
 				std::forward<_Pair>(__v));
 	}
    };
@@ -1315,19 +1330,6 @@ namespace __detail
 	  return _M_hash()(__k);
 	}
 
-      __hash_code
-      _M_hash_code(const _Hash&,
-		   const _Hash_node_value<_Value, true>& __n) const
-      { return __n._M_hash_code; }
-
-      // Compute hash code using _Hash as __n _M_hash_code, if present, was
-      // computed using _H2.
-      template<typename _H2>
-	__hash_code
-	_M_hash_code(const _H2&,
-		const _Hash_node_value<_Value, __cache_hash_code>& __n) const
-	{ return _M_hash_code(_ExtractKey{}(__n._M_v())); }
-
       __hash_code
       _M_hash_code(const _Hash_node_value<_Value, false>& __n) const
       { return _M_hash_code(_ExtractKey{}(__n._M_v())); }
@@ -1999,19 +2001,20 @@ namespace __detail
       _Hashtable_alloc<_NodeAlloc>::_M_allocate_node(_Args&&... __args)
       -> __node_ptr
       {
-	auto __nptr = __node_alloc_traits::allocate(_M_node_allocator(), 1);
+	auto& __alloc = _M_node_allocator();
+	auto __nptr = __node_alloc_traits::allocate(__alloc, 1);
+
 	__node_ptr __n = std::__to_address(__nptr);
 	__try
 	  {
 	    ::new ((void*)__n) __node_type;
-	    __node_alloc_traits::construct(_M_node_allocator(),
-					   __n->_M_valptr(),
+	    __node_alloc_traits::construct(__alloc, __n->_M_valptr(),
 					   std::forward<_Args>(__args)...);
 	    return __n;
 	  }
 	__catch(...)
 	  {
-	    __node_alloc_traits::deallocate(_M_node_allocator(), __nptr, 1);
+	    __node_alloc_traits::deallocate(__alloc, __nptr, 1);
 	    __throw_exception_again;
 	  }
       }
diff --git a/libstdc++-v3/include/bits/unordered_map.h b/libstdc++-v3/include/bits/unordered_map.h
index 2f63bc5f1fa..d6becb83ee8 100644
--- a/libstdc++-v3/include/bits/unordered_map.h
+++ b/libstdc++-v3/include/bits/unordered_map.h
@@ -443,12 +443,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
       /// Re-insert an extracted node.
       insert_return_type
       insert(node_type&& __nh)
-      { return _M_h._M_reinsert_node(std::move(__nh)); }
+      { return _M_h._M_reinsert_node(cend(), std::move(__nh)); }
 
       /// Re-insert an extracted node.
       iterator
-      insert(const_iterator, node_type&& __nh)
-      { return _M_h._M_reinsert_node(std::move(__nh)).position; }
+      insert(const_iterator __hint, node_type&& __nh)
+      { return _M_h._M_reinsert_node(__hint, std::move(__nh)).position; }
 
 #define __cpp_lib_unordered_map_try_emplace 201411L
       /**
diff --git a/libstdc++-v3/include/bits/unordered_set.h b/libstdc++-v3/include/bits/unordered_set.h
index f3b0c078baa..915cd4fb700 100644
--- a/libstdc++-v3/include/bits/unordered_set.h
+++ b/libstdc++-v3/include/bits/unordered_set.h
@@ -504,12 +504,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
       /// Re-insert an extracted node.
       insert_return_type
       insert(node_type&& __nh)
-      { return _M_h._M_reinsert_node(std::move(__nh)); }
+      { return _M_h._M_reinsert_node(cend(), std::move(__nh)); }
 
       /// Re-insert an extracted node.
       iterator
-      insert(const_iterator, node_type&& __nh)
-      { return _M_h._M_reinsert_node(std::move(__nh)).position; }
+      insert(const_iterator __hint, node_type&& __nh)
+      { return _M_h._M_reinsert_node(__hint, std::move(__nh)).position; }
 #endif // C++17
 
       ///@{
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_multimap/insert/hint.cc b/libstdc++-v3/testsuite/23_containers/unordered_multimap/insert/hint.cc
index d0ed60c6799..1db71b8f30b 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_multimap/insert/hint.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_multimap/insert/hint.cc
@@ -40,11 +40,12 @@ void test01()
   VERIFY( it2 != m.end() );
   VERIFY( it2 != it1 );
   VERIFY( it2->second == 2 );
-  VERIFY( it2 == std::next(it1) );
+  VERIFY( std::next(it2) == it1 );
 
+  it1 = it2;
   Pair p(0, 1);
   it2 = m.insert(it1, p);
-  VERIFY( it2 == std::next(it1) );
+  VERIFY( std::next(it2) == it1 );
 }
 
 struct hasher
@@ -71,13 +72,13 @@ void test02()
   VERIFY( m.bucket_size(m.bucket(it3->first)) == 1 );
 
   auto it4 = m.insert(it1, Pair(0, 1));
-  VERIFY( it4 == std::next(it1) );
+  VERIFY( std::next(it4) == it1 );
   VERIFY( m.bucket_size(m.bucket(it1->first)) == 3 );
   VERIFY( m.bucket_size(m.bucket(it3->first)) == 1 );
 
   Pair p(1, 1);
   it4 = m.insert(it2, p);
-  VERIFY( it4 == std::next(it2) );
+  VERIFY( std::next(it4) == it2 );
   VERIFY( m.bucket_size(m.bucket(it1->first)) == 4 );
   auto range = m.equal_range(0);
   VERIFY( std::distance(range.first, range.second) == 2 );
@@ -104,11 +105,12 @@ void test03()
   VERIFY( it2 != m.end() );
   VERIFY( it2 != it1 );
   VERIFY( it2->second == 2 );
-  VERIFY( it2 == std::next(it1) );
+  VERIFY( std::next(it2) == it1 );
 
+  it1 = it2;
   Pair p(0, 1);
   it2 = m.emplace_hint(it1, p);
-  VERIFY( it2 == std::next(it1) );
+  VERIFY( std::next(it2) == it1 );
 }
 
 int main()
diff --git a/libstdc++-v3/testsuite/performance/23_containers/insert/unordered_multiset_hint.cc b/libstdc++-v3/testsuite/performance/23_containers/insert/unordered_multiset_hint.cc
index 125a8a615b7..1e2502209f1 100644
--- a/libstdc++-v3/testsuite/performance/23_containers/insert/unordered_multiset_hint.cc
+++ b/libstdc++-v3/testsuite/performance/23_containers/insert/unordered_multiset_hint.cc
@@ -27,310 +27,170 @@
 namespace
 {
   const int sz = 2000000;
-  const std::string pattern = "test string #";
-  const int nb_copies = 100;
+  const std::string pattern = "long enough test string #";
 
-  // Special std::string hasher based on std::hash<std::string> but not tag as
-  // slow so that hash code is not cached. It is easier to show impact of
-  // hinting in this context.
+  // Perfect std::string hasher knowing how string instances have been
+  // generated. It is not tag as slow so that hash code is not cached.
+  // It is easier to show impact of hint in this context.
   struct hasher
   {
     std::size_t
     operator()(const std::string& str) const noexcept
     {
-      //std::istringstream istr(str.substr(pattern.size()));
-      //std::size_t str_index;
-      //istr >> str_index;
-      //return str_index;
       std::hash<std::string> std_hasher;
-      return std_hasher(str);
+      auto hash = std_hasher(pattern);
+      std::istringstream isstr(str.substr(pattern.length()));
+      int idx;
+      isstr >> idx;
+      return (std::size_t)(hash / sz) * sz + idx;
     }
   };
 
-  using ums_t = std::unordered_multiset<std::string, hasher>;
-
-  void
-  insert_with_perfect_hint1(const std::vector<std::string>& strs,
-			    ums_t& ms)
-  {
-    std::vector<typename ums_t::iterator> hints;
-    hints.reserve(strs.size());
-    for (auto str : strs)
-      hints.push_back(ms.insert(str));
-
-    for (int j = 1; j != nb_copies; ++j)
-      for (std::size_t i = 0; i != strs.size(); ++i)
-	ms.insert(hints[i], strs[i]);
-  }
-
-  void
-  insert_with_perfect_hint2(const std::vector<std::string>& strs,
-			    ums_t& ms)
-  {
-    std::vector<typename ums_t::iterator> hints;
-    hints.reserve(strs.size());
-    for (auto str : strs)
-      hints.push_back(ms.insert(str));
-
-    for (std::size_t i = 0; i != strs.size(); ++i)
-      for (int j = 1; j != nb_copies; ++j)
-	ms.insert(hints[i], strs[i]);
-  }
-
-  // Always insert with the result of the previous insertion. The result of
-  // the previous insertion will never be followed by an equivalent node
-  // resulting in a re-computation of its hash code which is expensive.
-  void
-  insert_with_good_hint(const std::vector<std::string>& strs,
-			ums_t& ms)
-  {
-    std::vector<typename ums_t::iterator> hints;
-    hints.reserve(strs.size());
-    for (auto str : strs)
-      hints.push_back(ms.insert(str));
-
-    for (int j = 1; j != nb_copies; ++j)
-      for (std::size_t i = 0; i != strs.size(); ++i)
-	hints[i] = ms.insert(hints[i], strs[i]);
-  }
-
-  // Note that the following use case is particularly bad especially compared to
-  // the solution without hint because without hint the first insertion will put
-  // it first in the bucket and following insertions will detect it and insert
-  // just before. By giving a hint insertion will be done just after forcing to
-  // check if it has no impact on the following bucket.
-  void
-  insert_with_bad_hint(const std::vector<std::string>& strs,
-		       ums_t& ms)
-  {
-    std::vector<typename ums_t::iterator> hints;
-    hints.reserve(strs.size());
-    for (auto str : strs)
-      hints.push_back(ms.insert(str));
-
-    for (std::size_t i = 0; i != strs.size(); ++i)
-      for (int j = 1; j != nb_copies; ++j)
-	hints[i] = ms.insert(hints[i], strs[i]);
-  }
-
-  void
-  insert_without_hint1(const std::vector<std::string>& strs,
-		       ums_t& ms)
-  {
-    std::vector<typename ums_t::iterator> hints;
-    hints.reserve(strs.size());
-    for (auto str : strs)
-      hints.push_back(ms.insert(str));
-
-    for (int j = 1; j != nb_copies; ++j)
-      for (std::size_t i = 0; i != strs.size(); ++i)
-	hints[i] = ms.insert(strs[i]);
-  }
-
-  // This version is the best one if you insert all equivalent elements at the
-  // same time. It demonstrates that most of the time it is better not to give
-  // any hint unless you have written a benchmark for your application showing
-  // that it does have a positive effect.
-  void
-  insert_without_hint2(const std::vector<std::string>& strs,
-		       ums_t& ms)
+  // Like previous hasher but tagged as slow.
+  struct slow_hasher
   {
-    std::vector<typename ums_t::iterator> hints;
-    hints.reserve(strs.size());
-    for (auto str : strs)
-      hints.push_back(ms.insert(str));
-
-    for (std::size_t i = 0; i != strs.size(); ++i)
-      for (int j = 1; j != nb_copies; ++j)
-	hints[i] = ms.insert(strs[i]);
-  }
+    std::size_t
+    operator()(const std::string& str) const noexcept
+    { return hasher{}(str); }
+  };
 
-  void
-  insert_with_any_hint1(const std::vector<std::string>& strs,
-			ums_t& ms)
-  {
-    std::vector<typename ums_t::iterator> hints;
-    hints.reserve(strs.size());
-    for (auto str : strs)
-      hints.push_back(ms.insert(ms.begin(), str));
+  template<typename _Hash>
+    using ums_t = std::unordered_multiset<std::string, _Hash>;
 
-    std::size_t offset = strs.size() / 2;
-    for (int j = 1; j != nb_copies; ++j)
-      for (std::size_t i = 0; i != strs.size(); ++i)
+  template<typename _Hash>
+    void
+    insert_with_perfect_hint(const std::vector<std::string>& strs,
+			     ums_t<_Hash>& ms)
+    {
+      auto hint = ms.end();
+      for (auto str : strs)
 	{
-	  ms.insert(hints[(i + offset) % hints.size()], strs[i]);
-	  ++offset;
+	  auto insert_pos = ms.insert(hint, str);
+	  if (std::next(insert_pos) == ms.end())
+	    hint = insert_pos;
 	}
-  }
-
-  void
-  insert_with_any_hint2(const std::vector<std::string>& strs,
-			ums_t& ms)
-  {
-    std::vector<typename ums_t::iterator> hints;
-    hints.reserve(strs.size());
-    for (auto str : strs)
-      hints.push_back(ms.insert(ms.begin(), str));
+    }
 
-    std::size_t offset = strs.size() / 2;
-    for (std::size_t i = 0; i != strs.size(); ++i)
-      for (int j = 1; j != nb_copies; ++j)
+  template<typename _Hash>
+    void
+    insert_with_bad_hint(const std::vector<std::string>& strs,
+			 ums_t<_Hash>& ms)
+    {
+      auto hint = ms.begin();
+      for (auto str : strs)
 	{
-	  ms.insert(hints[(i + offset) % hints.size()], strs[i]);
-	  ++offset;
+	  auto insert_pos = ms.insert(hint, str);
+	  if (std::next(insert_pos) == hint)
+	    hint = ms.begin();
 	}
-  }
-}
-
-int main()
-{
-  using namespace __gnu_test;
-
-  const int nb_iter = 10;
-
-  std::vector<std::string> strs;
-  strs.reserve(sz / nb_copies);
+    }
 
-  for (int i = 0; i != sz / nb_copies; ++i)
+  template<typename _Hash>
+    void
+    insert_without_hint(const std::vector<std::string>& strs,
+			ums_t<_Hash>& ms)
     {
-      std::ostringstream osstr;
-      osstr << pattern << i;
-      strs.push_back(osstr.str());
+      for (auto str : strs)
+	ms.insert(str);
     }
 
-  ums_t ms;
-  // Use a large load factor to make the context ideal for using hint because we
-  // will have many elements per bucket.
-  ms.max_load_factor(10.0f);
-  ms.reserve(sz);
+  template<typename _Hash>
+    void
+    insert_range(const std::vector<std::string>& strs,
+		 ums_t<_Hash>& ms)
+    { ms.insert(strs.begin(), strs.end()); }
+}
 
-  // Warm up.
+template<typename _Hash>
+  void bench(const char* ctx)
   {
-    for (auto str : strs)
-      for (int j = 0; j != nb_copies; ++j)
-	ms.insert(str);
-  }
-
-  time_counter time_no_hint1, time_any_hint1, time_bad_hint, time_perfect_hint1;
-  time_counter time_no_hint2, time_any_hint2, time_good_hint, time_perfect_hint2;
-  resource_counter resource_no_hint1, resource_any_hint1, resource_bad_hint,
-    resource_perfect_hint1;
-  resource_counter resource_no_hint2, resource_any_hint2, resource_good_hint,
-    resource_perfect_hint2;
-
-  for (int i = 0; i != nb_iter; ++i)
-    {
-      // Bad hint
-      {
-	ms.clear();
-	start_counters(time_bad_hint, resource_bad_hint);
-	insert_with_bad_hint(strs, ms);
-	stop_counters(time_bad_hint, resource_bad_hint);
-      }
+    using namespace __gnu_test;
 
-      // No hint
-      {
-	ms.clear();
-	start_counters(time_no_hint1, resource_no_hint1);
-	insert_without_hint1(strs, ms);
-	stop_counters(time_no_hint1, resource_no_hint1);
-      }
-
-      // Any hint
-      {
-	ms.clear();
-	start_counters(time_any_hint1, resource_any_hint1);
-	insert_with_any_hint1(strs, ms);
-	stop_counters(time_any_hint1, resource_any_hint1);
-      }
+    const int nb_iter = 10;
 
-      // Good hint
-      {
-	ms.clear();
-	start_counters(time_good_hint, resource_good_hint);
-	insert_with_good_hint(strs, ms);
-	stop_counters(time_good_hint, resource_good_hint);
-      }
-
-      // No hint
-      {
-	ms.clear();
-	start_counters(time_no_hint2, resource_no_hint2);
-	insert_without_hint2(strs, ms);
-	stop_counters(time_no_hint2, resource_no_hint2);
-      }
+    std::vector<std::string> strs;
+    strs.reserve(sz);
 
-      // Perfect hint
+    for (int i = 0; i != sz; ++i)
       {
-	ms.clear();
-	start_counters(time_perfect_hint2, resource_perfect_hint2);
-	insert_with_perfect_hint2(strs, ms);
-	stop_counters(time_perfect_hint2, resource_perfect_hint2);
+	std::ostringstream osstr;
+	osstr << pattern << i;
+	strs.push_back(osstr.str());
       }
 
-      // Any hint
-      {
-	ms.clear();
-	start_counters(time_any_hint2, resource_any_hint2);
-	insert_with_any_hint2(strs, ms);
-	stop_counters(time_any_hint2, resource_any_hint2);
-      }
+    ums_t<_Hash> ms;
+    ms.reserve(sz);
 
-      // Perfect hint
-      {
-	ms.clear();
-	start_counters(time_perfect_hint1, resource_perfect_hint1);
-	insert_with_perfect_hint1(strs, ms);
-	stop_counters(time_perfect_hint1, resource_perfect_hint1);
-      }
+    // Warm up.
+    {
+      for (auto str : strs)
+	ms.insert(str);
     }
 
-  std::ostringstream ostr;
-  ostr << "unordered_set " << nb_copies << " X " << sz / nb_copies
-       << " insertions w/o hint";
-  report_performance(__FILE__, ostr.str().c_str(),
-		     time_no_hint1, resource_no_hint1);
-
-  ostr.str("");
-  ostr << "unordered_set " << nb_copies << " X " << sz / nb_copies
-       << " insertions with any hint";
-  report_performance(__FILE__, ostr.str().c_str(),
-		     time_any_hint1, resource_any_hint1);
+    time_counter time_no_hint, time_bad_hint, time_perfect_hint,
+      time_range;
+    resource_counter resource_no_hint, resource_bad_hint, resource_perfect_hint,
+      resource_range;
 
-  ostr.str("");
-  ostr << "unordered_set " << nb_copies << " X " << sz / nb_copies
-       << " insertions with good hint";
-  report_performance(__FILE__, ostr.str().c_str(),
-		     time_good_hint, resource_good_hint);
+    for (int i = 0; i != nb_iter; ++i)
+      {
+	// Bad hint
+	{
+	  ms.clear();
+	  start_counters(time_bad_hint, resource_bad_hint);
+	  insert_with_bad_hint(strs, ms);
+	  stop_counters(time_bad_hint, resource_bad_hint);
+	}
 
-  ostr.str("");
-  ostr << "unordered_set " << nb_copies << " X " << sz / nb_copies
-       << " insertions with perfect hint";
-  report_performance(__FILE__, ostr.str().c_str(),
-		     time_perfect_hint1, resource_perfect_hint1);
+	// No hint
+	{
+	  ms.clear();
+	  start_counters(time_no_hint, resource_no_hint);
+	  insert_without_hint(strs, ms);
+	  stop_counters(time_no_hint, resource_no_hint);
+	}
 
-  ostr.str("");
-  ostr << "unordered_set " << sz / nb_copies << " X " << nb_copies
-       << " insertions w/o hint";
-  report_performance(__FILE__, ostr.str().c_str(),
-		     time_no_hint2, resource_no_hint2);
+	// Perfect hint
+	{
+	  ms.clear();
+	  start_counters(time_perfect_hint, resource_perfect_hint);
+	  insert_with_perfect_hint(strs, ms);
+	  stop_counters(time_perfect_hint, resource_perfect_hint);
+	}
 
-  ostr.str("");
-  ostr << "unordered_set " << sz / nb_copies << " X " << nb_copies
-       << " insertions with any hint";
-  report_performance(__FILE__, ostr.str().c_str(),
-		     time_any_hint2, resource_any_hint2);
+	// Range insert
+	{
+	  ms.clear();
+	  start_counters(time_range, resource_range);
+	  insert_range(strs, ms);
+	  stop_counters(time_range, resource_range);
+	}
+      }
 
-  ostr.str("");
-  ostr << "unordered_set " << sz / nb_copies << " X " << nb_copies
-       << " insertions with bad hint";
-  report_performance(__FILE__, ostr.str().c_str(),
-		     time_bad_hint, resource_bad_hint);
+    std::ostringstream ostr;
+    ostr << ctx << ' ' << sz << " insertions w/o hint";
+    report_performance(__FILE__, ostr.str().c_str(),
+		       time_no_hint, resource_no_hint);
+
+    ostr.str("");
+    ostr << ctx << ' ' << sz << " insertions with perfect hint";
+    report_performance(__FILE__, ostr.str().c_str(),
+		       time_perfect_hint, resource_perfect_hint);
+
+    ostr.str("");
+    ostr << ctx << ' ' << sz << " insertions with bad hint";
+    report_performance(__FILE__, ostr.str().c_str(),
+		       time_bad_hint, resource_bad_hint);
+
+    ostr.str("");
+    ostr << ctx << ' ' << sz << " range insertions";
+    report_performance(__FILE__, ostr.str().c_str(),
+		       time_range, resource_range);
+  }
 
-  ostr.str("");
-  ostr << "unordered_set " << sz / nb_copies << " X " << nb_copies
-       << " insertions with perfect hint";
-  report_performance(__FILE__, ostr.str().c_str(),
-		     time_perfect_hint2, resource_perfect_hint2);
+int main()
+{
+  bench<hasher>("hash code NOT cached");
+  bench<slow_hasher>("hash code cached");
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/performance/23_containers/insert/unordered_set_hint.cc b/libstdc++-v3/testsuite/performance/23_containers/insert/unordered_set_hint.cc
new file mode 100644
index 00000000000..0197ba31b8b
--- /dev/null
+++ b/libstdc++-v3/testsuite/performance/23_containers/insert/unordered_set_hint.cc
@@ -0,0 +1,186 @@
+// { dg-do run { target c++11 } }
+
+#include <testsuite_performance.h>
+
+#include <sstream>
+#include <string>
+#include <vector>
+#include <unordered_set>
+
+namespace
+{
+  const int sz = 2000000;
+  const std::string pattern = "long enough test string #";
+
+  // Perfect std::string hasher knowing how string instances have been
+  // generated. It is not tag as slow so that hash code is not cached.
+  // It is easier to show impact of hint in this context.
+  struct hasher
+  {
+    std::size_t
+    operator()(const std::string& str) const noexcept
+    {
+      std::hash<std::string> std_hasher;
+      auto hash = std_hasher(pattern);
+      std::istringstream isstr(str.substr(pattern.length()));
+      int idx;
+      isstr >> idx;
+      return (std::size_t)(hash / sz) * sz + idx;
+    }
+  };
+
+  // Like previous hasher but tagged as slow.
+  struct slow_hasher
+  {
+    std::size_t
+    operator()(const std::string& str) const noexcept
+    { return hasher{}(str); }
+  };
+
+  template<typename _Hash>
+    using us_t = std::unordered_set<std::string, _Hash>;
+
+  template<typename _Hash>
+    void
+    insert_with_perfect_hint(const std::vector<std::string>& strs,
+			     us_t<_Hash>& s)
+    {
+      auto hint = s.end();
+      for (auto str : strs)
+	{
+	  auto insert_pos = s.insert(hint, str);
+	  if (std::next(insert_pos) == s.end())
+	    hint = insert_pos;
+	}
+    }
+
+  template<typename _Hash>
+    void
+    insert_with_bad_hint(const std::vector<std::string>& strs,
+			 us_t<_Hash>& s)
+    {
+      auto hint = s.begin();
+      for (auto str : strs)
+	{
+	  auto insert_pos = s.insert(hint, str);
+	  if (std::next(insert_pos) == hint)
+	    hint = s.begin();
+	}
+    }
+
+  template<typename _Hash>
+    void
+    insert_without_hint(const std::vector<std::string>& strs,
+			us_t<_Hash>& s)
+    {
+      for (auto str : strs)
+	s.insert(str);
+    }
+
+  template<typename _Hash>
+    void
+    insert_range(const std::vector<std::string>& strs,
+		 us_t<_Hash>& s)
+    { s.insert(strs.begin(), strs.end()); }
+}
+
+template<typename _Hash>
+  void bench(const char* ctx)
+  {
+    using namespace __gnu_test;
+
+    const int nb_iter = 10;
+
+    std::vector<std::string> strs;
+    strs.reserve(sz);
+
+    for (int i = 0; i != sz; ++i)
+      {
+	std::ostringstream osstr;
+	osstr << pattern << i;
+	strs.push_back(osstr.str());
+      }
+
+    us_t<_Hash> s;
+    s.reserve(sz);
+
+    // Warm up.
+    {
+      for (auto str : strs)
+	s.insert(str);
+    }
+
+    time_counter time_no_hint, time_bad_hint, time_perfect_hint,
+      time_range;
+    resource_counter resource_no_hint, resource_bad_hint, resource_perfect_hint,
+      resource_range;
+
+    for (int i = 0; i != nb_iter; ++i)
+      {
+	// Bad hint
+	{
+	  s.clear();
+	  start_counters(time_bad_hint, resource_bad_hint);
+	  insert_with_bad_hint(strs, s);
+	  stop_counters(time_bad_hint, resource_bad_hint);
+	}
+
+	// No hint
+	{
+	  s.clear();
+	  start_counters(time_no_hint, resource_no_hint);
+	  insert_without_hint(strs, s);
+	  stop_counters(time_no_hint, resource_no_hint);
+	}
+
+	// Perfect hint
+	{
+	  s.clear();
+	  start_counters(time_perfect_hint, resource_perfect_hint);
+	  insert_with_perfect_hint(strs, s);
+	  stop_counters(time_perfect_hint, resource_perfect_hint);
+	}
+
+	// Range insert
+	{
+	  s.clear();
+	  start_counters(time_range, resource_range);
+	  insert_range(strs, s);
+	  stop_counters(time_range, resource_range);
+	}
+      }
+
+    std::ostringstream ostr;
+    ostr << ctx << ' ' << sz << " insertions w/o hint";
+    report_performance(__FILE__, ostr.str().c_str(),
+		       time_no_hint, resource_no_hint);
+
+    ostr.str("");
+    ostr << ctx << ' ' << sz << " insertions with perfect hint";
+    report_performance(__FILE__, ostr.str().c_str(),
+		       time_perfect_hint, resource_perfect_hint);
+
+    ostr.str("");
+    ostr << ctx << ' ' << sz << " insertions with bad hint";
+    report_performance(__FILE__, ostr.str().c_str(),
+		       time_bad_hint, resource_bad_hint);
+
+    ostr.str("");
+    ostr << ctx << ' ' << sz << " range insertions";
+    report_performance(__FILE__, ostr.str().c_str(),
+		       time_range, resource_range);
+  }
+
+namespace std
+{
+  template<>
+    struct __is_fast_hash<slow_hasher> : std::false_type
+    { };
+}
+
+int main()
+{
+  bench<hasher>("hash code NOT cached");
+  bench<slow_hasher>("hash code cached");
+  return 0;
+}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][Hashtable] Performance optimization through use of insertion hint
  2023-07-24 11:02 [PATCH][Hashtable] Performance optimization through use of insertion hint François Dumont
@ 2023-08-29 19:51 ` François Dumont
  2023-09-01  8:59   ` Jonathan Wakely
  0 siblings, 1 reply; 5+ messages in thread
From: François Dumont @ 2023-08-29 19:51 UTC (permalink / raw)
  To: libstdc++; +Cc: gcc-patches

Hi

Any feedback regarding this patch ?

François

On 24/07/2023 13:02, François Dumont wrote:
> libstdc++: [_Hashtable] Make more use of insertion hint
>
>
>     When inserting an element into an empty bucket we currently insert 
> the new node
>     after the before-begin node so in first position. The drawback of 
> doing this is
>     that we are forced to update the bucket that was containing this 
> before-begin
>     node to point to the newly inserted node. To do so we need at best 
> to do a modulo
>     to find this bucket and when hash code is not cached also compute it.
>
>     To avoid this side effect it is better to insert after the last 
> node. Adding
>     a member to keep track of this last node would be an abi breaking 
> change. Still we
>     can ask the user to maintain and provide this last node as an 
> insertion hint.
>
>     Adapt range insertion methods to try to detect the last node and 
> then use it as
>     the insertion hint.
>
>     libstdc++-v3/ChangeLog:
>
>             * include/bits/hashtable_policy.h:
>             (_Insert_base::try_emplace): Adapt to use hint.
>             (_Insert_base::_M_insert_range): Try to detect container 
> last node and use it
>             as hint for insertions.
>             (_Hash_code_base::_M_hash_code(const _Hash&, const 
> _Hash_node_value<>&)): Remove.
>             (_Hash_code_base::_M_hash_code<_H2>(const _H2&, const 
> _Hash_node_value<>&)): Remove.
>             * include/bits/hashtable.h
>             (_Hashtable<>::_M_insert_bucket_begin): Add hint parameter 
> and use it when inserting
>             into an empty bucket if hint is the last container node.
>             (_Hashtable<>::_InsertInfo): New struct.
>             (_Hashtable<>::_M_get_insert_info): New, return latter.
>             (_Hashtable<>::_M_insert_multi_node): Add _InsertInfo 
> parameter.
>             (_Hashtable<>::_M_insert_unique_node): Add __node_ptr hint 
> parameter.
>             (_Hashtable<>::_M_emplace_unique(__node_ptr, _Args&&...)): 
> New.
>             (_Hashtable<>::_M_emplace_multi(__node_ptr, _Args&&...)): 
> New.
>             (_Hashtable<>::_M_emplace()): Adapt to use latters.
>             (_Hashtable<>::_M_insert_unique): Add __node_ptr parameter.
>             (_Hashtable<>::_M_insert_unique_aux): Add __node_ptr 
> parameter.
>             (_Hashtable<>::_M_insert(__node_ptr, _Arg&&, const 
> _NodeGenerator&, true_type)):
>             Use latter.
>             (_Hashtable<>::_M_reinsert_node(const_iterator, 
> node_type&&)):
>             Add hint parameter, adapt to use it.
>             (_Hashtable<>::_M_reinsert_node_multi): Use hint parameter 
> if available to extract
>             hash code.
>             (_Hashtable<>::_M_compute_hash_code(const _Hash&, 
> __node_ptr, __node_ptr,
>             const key_type&)): New.
> (_Hashtable<>::_M_compute_hash_code<_H2>(const _H2&, __node_ptr, 
> __node_ptr,
>             const key_type&)): New.
>             (_Hashtable<>::_M_merge_unique): Adapt to use latter. 
> Implement small size
>             optimization.
>             (_Hashtable<>::_M_get_insert_info(const _Hash&, 
> __node_ptr, __node_ptr,
>             const key_type&)): New.
> (_Hashtable<>::_M_get_insert_info<_H2>(const _H2&, __node_ptr, 
> __node_ptr,
>             const key_type&)): New.
>             (_Hashtable<>::_M_merge_multi): Adapt to use latter.
>             * include/bits/unordered_map.h 
> (unordered_map<>::insert(node_type&&)): Pass cend as
>             hint.
>             (unordered_map<>::insert(const_iterator, node_type&&)): 
> Adapt to use hint.
>             * include/bits/unordered_set.h 
> (unordered_set<>::insert(node_type&&)): Pass cend as
>             hint.
>             (unordered_set<>::insert(const_iterator, node_type&&)): 
> Adapt to use hint.
>             * 
> testsuite/23_containers/unordered_multimap/insert/hint.cc: Adapt 
> implementation
>             specific tests.
>
> Tested under linux x86_64.
>
> Here is the performance results of this patch, before:
>
> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
> insertions w/o hint      94r   94u    0s 191999712mem    0pf
> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
> insertions with perfect hint      95r   94u    0s 191999712mem 0pf
> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
> insertions with bad hint      94r   94u    0s 191999712mem    0pf
> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
> range insertions      88r   88u    0s 191999712mem    0pf
> unordered_multiset_hint.cc-thread    hash code cached 2000000 
> insertions w/o hint      91r   91u    0s 191999712mem    0pf
> unordered_multiset_hint.cc-thread    hash code cached 2000000 
> insertions with perfect hint      92r   93u    0s 191999712mem 0pf
> unordered_multiset_hint.cc-thread    hash code cached 2000000 
> insertions with bad hint      93r   93u    0s 191999712mem    0pf
> unordered_multiset_hint.cc-thread    hash code cached 2000000 range 
> insertions      88r   88u    0s 191999712mem    0pf
> unordered_set_hint.cc-thread    hash code NOT cached 2000000 
> insertions w/o hint      94r   95u    0s 191999712mem    0pf
> unordered_set_hint.cc-thread    hash code NOT cached 2000000 
> insertions with perfect hint      94r   95u    0s 191999712mem 0pf
> unordered_set_hint.cc-thread    hash code NOT cached 2000000 
> insertions with bad hint      94r   94u    0s 191999712mem    0pf
> unordered_set_hint.cc-thread    hash code NOT cached 2000000 range 
> insertions      90r   90u    0s 191999712mem    0pf
> unordered_set_hint.cc-thread    hash code cached 2000000 insertions 
> w/o hint      68r   68u    0s 223999264mem    0pf
> unordered_set_hint.cc-thread    hash code cached 2000000 insertions 
> with perfect hint      67r   67u    0s 223999264mem 0pf
> unordered_set_hint.cc-thread    hash code cached 2000000 insertions 
> with bad hint      68r   68u    0s 223999264mem    0pf
> unordered_set_hint.cc-thread    hash code cached 2000000 range 
> insertions      62r   62u    0s 223999264mem    0pf
>
> After:
>
> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
> insertions w/o hint      93r   93u    0s 191999712mem    0pf
> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
> insertions with perfect hint      58r   59u    0s 191999712mem 0pf
> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
> insertions with bad hint      93r   94u    0s 191999712mem    0pf
> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000 
> range insertions      52r   51u    0s 191999712mem    0pf
> unordered_multiset_hint.cc-thread    hash code cached 2000000 
> insertions w/o hint      96r   95u    0s 191999712mem    0pf
> unordered_multiset_hint.cc-thread    hash code cached 2000000 
> insertions with perfect hint      61r   62u    0s 191999712mem 0pf
> unordered_multiset_hint.cc-thread    hash code cached 2000000 
> insertions with bad hint      94r   94u    0s 191999712mem    0pf
> unordered_multiset_hint.cc-thread    hash code cached 2000000 range 
> insertions      52r   52u    0s 191999712mem    0pf
> unordered_set_hint.cc-thread    hash code NOT cached 2000000 
> insertions w/o hint      96r   95u    0s 191999712mem    0pf
> unordered_set_hint.cc-thread    hash code NOT cached 2000000 
> insertions with perfect hint      58r   59u    0s 191999712mem 0pf
> unordered_set_hint.cc-thread    hash code NOT cached 2000000 
> insertions with bad hint      94r   94u    0s 191999712mem    0pf
> unordered_set_hint.cc-thread    hash code NOT cached 2000000 range 
> insertions      52r   52u    0s 191999712mem    0pf
> unordered_set_hint.cc-thread    hash code cached 2000000 insertions 
> w/o hint      70r   69u    0s 223999264mem    0pf
> unordered_set_hint.cc-thread    hash code cached 2000000 insertions 
> with perfect hint      67r   67u    0s 223999264mem 0pf
> unordered_set_hint.cc-thread    hash code cached 2000000 insertions 
> with bad hint      67r   67u    0s 223999264mem    0pf
> unordered_set_hint.cc-thread    hash code cached 2000000 range 
> insertions      63r   62u    0s 223999264mem    0pf
>
> Ok to commit ?
>
> François

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][Hashtable] Performance optimization through use of insertion hint
  2023-08-29 19:51 ` François Dumont
@ 2023-09-01  8:59   ` Jonathan Wakely
  2023-09-01  9:00     ` Jonathan Wakely
  2023-09-07 16:55     ` François Dumont
  0 siblings, 2 replies; 5+ messages in thread
From: Jonathan Wakely @ 2023-09-01  8:59 UTC (permalink / raw)
  To: François Dumont; +Cc: libstdc++, gcc-patches

On Tue, 29 Aug 2023 at 20:52, François Dumont via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
>
> Hi
>
> Any feedback regarding this patch ?

This is a fairly large patch and before we make any more changes to
unordered containers we have an ABI break to fix:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111050


>
> François
>
> On 24/07/2023 13:02, François Dumont wrote:
> > libstdc++: [_Hashtable] Make more use of insertion hint
> >
> >
> >     When inserting an element into an empty bucket we currently insert
> > the new node
> >     after the before-begin node so in first position. The drawback of
> > doing this is
> >     that we are forced to update the bucket that was containing this
> > before-begin
> >     node to point to the newly inserted node. To do so we need at best
> > to do a modulo
> >     to find this bucket and when hash code is not cached also compute it.
> >
> >     To avoid this side effect it is better to insert after the last
> > node. Adding
> >     a member to keep track of this last node would be an abi breaking
> > change. Still we
> >     can ask the user to maintain and provide this last node as an
> > insertion hint.
> >
> >     Adapt range insertion methods to try to detect the last node and
> > then use it as
> >     the insertion hint.
> >
> >     libstdc++-v3/ChangeLog:
> >
> >             * include/bits/hashtable_policy.h:
> >             (_Insert_base::try_emplace): Adapt to use hint.
> >             (_Insert_base::_M_insert_range): Try to detect container
> > last node and use it
> >             as hint for insertions.
> >             (_Hash_code_base::_M_hash_code(const _Hash&, const
> > _Hash_node_value<>&)): Remove.
> >             (_Hash_code_base::_M_hash_code<_H2>(const _H2&, const
> > _Hash_node_value<>&)): Remove.
> >             * include/bits/hashtable.h
> >             (_Hashtable<>::_M_insert_bucket_begin): Add hint parameter
> > and use it when inserting
> >             into an empty bucket if hint is the last container node.
> >             (_Hashtable<>::_InsertInfo): New struct.
> >             (_Hashtable<>::_M_get_insert_info): New, return latter.
> >             (_Hashtable<>::_M_insert_multi_node): Add _InsertInfo
> > parameter.
> >             (_Hashtable<>::_M_insert_unique_node): Add __node_ptr hint
> > parameter.
> >             (_Hashtable<>::_M_emplace_unique(__node_ptr, _Args&&...)):
> > New.
> >             (_Hashtable<>::_M_emplace_multi(__node_ptr, _Args&&...)):
> > New.
> >             (_Hashtable<>::_M_emplace()): Adapt to use latters.
> >             (_Hashtable<>::_M_insert_unique): Add __node_ptr parameter.
> >             (_Hashtable<>::_M_insert_unique_aux): Add __node_ptr
> > parameter.
> >             (_Hashtable<>::_M_insert(__node_ptr, _Arg&&, const
> > _NodeGenerator&, true_type)):
> >             Use latter.
> >             (_Hashtable<>::_M_reinsert_node(const_iterator,
> > node_type&&)):
> >             Add hint parameter, adapt to use it.
> >             (_Hashtable<>::_M_reinsert_node_multi): Use hint parameter
> > if available to extract
> >             hash code.
> >             (_Hashtable<>::_M_compute_hash_code(const _Hash&,
> > __node_ptr, __node_ptr,
> >             const key_type&)): New.
> > (_Hashtable<>::_M_compute_hash_code<_H2>(const _H2&, __node_ptr,
> > __node_ptr,
> >             const key_type&)): New.
> >             (_Hashtable<>::_M_merge_unique): Adapt to use latter.
> > Implement small size
> >             optimization.
> >             (_Hashtable<>::_M_get_insert_info(const _Hash&,
> > __node_ptr, __node_ptr,
> >             const key_type&)): New.
> > (_Hashtable<>::_M_get_insert_info<_H2>(const _H2&, __node_ptr,
> > __node_ptr,
> >             const key_type&)): New.
> >             (_Hashtable<>::_M_merge_multi): Adapt to use latter.
> >             * include/bits/unordered_map.h
> > (unordered_map<>::insert(node_type&&)): Pass cend as
> >             hint.
> >             (unordered_map<>::insert(const_iterator, node_type&&)):
> > Adapt to use hint.
> >             * include/bits/unordered_set.h
> > (unordered_set<>::insert(node_type&&)): Pass cend as
> >             hint.
> >             (unordered_set<>::insert(const_iterator, node_type&&)):
> > Adapt to use hint.
> >             *
> > testsuite/23_containers/unordered_multimap/insert/hint.cc: Adapt
> > implementation
> >             specific tests.
> >
> > Tested under linux x86_64.
> >
> > Here is the performance results of this patch, before:
> >
> > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > insertions w/o hint      94r   94u    0s 191999712mem    0pf
> > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > insertions with perfect hint      95r   94u    0s 191999712mem 0pf
> > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > insertions with bad hint      94r   94u    0s 191999712mem    0pf
> > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > range insertions      88r   88u    0s 191999712mem    0pf
> > unordered_multiset_hint.cc-thread    hash code cached 2000000
> > insertions w/o hint      91r   91u    0s 191999712mem    0pf
> > unordered_multiset_hint.cc-thread    hash code cached 2000000
> > insertions with perfect hint      92r   93u    0s 191999712mem 0pf
> > unordered_multiset_hint.cc-thread    hash code cached 2000000
> > insertions with bad hint      93r   93u    0s 191999712mem    0pf
> > unordered_multiset_hint.cc-thread    hash code cached 2000000 range
> > insertions      88r   88u    0s 191999712mem    0pf
> > unordered_set_hint.cc-thread    hash code NOT cached 2000000
> > insertions w/o hint      94r   95u    0s 191999712mem    0pf
> > unordered_set_hint.cc-thread    hash code NOT cached 2000000
> > insertions with perfect hint      94r   95u    0s 191999712mem 0pf
> > unordered_set_hint.cc-thread    hash code NOT cached 2000000
> > insertions with bad hint      94r   94u    0s 191999712mem    0pf
> > unordered_set_hint.cc-thread    hash code NOT cached 2000000 range
> > insertions      90r   90u    0s 191999712mem    0pf
> > unordered_set_hint.cc-thread    hash code cached 2000000 insertions
> > w/o hint      68r   68u    0s 223999264mem    0pf
> > unordered_set_hint.cc-thread    hash code cached 2000000 insertions
> > with perfect hint      67r   67u    0s 223999264mem 0pf
> > unordered_set_hint.cc-thread    hash code cached 2000000 insertions
> > with bad hint      68r   68u    0s 223999264mem    0pf
> > unordered_set_hint.cc-thread    hash code cached 2000000 range
> > insertions      62r   62u    0s 223999264mem    0pf
> >
> > After:
> >
> > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > insertions w/o hint      93r   93u    0s 191999712mem    0pf
> > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > insertions with perfect hint      58r   59u    0s 191999712mem 0pf
> > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > insertions with bad hint      93r   94u    0s 191999712mem    0pf
> > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > range insertions      52r   51u    0s 191999712mem    0pf
> > unordered_multiset_hint.cc-thread    hash code cached 2000000
> > insertions w/o hint      96r   95u    0s 191999712mem    0pf
> > unordered_multiset_hint.cc-thread    hash code cached 2000000
> > insertions with perfect hint      61r   62u    0s 191999712mem 0pf
> > unordered_multiset_hint.cc-thread    hash code cached 2000000
> > insertions with bad hint      94r   94u    0s 191999712mem    0pf
> > unordered_multiset_hint.cc-thread    hash code cached 2000000 range
> > insertions      52r   52u    0s 191999712mem    0pf
> > unordered_set_hint.cc-thread    hash code NOT cached 2000000
> > insertions w/o hint      96r   95u    0s 191999712mem    0pf
> > unordered_set_hint.cc-thread    hash code NOT cached 2000000
> > insertions with perfect hint      58r   59u    0s 191999712mem 0pf
> > unordered_set_hint.cc-thread    hash code NOT cached 2000000
> > insertions with bad hint      94r   94u    0s 191999712mem    0pf
> > unordered_set_hint.cc-thread    hash code NOT cached 2000000 range
> > insertions      52r   52u    0s 191999712mem    0pf
> > unordered_set_hint.cc-thread    hash code cached 2000000 insertions
> > w/o hint      70r   69u    0s 223999264mem    0pf
> > unordered_set_hint.cc-thread    hash code cached 2000000 insertions
> > with perfect hint      67r   67u    0s 223999264mem 0pf
> > unordered_set_hint.cc-thread    hash code cached 2000000 insertions
> > with bad hint      67r   67u    0s 223999264mem    0pf
> > unordered_set_hint.cc-thread    hash code cached 2000000 range
> > insertions      63r   62u    0s 223999264mem    0pf
> >
> > Ok to commit ?
> >
> > François
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][Hashtable] Performance optimization through use of insertion hint
  2023-09-01  8:59   ` Jonathan Wakely
@ 2023-09-01  9:00     ` Jonathan Wakely
  2023-09-07 16:55     ` François Dumont
  1 sibling, 0 replies; 5+ messages in thread
From: Jonathan Wakely @ 2023-09-01  9:00 UTC (permalink / raw)
  To: François Dumont; +Cc: libstdc++, gcc-patches

On Fri, 1 Sept 2023 at 09:59, Jonathan Wakely <jwakely@redhat.com> wrote:
>
> On Tue, 29 Aug 2023 at 20:52, François Dumont via Libstdc++
> <libstdc++@gcc.gnu.org> wrote:
> >
> > Hi
> >
> > Any feedback regarding this patch ?
>
> This is a fairly large patch and before we make any more changes to
> unordered containers we have an ABI break to fix:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111050

Or at least decide what to do about it...


>
>
> >
> > François
> >
> > On 24/07/2023 13:02, François Dumont wrote:
> > > libstdc++: [_Hashtable] Make more use of insertion hint
> > >
> > >
> > >     When inserting an element into an empty bucket we currently insert
> > > the new node
> > >     after the before-begin node so in first position. The drawback of
> > > doing this is
> > >     that we are forced to update the bucket that was containing this
> > > before-begin
> > >     node to point to the newly inserted node. To do so we need at best
> > > to do a modulo
> > >     to find this bucket and when hash code is not cached also compute it.
> > >
> > >     To avoid this side effect it is better to insert after the last
> > > node. Adding
> > >     a member to keep track of this last node would be an abi breaking
> > > change. Still we
> > >     can ask the user to maintain and provide this last node as an
> > > insertion hint.
> > >
> > >     Adapt range insertion methods to try to detect the last node and
> > > then use it as
> > >     the insertion hint.
> > >
> > >     libstdc++-v3/ChangeLog:
> > >
> > >             * include/bits/hashtable_policy.h:
> > >             (_Insert_base::try_emplace): Adapt to use hint.
> > >             (_Insert_base::_M_insert_range): Try to detect container
> > > last node and use it
> > >             as hint for insertions.
> > >             (_Hash_code_base::_M_hash_code(const _Hash&, const
> > > _Hash_node_value<>&)): Remove.
> > >             (_Hash_code_base::_M_hash_code<_H2>(const _H2&, const
> > > _Hash_node_value<>&)): Remove.
> > >             * include/bits/hashtable.h
> > >             (_Hashtable<>::_M_insert_bucket_begin): Add hint parameter
> > > and use it when inserting
> > >             into an empty bucket if hint is the last container node.
> > >             (_Hashtable<>::_InsertInfo): New struct.
> > >             (_Hashtable<>::_M_get_insert_info): New, return latter.
> > >             (_Hashtable<>::_M_insert_multi_node): Add _InsertInfo
> > > parameter.
> > >             (_Hashtable<>::_M_insert_unique_node): Add __node_ptr hint
> > > parameter.
> > >             (_Hashtable<>::_M_emplace_unique(__node_ptr, _Args&&...)):
> > > New.
> > >             (_Hashtable<>::_M_emplace_multi(__node_ptr, _Args&&...)):
> > > New.
> > >             (_Hashtable<>::_M_emplace()): Adapt to use latters.
> > >             (_Hashtable<>::_M_insert_unique): Add __node_ptr parameter.
> > >             (_Hashtable<>::_M_insert_unique_aux): Add __node_ptr
> > > parameter.
> > >             (_Hashtable<>::_M_insert(__node_ptr, _Arg&&, const
> > > _NodeGenerator&, true_type)):
> > >             Use latter.
> > >             (_Hashtable<>::_M_reinsert_node(const_iterator,
> > > node_type&&)):
> > >             Add hint parameter, adapt to use it.
> > >             (_Hashtable<>::_M_reinsert_node_multi): Use hint parameter
> > > if available to extract
> > >             hash code.
> > >             (_Hashtable<>::_M_compute_hash_code(const _Hash&,
> > > __node_ptr, __node_ptr,
> > >             const key_type&)): New.
> > > (_Hashtable<>::_M_compute_hash_code<_H2>(const _H2&, __node_ptr,
> > > __node_ptr,
> > >             const key_type&)): New.
> > >             (_Hashtable<>::_M_merge_unique): Adapt to use latter.
> > > Implement small size
> > >             optimization.
> > >             (_Hashtable<>::_M_get_insert_info(const _Hash&,
> > > __node_ptr, __node_ptr,
> > >             const key_type&)): New.
> > > (_Hashtable<>::_M_get_insert_info<_H2>(const _H2&, __node_ptr,
> > > __node_ptr,
> > >             const key_type&)): New.
> > >             (_Hashtable<>::_M_merge_multi): Adapt to use latter.
> > >             * include/bits/unordered_map.h
> > > (unordered_map<>::insert(node_type&&)): Pass cend as
> > >             hint.
> > >             (unordered_map<>::insert(const_iterator, node_type&&)):
> > > Adapt to use hint.
> > >             * include/bits/unordered_set.h
> > > (unordered_set<>::insert(node_type&&)): Pass cend as
> > >             hint.
> > >             (unordered_set<>::insert(const_iterator, node_type&&)):
> > > Adapt to use hint.
> > >             *
> > > testsuite/23_containers/unordered_multimap/insert/hint.cc: Adapt
> > > implementation
> > >             specific tests.
> > >
> > > Tested under linux x86_64.
> > >
> > > Here is the performance results of this patch, before:
> > >
> > > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > > insertions w/o hint      94r   94u    0s 191999712mem    0pf
> > > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > > insertions with perfect hint      95r   94u    0s 191999712mem 0pf
> > > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > > insertions with bad hint      94r   94u    0s 191999712mem    0pf
> > > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > > range insertions      88r   88u    0s 191999712mem    0pf
> > > unordered_multiset_hint.cc-thread    hash code cached 2000000
> > > insertions w/o hint      91r   91u    0s 191999712mem    0pf
> > > unordered_multiset_hint.cc-thread    hash code cached 2000000
> > > insertions with perfect hint      92r   93u    0s 191999712mem 0pf
> > > unordered_multiset_hint.cc-thread    hash code cached 2000000
> > > insertions with bad hint      93r   93u    0s 191999712mem    0pf
> > > unordered_multiset_hint.cc-thread    hash code cached 2000000 range
> > > insertions      88r   88u    0s 191999712mem    0pf
> > > unordered_set_hint.cc-thread    hash code NOT cached 2000000
> > > insertions w/o hint      94r   95u    0s 191999712mem    0pf
> > > unordered_set_hint.cc-thread    hash code NOT cached 2000000
> > > insertions with perfect hint      94r   95u    0s 191999712mem 0pf
> > > unordered_set_hint.cc-thread    hash code NOT cached 2000000
> > > insertions with bad hint      94r   94u    0s 191999712mem    0pf
> > > unordered_set_hint.cc-thread    hash code NOT cached 2000000 range
> > > insertions      90r   90u    0s 191999712mem    0pf
> > > unordered_set_hint.cc-thread    hash code cached 2000000 insertions
> > > w/o hint      68r   68u    0s 223999264mem    0pf
> > > unordered_set_hint.cc-thread    hash code cached 2000000 insertions
> > > with perfect hint      67r   67u    0s 223999264mem 0pf
> > > unordered_set_hint.cc-thread    hash code cached 2000000 insertions
> > > with bad hint      68r   68u    0s 223999264mem    0pf
> > > unordered_set_hint.cc-thread    hash code cached 2000000 range
> > > insertions      62r   62u    0s 223999264mem    0pf
> > >
> > > After:
> > >
> > > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > > insertions w/o hint      93r   93u    0s 191999712mem    0pf
> > > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > > insertions with perfect hint      58r   59u    0s 191999712mem 0pf
> > > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > > insertions with bad hint      93r   94u    0s 191999712mem    0pf
> > > unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
> > > range insertions      52r   51u    0s 191999712mem    0pf
> > > unordered_multiset_hint.cc-thread    hash code cached 2000000
> > > insertions w/o hint      96r   95u    0s 191999712mem    0pf
> > > unordered_multiset_hint.cc-thread    hash code cached 2000000
> > > insertions with perfect hint      61r   62u    0s 191999712mem 0pf
> > > unordered_multiset_hint.cc-thread    hash code cached 2000000
> > > insertions with bad hint      94r   94u    0s 191999712mem    0pf
> > > unordered_multiset_hint.cc-thread    hash code cached 2000000 range
> > > insertions      52r   52u    0s 191999712mem    0pf
> > > unordered_set_hint.cc-thread    hash code NOT cached 2000000
> > > insertions w/o hint      96r   95u    0s 191999712mem    0pf
> > > unordered_set_hint.cc-thread    hash code NOT cached 2000000
> > > insertions with perfect hint      58r   59u    0s 191999712mem 0pf
> > > unordered_set_hint.cc-thread    hash code NOT cached 2000000
> > > insertions with bad hint      94r   94u    0s 191999712mem    0pf
> > > unordered_set_hint.cc-thread    hash code NOT cached 2000000 range
> > > insertions      52r   52u    0s 191999712mem    0pf
> > > unordered_set_hint.cc-thread    hash code cached 2000000 insertions
> > > w/o hint      70r   69u    0s 223999264mem    0pf
> > > unordered_set_hint.cc-thread    hash code cached 2000000 insertions
> > > with perfect hint      67r   67u    0s 223999264mem 0pf
> > > unordered_set_hint.cc-thread    hash code cached 2000000 insertions
> > > with bad hint      67r   67u    0s 223999264mem    0pf
> > > unordered_set_hint.cc-thread    hash code cached 2000000 range
> > > insertions      63r   62u    0s 223999264mem    0pf
> > >
> > > Ok to commit ?
> > >
> > > François
> >


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][Hashtable] Performance optimization through use of insertion hint
  2023-09-01  8:59   ` Jonathan Wakely
  2023-09-01  9:00     ` Jonathan Wakely
@ 2023-09-07 16:55     ` François Dumont
  1 sibling, 0 replies; 5+ messages in thread
From: François Dumont @ 2023-09-07 16:55 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches


On 01/09/2023 10:59, Jonathan Wakely wrote:
> On Tue, 29 Aug 2023 at 20:52, François Dumont via Libstdc++
> <libstdc++@gcc.gnu.org> wrote:
>> Hi
>>
>> Any feedback regarding this patch ?
> This is a fairly large patch

I've decided to split it, at least in 2.

So just ignore this one, I'll submit new ones once abi issue is fixed.

>   and before we make any more changes to
> unordered containers we have an ABI break to fix:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111050
>
>
>> François
>>
>> On 24/07/2023 13:02, François Dumont wrote:
>>> libstdc++: [_Hashtable] Make more use of insertion hint
>>>
>>>
>>>      When inserting an element into an empty bucket we currently insert
>>> the new node
>>>      after the before-begin node so in first position. The drawback of
>>> doing this is
>>>      that we are forced to update the bucket that was containing this
>>> before-begin
>>>      node to point to the newly inserted node. To do so we need at best
>>> to do a modulo
>>>      to find this bucket and when hash code is not cached also compute it.
>>>
>>>      To avoid this side effect it is better to insert after the last
>>> node. Adding
>>>      a member to keep track of this last node would be an abi breaking
>>> change. Still we
>>>      can ask the user to maintain and provide this last node as an
>>> insertion hint.
>>>
>>>      Adapt range insertion methods to try to detect the last node and
>>> then use it as
>>>      the insertion hint.
>>>
>>>      libstdc++-v3/ChangeLog:
>>>
>>>              * include/bits/hashtable_policy.h:
>>>              (_Insert_base::try_emplace): Adapt to use hint.
>>>              (_Insert_base::_M_insert_range): Try to detect container
>>> last node and use it
>>>              as hint for insertions.
>>>              (_Hash_code_base::_M_hash_code(const _Hash&, const
>>> _Hash_node_value<>&)): Remove.
>>>              (_Hash_code_base::_M_hash_code<_H2>(const _H2&, const
>>> _Hash_node_value<>&)): Remove.
>>>              * include/bits/hashtable.h
>>>              (_Hashtable<>::_M_insert_bucket_begin): Add hint parameter
>>> and use it when inserting
>>>              into an empty bucket if hint is the last container node.
>>>              (_Hashtable<>::_InsertInfo): New struct.
>>>              (_Hashtable<>::_M_get_insert_info): New, return latter.
>>>              (_Hashtable<>::_M_insert_multi_node): Add _InsertInfo
>>> parameter.
>>>              (_Hashtable<>::_M_insert_unique_node): Add __node_ptr hint
>>> parameter.
>>>              (_Hashtable<>::_M_emplace_unique(__node_ptr, _Args&&...)):
>>> New.
>>>              (_Hashtable<>::_M_emplace_multi(__node_ptr, _Args&&...)):
>>> New.
>>>              (_Hashtable<>::_M_emplace()): Adapt to use latters.
>>>              (_Hashtable<>::_M_insert_unique): Add __node_ptr parameter.
>>>              (_Hashtable<>::_M_insert_unique_aux): Add __node_ptr
>>> parameter.
>>>              (_Hashtable<>::_M_insert(__node_ptr, _Arg&&, const
>>> _NodeGenerator&, true_type)):
>>>              Use latter.
>>>              (_Hashtable<>::_M_reinsert_node(const_iterator,
>>> node_type&&)):
>>>              Add hint parameter, adapt to use it.
>>>              (_Hashtable<>::_M_reinsert_node_multi): Use hint parameter
>>> if available to extract
>>>              hash code.
>>>              (_Hashtable<>::_M_compute_hash_code(const _Hash&,
>>> __node_ptr, __node_ptr,
>>>              const key_type&)): New.
>>> (_Hashtable<>::_M_compute_hash_code<_H2>(const _H2&, __node_ptr,
>>> __node_ptr,
>>>              const key_type&)): New.
>>>              (_Hashtable<>::_M_merge_unique): Adapt to use latter.
>>> Implement small size
>>>              optimization.
>>>              (_Hashtable<>::_M_get_insert_info(const _Hash&,
>>> __node_ptr, __node_ptr,
>>>              const key_type&)): New.
>>> (_Hashtable<>::_M_get_insert_info<_H2>(const _H2&, __node_ptr,
>>> __node_ptr,
>>>              const key_type&)): New.
>>>              (_Hashtable<>::_M_merge_multi): Adapt to use latter.
>>>              * include/bits/unordered_map.h
>>> (unordered_map<>::insert(node_type&&)): Pass cend as
>>>              hint.
>>>              (unordered_map<>::insert(const_iterator, node_type&&)):
>>> Adapt to use hint.
>>>              * include/bits/unordered_set.h
>>> (unordered_set<>::insert(node_type&&)): Pass cend as
>>>              hint.
>>>              (unordered_set<>::insert(const_iterator, node_type&&)):
>>> Adapt to use hint.
>>>              *
>>> testsuite/23_containers/unordered_multimap/insert/hint.cc: Adapt
>>> implementation
>>>              specific tests.
>>>
>>> Tested under linux x86_64.
>>>
>>> Here is the performance results of this patch, before:
>>>
>>> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
>>> insertions w/o hint      94r   94u    0s 191999712mem    0pf
>>> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
>>> insertions with perfect hint      95r   94u    0s 191999712mem 0pf
>>> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
>>> insertions with bad hint      94r   94u    0s 191999712mem    0pf
>>> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
>>> range insertions      88r   88u    0s 191999712mem    0pf
>>> unordered_multiset_hint.cc-thread    hash code cached 2000000
>>> insertions w/o hint      91r   91u    0s 191999712mem    0pf
>>> unordered_multiset_hint.cc-thread    hash code cached 2000000
>>> insertions with perfect hint      92r   93u    0s 191999712mem 0pf
>>> unordered_multiset_hint.cc-thread    hash code cached 2000000
>>> insertions with bad hint      93r   93u    0s 191999712mem    0pf
>>> unordered_multiset_hint.cc-thread    hash code cached 2000000 range
>>> insertions      88r   88u    0s 191999712mem    0pf
>>> unordered_set_hint.cc-thread    hash code NOT cached 2000000
>>> insertions w/o hint      94r   95u    0s 191999712mem    0pf
>>> unordered_set_hint.cc-thread    hash code NOT cached 2000000
>>> insertions with perfect hint      94r   95u    0s 191999712mem 0pf
>>> unordered_set_hint.cc-thread    hash code NOT cached 2000000
>>> insertions with bad hint      94r   94u    0s 191999712mem    0pf
>>> unordered_set_hint.cc-thread    hash code NOT cached 2000000 range
>>> insertions      90r   90u    0s 191999712mem    0pf
>>> unordered_set_hint.cc-thread    hash code cached 2000000 insertions
>>> w/o hint      68r   68u    0s 223999264mem    0pf
>>> unordered_set_hint.cc-thread    hash code cached 2000000 insertions
>>> with perfect hint      67r   67u    0s 223999264mem 0pf
>>> unordered_set_hint.cc-thread    hash code cached 2000000 insertions
>>> with bad hint      68r   68u    0s 223999264mem    0pf
>>> unordered_set_hint.cc-thread    hash code cached 2000000 range
>>> insertions      62r   62u    0s 223999264mem    0pf
>>>
>>> After:
>>>
>>> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
>>> insertions w/o hint      93r   93u    0s 191999712mem    0pf
>>> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
>>> insertions with perfect hint      58r   59u    0s 191999712mem 0pf
>>> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
>>> insertions with bad hint      93r   94u    0s 191999712mem    0pf
>>> unordered_multiset_hint.cc-thread    hash code NOT cached 2000000
>>> range insertions      52r   51u    0s 191999712mem    0pf
>>> unordered_multiset_hint.cc-thread    hash code cached 2000000
>>> insertions w/o hint      96r   95u    0s 191999712mem    0pf
>>> unordered_multiset_hint.cc-thread    hash code cached 2000000
>>> insertions with perfect hint      61r   62u    0s 191999712mem 0pf
>>> unordered_multiset_hint.cc-thread    hash code cached 2000000
>>> insertions with bad hint      94r   94u    0s 191999712mem    0pf
>>> unordered_multiset_hint.cc-thread    hash code cached 2000000 range
>>> insertions      52r   52u    0s 191999712mem    0pf
>>> unordered_set_hint.cc-thread    hash code NOT cached 2000000
>>> insertions w/o hint      96r   95u    0s 191999712mem    0pf
>>> unordered_set_hint.cc-thread    hash code NOT cached 2000000
>>> insertions with perfect hint      58r   59u    0s 191999712mem 0pf
>>> unordered_set_hint.cc-thread    hash code NOT cached 2000000
>>> insertions with bad hint      94r   94u    0s 191999712mem    0pf
>>> unordered_set_hint.cc-thread    hash code NOT cached 2000000 range
>>> insertions      52r   52u    0s 191999712mem    0pf
>>> unordered_set_hint.cc-thread    hash code cached 2000000 insertions
>>> w/o hint      70r   69u    0s 223999264mem    0pf
>>> unordered_set_hint.cc-thread    hash code cached 2000000 insertions
>>> with perfect hint      67r   67u    0s 223999264mem 0pf
>>> unordered_set_hint.cc-thread    hash code cached 2000000 insertions
>>> with bad hint      67r   67u    0s 223999264mem    0pf
>>> unordered_set_hint.cc-thread    hash code cached 2000000 range
>>> insertions      63r   62u    0s 223999264mem    0pf
>>>
>>> Ok to commit ?
>>>
>>> François

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-09-07 16:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-24 11:02 [PATCH][Hashtable] Performance optimization through use of insertion hint François Dumont
2023-08-29 19:51 ` François Dumont
2023-09-01  8:59   ` Jonathan Wakely
2023-09-01  9:00     ` Jonathan Wakely
2023-09-07 16:55     ` François Dumont

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).