On 23/12/21 2:03 pm, Jonathan Wakely wrote: > On 21/12/21 07:07 +0100, François Dumont wrote: >> Hi >> >> ??? Is there a chance for this patch to be integrated for next gcc >> release ? > > Yes, I think this can still make it for GCC 12 (the patch was first > posted long ago in stage 1 and it's just me being slow to review it). > > But I have some questions and comments ... > > >> diff --git a/libstdc++-v3/include/bits/hashtable.h >> b/libstdc++-v3/include/bits/hashtable.h >> index 6e2d4c10cfe..6b2c6b99fef 100644 >> --- a/libstdc++-v3/include/bits/hashtable.h >> +++ b/libstdc++-v3/include/bits/hashtable.h >> @@ -419,6 +419,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> _M_uses_single_bucket() const >> { return _M_uses_single_bucket(_M_buckets); } >> >> + static constexpr size_t >> + __small_size_threshold() >> + { >> + return >> + __detail::_Hashtable_hash_traits<_Hash>::__small_size_threshold(); >> + } >> + I've added the noexcept qualification as you asked. >> __hashtable_alloc& >> _M_base_alloc() { return *this; } >> >> @@ -788,6 +795,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> _M_bucket_index(__hash_code __c) const >> { return __hash_code_base::_M_bucket_index(__c, >> _M_bucket_count); } >> >> + __node_base_ptr >> + _M_find_before_node(const key_type&); >> + >> // Find and insert helper functions and types >> // Find the node before the one matching the criteria. >> __node_base_ptr >> @@ -831,6 +841,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> __node_base_ptr >> _M_get_previous_node(size_type __bkt, __node_ptr __n); >> >> + pair >> + _M_compute_hash_code(const_iterator __hint, const key_type& >> __k) const; >> + >> // Insert node __n with hash code __code, in bucket __bkt if no >> // rehash (assumes no element with same key already present). >> // Takes ownership of __n if insertion succeeds, throws otherwise. >> @@ -1126,7 +1139,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> void _M_rehash(size_type __bkt_count, const __rehash_state& >> __state); >> }; >> >> - >> // Definitions of class template _Hashtable's out-of-line member >> functions. >> template> typename _ExtractKey, typename _Equal, >> @@ -1628,6 +1640,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> find(const key_type& __k) >> -> iterator >> { >> + if (size() <= __small_size_threshold()) >> + { >> + for (auto __it = begin(); __it != end(); ++__it) >> + if (this->_M_key_equals(__k, *__it._M_cur)) >> + return __it; >> + return end(); >> + } > > This loop is repeated a few times, would it be better factored out > into its own function? _M_find_loop or something? The return type is > different in some cases, so maybe it's OK like this. Yes, I also thought about that but there is often small changes from one loop to another as you noticed. > > > >> + >> __hash_code __code = this->_M_hash_code(__k); >> std::size_t __bkt = _M_bucket_index(__code); >> return iterator(_M_find_node(__bkt, __k, __code)); >> @@ -1643,6 +1663,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> find(const key_type& __k) const >> -> const_iterator >> { >> + if (size() <= __small_size_threshold()) >> + { >> + for (auto __it = begin(); __it != end(); ++__it) >> + if (this->_M_key_equals(__k, *__it._M_cur)) >> + return __it; >> + return end(); >> + } >> + >> __hash_code __code = this->_M_hash_code(__k); >> std::size_t __bkt = _M_bucket_index(__code); >> return const_iterator(_M_find_node(__bkt, __k, __code)); >> @@ -1855,6 +1883,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> } >> #endif >> >> + // Find the node before the one whose key compares equal to k. >> + // Return nullptr if no node is found. >> + template> + typename _ExtractKey, typename _Equal, >> + typename _Hash, typename _RangeHash, typename _Unused, >> + typename _RehashPolicy, typename _Traits> >> + auto >> + _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, >> + _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>:: >> + _M_find_before_node(const key_type& __k) >> + -> __node_base_ptr >> + { >> + __node_base_ptr __prev_p = &_M_before_begin; > > This is OK now, but would need to be std::__addressof(_M_before_begin) > if/when the _Hash_code_base type becomes dependent on the allocator's > pointer. Yes, maybe after gcc release we will talk about those fancy pointer types again. > > __n_last = __n_last->_M_next(); >> diff --git a/libstdc++-v3/include/bits/hashtable_policy.h >> b/libstdc++-v3/include/bits/hashtable_policy.h >> index 0b5443fc187..b4a8af081ce 100644 >> --- a/libstdc++-v3/include/bits/hashtable_policy.h >> +++ b/libstdc++-v3/include/bits/hashtable_policy.h >> @@ -246,6 +246,20 @@ namespace __detail >> using __unique_keys = __bool_constant<_Unique_keys>; >> }; >> >> + /** >> + * struct _Hashtable_hash_traits >> + * >> + * Important traits for hash tables depending on associated hasher. >> + * >> + */ >> + template >> + struct _Hashtable_hash_traits >> + { >> + static constexpr std::size_t >> + __small_size_threshold() >> + { return std::__is_fast_hash<_Hash>::value ? 0 : 20; } >> + }; > > Yet another trait that nobody is ever going to specialize make me sad. > I don't have a better idea though. Sure, but maybe I can document it ? I also wonder why you did not propose to make it a constant rather than requesting to add the noexcept. > > { >> @@ -1263,6 +1279,14 @@ namespace __detail >> const _Hash_node_value<_Value, __cache_hash_code>& __n) const >> { return _M_hash_code(_ExtractKey{}(__n._M_v())); } >> >> + __hash_code >> + _M_hash_code(const _Hash_node_value<_Value, false>& __n) const >> + { return _M_hash_code(_ExtractKey{}(__n._M_v())); } >> + >> + __hash_code >> + _M_hash_code(const _Hash_node_value<_Value, true>& __n) const >> + { return __n._M_hash_code; } >> + >> std::size_t >> _M_bucket_index(__hash_code __c, std::size_t __bkt_count) const >> { return _RangeHash{}(__c, __bkt_count); } >> @@ -1273,17 +1297,14 @@ namespace __detail >> noexcept( noexcept(declval()(declval())) >> && noexcept(declval()((__hash_code)0, >> (std::size_t)0)) ) >> - { >> - return _RangeHash{}(_M_hash_code(_ExtractKey{}(__n._M_v())), >> - __bkt_count); >> - } >> + { return _M_bucket_index(_M_hash_code(__n), __bkt_count); } > > Why add this extra level of indirection (and overload resolution)? > > We know this is a _Hash_node_value, why call _M_hash_code to > decide how to get the hash code for it? Just to avoid code duplication but indeed it introduces overload resolution so I reverted it. > > >> diff --git a/libstdc++-v3/testsuite/util/testsuite_performance.h >> b/libstdc++-v3/testsuite/util/testsuite_performance.h >> index cba3a0d4b17..4ca15ab0e71 100644 >> --- a/libstdc++-v3/testsuite/util/testsuite_performance.h >> +++ b/libstdc++-v3/testsuite/util/testsuite_performance.h >> @@ -239,7 +239,7 @@ namespace __gnu_test >> out << std::setw(4) << t.real_time() << "r" << space; >> out << std::setw(4) << t.user_time() << "u" << space; >> out << std::setw(4) << t.system_time() << "s" << space; >> - out << std::setw(8) << r.allocated_memory() << "mem" << space; >> + out << std::setw(9) << r.allocated_memory() << "mem" << space; > > One day I need to figure out why the reported memory is garbage so > often Ok with those changes ? François