hashtable optimization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* hashtable optimization
@ 2015-02-18  9:36 François Dumont
  2015-02-18  9:37 ` François Dumont
  2015-02-20 13:28 ` Jonathan Wakely
  0 siblings, 2 replies; 4+ messages in thread
From: François Dumont @ 2015-02-18  9:36 UTC (permalink / raw)
  To: libstdc++, gcc-patches

Hello

     I am still studying hashtable performances and especially how to 
reduce overhead compared to tr1 implementation. Most of the overhead is 
coming from the additional modulo operations required with the new data 
model. Having a closer look at PR 57885 bench I realized that we can 
quite easily avoid an important modulo operation in 
_M_insert_bucket_begin thanks to an additional std::size_t in the container.

     The patch is quite straightforward, it optimizes insertion of a 
node in an empty bucket which is the most important kind of insertion as 
long as hash functor is doing a good job as per default we should have 
at most 1 element per buckets:

Without patch:
Container:std::unordered_map<int,int>  Key:int
Insertion: 1106 671 634 634 635  min=634 max=1106

Container:std::tr1::unordered_map<int,int>  Key:int
Insertion: 885 491 487 490 511  min=487 max=885

With patch:
Container:std::unordered_map<int,int>  Key:int
Insertion: 1099 581 583 584 592  min=581 max=1099

Container:std::tr1::unordered_map<int,int>  Key:int
Insertion: 889 491 519 492 493  min=491 max=889

     I prefer to propose it now because it impacts ABI.

2015-02-19  FranÃ§ois Dumont  <fdumont@gcc.gnu.org>

     * include/bits/hashtable.h (_Hashtable<>::_M_bbegin_bkt): New, bucket
     pointing to _M_before_begin.
     (_Hashtable<>): Maintain and use later.

Tested under Linux x86_64.

FranÃ§ois

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: hashtable optimization
  2015-02-18  9:36 hashtable optimization François Dumont
@ 2015-02-18  9:37 ` François Dumont
  2015-02-20 13:28 ` Jonathan Wakely
  1 sibling, 0 replies; 4+ messages in thread
From: François Dumont @ 2015-02-18  9:37 UTC (permalink / raw)
  To: libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1532 bytes --]

With patch.

On 18/02/2015 10:35, FranÃ§ois Dumont wrote:
> Hello
>
>     I am still studying hashtable performances and especially how to 
> reduce overhead compared to tr1 implementation. Most of the overhead 
> is coming from the additional modulo operations required with the new 
> data model. Having a closer look at PR 57885 bench I realized that we 
> can quite easily avoid an important modulo operation in 
> _M_insert_bucket_begin thanks to an additional std::size_t in the 
> container.
>
>     The patch is quite straightforward, it optimizes insertion of a 
> node in an empty bucket which is the most important kind of insertion 
> as long as hash functor is doing a good job as per default we should 
> have at most 1 element per buckets:
>
> Without patch:
> Container:std::unordered_map<int,int>  Key:int
> Insertion: 1106 671 634 634 635  min=634 max=1106
>
> Container:std::tr1::unordered_map<int,int>  Key:int
> Insertion: 885 491 487 490 511  min=487 max=885
>
> With patch:
> Container:std::unordered_map<int,int>  Key:int
> Insertion: 1099 581 583 584 592  min=581 max=1099
>
> Container:std::tr1::unordered_map<int,int>  Key:int
> Insertion: 889 491 519 492 493  min=491 max=889
>
>     I prefer to propose it now because it impacts ABI.
>
> 2015-02-19  FranÃ§ois Dumont  <fdumont@gcc.gnu.org>
>
>     * include/bits/hashtable.h (_Hashtable<>::_M_bbegin_bkt): New, bucket
>     pointing to _M_before_begin.
>     (_Hashtable<>): Maintain and use later.
>
> Tested under Linux x86_64.
>
> FranÃ§ois
>


[-- Attachment #2: hashtable_optim.patch --]
[-- Type: text/x-patch, Size: 5022 bytes --]

Index: include/bits/hashtable.h
===================================================================
--- include/bits/hashtable.h	(revision 220780)
+++ include/bits/hashtable.h	(working copy)
@@ -324,6 +324,9 @@
       // numerous checks in the code to avoid 0 modulus.
       __bucket_type		_M_single_bucket	= nullptr;
 
+      // Cache index of the bucket pointing to _M_before_begin
+      size_type			_M_bbegin_bkt;
+
       bool
       _M_uses_single_bucket(__bucket_type* __bkts) const
       { return __builtin_expect(__bkts == &_M_single_bucket, false); }
@@ -965,7 +968,8 @@
 	    __node_type* __this_n = __node_gen(__ht_n);
 	    this->_M_copy_code(__this_n, __ht_n);
 	    _M_before_begin._M_nxt = __this_n;
-	    _M_buckets[_M_bucket_index(__this_n)] = &_M_before_begin;
+	    _M_bbegin_bkt = _M_bucket_index(__this_n);
+	    _M_buckets[_M_bbegin_bkt] = &_M_before_begin;
 
 	    // Then deal with other nodes.
 	    __node_base* __prev_n = __this_n;
@@ -1029,12 +1033,13 @@
       _M_bucket_count = __ht._M_bucket_count;
       _M_before_begin._M_nxt = __ht._M_before_begin._M_nxt;
       _M_element_count = __ht._M_element_count;
+      _M_bbegin_bkt = __ht._M_bbegin_bkt;
       std::__alloc_on_move(this->_M_node_allocator(), __ht._M_node_allocator());
 
       // Fix buckets containing the _M_before_begin pointers that can't be
       // moved.
       if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+	_M_buckets[_M_bbegin_bkt] = &_M_before_begin;
       __ht._M_reset();
     }
 
@@ -1131,7 +1136,8 @@
       _M_bucket_count(__ht._M_bucket_count),
       _M_before_begin(__ht._M_before_begin._M_nxt),
       _M_element_count(__ht._M_element_count),
-      _M_rehash_policy(__ht._M_rehash_policy)
+      _M_rehash_policy(__ht._M_rehash_policy),
+      _M_bbegin_bkt(__ht._M_bbegin_bkt)
     {
       // Update, if necessary, buckets if __ht is using its single bucket.
       if (__ht._M_uses_single_bucket())
@@ -1143,7 +1149,7 @@
       // Update, if necessary, bucket pointing to before begin that hasn't
       // moved.
       if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+	_M_buckets[_M_bbegin_bkt] = &_M_before_begin;
 
       __ht._M_reset();
     }
@@ -1183,7 +1189,8 @@
       _M_buckets(nullptr),
       _M_bucket_count(__ht._M_bucket_count),
       _M_element_count(__ht._M_element_count),
-      _M_rehash_policy(__ht._M_rehash_policy)
+      _M_rehash_policy(__ht._M_rehash_policy),
+      _M_bbegin_bkt(__ht._M_bbegin_bkt)
     {
       if (__ht._M_node_allocator() == this->_M_node_allocator())
 	{
@@ -1199,7 +1206,7 @@
 	  // Update, if necessary, bucket pointing to before begin that hasn't
 	  // moved.
 	  if (_M_begin())
-	    _M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+	    _M_buckets[_M_bbegin_bkt] = &_M_before_begin;
 	  __ht._M_reset();
 	}
       else
@@ -1265,15 +1272,15 @@
       std::swap(_M_before_begin._M_nxt, __x._M_before_begin._M_nxt);
       std::swap(_M_element_count, __x._M_element_count);
       std::swap(_M_single_bucket, __x._M_single_bucket);
+      std::swap(_M_bbegin_bkt, __x._M_bbegin_bkt);
 
       // Fix buckets containing the _M_before_begin pointers that can't be
       // swapped.
       if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+	_M_buckets[_M_bbegin_bkt] = &_M_before_begin;
 
       if (__x._M_begin())
-	__x._M_buckets[__x._M_bucket_index(__x._M_begin())]
-	  = &__x._M_before_begin;
+	__x._M_buckets[__x._M_bbegin_bkt] = &__x._M_before_begin;
     }
 
   template<typename _Key, typename _Value,
@@ -1466,7 +1473,8 @@
 	  if (__node->_M_nxt)
 	    // We must update former begin bucket that is pointing to
 	    // _M_before_begin.
-	    _M_buckets[_M_bucket_index(__node->_M_next())] = __node;
+	    _M_buckets[_M_bbegin_bkt] = __node;
+	  _M_bbegin_bkt = __bkt;
 	  _M_buckets[__bkt] = &_M_before_begin;
 	}
     }
@@ -1974,7 +1982,7 @@
       __bucket_type* __new_buckets = _M_allocate_buckets(__n);
       __node_type* __p = _M_begin();
       _M_before_begin._M_nxt = nullptr;
-      std::size_t __bbegin_bkt = 0;
+
       while (__p)
 	{
 	  __node_type* __next = __p->_M_next();
@@ -1985,8 +1993,8 @@
 	      _M_before_begin._M_nxt = __p;
 	      __new_buckets[__bkt] = &_M_before_begin;
 	      if (__p->_M_nxt)
-		__new_buckets[__bbegin_bkt] = __p;
-	      __bbegin_bkt = __bkt;
+		__new_buckets[_M_bbegin_bkt] = __p;
+	      _M_bbegin_bkt = __bkt;
 	    }
 	  else
 	    {
@@ -2016,7 +2024,6 @@
 
       __node_type* __p = _M_begin();
       _M_before_begin._M_nxt = nullptr;
-      std::size_t __bbegin_bkt = 0;
       std::size_t __prev_bkt = 0;
       __node_type* __prev_p = nullptr;
       bool __check_bucket = false;
@@ -2064,8 +2071,8 @@
 		  _M_before_begin._M_nxt = __p;
 		  __new_buckets[__bkt] = &_M_before_begin;
 		  if (__p->_M_nxt)
-		    __new_buckets[__bbegin_bkt] = __p;
-		  __bbegin_bkt = __bkt;
+		    __new_buckets[_M_bbegin_bkt] = __p;
+		  _M_bbegin_bkt = __bkt;
 		}
 	      else
 		{

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: hashtable optimization
  2015-02-18  9:36 hashtable optimization François Dumont
  2015-02-18  9:37 ` François Dumont
@ 2015-02-20 13:28 ` Jonathan Wakely
  2015-02-22 18:45   ` François Dumont
  1 sibling, 1 reply; 4+ messages in thread
From: Jonathan Wakely @ 2015-02-20 13:28 UTC (permalink / raw)
  To: François Dumont; +Cc: libstdc++, gcc-patches

On 18/02/15 10:35 +0100, FranÃ§ois Dumont wrote:
>Hello
>
>    I am still studying hashtable performances and especially how to 
>reduce overhead compared to tr1 implementation. Most of the overhead 
>is coming from the additional modulo operations required with the new 
>data model. Having a closer look at PR 57885 bench I realized that we 
>can quite easily avoid an important modulo operation in 
>_M_insert_bucket_begin thanks to an additional std::size_t in the 
>container.
>
>    The patch is quite straightforward, it optimizes insertion of a 
>node in an empty bucket which is the most important kind of insertion 
>as long as hash functor is doing a good job as per default we should 
>have at most 1 element per buckets:
>
>Without patch:
>Container:std::unordered_map<int,int>  Key:int
>Insertion: 1106 671 634 634 635  min=634 max=1106
>
>Container:std::tr1::unordered_map<int,int>  Key:int
>Insertion: 885 491 487 490 511  min=487 max=885
>
>With patch:
>Container:std::unordered_map<int,int>  Key:int
>Insertion: 1099 581 583 584 592  min=581 max=1099
>
>Container:std::tr1::unordered_map<int,int>  Key:int
>Insertion: 889 491 519 492 493  min=491 max=889
>
>    I prefer to propose it now because it impacts ABI.
>
>2015-02-19  FranÃ§ois Dumont  <fdumont@gcc.gnu.org>
>
>    * include/bits/hashtable.h (_Hashtable<>::_M_bbegin_bkt): New, bucket
>    pointing to _M_before_begin.
>    (_Hashtable<>): Maintain and use later.
>
>Tested under Linux x86_64.

I'm nervous about this kind of change at this stage. The patch looks
right, but are you sure you've covered *every* case where the cached
value needs to be updated?

Since this only affects performance, not correctness, it might be hard
to convince the release managers it is necessary during stage 4, and
I'm not entirely convinced myself either.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: hashtable optimization
  2015-02-20 13:28 ` Jonathan Wakely
@ 2015-02-22 18:45   ` François Dumont
  0 siblings, 0 replies; 4+ messages in thread
From: François Dumont @ 2015-02-22 18:45 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches

On 20/02/2015 14:22, Jonathan Wakely wrote:
> On 18/02/15 10:35 +0100, FranÃ§ois Dumont wrote:
>> Hello
>>
>>    I am still studying hashtable performances and especially how to 
>> reduce overhead compared to tr1 implementation. Most of the overhead 
>> is coming from the additional modulo operations required with the new 
>> data model. Having a closer look at PR 57885 bench I realized that we 
>> can quite easily avoid an important modulo operation in 
>> _M_insert_bucket_begin thanks to an additional std::size_t in the 
>> container.
>>
>>    The patch is quite straightforward, it optimizes insertion of a 
>> node in an empty bucket which is the most important kind of insertion 
>> as long as hash functor is doing a good job as per default we should 
>> have at most 1 element per buckets:
>>
>> Without patch:
>> Container:std::unordered_map<int,int>  Key:int
>> Insertion: 1106 671 634 634 635  min=634 max=1106
>>
>> Container:std::tr1::unordered_map<int,int>  Key:int
>> Insertion: 885 491 487 490 511  min=487 max=885
>>
>> With patch:
>> Container:std::unordered_map<int,int>  Key:int
>> Insertion: 1099 581 583 584 592  min=581 max=1099
>>
>> Container:std::tr1::unordered_map<int,int>  Key:int
>> Insertion: 889 491 519 492 493  min=491 max=889
>>
>>    I prefer to propose it now because it impacts ABI.
>>
>> 2015-02-19  FranÃ§ois Dumont  <fdumont@gcc.gnu.org>
>>
>>    * include/bits/hashtable.h (_Hashtable<>::_M_bbegin_bkt): New, bucket
>>    pointing to _M_before_begin.
>>    (_Hashtable<>): Maintain and use later.
>>
>> Tested under Linux x86_64.
>
> I'm nervous about this kind of change at this stage. The patch looks
> right, but are you sure you've covered *every* case where the cached
> value needs to be updated?
>
> Since this only affects performance, not correctness, it might be hard
> to convince the release managers it is necessary during stage 4, and
> I'm not entirely convinced myself either.
>
     Ok, I understand, lets keep it for later then. I am also still 
looking for the best implementation, maybe me or someone else will come 
with the perfect one having tr1 performances for insert/lookup and std 
performances for iteration/erase.

FranÃ§ois

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-02-22 18:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-18  9:36 hashtable optimization François Dumont
2015-02-18  9:37 ` François Dumont
2015-02-20 13:28 ` Jonathan Wakely
2015-02-22 18:45   ` François Dumont

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).