[Bug libstdc++/65641] New: unordered_map - __detail::_Mod_range

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug libstdc++/65641] New: unordered_map - __detail::_Mod_range_hashing is slow
@ 2015-03-31 14:20 j.breitbart at tum dot de
  2015-03-31 17:29 ` [Bug libstdc++/65641] " redi at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: j.breitbart at tum dot de @ 2015-03-31 14:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65641

            Bug ID: 65641
           Summary: unordered_map - __detail::_Mod_range_hashing is slow
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: j.breitbart at tum dot de

Created attachment 35192
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35192&action=edit
Small benchmark for our unordered_map change

Hi,

we have been using std::unordered_map with a pointer as the key in one of our
applications and analysis showed that the find() function is one of two
performance bottlenecks. Further analysis showed that about 40% of the total
application runtime is spent in a single x86 divq instruction coming from
std::__detail::_Mod_range_hashing. We think that using a modulo operation
(translated to divq x86 instruction) all the time is suboptimal and have
attached a simple example to show the benefits that can be achieved by
replacing the modulo operation by masking.

Example code (attachment)
-------------------------
We specialized the _Hashtable template to insert our own implementation of
__detail::_Mod_range_hashing. In general the attached code should only be
considered a demo for the performance increase possible, and not be considered
a good solution.

Benchmark
---------
The example does 50,000,000 emplace and 50,000,000 find operations on an
unordered_map. The test system is a Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz
using gcc version 4.9.1 (Ubuntu 4.9.1-16ubuntu6).

Here are the performance results for the current implementation:
$ g++ -Wall -Wextra -O3 -std=c++11 umap_test.cpp && ./a.out 
runtime(s) emplace = 3.09947
runtime(s) find = 6.67535

Here is our optimization.
$ g++ -Wall -Wextra -O3 -std=c++11 -DLESSDIV umap_test.cpp && ./a.out 
runtime(s) emplace = 2.21004
runtime(s) find = 2.77398

Related work
------------
Facebooks folly uses a similar approach to what we do, but relies on a fixed
bucket count. libcxx uses masking to compute the bucket number only if the
number of buckets is a power of two.

Getting the change upstream
---------------------------
If there is any interest we would be happy to help out, but we are afraid that
it requires an ABI change, as we must store a mask for every unordered_map
(unless using libcxx's approach).


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libstdc++/65641] unordered_map - __detail::_Mod_range_hashing is slow
  2015-03-31 14:20 [Bug libstdc++/65641] New: unordered_map - __detail::_Mod_range_hashing is slow j.breitbart at tum dot de
@ 2015-03-31 17:29 ` redi at gcc dot gnu.org
  2015-04-02 12:37 ` j.breitbart at tum dot de
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: redi at gcc dot gnu.org @ 2015-03-31 17:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65641

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
An ABI change is not an option, although an alternative functor could be
provided as an optional extension.

There was a related thread a year ago starting at
https://gcc.gnu.org/ml/libstdc++/2014-03/msg00024.html


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libstdc++/65641] unordered_map - __detail::_Mod_range_hashing is slow
  2015-03-31 14:20 [Bug libstdc++/65641] New: unordered_map - __detail::_Mod_range_hashing is slow j.breitbart at tum dot de
  2015-03-31 17:29 ` [Bug libstdc++/65641] " redi at gcc dot gnu.org
@ 2015-04-02 12:37 ` j.breitbart at tum dot de
  2015-04-02 12:56 ` redi at gcc dot gnu.org
  2015-05-02 18:35 ` glisse at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: j.breitbart at tum dot de @ 2015-04-02 12:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65641

--- Comment #2 from Jens Breitbart <j.breitbart at tum dot de> ---
Thanks for the link. I am not sure if there is really any benefit of using
libdivide instead of the masking.

I'll attach a first version of patch in which the functor stores the mask. Any
comments welcome, I am not familiar with the library.

Another possible solution would be to allow the number of buckets to be a power
of two, as one can easily compute the mask for such cases. This could be
triggered by the user explicitly calling rehash() with a power of two as the
parameter. Increasing the number of buckets would only increase to another
power of two. _Mod_range_hashing could check if the number of buckets is a
power of two and use masking in that case. This would not require an ABI
change.

Any chance of getting such a change upstream? As far as I can see, there seems
to be no easy way to have the unorered_map use our folding functor instead of
_Mod_range_hashing or am I missing something?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libstdc++/65641] unordered_map - __detail::_Mod_range_hashing is slow
  2015-03-31 14:20 [Bug libstdc++/65641] New: unordered_map - __detail::_Mod_range_hashing is slow j.breitbart at tum dot de
  2015-03-31 17:29 ` [Bug libstdc++/65641] " redi at gcc dot gnu.org
  2015-04-02 12:37 ` j.breitbart at tum dot de
@ 2015-04-02 12:56 ` redi at gcc dot gnu.org
  2015-05-02 18:35 ` glisse at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: redi at gcc dot gnu.org @ 2015-04-02 12:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65641

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-04-02
                 CC|                            |fdumont at gcc dot gnu.org
     Ever confirmed|0                           |1
           Severity|normal                      |enhancement

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jens Breitbart from comment #2)
> Another possible solution would be to allow the number of buckets to be a
> power of two, as one can easily compute the mask for such cases. This could
> be triggered by the user explicitly calling rehash() with a power of two as
> the parameter. Increasing the number of buckets would only increase to
> another power of two. _Mod_range_hashing could check if the number of
> buckets is a power of two and use masking in that case. This would not
> require an ABI change.

That sounds promising, and worth pursuing.

> Any chance of getting such a change upstream?

I don't see why not, although unless you have a GCC copyright assignment on
file, or plan to get one (immediately, since it can take a while) it's better
*not* to give us a patch, because we can't use it anyway and there can be no
danger of using your code if we don't see it!

> As far as I can see, there
> seems to be no easy way to have the unorered_map use our folding functor
> instead of _Mod_range_hashing or am I missing something?

I think you would need to use the _Hastable class template directly, rather
than via std::unordered_map. In theory that allows you to re-use the internals
with different policies, but in practice it's not very easy.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libstdc++/65641] unordered_map - __detail::_Mod_range_hashing is slow
  2015-03-31 14:20 [Bug libstdc++/65641] New: unordered_map - __detail::_Mod_range_hashing is slow j.breitbart at tum dot de
                   ` (2 preceding siblings ...)
  2015-04-02 12:56 ` redi at gcc dot gnu.org
@ 2015-05-02 18:35 ` glisse at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: glisse at gcc dot gnu.org @ 2015-05-02 18:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65641

--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> ---
Currently, the only implemented policy uses primes from a hard-coded list for
the number of buckets. This makes it easy to precompute (and hard-code in the
library) anything that may be helpful to speed-up modulo computation. With a
number of buckets that is a power of 2, modulo computation becomes trivial
(masking). However, the simplistic specialization of std::hash for pointers in
libstdc++ means that all double* hash to a multiple of 8. So we would need to
add some scrambling somewhere to avoid leaving most buckets empty in
unordered_set<double*>.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-05-02 18:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-31 14:20 [Bug libstdc++/65641] New: unordered_map - __detail::_Mod_range_hashing is slow j.breitbart at tum dot de
2015-03-31 17:29 ` [Bug libstdc++/65641] " redi at gcc dot gnu.org
2015-04-02 12:37 ` j.breitbart at tum dot de
2015-04-02 12:56 ` redi at gcc dot gnu.org
2015-05-02 18:35 ` glisse at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).