From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-482386-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 19603 invoked by alias); 31 Mar 2015 13:54:19 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 19548 invoked by uid 48); 31 Mar 2015 13:54:16 -0000
From: "j.breitbart at tum dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libstdc++/65641] New: unordered_map - __detail::_Mod_range_hashing is slow
Date: Tue, 31 Mar 2015 14:20:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: libstdc++
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: j.breitbart at tum dot de
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter attachments.created
Message-ID: <bug-65641-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-03/txt/msg03530.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65641

            Bug ID: 65641
           Summary: unordered_map - __detail::_Mod_range_hashing is slow
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: j.breitbart at tum dot de

Created attachment 35192
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35192&action=edit
Small benchmark for our unordered_map change

Hi,

we have been using std::unordered_map with a pointer as the key in one of our
applications and analysis showed that the find() function is one of two
performance bottlenecks. Further analysis showed that about 40% of the total
application runtime is spent in a single x86 divq instruction coming from
std::__detail::_Mod_range_hashing. We think that using a modulo operation
(translated to divq x86 instruction) all the time is suboptimal and have
attached a simple example to show the benefits that can be achieved by
replacing the modulo operation by masking.

Example code (attachment)
-------------------------
We specialized the _Hashtable template to insert our own implementation of
__detail::_Mod_range_hashing. In general the attached code should only be
considered a demo for the performance increase possible, and not be considered
a good solution.

Benchmark
---------
The example does 50,000,000 emplace and 50,000,000 find operations on an
unordered_map. The test system is a Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz
using gcc version 4.9.1 (Ubuntu 4.9.1-16ubuntu6).

Here are the performance results for the current implementation:
$ g++ -Wall -Wextra -O3 -std=c++11 umap_test.cpp && ./a.out 
runtime(s) emplace = 3.09947
runtime(s) find = 6.67535

Here is our optimization.
$ g++ -Wall -Wextra -O3 -std=c++11 -DLESSDIV umap_test.cpp && ./a.out 
runtime(s) emplace = 2.21004
runtime(s) find = 2.77398

Related work
------------
Facebooks folly uses a similar approach to what we do, but relies on a fixed
bucket count. libcxx uses masking to compute the bucket number only if the
number of buckets is a power of two.

Getting the change upstream
---------------------------
If there is any interest we would be happy to help out, but we are afraid that
it requires an ABI change, as we must store a mask for every unordered_map
(unless using libcxx's approach).