From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19603 invoked by alias); 31 Mar 2015 13:54:19 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 19548 invoked by uid 48); 31 Mar 2015 13:54:16 -0000 From: "j.breitbart at tum dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/65641] New: unordered_map - __detail::_Mod_range_hashing is slow Date: Tue, 31 Mar 2015 14:20:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: unknown X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: j.breitbart at tum dot de X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-03/txt/msg03530.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65641 Bug ID: 65641 Summary: unordered_map - __detail::_Mod_range_hashing is slow Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: j.breitbart at tum dot de Created attachment 35192 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35192&action=edit Small benchmark for our unordered_map change Hi, we have been using std::unordered_map with a pointer as the key in one of our applications and analysis showed that the find() function is one of two performance bottlenecks. Further analysis showed that about 40% of the total application runtime is spent in a single x86 divq instruction coming from std::__detail::_Mod_range_hashing. We think that using a modulo operation (translated to divq x86 instruction) all the time is suboptimal and have attached a simple example to show the benefits that can be achieved by replacing the modulo operation by masking. Example code (attachment) ------------------------- We specialized the _Hashtable template to insert our own implementation of __detail::_Mod_range_hashing. In general the attached code should only be considered a demo for the performance increase possible, and not be considered a good solution. Benchmark --------- The example does 50,000,000 emplace and 50,000,000 find operations on an unordered_map. The test system is a Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz using gcc version 4.9.1 (Ubuntu 4.9.1-16ubuntu6). Here are the performance results for the current implementation: $ g++ -Wall -Wextra -O3 -std=c++11 umap_test.cpp && ./a.out runtime(s) emplace = 3.09947 runtime(s) find = 6.67535 Here is our optimization. $ g++ -Wall -Wextra -O3 -std=c++11 -DLESSDIV umap_test.cpp && ./a.out runtime(s) emplace = 2.21004 runtime(s) find = 2.77398 Related work ------------ Facebooks folly uses a similar approach to what we do, but relies on a fixed bucket count. libcxx uses masking to compute the bucket number only if the number of buckets is a power of two. Getting the change upstream --------------------------- If there is any interest we would be happy to help out, but we are afraid that it requires an ABI change, as we must store a mask for every unordered_map (unless using libcxx's approach).