From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22778 invoked by alias); 14 Aug 2013 11:16:20 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 22750 invoked by uid 48); 14 Aug 2013 11:16:19 -0000 From: "temporal at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/58153] unordered_multimap::erase(iterator) is not constant-time when many entries have the same key Date: Wed, 14 Aug 2013 11:16:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: 4.8.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: temporal at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2013-08/txt/msg00735.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58153 --- Comment #2 from Kenton Varda --- > The standard says average case O(1), worst case O(a.size()), so if every > element in the container has the same key then it's O(n). I'm not sure that follows. Yes, the standard says "worst case O(n)", but why should expect worst-case performance if I have lots of entries with the same key? In the absence of explicit language to the contrary, it seems entirely reasonable to expect unordered_multimap to perform similarly to unordered_map>, where in fact it turns out to be more like unordered_map>. I guess the real bug here is with all of the C++ reference sites that fail to mention this rather large caveat... that unordered_multimap supports duplicate keys, but only as long as there aren't "too many" duplicates. I think part of the problem might be that people are thinking: "Well, duh, hash maps perform poorly if lots of keys have the same hash!". But I don't think that's quite the same thing. Unique hash maps perform poorly in that case because it's necessary to linearly scan through many entries to find the one that matches the key. But when looking for a key in a multimap, you stop at the first match, so even if there are lots of entries with the same key, lookup can still be O(1). So, I don't think it's intrinsically true that a hashtable-based multimap should perform badly when many entries have the same key.