From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-427811-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 22778 invoked by alias); 14 Aug 2013 11:16:20 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 22750 invoked by uid 48); 14 Aug 2013 11:16:19 -0000
From: "temporal at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libstdc++/58153] unordered_multimap::erase(iterator) is not constant-time when many entries have the same key
Date: Wed, 14 Aug 2013 11:16:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: libstdc++
X-Bugzilla-Version: 4.8.1
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: temporal at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-58153-4-taxSHLukek@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-58153-4@http.gcc.gnu.org/bugzilla/>
References: <bug-58153-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-08/txt/msg00735.txt.bz2

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58153
--- Comment #2 from Kenton Varda <temporal at gmail dot com> ---
> The standard says average case O(1), worst case O(a.size()), so if every
> element in the container has the same key then it's O(n).

I'm not sure that follows.  Yes, the standard says "worst case O(n)", but why
should expect worst-case performance if I have lots of entries with the same
key?  In the absence of explicit language to the contrary, it seems entirely
reasonable to expect unordered_multimap<K, V> to perform similarly to
unordered_map<K, list<V>>, where in fact it turns out to be more like
unordered_map<K, slist<V>>.

I guess the real bug here is with all of the C++ reference sites that fail to
mention this rather large caveat...  that unordered_multimap supports duplicate
keys, but only as long as there aren't "too many" duplicates.

I think part of the problem might be that people are thinking: "Well, duh, hash
maps perform poorly if lots of keys have the same hash!".  But I don't think
that's quite the same thing.  Unique hash maps perform poorly in that case
because it's necessary to linearly scan through many entries to find the one
that matches the key.  But when looking for a key in a multimap, you stop at
the first match, so even if there are lots of entries with the same key, lookup
can still be O(1).  So, I don't think it's intrinsically true that a
hashtable-based multimap should perform badly when many entries have the same
key.