public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/58153] New: unordered_multimap::erase(iterator) is not constant-time when many entries have the same key
@ 2013-08-14 10:11 temporal at gmail dot com
  2013-08-14 10:35 ` [Bug libstdc++/58153] " redi at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: temporal at gmail dot com @ 2013-08-14 10:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58153

            Bug ID: 58153
           Summary: unordered_multimap::erase(iterator) is not
                    constant-time when many entries have the same key
           Product: gcc
           Version: 4.8.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: temporal at gmail dot com

It appears that if an unordered_multimap has k entries with the same key, then
erase(iter) for any of those entries is O(k) rather than constant-time.  The
problem is that _Hashtable::erase() searches through all nodes in the bucket
looking for the one previous to the one being removed.  This is reasonable for
unordered_map, where buckets are expected to have no more than a couple
entries.  But it is surprising for unordered_multimap, whose whole purpose is
to support multiple entries with the same key (and therefore the same bucket).

I do not know exactly what the standard requires here, but all of the
references I can find claim that erase(iter) should be average-time O(1), and
none of them suggest that having a large number of entries with the same key
should cause trouble.

FWIW, it looks like libc++ has the same behavior.  Maybe my expectations were
wrong, and unordered_multimap was never meant to contain more than a couple
entries per key?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/58153] unordered_multimap::erase(iterator) is not constant-time when many entries have the same key
  2013-08-14 10:11 [Bug libstdc++/58153] New: unordered_multimap::erase(iterator) is not constant-time when many entries have the same key temporal at gmail dot com
@ 2013-08-14 10:35 ` redi at gcc dot gnu.org
  2013-08-14 11:16 ` temporal at gmail dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2013-08-14 10:35 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58153

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Kenton Varda from comment #0)
> I do not know exactly what the standard requires here, but all of the
> references I can find claim that erase(iter) should be average-time O(1),
> and none of them suggest that having a large number of entries with the same
> key should cause trouble.

The standard says average case O(1), worst case O(a.size()), so if every
element in the container has the same key then it's O(n).

> FWIW, it looks like libc++ has the same behavior.  Maybe my expectations
> were wrong, and unordered_multimap was never meant to contain more than a
> couple entries per key?

It supports any number of entries with the same key, you just can't expect O(1)
erase operations in that case.  If you have very many collisions maybe your
choice of key or your choice of container isn't ideal.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/58153] unordered_multimap::erase(iterator) is not constant-time when many entries have the same key
  2013-08-14 10:11 [Bug libstdc++/58153] New: unordered_multimap::erase(iterator) is not constant-time when many entries have the same key temporal at gmail dot com
  2013-08-14 10:35 ` [Bug libstdc++/58153] " redi at gcc dot gnu.org
@ 2013-08-14 11:16 ` temporal at gmail dot com
  2013-08-24 16:30 ` fdumont at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: temporal at gmail dot com @ 2013-08-14 11:16 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58153

--- Comment #2 from Kenton Varda <temporal at gmail dot com> ---
> The standard says average case O(1), worst case O(a.size()), so if every
> element in the container has the same key then it's O(n).

I'm not sure that follows.  Yes, the standard says "worst case O(n)", but why
should expect worst-case performance if I have lots of entries with the same
key?  In the absence of explicit language to the contrary, it seems entirely
reasonable to expect unordered_multimap<K, V> to perform similarly to
unordered_map<K, list<V>>, where in fact it turns out to be more like
unordered_map<K, slist<V>>.

I guess the real bug here is with all of the C++ reference sites that fail to
mention this rather large caveat...  that unordered_multimap supports duplicate
keys, but only as long as there aren't "too many" duplicates.

I think part of the problem might be that people are thinking: "Well, duh, hash
maps perform poorly if lots of keys have the same hash!".  But I don't think
that's quite the same thing.  Unique hash maps perform poorly in that case
because it's necessary to linearly scan through many entries to find the one
that matches the key.  But when looking for a key in a multimap, you stop at
the first match, so even if there are lots of entries with the same key, lookup
can still be O(1).  So, I don't think it's intrinsically true that a
hashtable-based multimap should perform badly when many entries have the same
key.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/58153] unordered_multimap::erase(iterator) is not constant-time when many entries have the same key
  2013-08-14 10:11 [Bug libstdc++/58153] New: unordered_multimap::erase(iterator) is not constant-time when many entries have the same key temporal at gmail dot com
  2013-08-14 10:35 ` [Bug libstdc++/58153] " redi at gcc dot gnu.org
  2013-08-14 11:16 ` temporal at gmail dot com
@ 2013-08-24 16:30 ` fdumont at gcc dot gnu.org
  2013-08-24 19:16 ` temporal at gmail dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: fdumont at gcc dot gnu.org @ 2013-08-24 16:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58153

--- Comment #3 from François Dumont <fdumont at gcc dot gnu.org> ---
This report entry made me wonder why iterators could not just be pointing to
the node just before the one containing the pointed to value. For instance
begin() iterator would contained the before_begin one. This is rather
consistent with the data structure as we are using some kind of forward_list
which only propose an O(1) erase_after method. In this case erase(iterator)
will work just like erase_after.

I am going to challenge this approach and will let you know if it has no
drawback.
>From gcc-bugs-return-428342-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Sat Aug 24 18:10:25 2013
Return-Path: <gcc-bugs-return-428342-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 21815 invoked by alias); 24 Aug 2013 18:10:25 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 21801 invoked by uid 48); 24 Aug 2013 18:10:23 -0000
From: "tammy at Cadence dot COM" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/58208] deque<std::string> 32-bit "-O3" bug
Date: Sat, 24 Aug 2013 18:10:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 4.8.1
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: tammy at Cadence dot COM
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-58208-4-apTMmnTAsM@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-58208-4@http.gcc.gnu.org/bugzilla/>
References: <bug-58208-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-08/txt/msg01266.txt.bz2
Content-length: 440

http://gcc.gnu.org/bugzilla/show_bug.cgi?idX208

--- Comment #4 from Tammy Hsu <tammy at Cadence dot COM> ---
Sorry, I forgot to mention that I need to set LD_LIBRARY_PATH to "." to run
import.
I build the gcc473 and gcc481 by using the same configuration and on the same
RHEL 5.5 system, however the gcc473 version works and gcc481 with -m32
crash....

May I know how you configure your gcc481 or if you have added any extra bug
fixes?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/58153] unordered_multimap::erase(iterator) is not constant-time when many entries have the same key
  2013-08-14 10:11 [Bug libstdc++/58153] New: unordered_multimap::erase(iterator) is not constant-time when many entries have the same key temporal at gmail dot com
                   ` (2 preceding siblings ...)
  2013-08-24 16:30 ` fdumont at gcc dot gnu.org
@ 2013-08-24 19:16 ` temporal at gmail dot com
  2013-08-26 20:06 ` fdumont at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: temporal at gmail dot com @ 2013-08-24 19:16 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58153

--- Comment #4 from Kenton Varda <temporal at gmail dot com> ---
> This report entry made me wonder why iterators could not just be
> pointing to the node just before the one containing the pointed to value.

That's a neat idea.

I think there is an obscure issue, though.  If I have an iterator pointing at
item N, and then I (separately) erase item N - 1, what happens to my iterator?

But you bring up another, simpler point:  why not just have an erase_after()
method like forward_list does?  That would suit my use case (although at this
point I've rewritten it to do something different).


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/58153] unordered_multimap::erase(iterator) is not constant-time when many entries have the same key
  2013-08-14 10:11 [Bug libstdc++/58153] New: unordered_multimap::erase(iterator) is not constant-time when many entries have the same key temporal at gmail dot com
                   ` (3 preceding siblings ...)
  2013-08-24 19:16 ` temporal at gmail dot com
@ 2013-08-26 20:06 ` fdumont at gcc dot gnu.org
  2013-08-27  3:10 ` temporal at gmail dot com
  2015-04-09 14:49 ` redi at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: fdumont at gcc dot gnu.org @ 2013-08-26 20:06 UTC (permalink / raw)
  To: gcc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 6148 bytes --]

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58153

--- Comment #5 from François Dumont <fdumont at gcc dot gnu.org> ---
And your remark is good too and will avoid me to spend some time on this idea.
Standard requirements regarding validity of iterators won't let us have
iterators invalidated because another iterator is erased, I do not need to
challenge it. For information, on my side I was more concerned about how I
would represent the past-the-end iterator.

Regarding the idea of having an erase_after method, usage of a forward_list
data structure for the hashtable implementation is a libstdc++ implementation
detail, not a Standard requirement. So Standard committee hasn't design it with
such a design decision. Our choice of a forward_list data structure is at the
moment the best tradeoff we came too but of course you are free to help us find
a better one. Note that I remember having tried to use a doubly linked list
when reviewing hashtable implementation but the memory overhead was resulting
in worst performance.

As you seem to have a lot of equivalent keys in your unordered_multimap<key,
value> you should perhaps move to an unordered_map<key, Container<value>>. I
even wonder if the libstdc++ profile mode could detect such an opportunity,
potentially saying what type of container should be used.
>From gcc-bugs-return-428411-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon Aug 26 20:27:36 2013
Return-Path: <gcc-bugs-return-428411-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 8384 invoked by alias); 26 Aug 2013 20:27:36 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 8345 invoked by uid 48); 26 Aug 2013 20:27:32 -0000
From: "martin.konopka at stuba dot sk" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug regression/58244] global variable: many THOUSANDS times slower execution
Date: Mon, 26 Aug 2013 20:27:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: regression
X-Bugzilla-Version: 4.7.3
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: martin.konopka at stuba dot sk
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-58244-4-LKZDXIk8dQ@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-58244-4@http.gcc.gnu.org/bugzilla/>
References: <bug-58244-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-08/txt/msg01335.txt.bz2
Content-length: 332

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58244

--- Comment #8 from Martin Konôpka <martin.konopka at stuba dot sk> ---
Yes, I understand now. Thanks. The lines with the sin() functions were not
evaluated with the local declaration. My apologies for reporting the false bug.
I will try to close the bug if I am allowed.
>From gcc-bugs-return-428412-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon Aug 26 22:02:32 2013
Return-Path: <gcc-bugs-return-428412-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 21640 invoked by alias); 26 Aug 2013 22:02:32 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 21598 invoked by uid 48); 26 Aug 2013 22:02:27 -0000
From: "dj at redhat dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug other/58238] cc1 crashes when built for ms-dos cross-compiling
Date: Mon, 26 Aug 2013 22:02:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: other
X-Bugzilla-Version: 4.7.3
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: dj at redhat dot com
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: dj at redhat dot com
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_status cf_reconfirmed_on cc assigned_to everconfirmed attachments.created
Message-ID: <bug-58238-4-czhzrmhFx5@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-58238-4@http.gcc.gnu.org/bugzilla/>
References: <bug-58238-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-08/txt/msg01336.txt.bz2
Content-length: 834

http://gcc.gnu.org/bugzilla/show_bug.cgi?idX238

DJ Delorie <dj at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2013-08-26
                 CC|                            |dj at redhat dot com
           Assignee|unassigned at gcc dot gnu.org      |dj at redhat dot com
     Ever confirmed|0                           |1

--- Comment #1 from DJ Delorie <dj at redhat dot com> ---
Created attachment 30702
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id0702&actioníit
remove "signed" from internal type names

It seems gcc no longer permits "signed" in internal type names, so this patch
takes them out.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/58153] unordered_multimap::erase(iterator) is not constant-time when many entries have the same key
  2013-08-14 10:11 [Bug libstdc++/58153] New: unordered_multimap::erase(iterator) is not constant-time when many entries have the same key temporal at gmail dot com
                   ` (4 preceding siblings ...)
  2013-08-26 20:06 ` fdumont at gcc dot gnu.org
@ 2013-08-27  3:10 ` temporal at gmail dot com
  2015-04-09 14:49 ` redi at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: temporal at gmail dot com @ 2013-08-27  3:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58153

--- Comment #6 from Kenton Varda <temporal at gmail dot com> ---
Yep, I realize that erase_after would need to be added to the standard.  I was
just speculating that it may be something the standard committee should
consider.

I've long since solved my problem by changing my approach, so this bug is just
meant to raise awareness.  Feel free to ignore it if you don't think there's
anything to be done.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/58153] unordered_multimap::erase(iterator) is not constant-time when many entries have the same key
  2013-08-14 10:11 [Bug libstdc++/58153] New: unordered_multimap::erase(iterator) is not constant-time when many entries have the same key temporal at gmail dot com
                   ` (5 preceding siblings ...)
  2013-08-27  3:10 ` temporal at gmail dot com
@ 2015-04-09 14:49 ` redi at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2015-04-09 14:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58153

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |INVALID

--- Comment #7 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I don't think we're doing anything wrong here, our implementation is standard
conforming, and changing it would be difficult.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-04-09 14:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-14 10:11 [Bug libstdc++/58153] New: unordered_multimap::erase(iterator) is not constant-time when many entries have the same key temporal at gmail dot com
2013-08-14 10:35 ` [Bug libstdc++/58153] " redi at gcc dot gnu.org
2013-08-14 11:16 ` temporal at gmail dot com
2013-08-24 16:30 ` fdumont at gcc dot gnu.org
2013-08-24 19:16 ` temporal at gmail dot com
2013-08-26 20:06 ` fdumont at gcc dot gnu.org
2013-08-27  3:10 ` temporal at gmail dot com
2015-04-09 14:49 ` redi at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).