public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/95079] New: unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash.
@ 2020-05-12 11:00 redbeard0531 at gmail dot com
  2020-05-12 11:26 ` [Bug libstdc++/95079] " redi at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: redbeard0531 at gmail dot com @ 2020-05-12 11:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95079

            Bug ID: 95079
           Summary: unorderd_map::insert_or_assign and try_emplace should
                    only hash and mod once unless there is a rehash.
           Product: gcc
           Version: 10.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: redbeard0531 at gmail dot com
  Target Milestone: ---

Currently insert_or_assign() and try_emplace() call find(key) and fall back to
emplace(...) if that fails to find the key. The computed hash (and more
importantly in general) the modded bucket index computed by find() is thrown
away and recomputed by emplace(). Instead they should compute the hash once,
and unless there is a rehash, also only do the modulus once. This optimization
is already performed for operator[].

https://godbolt.org/z/cw82RC shows that the hasher is invoked once for
operator[] and twice for insert_or_assign().
http://quick-bench.com/ge8Suq7PcdRKm6IBQbjvwuXhW6Y shows that there is a
significant performance difference (20% in this test).

(I know std::unordered_map is always going to be less than fast on 64-bit
platforms, but it doesn't need to be slower than it needs to be...)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/95079] unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash.
  2020-05-12 11:00 [Bug libstdc++/95079] New: unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash redbeard0531 at gmail dot com
@ 2020-05-12 11:26 ` redi at gcc dot gnu.org
  2020-05-24 10:00 ` fdumont at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: redi at gcc dot gnu.org @ 2020-05-12 11:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95079

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-05-12

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Agreed, thanks for the analysis.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/95079] unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash.
  2020-05-12 11:00 [Bug libstdc++/95079] New: unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash redbeard0531 at gmail dot com
  2020-05-12 11:26 ` [Bug libstdc++/95079] " redi at gcc dot gnu.org
@ 2020-05-24 10:00 ` fdumont at gcc dot gnu.org
  2020-05-29 11:13 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: fdumont at gcc dot gnu.org @ 2020-05-24 10:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95079

François Dumont <fdumont at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |fdumont at gcc dot gnu.org
             Status|NEW                         |ASSIGNED
                 CC|                            |fdumont at gcc dot gnu.org

--- Comment #2 from François Dumont <fdumont at gcc dot gnu.org> ---
I'd like to take this one and already started working on it.

I've already many patches awaiting for the hashtable implementation but none
fixing this problem.

I also don't get the point about std::unordered_map performance issue on
64-bits platform. Maybe you should elaborate as part of another bug report.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/95079] unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash.
  2020-05-12 11:00 [Bug libstdc++/95079] New: unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash redbeard0531 at gmail dot com
  2020-05-12 11:26 ` [Bug libstdc++/95079] " redi at gcc dot gnu.org
  2020-05-24 10:00 ` fdumont at gcc dot gnu.org
@ 2020-05-29 11:13 ` cvs-commit at gcc dot gnu.org
  2021-02-28 14:39 ` fdumont at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-05-29 11:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95079

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Franथà¤ois Dumont
<fdumont@gcc.gnu.org>:

https://gcc.gnu.org/g:7688e5e8c4d46102a0cc0b9c25191ac7dde0d285

commit r11-719-g7688e5e8c4d46102a0cc0b9c25191ac7dde0d285
Author: François Dumont <fdumont@gcc.gnu.org>
Date:   Sun May 24 12:04:38 2020 +0200

    libstdc++: Review unordered_map insert_or_assign/try_emplace (PR 95079)

    Those methods are making a double lookup in case of insertion, they can
    perform only one.

            PR libstdc++/95079
            * include/bits/hashtable_policy.h (_Insert_base<>::try_emplace):
New.
            * include/bits/unordered_map.h (unordered_map<>::try_emplace):
Adapt.
            (unordered_map<>::insert_or_assign): Adapt.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/95079] unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash.
  2020-05-12 11:00 [Bug libstdc++/95079] New: unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash redbeard0531 at gmail dot com
                   ` (2 preceding siblings ...)
  2020-05-29 11:13 ` cvs-commit at gcc dot gnu.org
@ 2021-02-28 14:39 ` fdumont at gcc dot gnu.org
  2021-03-02 13:40 ` redbeard0531 at gmail dot com
  2021-03-02 16:49 ` fdumont at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: fdumont at gcc dot gnu.org @ 2021-02-28 14:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95079

François Dumont <fdumont at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #4 from François Dumont <fdumont at gcc dot gnu.org> ---
This should be considered as fixed now.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/95079] unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash.
  2020-05-12 11:00 [Bug libstdc++/95079] New: unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash redbeard0531 at gmail dot com
                   ` (3 preceding siblings ...)
  2021-02-28 14:39 ` fdumont at gcc dot gnu.org
@ 2021-03-02 13:40 ` redbeard0531 at gmail dot com
  2021-03-02 16:49 ` fdumont at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: redbeard0531 at gmail dot com @ 2021-03-02 13:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95079

--- Comment #5 from Mathias Stearn <redbeard0531 at gmail dot com> ---
@François Dumont: Sorry I didn't see your question earlier. The reason that
unordered_map perf hurts on 64-bit platforms is because it is designed to do a
size_t modulus-by-prime on every lookup, and on most platforms that is *very*
expensive (up to 100 cycles for 64 bits vs 20ish for 32 bits). Some very modern
CPUs have made improvements here, but it is still much more expensive than just
using power-of-2 buckets and masking, even if you need to hash the hash if you
don't trust the low order bits to have enough entropy. Unfortunately, fixing
this is a pretty big ABI break, so it isn't going to change any time soon.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libstdc++/95079] unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash.
  2020-05-12 11:00 [Bug libstdc++/95079] New: unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash redbeard0531 at gmail dot com
                   ` (4 preceding siblings ...)
  2021-03-02 13:40 ` redbeard0531 at gmail dot com
@ 2021-03-02 16:49 ` fdumont at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: fdumont at gcc dot gnu.org @ 2021-03-02 16:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95079

--- Comment #6 from François Dumont <fdumont at gcc dot gnu.org> ---
Thanks for the feedback.

If this is still a problem for you after this enhancement you should perhaps
try the _Power2_rehash_policy provided as an extension. In
testsuite/23_containers/unordered_set/insert/hash_policy.cc you'll see an
example with a unordered_set like container defined as:

template<typename _Value, typename _Hash,
         typename _Pred, typename _Alloc>
  using unordered_set_power2_rehash =
  std::_Hashtable<_Value, _Value, _Alloc,
                  std::__detail::_Identity,
                  _Pred,
                  _Hash,
                  std::__detail::_Mask_range_hashing,
                  std::__detail::_Default_ranged_hash,
                  std::__detail::_Power2_rehash_policy,
                  std::__detail::_Hashtable_traits<false, true, true>>;

As stated by its name _Power2_rehash_policy will make sure that number buckets
is a power of 2 and so make the modulo much trivial.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-03-02 16:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-12 11:00 [Bug libstdc++/95079] New: unorderd_map::insert_or_assign and try_emplace should only hash and mod once unless there is a rehash redbeard0531 at gmail dot com
2020-05-12 11:26 ` [Bug libstdc++/95079] " redi at gcc dot gnu.org
2020-05-24 10:00 ` fdumont at gcc dot gnu.org
2020-05-29 11:13 ` cvs-commit at gcc dot gnu.org
2021-02-28 14:39 ` fdumont at gcc dot gnu.org
2021-03-02 13:40 ` redbeard0531 at gmail dot com
2021-03-02 16:49 ` fdumont at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).