public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/106275] New: unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness
@ 2022-07-12 22:18 cuzdav at gmail dot com
  2022-07-12 22:19 ` [Bug libstdc++/106275] " cuzdav at gmail dot com
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: cuzdav at gmail dot com @ 2022-07-12 22:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

            Bug ID: 106275
           Summary: unordered_map with std::string key,
                    std::hash<std::string>, and custom equality predicate
                    weirdness
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: cuzdav at gmail dot com
  Target Milestone: ---

g++12.1 (on linux), Does not occur on GCC 11 or earlier.

std::unordered_map<std::string, std::hash<std::string>, CustomPred> acts
strangely such that finds do not seem to use the hasher properly, and seem to
use a linear search, invoking the equality predicate against every key.  

A custom hasher implemented in terms of std::hash<string> fixes it, as does
using the default equality predicate.  It also does not happen with key type
of, say, "int".  I've only seen it for std::string (in my limited
experimentation.)

I added output to the predicate to indicate when it's called, and it shows
excessive calls printed when "SHOWBUG" macro is defined.


////////////////////////////////////////
#include <functional>
#include <iostream>
#include <string>
#include <unordered_map>

struct EqualToWrapper{
    bool operator()(const std::string& key1, const std::string& key2) const {
        std::cout << "equal_to(key1=" << key1 << ", key2=" << key2 << ")\n";
        return std::equal_to<>{}(key1, key2);
    }
};

#ifdef SHOWBUG
using UsageMap =  std::unordered_map<std::string, int, std::hash<std::string>,
EqualToWrapper>;
#else
struct MyHash : public std::hash<std::string> {};
using UsageMap =  std::unordered_map<std::string, int, MyHash, EqualToWrapper>;
#endif

int main() {
    UsageMap m;
    m.insert(std::make_pair("A", 111));
    m.insert(std::make_pair("B", 222));
    m.insert(std::make_pair("C", 333));
    m.insert(std::make_pair("D", 444));
    m.insert(std::make_pair("E", 555));
    m.insert(std::make_pair("F", 666));
    m.find("foo");
}
////////////////////////////////


With a custom equality predicate and my derived-from-std::hash hasher, output
on g++ 12 is:

  equal_to(key1=C, key2=A)   

When run with -DSHOWBUG macro defined, output is:
  equal_to(key1=B, key2=A)
  equal_to(key1=C, key2=B)
  equal_to(key1=C, key2=A)
  equal_to(key1=D, key2=B)
  equal_to(key1=D, key2=C)
  equal_to(key1=D, key2=A)
  equal_to(key1=E, key2=B)
  equal_to(key1=E, key2=D)
  equal_to(key1=E, key2=C)
  equal_to(key1=E, key2=A)
  equal_to(key1=F, key2=E)
  equal_to(key1=F, key2=B)
  equal_to(key1=F, key2=D)
  equal_to(key1=F, key2=C)
  equal_to(key1=F, key2=A)
  equal_to(key1=foo, key2=F)
  equal_to(key1=foo, key2=E)
  equal_to(key1=foo, key2=B)
  equal_to(key1=foo, key2=D)
  equal_to(key1=foo, key2=C)
  equal_to(key1=foo, key2=A)



On Godbolt:
https://godbolt.org/z/GP5dox1qs



$ g++ -v 
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/opt/imc/gcc-12.1.0/libexec/gcc/x86_64-pc-linux-gnu/12.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-12.1.0/configure --prefix=/opt/imc/gcc-12.1.0
--enable-languages=c,c++,fortran,lto --disable-multilib
--with-build-time-tools=/build/INSTALLDIR//opt/imc/gcc-12.1.0/bin
--enable-libstdcxx-time=rt
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.1.0 (GCC)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/106275] unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness
  2022-07-12 22:18 [Bug libstdc++/106275] New: unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness cuzdav at gmail dot com
@ 2022-07-12 22:19 ` cuzdav at gmail dot com
  2022-07-12 22:49 ` redi at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: cuzdav at gmail dot com @ 2022-07-12 22:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

--- Comment #1 from Chris Uzdavinis <cuzdav at gmail dot com> ---
Created attachment 53293
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53293&action=edit
preprocessed code

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/106275] unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness
  2022-07-12 22:18 [Bug libstdc++/106275] New: unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness cuzdav at gmail dot com
  2022-07-12 22:19 ` [Bug libstdc++/106275] " cuzdav at gmail dot com
@ 2022-07-12 22:49 ` redi at gcc dot gnu.org
  2022-07-12 22:51 ` redi at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2022-07-12 22:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
The difference in behaviour is due to r12-6272-ge3ef832a9e8d6a

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/106275] unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness
  2022-07-12 22:18 [Bug libstdc++/106275] New: unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness cuzdav at gmail dot com
  2022-07-12 22:19 ` [Bug libstdc++/106275] " cuzdav at gmail dot com
  2022-07-12 22:49 ` redi at gcc dot gnu.org
@ 2022-07-12 22:51 ` redi at gcc dot gnu.org
  2022-07-12 22:53 ` redi at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2022-07-12 22:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
  template<typename _Hash>
    struct _Hashtable_hash_traits
    {
      static constexpr std::size_t
      __small_size_threshold() noexcept
      { return std::__is_fast_hash<_Hash>::value ? 0 : 20; }
    };

This new trait determines whether to just do a linear search in a small
container:

  template<typename _Key, typename _Value, typename _Alloc,
           typename _ExtractKey, typename _Equal,
           typename _Hash, typename _RangeHash, typename _Unused,
           typename _RehashPolicy, typename _Traits>
    auto
    _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
               _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
    find(const key_type& __k)
    -> iterator
    {
      if (size() <= __small_size_threshold())
        {
          for (auto __it = begin(); __it != end(); ++__it)
            if (this->_M_key_equals(__k, *__it._M_cur))
              return __it;
          return end();
        }

      __hash_code __code = this->_M_hash_code(__k);
      std::size_t __bkt = _M_bucket_index(__code);
      return iterator(_M_find_node(__bkt, __k, __code));
    }

The __is_fast_hash trait is true for std::hash<std::string> but false for your
custom hash function.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/106275] unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness
  2022-07-12 22:18 [Bug libstdc++/106275] New: unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness cuzdav at gmail dot com
                   ` (2 preceding siblings ...)
  2022-07-12 22:51 ` redi at gcc dot gnu.org
@ 2022-07-12 22:53 ` redi at gcc dot gnu.org
  2022-07-12 23:00 ` redi at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2022-07-12 22:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #3)
> The __is_fast_hash trait is true for std::hash<std::string> but false for
> your custom hash function.

Oops, I mean the other way around! std::hash<std::string> is considered slow,
your custom hash function is not.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/106275] unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness
  2022-07-12 22:18 [Bug libstdc++/106275] New: unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness cuzdav at gmail dot com
                   ` (3 preceding siblings ...)
  2022-07-12 22:53 ` redi at gcc dot gnu.org
@ 2022-07-12 23:00 ` redi at gcc dot gnu.org
  2022-07-13 16:28 ` cuzdav at gmail dot com
  2022-07-13 20:14 ` redi at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2022-07-12 23:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> ---
This "bug" (i.e. doing a linear search for small containers using slow hash
functions) makes the find operation more than twice as fast:

BM_std_hash          4.67 ns         4.66 ns    150610579
BM_custom_hash       12.0 ns         12.0 ns     57529241

See PR 68303 for more details.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/106275] unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness
  2022-07-12 22:18 [Bug libstdc++/106275] New: unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness cuzdav at gmail dot com
                   ` (4 preceding siblings ...)
  2022-07-12 23:00 ` redi at gcc dot gnu.org
@ 2022-07-13 16:28 ` cuzdav at gmail dot com
  2022-07-13 20:14 ` redi at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cuzdav at gmail dot com @ 2022-07-13 16:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

--- Comment #6 from Chris Uzdavinis <cuzdav at gmail dot com> ---
Thank you for the information.  If the equality comparison function is slow
enough, the large number of extra calls may not be an optimization.  

While looking into it, the vastly different runtime behavior of it jumped out
to me as an indication of a problem, and seemed to be the answer to why
internal tests drew my attention to this in the first place.  (Behavior is
normal without this optimization, but tests complained with it.)  But now I
understand it better, being a combination of the new behavior and
characteristics of the custom predicate in "real code" of which this is a
minimization.  The extra calls are a symptom, not a cause.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug libstdc++/106275] unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness
  2022-07-12 22:18 [Bug libstdc++/106275] New: unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness cuzdav at gmail dot com
                   ` (5 preceding siblings ...)
  2022-07-13 16:28 ` cuzdav at gmail dot com
@ 2022-07-13 20:14 ` redi at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: redi at gcc dot gnu.org @ 2022-07-13 20:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106275

--- Comment #7 from Jonathan Wakely <redi at gcc dot gnu.org> ---
You can specialize the std::__is_fast_hash trait for your custom hash if you
need to. Be aware that's an ABI change though (the container will cache the
hash code in each node if the hash function is "not fast").

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-07-13 20:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-12 22:18 [Bug libstdc++/106275] New: unordered_map with std::string key, std::hash<std::string>, and custom equality predicate weirdness cuzdav at gmail dot com
2022-07-12 22:19 ` [Bug libstdc++/106275] " cuzdav at gmail dot com
2022-07-12 22:49 ` redi at gcc dot gnu.org
2022-07-12 22:51 ` redi at gcc dot gnu.org
2022-07-12 22:53 ` redi at gcc dot gnu.org
2022-07-12 23:00 ` redi at gcc dot gnu.org
2022-07-13 16:28 ` cuzdav at gmail dot com
2022-07-13 20:14 ` redi at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).