Re: New power of 2 hash policy

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jonathan Wakely <jwakely@redhat.com>
To: "François Dumont" <frs.dumont@gmail.com>
Cc: Michael Matz <matz@suse.de>,
	       "libstdc++@gcc.gnu.org" <libstdc++@gcc.gnu.org>,
	       gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: New power of 2 hash policy
Date: Fri, 25 Sep 2015 13:58:00 -0000	[thread overview]
Message-ID: <20150925132815.GG12094@redhat.com> (raw)
In-Reply-To: <5604584D.5040904@gmail.com>

On 24/09/15 22:08 +0200, FranÃ§ois Dumont wrote:
>    Working on it I realised that despite the comment on _M_next_bkt
>saying "no smaller than n" the method can return a value smaller for big
>n values. This is not likely to happen but I prefer to take care of it.
>I just make sure we won't try to rehash again even if the drawback is
>that we won't respect max_load_factor anymore at those levels. But as I
>said we will surely have a memory issue before that.
>
>    Ok to commit ?
>
>FranÃ§ois
>
>
>

>diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
>index a9ad7dd..4e1bc29 100644
>--- a/libstdc++-v3/include/bits/hashtable_policy.h
>+++ b/libstdc++-v3/include/bits/hashtable_policy.h
>@@ -457,6 +457,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   /// smallest prime that keeps the load factor small enough.
>   struct _Prime_rehash_policy
>   {
>+    using __has_load_factor = std::true_type;
>+
>     _Prime_rehash_policy(float __z = 1.0) noexcept
>     : _M_max_load_factor(__z), _M_next_resize(0) { }
> 
>@@ -501,6 +503,129 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     mutable std::size_t	_M_next_resize;
>   };
> 
>+  /// Range hashing function considering that second args is a power of 2.

Does this mean "assuming" not "considering"?

>+  struct _Mask_range_hashing
>+  {
>+    typedef std::size_t first_argument_type;
>+    typedef std::size_t second_argument_type;
>+    typedef std::size_t result_type;
>+
>+    result_type
>+    operator()(first_argument_type __num,
>+	       second_argument_type __den) const noexcept
>+    { return __num & (__den - 1); }
>+  };
>+
>+
>+  /// Helper type to compute next power of 2.
>+  template<std::size_t _N>
>+    struct _NextPower2
>+    {
>+      static std::size_t
>+      _Get(std::size_t __n)
>+      {
>+	std::size_t __next = _NextPower2<(_N >> 1)>::_Get(__n);
>+	return __next |= __next >> _N;
>+      }
>+    };
>+
>+  template<>
>+    struct _NextPower2<1>
>+    {
>+      static std::size_t
>+      _Get(std::size_t __n)
>+      { return __n |= __n >> 1; }
>+    };

This doesn't seem to return the next power of 2, it returns one less.

_NextPower2<32>::_Get(2) returns 3, but 2 is already a power of 2.
_NextPower2<32>::_Get(3) returns 3, but the next power of 2 is 4.


I don't think this needs to be a recursive template, it can simply be
a function, can't it?


>+  /// Rehash policy providing power of 2 bucket numbers. Ease modulo
>+  /// operations.
>+  struct _Power2_rehash_policy
>+  {
>+    using __has_load_factor = std::true_type;
>+
>+    _Power2_rehash_policy(float __z = 1.0) noexcept
>+    : _M_max_load_factor(__z), _M_next_resize(0) { }
>+
>+    float
>+    max_load_factor() const noexcept
>+    { return _M_max_load_factor; }
>+
>+    // Return a bucket size no smaller than n (as long as n is not above the
>+    // highest power of 2).

This says "no smaller than n" but it actually seems to guarantee
"greater than n" because _NextPower2<>::_Get(n)+1 is 2n when n is a
power of two.

>+    std::size_t
>+    _M_next_bkt(std::size_t __n) const
>+    {
>+      constexpr auto __max_bkt
>+	= (std::size_t(1) << (sizeof(std::size_t) * 8 - 1));
>+
>+      std::size_t __res
>+	= _NextPower2<((sizeof(std::size_t) * 8) >> 1)>::_Get(--__n) + 1;

You wouldn't need to add one to the result if the template actually
returned a power of two!

>+      if (__res == 0)
>+	__res = __max_bkt;
>+
>+      if (__res == __max_bkt)
>+	// Set next resize to the max value so that we never try to rehash again
>+	// as we already reach the biggest possible bucket number.
>+	// Note that it might result in max_load_factor not being respected.
>+	_M_next_resize = std::size_t(0) - 1;
>+      else
>+	_M_next_resize
>+	  = __builtin_floor(__res * (long double)_M_max_load_factor);
>+
>+      return __res;
>+    }

What are the requirements for this function, "no smaller than n" or
"greater than n"?

If "no smaller than n" is correct then the algorithm you want is
"round up to nearest power of 2", which you can find here (I wrote
this earlier this year for some reason I can't remember now):

https://gitlab.com/redistd/redistd/blob/master/include/redi/bits.h

The non-recursive version is only a valid constexpr function in C++14,
but since you don't need a constexpr function you could just that,
extended to handle 64-bit:

  std::size_t
  clp2(std::size_t n)
  {
    std::uint_least64_t x = n;
    // Algorithm from Hacker's Delight, Figure 3-3.
    x = x - 1;
    x = x | (x >> 1);
    x = x | (x >> 2);
    x = x | (x >> 4);
    x = x | (x >> 8);
    x = x | (x >>16);
    x = x | (x >>32);
    return x + 1;
  }

We could avoid the last shift when sizeof(size_t) == 32, I don't know
if the optimisers will take care of that anyway.


The rest of the patch looks fine.

next prev parent reply	other threads:[~2015-09-25 13:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-10 20:11 François Dumont
2015-09-11 13:18 ` Michael Matz
2015-09-11 13:19   ` Jonathan Wakely
2015-09-11 13:28     ` Jonathan Wakely
2015-09-13 20:28       ` François Dumont
2015-09-24 20:58       ` François Dumont
2015-09-25 13:58         ` Jonathan Wakely [this message]
2015-09-28 19:39           ` François Dumont
2015-10-19 20:23             ` François Dumont

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150925132815.GG12094@redhat.com \
    --to=jwakely@redhat.com \
    --cc=frs.dumont@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=libstdc++@gcc.gnu.org \
    --cc=matz@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).