From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jwakely@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [216.205.24.124])
 by sourceware.org (Postfix) with ESMTP id 089A33857C4C
 for <gcc-patches@gcc.gnu.org>; Thu, 29 Oct 2020 14:59:40 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 089A33857C4C
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-35-NiVwFOwNMNOwiTSxrxaa0w-1; Thu, 29 Oct 2020 10:59:37 -0400
X-MC-Unique: NiVwFOwNMNOwiTSxrxaa0w-1
Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com
 [10.5.11.23])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 05A8F760C6;
 Thu, 29 Oct 2020 14:59:36 +0000 (UTC)
Received: from localhost (unknown [10.33.36.7])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 95C9F1974D;
 Thu, 29 Oct 2020 14:59:35 +0000 (UTC)
Date: Thu, 29 Oct 2020 14:59:34 +0000
From: Jonathan Wakely <jwakely@redhat.com>
To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org
Cc: Daniel Lemire <lemire@gmail.com>
Subject: [committed] libstdc++: Allow Lemire's algorithm to be used in more
 cases
Message-ID: <20201029145934.GA2368280@redhat.com>
MIME-Version: 1.0
X-Clacks-Overhead: GNU Terry Pratchett
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: multipart/mixed; boundary="0F1p//8PRICkK4MW"
Content-Disposition: inline
X-Spam-Status: No, score=-14.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Oct 2020 14:59:41 -0000

--0F1p//8PRICkK4MW
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

This extends the fast path to also work when the URBG's range of
possible values is not the entire range of its result_type. Previously,
the slow path would be used for engines with a uint_fast32_t result type
if that type is actually a typedef for uint64_t rather than uint32_t.
After this change, the generator's result_type is not important, only
the range of possible value that generator can produce. If the
generator's range is exactly UINT64_MAX then the calculation will be
done using 128-bit and 64-bit integers, and if the range is UINT32_MAX
it will be done using 64-bit and 32-bit integers.

In practice, this benefits most of the engines and engine adaptors
defined in [rand.predef] on x86_64-linux and other 64-bit targets. This
is because std::minstd_rand0 and std::mt19937 and others use
uint_fast32_t, which is a typedef for uint64_t.

The code now makes use of the recently-clarified requirement that the
generator's min() and max() functions are usable in constant
expressions (see LWG 2154).

libstdc++-v3/ChangeLog:

	* include/bits/uniform_int_dist.h (_Power_of_two): Add
	constexpr.
	(uniform_int_distribution::_S_nd): Add static_assert to ensure
	the wider type is twice as wide as the result type.
	(uniform_int_distribution::__generate_impl): Add static_assert
	and declare variables as constexpr where appropriate.
	(uniform_int_distribution:operator()): Likewise. Only consider
	the uniform random bit generator's range of possible results
	when deciding whether _S_nd can be used, not the __uctype type.

Tested x86_64-linux. Committed to trunk.


--0F1p//8PRICkK4MW
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="patch.txt"

commit 822c1d21a3c710831af65a6e3bc83f558fb39044
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Oct 29 14:47:18 2020

    libstdc++: Allow Lemire's algorithm to be used in more cases
    
    This extends the fast path to also work when the URBG's range of
    possible values is not the entire range of its result_type. Previously,
    the slow path would be used for engines with a uint_fast32_t result type
    if that type is actually a typedef for uint64_t rather than uint32_t.
    After this change, the generator's result_type is not important, only
    the range of possible value that generator can produce. If the
    generator's range is exactly UINT64_MAX then the calculation will be
    done using 128-bit and 64-bit integers, and if the range is UINT32_MAX
    it will be done using 64-bit and 32-bit integers.
    
    In practice, this benefits most of the engines and engine adaptors
    defined in [rand.predef] on x86_64-linux and other 64-bit targets. This
    is because std::minstd_rand0 and std::mt19937 and others use
    uint_fast32_t, which is a typedef for uint64_t.
    
    The code now makes use of the recently-clarified requirement that the
    generator's min() and max() functions are usable in constant
    expressions (see LWG 2154).
    
    libstdc++-v3/ChangeLog:
    
            * include/bits/uniform_int_dist.h (_Power_of_two): Add
            constexpr.
            (uniform_int_distribution::_S_nd): Add static_assert to ensure
            the wider type is twice as wide as the result type.
            (uniform_int_distribution::__generate_impl): Add static_assert
            and declare variables as constexpr where appropriate.
            (uniform_int_distribution:operator()): Likewise. Only consider
            the uniform random bit generator's range of possible results
            when deciding whether _S_nd can be used, not the __uctype type.

diff --git a/libstdc++-v3/include/bits/uniform_int_dist.h b/libstdc++-v3/include/bits/uniform_int_dist.h
index cf6ba35c675..524593bb984 100644
--- a/libstdc++-v3/include/bits/uniform_int_dist.h
+++ b/libstdc++-v3/include/bits/uniform_int_dist.h
@@ -58,7 +58,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
     /* Determine whether number is a power of 2.  */
     template<typename _Tp>
-      inline bool
+      constexpr bool
       _Power_of_2(_Tp __x)
       {
 	return ((__x - 1) & __x) == 0;
@@ -242,9 +242,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	static _Up
 	_S_nd(_Urbg& __g, _Up __range)
 	{
-	  using __gnu_cxx::__int_traits;
-	  static_assert(!__int_traits<_Up>::__is_signed, "U must be unsigned");
-	  static_assert(!__int_traits<_Wp>::__is_signed, "W must be unsigned");
+	  using _Up_traits = __gnu_cxx::__int_traits<_Up>;
+	  using _Wp_traits = __gnu_cxx::__int_traits<_Wp>;
+	  static_assert(!_Up_traits::__is_signed, "U must be unsigned");
+	  static_assert(!_Wp_traits::__is_signed, "W must be unsigned");
+	  static_assert(_Wp_traits::__digits == (2 * _Up_traits::__digits),
+			"W must be twice as wide as U");
 
 	  // reference: Fast Random Integer Generation in an Interval
 	  // ACM Transactions on Modeling and Computer Simulation 29 (1), 2019
@@ -260,7 +263,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  __low = _Up(__product);
 		}
 	    }
-	  return __product >> __gnu_cxx::__int_traits<_Up>::__digits;
+	  return __product >> _Up_traits::__digits;
 	}
     };
 
@@ -275,9 +278,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	typedef typename make_unsigned<result_type>::type __utype;
 	typedef typename common_type<_Gresult_type, __utype>::type __uctype;
 
-	const __uctype __urngmin = __urng.min();
-	const __uctype __urngmax = __urng.max();
-	const __uctype __urngrange = __urngmax - __urngmin;
+	static_assert( __urng.min() < __urng.max(),
+	    "Uniform random bit generator must define min() < max()");
+
+	constexpr __uctype __urngmin = __urng.min();
+	constexpr __uctype __urngmax = __urng.max();
+	constexpr __uctype __urngrange = __urngmax - __urngmin;
 	const __uctype __urange
 	  = __uctype(__param.b()) - __uctype(__param.a());
 
@@ -288,21 +294,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 	    const __uctype __uerange = __urange + 1; // __urange can be zero
 
-	    using __gnu_cxx::__int_traits;
+#if defined __UINT64_TYPE__ && defined __UINT32_TYPE__
 #if __SIZEOF_INT128__
-	    if (__int_traits<__uctype>::__digits == 64
-		&& __urngrange == __int_traits<__uctype>::__max)
+	    if _GLIBCXX17_CONSTEXPR (__urngrange == __UINT64_MAX__)
 	      {
-		__ret = _S_nd<unsigned __int128>(__urng, __uerange);
+		// __urng produces values that use exactly 64-bits,
+		// so use 128-bit integers to downscale to desired range.
+		__UINT64_TYPE__ __u64erange = __uerange;
+		__ret = _S_nd<unsigned __int128>(__urng, __u64erange);
 	      }
 	    else
 #endif
-	    if (__int_traits<__uctype>::__digits == 32
-		&& __urngrange == __int_traits<__uctype>::__max)
+	    if _GLIBCXX17_CONSTEXPR (__urngrange == __UINT32_MAX__)
 	      {
-		__ret = _S_nd<__UINT64_TYPE__>(__urng, __uerange);
+		// __urng produces values that use exactly 32-bits,
+		// so use 64-bit integers to downscale to desired range.
+		__UINT32_TYPE__ __u32erange = __uerange;
+		__ret = _S_nd<__UINT64_TYPE__>(__urng, __u32erange);
 	      }
 	    else
+#endif
 	      {
 		// fallback case (2 divisions)
 		const __uctype __scaling = __urngrange / __uerange;
@@ -361,9 +372,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	typedef typename make_unsigned<result_type>::type __utype;
 	typedef typename common_type<_Gresult_type, __utype>::type __uctype;
 
-	const __uctype __urngmin = __urng.min();
-	const __uctype __urngmax = __urng.max();
-	const __uctype __urngrange = __urngmax - __urngmin;
+	static_assert( __urng.min() < __urng.max(),
+	    "Uniform random bit generator must define min() < max()");
+
+	constexpr __uctype __urngmin = __urng.min();
+	constexpr __uctype __urngmax = __urng.max();
+	constexpr __uctype __urngrange = __urngmax - __urngmin;
 	const __uctype __urange
 	  = __uctype(__param.b()) - __uctype(__param.a());
 
@@ -417,7 +431,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	      {
 		do
 		  {
-		    const __uctype __uerngrange = __urngrange + 1;
+		    constexpr __uctype __uerngrange = __urngrange + 1;
 		    __tmp = (__uerngrange * operator()
 			     (__urng, param_type(0, __urange / __uerngrange)));
 		    __ret = __tmp + (__uctype(__urng()) - __urngmin);

--0F1p//8PRICkK4MW--