public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, libstdc++] Improve the performance of std::uniform_int_distribution (fewer divisions)
@ 2019-09-08 16:10 Daniel Lemire
  2019-09-27 18:19 ` Daniel Lemire
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Lemire @ 2019-09-08 16:10 UTC (permalink / raw)
  To: libstdc++, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2364 bytes --]

Even on recent processors, integer division is relatively expensive.
The current implementation of  std::uniform_int_distribution typically
requires two divisions by invocation:

        // downscaling
        const __uctype __uerange = __urange + 1; // __urange can be zero
        const __uctype __scaling = __urngrange / __uerange;
        const __uctype __past = __uerange * __scaling;
        do
          __ret = __uctype(__urng()) - __urngmin;
        while (__ret >= __past);
        __ret /= __scaling;

We can achieve the same algorithmic result with at most one division,
and typically
no division at all without requiring more calls to the random number generator.
This was recently added to Swift (https://github.com/apple/swift/pull/25286)

The main challenge is that we need to be able to compute the "full"
product. E.g.,
given two 64-bit integers, we want the 128-bit result; given two 32-bit integers
we want the 64-bit result. This is fast on common processors. The
128-bit product
is not natively supported in C/C++ but can be achieved with the
__uint128_t extension
which is widely supported by GCC. The patch checks for __uint128_t support; when
support is lacking, we fallback on the existing approach.

For example, if we replace the above code by the following, we get a substantial
performance boost on skylake microarchitectures. E.g., it can
be twice as fast to shuffle arrays of 1 million elements (e.g., using
the following
benchmark: https://github.com/lemire/simple_cpp_shuffle_benchmark )


      const __uctype __uerange = __urange + 1; // __urange can be zero
      uint64_t __product = (__uctype(__urng()) - __urngmin) * __uerange;
      uint32_t __lsb = uint32_t(__product);
      if(__lsb < __uerange) {
        uint64_t __threshold = -__uerange % __uerange;
        while (__lsb < __threshold) {
          __product = (__uctype(__urng()) - __urngmin) * __uerange;
          __lsb = uint32_t(__product);
        }
      }

Included is a patch that would bring better performance (e.g., 2x gain) to
std::uniform_int_distribution  in some cases.

This patch proposal was previously submitted solely to the libc++ list
and improved following feedback by Jonathan Wakely.

Reference: Fast Random Integer Generation in an Interval, ACM Transactions on
Modeling and Computer Simulation 29 (1), 2019 https://arxiv.org/abs/1805.10941

[-- Attachment #2: patch_uniform_int_dist.txt --]
[-- Type: text/plain, Size: 2804 bytes --]

Index: trunk/libstdc++-v3/include/bits/uniform_int_dist.h
===================================================================
--- trunk/libstdc++-v3/include/bits/uniform_int_dist.h	(revision 275324)
+++ trunk/libstdc++-v3/include/bits/uniform_int_dist.h	(working copy)
@@ -33,6 +33,7 @@

 #include <type_traits>
 #include <limits>
+#include <cstdint>

 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -242,15 +243,59 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

 	if (__urngrange > __urange)
 	  {
-	    // downscaling
-	    const __uctype __uerange = __urange + 1; // __urange can be zero
-	    const __uctype __scaling = __urngrange / __uerange;
-	    const __uctype __past = __uerange * __scaling;
-	    do
-	      __ret = __uctype(__urng()) - __urngmin;
-	    while (__ret >= __past);
-	    __ret /= __scaling;
-	  }
+		const __uctype __uerange = __urange + 1; // __urange can be zero
+#if _GLIBCXX_USE_INT128 == 1
+    if(sizeof(__uctype) == sizeof(uint64_t) and
+      (__urngrange == numeric_limits<uint64_t>::max()))
+    {
+      // 64-bit case
+      // reference: Fast Random Integer Generation in an Interval
+      // ACM Transactions on Modeling and Computer Simulation 29 (1), 2019
+      // https://arxiv.org/abs/1805.10941
+      __uint128_t __product = (__uctype(__urng()) - __urngmin) * __uerange;
+      uint64_t __lsb = uint64_t(__product);
+      if(__lsb < __uerange)
+      {
+        uint64_t __threshold = -__uerange % __uerange;
+        while (__lsb < __threshold)
+        {
+          __product = (__uctype(__urng()) - __urngmin) * __uerange;
+          __lsb = uint64_t(__product);
+        }
+      }
+      __ret = __product >> 64;
+    }
+    else
+#endif // _GLIBCXX_USE_INT128 == 1
+    if(sizeof(__uctype) == sizeof(uint32_t)
+      and (__urngrange == numeric_limits<uint32_t>::max()) )
+    {
+      // 32-bit case
+      // reference: Fast Random Integer Generation in an Interval
+      // ACM Transactions on Modeling and Computer Simulation 29 (1), 2019
+      // https://arxiv.org/abs/1805.10941
+      uint64_t __product = (__uctype(__urng()) - __urngmin) * __uerange;
+      uint32_t __lsb = uint32_t(__product);
+      if(__lsb < __uerange) {
+        uint64_t __threshold = -__uerange % __uerange;
+        while (__lsb < __threshold) {
+          __product = (__uctype(__urng()) - __urngmin) * __uerange;
+          __lsb = uint32_t(__product);
+        }
+      }
+      __ret = __product >> 32;
+    }
+    else
+    {
+      // fallback case (2 divisions)
+      const __uctype __scaling = __urngrange / __uerange;
+      const __uctype __past = __uerange * __scaling;
+      do
+        __ret = __uctype(__urng()) - __urngmin;
+      while (__ret >= __past);
+      __ret /= __scaling;
+    }
+  }
 	else if (__urngrange < __urange)
 	  {
 	    // upscaling

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-10-09 13:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-08 16:10 [PATCH, libstdc++] Improve the performance of std::uniform_int_distribution (fewer divisions) Daniel Lemire
2019-09-27 18:19 ` Daniel Lemire
2020-10-05 23:25   ` Jonathan Wakely
2020-10-05 23:38     ` Jonathan Wakely
2020-10-05 23:40     ` Jonathan Wakely
2020-10-06 19:09       ` Daniel Lemire
2020-10-06 19:55       ` Daniel Lemire
2020-10-06 20:04         ` Jonathan Wakely
2020-10-09 13:17         ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).