public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Fix random_sample_n and random_shuffle when RAND_MAX is small
@ 2018-12-12 21:31 Giovanni Bajo
  2019-01-14 20:38 ` Jonathan Wakely
  0 siblings, 1 reply; 3+ messages in thread
From: Giovanni Bajo @ 2018-12-12 21:31 UTC (permalink / raw)
  To: libstdc++, gcc-patches; +Cc: Andrea Griffini, mitalia

[-- Attachment #1: Type: text/plain, Size: 1178 bytes --]

Hello,

we hit a bug today while cross-compiling a C++ program with mingw32:
if random_shuffle or random_sample_n are called with a sequence of
elements whose length is higher than RAND_MAX, the functions don't
behave as expected because they ignore elements beyond RAND_MAX. This
does not happen often on Linux where glibc defines RAND_MAX to 2**31,
but mingw32 (all released versions) relies on the very old msvcrt.lib,
where RAND_MAX is just 2**15.

I found mentions of this problem in 2011
(http://mingw-users.1079350.n2.nabble.com/RAND-MAX-still-16bit-td6299546.html)
and 2006 (https://mingw-users.narkive.com/gAIO4G5V/rand-max-problem-why-is-it-only-16-bit).

I'm attaching a proof-of-concept patch that fixes the problem by
introducing an embedded xorshift generator, seeded with std::rand (so
that the functions still depend on srand — it looks like this is not
strictly required by the standard, but it sounds like a good thing to
do for backward compatibility with existing programs). I was wondering
if this approach is OK or something else is preferred.

-- 
Giovanni Bajo   ::  rasky@develer.com
Develer S.r.l.  ::  http://www.develer.com

[-- Attachment #2: rand.diff --]
[-- Type: application/octet-stream, Size: 2433 bytes --]

Index: i686-w64-mingw32/include/c++/4.8.0/bits/stl_algo.h
===================================================================
--- i686-w64-mingw32/include/c++/4.8.0/bits/stl_algo.h	(revision 94918)
+++ i686-w64-mingw32/include/c++/4.8.0/bits/stl_algo.h	(working copy)
@@ -5214,9 +5214,21 @@
 	    _RandomAccessIterator>)
       __glibcxx_requires_valid_range(__first, __last);
 
-      if (__first != __last)
-	for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
-	  std::iter_swap(__i, __first + (std::rand() % ((__i - __first) + 1)));
+      if (__first != __last) {
+          // Bugfix: std::rand() may provide "small" numbers only and indeed it
+          //         does when using Windows libc (x < 32768).
+          //         We initialize and use a xorshift implementation seeded by
+          //         a couple of calls to rand() instead of using rand() for all
+          //         the random numbers needed.
+          unsigned __xss = (unsigned)std::rand() ^ ((unsigned)std::rand()<<15);
+          __xss += !__xss;
+          for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i) {
+              __xss ^= __xss << 13;
+              __xss ^= __xss >> 17;
+              __xss ^= __xss << 5;
+              std::iter_swap(__i, __first + (__xss % ((__i - __first) + 1)));
+          }
+      }
     }
 
   /**
Index: i686-w64-mingw32/include/c++/4.8.0/ext/algorithm
===================================================================
--- i686-w64-mingw32/include/c++/4.8.0/ext/algorithm	(revision 94918)
+++ i686-w64-mingw32/include/c++/4.8.0/ext/algorithm	(working copy)
@@ -276,8 +276,18 @@
       _Distance __remaining = std::distance(__first, __last);
       _Distance __m = min(__n, __remaining);
 
+      // Bugfix: std::rand() may provide "small" numbers only and indeed it
+      //         does when using Windows libc (x < 32768).
+      //         We initialize and use a xorshift implementation seeded by
+      //         a couple of calls to rand() instead of using rand() for all
+      //         the random numbers needed.
+      unsigned __xss = (unsigned)std::rand() ^ ((unsigned)std::rand()<<15);
+      __xss += !__xss;
       while (__m > 0)
 	{
-	  if ((std::rand() % __remaining) < __m)
+      __xss ^= __xss << 13;
+      __xss ^= __xss >> 17;
+      __xss ^= __xss << 5;
+	  if ((__xss % __remaining) < __m)
 	    {
 	      *__out = *__first;
 	      ++__out;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Fix random_sample_n and random_shuffle when RAND_MAX is small
  2018-12-12 21:31 Fix random_sample_n and random_shuffle when RAND_MAX is small Giovanni Bajo
@ 2019-01-14 20:38 ` Jonathan Wakely
  2019-01-21  5:59   ` Giovanni Bajo
  0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Wakely @ 2019-01-14 20:38 UTC (permalink / raw)
  To: Giovanni Bajo; +Cc: libstdc++, gcc-patches, Andrea Griffini, mitalia

On 12/12/18 22:31 +0100, Giovanni Bajo wrote:
>Hello,
>
>we hit a bug today while cross-compiling a C++ program with mingw32:
>if random_shuffle or random_sample_n are called with a sequence of
>elements whose length is higher than RAND_MAX, the functions don't
>behave as expected because they ignore elements beyond RAND_MAX. This
>does not happen often on Linux where glibc defines RAND_MAX to 2**31,
>but mingw32 (all released versions) relies on the very old msvcrt.lib,
>where RAND_MAX is just 2**15.
>
>I found mentions of this problem in 2011
>(http://mingw-users.1079350.n2.nabble.com/RAND-MAX-still-16bit-td6299546.html)
>and 2006 (https://mingw-users.narkive.com/gAIO4G5V/rand-max-problem-why-is-it-only-16-bit).
>
>I'm attaching a proof-of-concept patch that fixes the problem by
>introducing an embedded xorshift generator, seeded with std::rand (so
>that the functions still depend on srand — it looks like this is not
>strictly required by the standard, but it sounds like a good thing to
>do for backward compatibility with existing programs). I was wondering
>if this approach is OK or something else is preferred.

I'd prefer not to introduce that change unconditionally. The existing
code works fine when std::distance(first, last) < RAND_MAX, and as we
have random access iterators we can check that cheaply.

We'd prefer a bug report in Bugzilla with a testcase that demonstrates
the bug. A portable regression test for our testsuite might not be
practical if it needs more than RAND_MAX elements, but one that runs
for mingw and verifies the fix there would be needed.

See https://gcc.gnu.org/contribute.html#patches for guidelines for
submitting patches (and the rest of the page for other requirements,
like copyright assignment or disclaimers).


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Fix random_sample_n and random_shuffle when RAND_MAX is small
  2019-01-14 20:38 ` Jonathan Wakely
@ 2019-01-21  5:59   ` Giovanni Bajo
  0 siblings, 0 replies; 3+ messages in thread
From: Giovanni Bajo @ 2019-01-21  5:59 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches, Andrea Griffini, mitalia

Il giorno mar 15 gen 2019 alle ore 03:38 Jonathan Wakely <jwakely@redhat.com>
ha scritto:

> On 12/12/18 22:31 +0100, Giovanni Bajo wrote:
> >Hello,
> >
> >we hit a bug today while cross-compiling a C++ program with mingw32:
> >if random_shuffle or random_sample_n are called with a sequence of
> >elements whose length is higher than RAND_MAX, the functions don't
> >behave as expected because they ignore elements beyond RAND_MAX. This
> >does not happen often on Linux where glibc defines RAND_MAX to 2**31,
> >but mingw32 (all released versions) relies on the very old msvcrt.lib,
> >where RAND_MAX is just 2**15.
> >
> >I found mentions of this problem in 2011
> >(
> http://mingw-users.1079350.n2.nabble.com/RAND-MAX-still-16bit-td6299546.html
> )
> >and 2006 (
> https://mingw-users.narkive.com/gAIO4G5V/rand-max-problem-why-is-it-only-16-bit
> ).
> >
> >I'm attaching a proof-of-concept patch that fixes the problem by
> >introducing an embedded xorshift generator, seeded with std::rand (so
> >that the functions still depend on srand — it looks like this is not
> >strictly required by the standard, but it sounds like a good thing to
> >do for backward compatibility with existing programs). I was wondering
> >if this approach is OK or something else is preferred.
>
> I'd prefer not to introduce that change unconditionally. The existing
> code works fine when std::distance(first, last) < RAND_MAX, and as we
> have random access iterators we can check that cheaply.
>
> We'd prefer a bug report in Bugzilla with a testcase that demonstrates
> the bug. A portable regression test for our testsuite might not be
> practical if it needs more than RAND_MAX elements, but one that runs
> for mingw and verifies the fix there would be needed.
>
> See https://gcc.gnu.org/contribute.html#patches for guidelines for
> submitting patches (and the rest of the page for other requirements,
> like copyright assignment or disclaimers).
>

Thanks Jonathan. We have opened a Bugzilla report here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88935

In the bug, we highlighted that the current algorithm is also (less
severely) broken when the number of elements is less but close to RAND_MAX;
the farther you move away from RAND_MAX, the better it becomes. Would you
still prefer to have a different version of the algorithm, gated by a
comparison to RAND_MAX? Our patch fixes everything by switching to an
inline 64-bit PRNG which is seeded by std::rand().
-- 
Giovanni Bajo   ::  rasky@develer.com
Develer S.r.l.  ::  http://www.develer.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-01-21  5:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 21:31 Fix random_sample_n and random_shuffle when RAND_MAX is small Giovanni Bajo
2019-01-14 20:38 ` Jonathan Wakely
2019-01-21  5:59   ` Giovanni Bajo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).