* Fix random_sample_n and random_shuffle when RAND_MAX is small
@ 2018-12-12 21:31 Giovanni Bajo
2019-01-14 20:38 ` Jonathan Wakely
0 siblings, 1 reply; 3+ messages in thread
From: Giovanni Bajo @ 2018-12-12 21:31 UTC (permalink / raw)
To: libstdc++, gcc-patches; +Cc: Andrea Griffini, mitalia
[-- Attachment #1: Type: text/plain, Size: 1178 bytes --]
Hello,
we hit a bug today while cross-compiling a C++ program with mingw32:
if random_shuffle or random_sample_n are called with a sequence of
elements whose length is higher than RAND_MAX, the functions don't
behave as expected because they ignore elements beyond RAND_MAX. This
does not happen often on Linux where glibc defines RAND_MAX to 2**31,
but mingw32 (all released versions) relies on the very old msvcrt.lib,
where RAND_MAX is just 2**15.
I found mentions of this problem in 2011
(http://mingw-users.1079350.n2.nabble.com/RAND-MAX-still-16bit-td6299546.html)
and 2006 (https://mingw-users.narkive.com/gAIO4G5V/rand-max-problem-why-is-it-only-16-bit).
I'm attaching a proof-of-concept patch that fixes the problem by
introducing an embedded xorshift generator, seeded with std::rand (so
that the functions still depend on srand — it looks like this is not
strictly required by the standard, but it sounds like a good thing to
do for backward compatibility with existing programs). I was wondering
if this approach is OK or something else is preferred.
--
Giovanni Bajo :: rasky@develer.com
Develer S.r.l. :: http://www.develer.com
[-- Attachment #2: rand.diff --]
[-- Type: application/octet-stream, Size: 2433 bytes --]
Index: i686-w64-mingw32/include/c++/4.8.0/bits/stl_algo.h
===================================================================
--- i686-w64-mingw32/include/c++/4.8.0/bits/stl_algo.h (revision 94918)
+++ i686-w64-mingw32/include/c++/4.8.0/bits/stl_algo.h (working copy)
@@ -5214,9 +5214,21 @@
_RandomAccessIterator>)
__glibcxx_requires_valid_range(__first, __last);
- if (__first != __last)
- for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
- std::iter_swap(__i, __first + (std::rand() % ((__i - __first) + 1)));
+ if (__first != __last) {
+ // Bugfix: std::rand() may provide "small" numbers only and indeed it
+ // does when using Windows libc (x < 32768).
+ // We initialize and use a xorshift implementation seeded by
+ // a couple of calls to rand() instead of using rand() for all
+ // the random numbers needed.
+ unsigned __xss = (unsigned)std::rand() ^ ((unsigned)std::rand()<<15);
+ __xss += !__xss;
+ for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i) {
+ __xss ^= __xss << 13;
+ __xss ^= __xss >> 17;
+ __xss ^= __xss << 5;
+ std::iter_swap(__i, __first + (__xss % ((__i - __first) + 1)));
+ }
+ }
}
/**
Index: i686-w64-mingw32/include/c++/4.8.0/ext/algorithm
===================================================================
--- i686-w64-mingw32/include/c++/4.8.0/ext/algorithm (revision 94918)
+++ i686-w64-mingw32/include/c++/4.8.0/ext/algorithm (working copy)
@@ -276,8 +276,18 @@
_Distance __remaining = std::distance(__first, __last);
_Distance __m = min(__n, __remaining);
+ // Bugfix: std::rand() may provide "small" numbers only and indeed it
+ // does when using Windows libc (x < 32768).
+ // We initialize and use a xorshift implementation seeded by
+ // a couple of calls to rand() instead of using rand() for all
+ // the random numbers needed.
+ unsigned __xss = (unsigned)std::rand() ^ ((unsigned)std::rand()<<15);
+ __xss += !__xss;
while (__m > 0)
{
- if ((std::rand() % __remaining) < __m)
+ __xss ^= __xss << 13;
+ __xss ^= __xss >> 17;
+ __xss ^= __xss << 5;
+ if ((__xss % __remaining) < __m)
{
*__out = *__first;
++__out;
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Fix random_sample_n and random_shuffle when RAND_MAX is small
2018-12-12 21:31 Fix random_sample_n and random_shuffle when RAND_MAX is small Giovanni Bajo
@ 2019-01-14 20:38 ` Jonathan Wakely
2019-01-21 5:59 ` Giovanni Bajo
0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Wakely @ 2019-01-14 20:38 UTC (permalink / raw)
To: Giovanni Bajo; +Cc: libstdc++, gcc-patches, Andrea Griffini, mitalia
On 12/12/18 22:31 +0100, Giovanni Bajo wrote:
>Hello,
>
>we hit a bug today while cross-compiling a C++ program with mingw32:
>if random_shuffle or random_sample_n are called with a sequence of
>elements whose length is higher than RAND_MAX, the functions don't
>behave as expected because they ignore elements beyond RAND_MAX. This
>does not happen often on Linux where glibc defines RAND_MAX to 2**31,
>but mingw32 (all released versions) relies on the very old msvcrt.lib,
>where RAND_MAX is just 2**15.
>
>I found mentions of this problem in 2011
>(http://mingw-users.1079350.n2.nabble.com/RAND-MAX-still-16bit-td6299546.html)
>and 2006 (https://mingw-users.narkive.com/gAIO4G5V/rand-max-problem-why-is-it-only-16-bit).
>
>I'm attaching a proof-of-concept patch that fixes the problem by
>introducing an embedded xorshift generator, seeded with std::rand (so
>that the functions still depend on srand â it looks like this is not
>strictly required by the standard, but it sounds like a good thing to
>do for backward compatibility with existing programs). I was wondering
>if this approach is OK or something else is preferred.
I'd prefer not to introduce that change unconditionally. The existing
code works fine when std::distance(first, last) < RAND_MAX, and as we
have random access iterators we can check that cheaply.
We'd prefer a bug report in Bugzilla with a testcase that demonstrates
the bug. A portable regression test for our testsuite might not be
practical if it needs more than RAND_MAX elements, but one that runs
for mingw and verifies the fix there would be needed.
See https://gcc.gnu.org/contribute.html#patches for guidelines for
submitting patches (and the rest of the page for other requirements,
like copyright assignment or disclaimers).
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Fix random_sample_n and random_shuffle when RAND_MAX is small
2019-01-14 20:38 ` Jonathan Wakely
@ 2019-01-21 5:59 ` Giovanni Bajo
0 siblings, 0 replies; 3+ messages in thread
From: Giovanni Bajo @ 2019-01-21 5:59 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: libstdc++, gcc-patches, Andrea Griffini, mitalia
Il giorno mar 15 gen 2019 alle ore 03:38 Jonathan Wakely <jwakely@redhat.com>
ha scritto:
> On 12/12/18 22:31 +0100, Giovanni Bajo wrote:
> >Hello,
> >
> >we hit a bug today while cross-compiling a C++ program with mingw32:
> >if random_shuffle or random_sample_n are called with a sequence of
> >elements whose length is higher than RAND_MAX, the functions don't
> >behave as expected because they ignore elements beyond RAND_MAX. This
> >does not happen often on Linux where glibc defines RAND_MAX to 2**31,
> >but mingw32 (all released versions) relies on the very old msvcrt.lib,
> >where RAND_MAX is just 2**15.
> >
> >I found mentions of this problem in 2011
> >(
> http://mingw-users.1079350.n2.nabble.com/RAND-MAX-still-16bit-td6299546.html
> )
> >and 2006 (
> https://mingw-users.narkive.com/gAIO4G5V/rand-max-problem-why-is-it-only-16-bit
> ).
> >
> >I'm attaching a proof-of-concept patch that fixes the problem by
> >introducing an embedded xorshift generator, seeded with std::rand (so
> >that the functions still depend on srand — it looks like this is not
> >strictly required by the standard, but it sounds like a good thing to
> >do for backward compatibility with existing programs). I was wondering
> >if this approach is OK or something else is preferred.
>
> I'd prefer not to introduce that change unconditionally. The existing
> code works fine when std::distance(first, last) < RAND_MAX, and as we
> have random access iterators we can check that cheaply.
>
> We'd prefer a bug report in Bugzilla with a testcase that demonstrates
> the bug. A portable regression test for our testsuite might not be
> practical if it needs more than RAND_MAX elements, but one that runs
> for mingw and verifies the fix there would be needed.
>
> See https://gcc.gnu.org/contribute.html#patches for guidelines for
> submitting patches (and the rest of the page for other requirements,
> like copyright assignment or disclaimers).
>
Thanks Jonathan. We have opened a Bugzilla report here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88935
In the bug, we highlighted that the current algorithm is also (less
severely) broken when the number of elements is less but close to RAND_MAX;
the farther you move away from RAND_MAX, the better it becomes. Would you
still prefer to have a different version of the algorithm, gated by a
comparison to RAND_MAX? Our patch fixes everything by switching to an
inline 64-bit PRNG which is seeded by std::rand().
--
Giovanni Bajo :: rasky@develer.com
Develer S.r.l. :: http://www.develer.com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-01-21 5:59 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 21:31 Fix random_sample_n and random_shuffle when RAND_MAX is small Giovanni Bajo
2019-01-14 20:38 ` Jonathan Wakely
2019-01-21 5:59 ` Giovanni Bajo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).