public inbox for glibc-bugs@sourceware.org help / color / mirror / Atom feed
From: "bugdal at aerifal dot cx" <sourceware-bugzilla@sourceware.org> To: glibc-bugs@sourceware.org Subject: [Bug libc/15615] New: Poor quality output from rand_r Date: Wed, 12 Jun 2013 23:39:00 -0000 [thread overview] Message-ID: <bug-15615-131@http.sourceware.org/bugzilla/> (raw) http://sourceware.org/bugzilla/show_bug.cgi?id=15615 Bug ID: 15615 Summary: Poor quality output from rand_r Product: glibc Version: unspecified Status: NEW Severity: normal Priority: P2 Component: libc Assignee: unassigned at sourceware dot org Reporter: bugdal at aerifal dot cx CC: drepper.fsp at gmail dot com Created attachment 7075 --> http://sourceware.org/bugzilla/attachment.cgi?id=7075&action=edit test program to generate data for analysis by dieharder Implementing a decent rand_r is very tricky because the interface requirement forces the full PRNG state to fit in 32 bits; this rules out pretty much all good PRNGs. Nonetheless, glibc's rand_r is much worse than it needs to be. glibc's rand_r is based on the LCG published in the C standard: next = next * 1103515245 + 12345; return next / 65536 % 32768; This sample code in the standard assumes RAND_MAX == 32767. The lowest bit of the output has period 2^17 and the highest bit has period 2^31. As such, if only some of the bits are to be used, the highest bits should be kept and the lower bits discarded; this is the principle the sample code above is using for generating 15-bit output from a 32-bit internal state. However, glibc is trying to generate 31-bit output from a 32-bit state. Just discarding the lowest bit would not be acceptable (the new lowest bit would have period 4), so something different needs to be done. glibc's approach is to advance the LCG 3 times (since 3 and 2^32 are relatively prime, this does not hurt the period of the internal state) and take 10-11 bits from each iteration to make a 31-bit output. Unfortunately, this approach has 2 serious flaws: 1. The 10-11 bits taken are bits 16-25 or 16-26, which have moderately low period, rather than bits 21-31 or 22-31 which would have maximal period. Thus, a lot of the potential period length is being thrown away. 2. Concatenating the bits from multiple steps, rather than using a single step of the LCG, introduces bias into the output. The frequency of each possible output is not the same; instead, some outputs occur much more often, and some not at all. Bias is especially noticable when the period is short, which it necessarily is with a 32-bit state, and which is made much worse by item 1 above. If nothing else, item 1 should be fixed by adjusting the bitshifts to use the highest available bits, rather than the middle bits, to obtain the 3 10-11 bit values. This is straightforward and a pure improvement. To fully fix rand_r, the approach of concatenating multiple iterations should be abandoned in favor of a single-LCG-iteration approach followed by an invertable transformation on the output. Obviously a 32-bit cryptographic block cipher would give the best statistical properties, but it would be slow. In musl, we are now using the "temper" function from mersenne twister. Another solution that seems to work is performing a mix of bit rotations, bswap, and multiplication by well-chosen odd numbers, applied to the output of the LCG. I am attaching a test program for use with the dieharder PRNG test suite that demonstrates the low quality of glibc's rand_r. In particular, at least the OPSO, OQSO, and DNA tests show problems (tests 5, 6, and 7) and offhand it makes sense that this kind of bias would affect them. Note that, since dieharder requires 32-bit input as opposed to 31-bit input, I slightly modified the glibc algorithm to pull 11,11,10 bits instead of 11,10,10 bits, and included it in the test program rather than calling glibc's rand_r. Producing tests that work directly with glibc's rand_r to demonstrate the problem would be a lot more work. To run the test, compile glibc_rand_r.c and pipe output to dieharder: ./glibc_rand_r | dieharder -a -g 200 -- You are receiving this mail because: You are on the CC list for the bug.
next reply other threads:[~2013-06-12 23:39 UTC|newest] Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2013-06-12 23:39 bugdal at aerifal dot cx [this message] 2013-06-13 8:26 ` Ondřej Bílka 2013-06-13 8:26 ` [Bug libc/15615] " neleai at seznam dot cz 2013-06-13 12:38 ` bugdal at aerifal dot cx 2013-06-14 12:11 ` Ondřej Bílka 2013-06-14 12:11 ` neleai at seznam dot cz 2013-06-14 15:37 ` bugdal at aerifal dot cx 2013-06-25 6:58 ` Ondřej Bílka 2013-06-25 12:25 ` bugdal at aerifal dot cx 2014-06-13 15:07 ` fweimer at redhat dot com
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-15615-131@http.sourceware.org/bugzilla/ \ --to=sourceware-bugzilla@sourceware.org \ --cc=glibc-bugs@sourceware.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).