From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28487 invoked by alias); 12 Jun 2013 23:39:15 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org Received: (qmail 28458 invoked by uid 48); 12 Jun 2013 23:39:11 -0000 From: "bugdal at aerifal dot cx" To: glibc-bugs@sourceware.org Subject: [Bug libc/15615] New: Poor quality output from rand_r Date: Wed, 12 Jun 2013 23:39:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: libc X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: bugdal at aerifal dot cx X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2013-06/txt/msg00092.txt.bz2 http://sourceware.org/bugzilla/show_bug.cgi?id=15615 Bug ID: 15615 Summary: Poor quality output from rand_r Product: glibc Version: unspecified Status: NEW Severity: normal Priority: P2 Component: libc Assignee: unassigned at sourceware dot org Reporter: bugdal at aerifal dot cx CC: drepper.fsp at gmail dot com Created attachment 7075 --> http://sourceware.org/bugzilla/attachment.cgi?id=7075&action=edit test program to generate data for analysis by dieharder Implementing a decent rand_r is very tricky because the interface requirement forces the full PRNG state to fit in 32 bits; this rules out pretty much all good PRNGs. Nonetheless, glibc's rand_r is much worse than it needs to be. glibc's rand_r is based on the LCG published in the C standard: next = next * 1103515245 + 12345; return next / 65536 % 32768; This sample code in the standard assumes RAND_MAX == 32767. The lowest bit of the output has period 2^17 and the highest bit has period 2^31. As such, if only some of the bits are to be used, the highest bits should be kept and the lower bits discarded; this is the principle the sample code above is using for generating 15-bit output from a 32-bit internal state. However, glibc is trying to generate 31-bit output from a 32-bit state. Just discarding the lowest bit would not be acceptable (the new lowest bit would have period 4), so something different needs to be done. glibc's approach is to advance the LCG 3 times (since 3 and 2^32 are relatively prime, this does not hurt the period of the internal state) and take 10-11 bits from each iteration to make a 31-bit output. Unfortunately, this approach has 2 serious flaws: 1. The 10-11 bits taken are bits 16-25 or 16-26, which have moderately low period, rather than bits 21-31 or 22-31 which would have maximal period. Thus, a lot of the potential period length is being thrown away. 2. Concatenating the bits from multiple steps, rather than using a single step of the LCG, introduces bias into the output. The frequency of each possible output is not the same; instead, some outputs occur much more often, and some not at all. Bias is especially noticable when the period is short, which it necessarily is with a 32-bit state, and which is made much worse by item 1 above. If nothing else, item 1 should be fixed by adjusting the bitshifts to use the highest available bits, rather than the middle bits, to obtain the 3 10-11 bit values. This is straightforward and a pure improvement. To fully fix rand_r, the approach of concatenating multiple iterations should be abandoned in favor of a single-LCG-iteration approach followed by an invertable transformation on the output. Obviously a 32-bit cryptographic block cipher would give the best statistical properties, but it would be slow. In musl, we are now using the "temper" function from mersenne twister. Another solution that seems to work is performing a mix of bit rotations, bswap, and multiplication by well-chosen odd numbers, applied to the output of the LCG. I am attaching a test program for use with the dieharder PRNG test suite that demonstrates the low quality of glibc's rand_r. In particular, at least the OPSO, OQSO, and DNA tests show problems (tests 5, 6, and 7) and offhand it makes sense that this kind of bias would affect them. Note that, since dieharder requires 32-bit input as opposed to 31-bit input, I slightly modified the glibc algorithm to pull 11,11,10 bits instead of 11,10,10 bits, and included it in the test program rather than calling glibc's rand_r. Producing tests that work directly with glibc's rand_r to demonstrate the problem would be a lot more work. To run the test, compile glibc_rand_r.c and pipe output to dieharder: ./glibc_rand_r | dieharder -a -g 200 -- You are receiving this mail because: You are on the CC list for the bug.