public inbox for glibc-bugs@sourceware.org help / color / mirror / Atom feed
From: "aurelien at aurel32 dot net" <sourceware-bugzilla@sourceware.org> To: glibc-bugs@sources.redhat.com Subject: [Bug libc/4403] New: strfry() gives skewed distributions Date: Sat, 21 Apr 2007 21:36:00 -0000 [thread overview] Message-ID: <20070421223613.4403.aurelien@aurel32.net> (raw) strfry() tries to shuffle its string using random swaps, but it uses the wrong strategy, and thus not all permutations are equally likely. The code doing the shuffling itself looks like this: len = strlen (string); for (i = 0; i < len; ++i) { int32_t j; char c; __random_r (&rdata, &j); j %= len; c = string[i]; string[i] = string[j]; string[j] = c; } In other words, for the string 'abc' j will always be between 0 and 2 (inclusive), giving the following possibilities (all equally likely): j0 j1 j2 result 0 0 0 cab 0 0 1 bca 0 0 2 bac 0 1 0 cba 0 1 1 acb 0 1 2 abc 0 2 0 bca 0 2 1 abc 0 2 2 acb 1 0 0 cba 1 0 1 acb 1 0 2 abc 1 1 0 cab 1 1 1 bca 1 1 2 bac 1 2 0 acb 1 2 1 bac 1 2 2 bca 2 0 0 acb 2 0 1 bac 2 0 2 bca 2 1 0 abc 2 1 1 cab 2 1 2 cba 2 2 0 bac 2 2 1 cba 2 2 2 cab Sorting and counting gives us the following distribution: 4 abc 5 acb 5 bac 5 bca 4 cab 4 cba In other words, this is clearly skewed; some strings will appear 25% more often than others. Steinar H. Gunderson <sgunderson@bigfoot.com> proposed the following patch: --- string/strfry.c 2007-04-21 23:12:47.000000000 +0200 +++ string/strfry.c 2007-04-21 23:22:46.000000000 +0200 @@ -38,17 +38,17 @@ } len = strlen (string); - for (i = 0; i < len; ++i) + for (i = 0; i < len - 1; ++i) { int32_t j; char c; __random_r (&rdata, &j); - j %= len; + j %= (len - i); c = string[i]; - string[i] = string[j]; - string[j] = c; + string[i] = string[j + i]; + string[j + i] = c; } return string; It turns strfry() into a proper Fisher-Yates shuffle. This gives exactly n! paths for a string with N characters, and since there are n! possible permutations, this means a one-to-one mapping, and all possibilities are equally likely. -- Summary: strfry() gives skewed distributions Product: glibc Version: unspecified Status: NEW Severity: normal Priority: P2 Component: libc AssignedTo: drepper at redhat dot com ReportedBy: aurelien at aurel32 dot net CC: glibc-bugs at sources dot redhat dot com GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://sourceware.org/bugzilla/show_bug.cgi?id=4403 ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
next reply other threads:[~2007-04-21 21:36 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2007-04-21 21:36 aurelien at aurel32 dot net [this message] 2007-05-20 20:49 ` [Bug libc/4403] " aurelien at aurel32 dot net 2007-08-22 7:30 ` drepper at redhat dot com 2009-05-09 17:00 ` me at evancarroll dot com 2009-05-10 8:04 ` pasky at suse dot cz 2010-09-16 12:05 ` rrt at sc3d dot org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20070421223613.4403.aurelien@aurel32.net \ --to=sourceware-bugzilla@sourceware.org \ --cc=glibc-bugs@sources.redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).