public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/4403] New: strfry() gives skewed distributions
@ 2007-04-21 21:36 aurelien at aurel32 dot net
  2007-05-20 20:49 ` [Bug libc/4403] " aurelien at aurel32 dot net
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: aurelien at aurel32 dot net @ 2007-04-21 21:36 UTC (permalink / raw)
  To: glibc-bugs

strfry() tries to shuffle its string using random swaps, but it uses the
wrong strategy, and thus not all permutations are equally likely. The
code doing the shuffling itself looks like this:

  len = strlen (string);
  for (i = 0; i < len; ++i)
    {
      int32_t j;
      char c;

      __random_r (&rdata, &j);
      j %= len;

      c = string[i];
      string[i] = string[j];
      string[j] = c;
    }

In other words, for the string 'abc' j will always be between 0 and 2
(inclusive), giving the following possibilities (all equally likely):

  j0 j1 j2  result
   0  0  0  cab
   0  0  1  bca
   0  0  2  bac
   0  1  0  cba
   0  1  1  acb
   0  1  2  abc
   0  2  0  bca
   0  2  1  abc
   0  2  2  acb
   1  0  0  cba
   1  0  1  acb
   1  0  2  abc
   1  1  0  cab
   1  1  1  bca
   1  1  2  bac
   1  2  0  acb
   1  2  1  bac
   1  2  2  bca
   2  0  0  acb
   2  0  1  bac
   2  0  2  bca
   2  1  0  abc
   2  1  1  cab
   2  1  2  cba
   2  2  0  bac
   2  2  1  cba
   2  2  2  cab

Sorting and counting gives us the following distribution:

   4  abc
   5  acb
   5  bac
   5  bca
   4  cab
   4  cba

In other words, this is clearly skewed; some strings will appear 25% more often
than others.

Steinar H. Gunderson <sgunderson@bigfoot.com> proposed the following patch:

--- string/strfry.c     2007-04-21 23:12:47.000000000 +0200
+++ string/strfry.c     2007-04-21 23:22:46.000000000 +0200
@@ -38,17 +38,17 @@
     }

   len = strlen (string);
-  for (i = 0; i < len; ++i)
+  for (i = 0; i < len - 1; ++i)
     {
       int32_t j;
       char c;

       __random_r (&rdata, &j);
-      j %= len;
+      j %= (len - i);

       c = string[i];
-      string[i] = string[j];
-      string[j] = c;
+      string[i] = string[j + i];
+      string[j + i] = c;
     }

   return string;

It turns strfry() into a proper Fisher-Yates shuffle. This gives exactly
n! paths for a string with N characters, and since there are n! possible
permutations, this means a one-to-one mapping, and all possibilities are
equally likely.

-- 
           Summary: strfry() gives skewed distributions
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: drepper at redhat dot com
        ReportedBy: aurelien at aurel32 dot net
                CC: glibc-bugs at sources dot redhat dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://sourceware.org/bugzilla/show_bug.cgi?id=4403

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-09-21 17:10 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-4403-131@http.sourceware.org/bugzilla/>
2010-10-31 17:47 ` [Bug libc/4403] strfry() gives skewed distributions awreece at gmail dot com
2011-07-26 17:26 ` kirill at shutemov dot name
2012-05-06 18:58 ` aj at suse dot de
2014-02-16 19:36 ` jackie.rosen at hushmail dot com
2014-05-28 19:46 ` schwab at sourceware dot org
2014-06-23  3:42 ` allachan at au1 dot ibm.com
2014-06-24  8:30 ` fweimer at redhat dot com
2014-07-13 22:00 ` coleharrisjohnson at gmail dot com
2015-08-27 22:03 ` [Bug string/4403] " jsm28 at gcc dot gnu.org
2015-09-04  7:17 ` slashtrooll at misc dot lka.org.lu
2022-03-26 22:42 ` sam at gentoo dot org
2022-09-21 17:10 ` gabravier at gmail dot com
2007-04-21 21:36 [Bug libc/4403] New: " aurelien at aurel32 dot net
2007-05-20 20:49 ` [Bug libc/4403] " aurelien at aurel32 dot net
2007-08-22  7:30 ` drepper at redhat dot com
2009-05-09 17:00 ` me at evancarroll dot com
2009-05-10  8:04 ` pasky at suse dot cz
2010-09-16 12:05 ` rrt at sc3d dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).