Re: discret random distributions: test

public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed

From: Jerome BENOIT <jgmbenoit@wanadoo.fr>
Cc: gsl-discuss@sources.redhat.com
Subject: Re: discret random distributions: test
Date: Sat, 25 Dec 2004 20:26:00 -0000	[thread overview]
Message-ID: <41CDCCFC.3010003@wanadoo.fr> (raw)
In-Reply-To: <Pine.LNX.4.58.0412250139540.30203@lilith.rgb.private.net>

Thank you very much for the explanation and your time:
I have a clear image now, and I will certainly consult
the cited literature.

Thanks again,
Jerome

Robert G. Brown wrote:
> yOn Fri, 24 Dec 2004, Jerome BENOIT wrote:d
> 
> 
>>Thanks for the reply.
>>
>>Brian Gough wrote:
>>
>>>Jerome BENOIT writes:
>>> > I understood the sampling part and the comparing part.
>>> > What confuses me is the compatibility criteria.
>>> > In particular, why the undimensionless sigma
>>> > variable (a difference over a square root) is compare
>>> > to a dimensionless value (a constant) ?
>>>
>>>The number of counts in a bin is a dimensionless quantity.
>>>
>>
>>I guess that there is a missunderstanding on my side:
>>is there somewhere in the (classical) literature
>>something which can clarify my understanding of the criteria ?
> 
> 
> Most generators of random distributions (uniform or otherwise) are
> tested by comparing their output with the (or an) expected result.  As
> in: "Suppose I generate some large number of samples from this
> distribution, and they are truly random.  Then I >>know<< the actual
> distribution of values that I >>should<< get.  If I compare the
> distribution of values that I >>did<< get to the one I should have
> gotten, I can calculate the probability of getting the one I got.  If
> that probability is very, very low, there is a pretty good chance that
> my generator is broken."
> 
> For example, suppose I am generating heads or tails via a coin flip, or
> random bits with a generator.  If I generate a large number of them (N)
> and M of them turn out to be heads or 1's, I can compute very exactly
> from the binomial distribution what the probability is that I got the
> particular N/M pair that I did get.  If that probability is small
> (perhaps I generated 1000 samples and 900 turned out to be heads) then
> we would doubt the generator, or if it were a coin we would doubt that
> the coin was an unbiased coin.  We might begin to suspect that if we
> flipped it a second 1000 times, we would be significantly more likely to
> get heads than tails.
> 
> The value computed is called the "p-value" -- the probability of getting
> the distribution you got presuming a truly random distribution
> generator.  Of course, p-values are THEMSELVES distributed randomly,
> presumably uniformly, between 0 and 1.  Sometimes generators might fail
> by getting too CLOSE to the "expected result" -- like a coin that always
> flipped exactly 500 heads out of 1000 flips, you'd start to look to see
> if it were generating sequences like HTHTHT that aren't random at all.
> 
> So you can do a bit better by performing lots of trials and generating a
> distribution of p-values, and comparing that distribution to a uniform
> one to obtain ITS p-value.  Usually one uses a Kolmogorov-Smirnov test
> to do this (and/or to compare a nonuniform distribution generator to the
> expected nonuniform distribution in the first place).  Alternatively,
> one can plot a histogram of p-values and compare it to uniform, although
> that isn't quantitatively as sound.
> 
> This kind of testing (and more) is described in Knuth's The Art of
> Programming, volume II (Seminumerical Algorithms) and is also described
> in some detail in the white paper associated with the NIST STS suite for
> testing RNG's.  A less detailed description is given in the documents
> associated with George Marsaglia's Diehard suite of random number
> generator tests.  There are links to both of these sites near the top of
> the main project page for an RNG tester I've been writing here:
> 
>   http://www.phy.duke.edu/~rgb/General/dieharder.php
> 
> If you grab one of the source tarballs from this site, in the docs
> directory are both the STS white paper and diehard.txt from diehard (as
> well as several other white papers of interest from e.g. FNAL and CERN).
> 
> So in a nutshell, most tests are ultimately based on the central limit
> theorem.  From theory one gets a mean (expected value) and standard
> deviation for some sampled quantity at some sample size.  One generates
> a sample (large enough that the CLT has some validity, minimally more
> than 30, sometimes much larger).  One compares the difference between
> the (computed) value you get and the value you expected to get to the
> standard deviation, and use the error function (for example) or chisq
> distribution to determine the probability of getting what you got.  If
> (very) small, maybe bad.  If not, you can either accept it as good
> (really, as "not obviously bad") or work harder to resolve a problem.
> 
> Hope this helps...and Merry Christmas!
> 
>    rgb
> 

-- 
Dr. Jerome BENOIT
room A2-26
Complexo Interdisciplinar da U. L.
Av. Prof. Gama Pinto, 2
P-1649-003 Lisboa, Portugal
email: jgmbenoit@wanadoo.fr or benoit@cii.fc.ul.pt
--
If you are convinced by the necessity of a European research
initiative, please visit http://fer.apinc.org

     prev parent reply	other threads:[~2004-12-25 20:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-19 18:22 Jerome BENOIT
2004-12-21 17:00 ` Brian Gough
2004-12-21 17:12   ` Jerome BENOIT
2004-12-24 19:00     ` Brian Gough
2004-12-24 19:36       ` Jerome BENOIT
2004-12-25  6:16         ` Robert G. Brown
2004-12-25 20:26           ` Jerome BENOIT [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41CDCCFC.3010003@wanadoo.fr \
    --to=jgmbenoit@wanadoo.fr \
    --cc=gsl-discuss@sources.redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).