From: Jerome BENOIT <jgmbenoit@wanadoo.fr>
Cc: gsl-discuss@sources.redhat.com
Subject: Re: discret random distributions: test
Date: Sat, 25 Dec 2004 20:26:00 -0000 [thread overview]
Message-ID: <41CDCCFC.3010003@wanadoo.fr> (raw)
In-Reply-To: <Pine.LNX.4.58.0412250139540.30203@lilith.rgb.private.net>
Thank you very much for the explanation and your time:
I have a clear image now, and I will certainly consult
the cited literature.
Thanks again,
Jerome
Robert G. Brown wrote:
> yOn Fri, 24 Dec 2004, Jerome BENOIT wrote:d
>
>
>>Thanks for the reply.
>>
>>Brian Gough wrote:
>>
>>>Jerome BENOIT writes:
>>> > I understood the sampling part and the comparing part.
>>> > What confuses me is the compatibility criteria.
>>> > In particular, why the undimensionless sigma
>>> > variable (a difference over a square root) is compare
>>> > to a dimensionless value (a constant) ?
>>>
>>>The number of counts in a bin is a dimensionless quantity.
>>>
>>
>>I guess that there is a missunderstanding on my side:
>>is there somewhere in the (classical) literature
>>something which can clarify my understanding of the criteria ?
>
>
> Most generators of random distributions (uniform or otherwise) are
> tested by comparing their output with the (or an) expected result. As
> in: "Suppose I generate some large number of samples from this
> distribution, and they are truly random. Then I >>know<< the actual
> distribution of values that I >>should<< get. If I compare the
> distribution of values that I >>did<< get to the one I should have
> gotten, I can calculate the probability of getting the one I got. If
> that probability is very, very low, there is a pretty good chance that
> my generator is broken."
>
> For example, suppose I am generating heads or tails via a coin flip, or
> random bits with a generator. If I generate a large number of them (N)
> and M of them turn out to be heads or 1's, I can compute very exactly
> from the binomial distribution what the probability is that I got the
> particular N/M pair that I did get. If that probability is small
> (perhaps I generated 1000 samples and 900 turned out to be heads) then
> we would doubt the generator, or if it were a coin we would doubt that
> the coin was an unbiased coin. We might begin to suspect that if we
> flipped it a second 1000 times, we would be significantly more likely to
> get heads than tails.
>
> The value computed is called the "p-value" -- the probability of getting
> the distribution you got presuming a truly random distribution
> generator. Of course, p-values are THEMSELVES distributed randomly,
> presumably uniformly, between 0 and 1. Sometimes generators might fail
> by getting too CLOSE to the "expected result" -- like a coin that always
> flipped exactly 500 heads out of 1000 flips, you'd start to look to see
> if it were generating sequences like HTHTHT that aren't random at all.
>
> So you can do a bit better by performing lots of trials and generating a
> distribution of p-values, and comparing that distribution to a uniform
> one to obtain ITS p-value. Usually one uses a Kolmogorov-Smirnov test
> to do this (and/or to compare a nonuniform distribution generator to the
> expected nonuniform distribution in the first place). Alternatively,
> one can plot a histogram of p-values and compare it to uniform, although
> that isn't quantitatively as sound.
>
> This kind of testing (and more) is described in Knuth's The Art of
> Programming, volume II (Seminumerical Algorithms) and is also described
> in some detail in the white paper associated with the NIST STS suite for
> testing RNG's. A less detailed description is given in the documents
> associated with George Marsaglia's Diehard suite of random number
> generator tests. There are links to both of these sites near the top of
> the main project page for an RNG tester I've been writing here:
>
> http://www.phy.duke.edu/~rgb/General/dieharder.php
>
> If you grab one of the source tarballs from this site, in the docs
> directory are both the STS white paper and diehard.txt from diehard (as
> well as several other white papers of interest from e.g. FNAL and CERN).
>
> So in a nutshell, most tests are ultimately based on the central limit
> theorem. From theory one gets a mean (expected value) and standard
> deviation for some sampled quantity at some sample size. One generates
> a sample (large enough that the CLT has some validity, minimally more
> than 30, sometimes much larger). One compares the difference between
> the (computed) value you get and the value you expected to get to the
> standard deviation, and use the error function (for example) or chisq
> distribution to determine the probability of getting what you got. If
> (very) small, maybe bad. If not, you can either accept it as good
> (really, as "not obviously bad") or work harder to resolve a problem.
>
> Hope this helps...and Merry Christmas!
>
> rgb
>
--
Dr. Jerome BENOIT
room A2-26
Complexo Interdisciplinar da U. L.
Av. Prof. Gama Pinto, 2
P-1649-003 Lisboa, Portugal
email: jgmbenoit@wanadoo.fr or benoit@cii.fc.ul.pt
--
If you are convinced by the necessity of a European research
initiative, please visit http://fer.apinc.org
prev parent reply other threads:[~2004-12-25 20:26 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-19 18:22 Jerome BENOIT
2004-12-21 17:00 ` Brian Gough
2004-12-21 17:12 ` Jerome BENOIT
2004-12-24 19:00 ` Brian Gough
2004-12-24 19:36 ` Jerome BENOIT
2004-12-25 6:16 ` Robert G. Brown
2004-12-25 20:26 ` Jerome BENOIT [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41CDCCFC.3010003@wanadoo.fr \
--to=jgmbenoit@wanadoo.fr \
--cc=gsl-discuss@sources.redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).