From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29968 invoked by alias); 1 Mar 2004 17:55:17 -0000 Mailing-List: contact gsl-discuss-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gsl-discuss-owner@sources.redhat.com Received: (qmail 29956 invoked from network); 1 Mar 2004 17:55:16 -0000 Received: from unknown (HELO mail.phy.duke.edu) (152.3.182.2) by sources.redhat.com with SMTP; 1 Mar 2004 17:55:16 -0000 Received: from ganesh.phy.duke.edu (ganesh.phy.duke.edu [152.3.182.51]) by mail.phy.duke.edu (Postfix) with ESMTP id E20CCA77E2; Mon, 1 Mar 2004 12:55:13 -0500 (EST) Received: by ganesh.phy.duke.edu (Postfix, from userid 1337) id 2DA9C6726B; Mon, 1 Mar 2004 12:55:14 -0500 (EST) Date: Mon, 01 Mar 2004 17:55:00 -0000 From: "Robert G. Brown" X-X-Sender: rgb@ganesh To: Przemyslaw Sliwa Cc: gsl-discuss@sources.redhat.com Subject: Re: Random Number Seed In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2004-q1/txt/msg00071.txt.bz2 On Mon, 1 Mar 2004, Przemyslaw Sliwa wrote: > Hi, > > I have a question: > When one wants to use the random number seed different than the default one (equals to 0) one can use the macro GSL_RNG_SEED=seed from the command line. I would like to use the system time as the seed and have no idea how one can use the it from the command line. Therefore I want to use the function clock() in my C program. Could you help me how the seed can be initialized from the function claock() within my c program? This is getting to be a faq. Here is a short discursion on seeds yet again. Depending on the rng chosen, using the clock as a seed ranges from a maybe-safe bad idea to a really BAD bad idea. Obviously the seeds on all jobs started in (say) any given hour will have substantial bit-level correlations. Whether or not those bit-level correlations will cause supposedly "independent" jobs started with nearby seeds to exhibit unexpected correlations depends in part on the quality of the rng selected, but LOTS of the GSL rngs are not terriby high quality and would be likely to exhibit the problem. Seeding by hand can also be problematic as humans have a hard time selecting random unsigned long integers from the full range of available values. The "best" solution (in my opinion) for seeding a rng to get unique rng series in disparate computations (so one can, for example, apply statistics safely to results from the computations under the assumption that those results are "independent, identically distributed" numbers according to the requirements of statistics and the central limit theorem) is to do the following: a) Use an rng with a very, very, very,...very long period. The period really should be long enough that all of your samplings from the rng are "unlikely" to overlap. b) Use a "high quality" rng, one that passes the Diehard suite or most of the NIST/FPE suite of tests of randomness. The default GSL rng, mt19937, is a very good choice wrt both a) and b). It has a period of 2^19937, which is yes, a very large number and has passed the diehard tests. It is also pretty fast -- one of the faster generators in the GSL suite. c) Seed the generator from /dev/random when it is available. /dev/random is slow and unsuitable for monte carlo sampling in most cases, but it is highly "unpredictable" and appears to do well on bit-level randomness tests. It is almost certainly adequate and may even be ideal. Note that EVEN mt19937 had problems with bit correlations caused by certain seeds -- the current version is supposedly fixed but it still cannot hurt at all to use the most random seed you have available. d) If you DO want to ensure that all your samplings drawn from each seed are unique, record the seeds and use them to label your answers in such a way that IF by any miracle you get two seeds that are identical, the answer derived from those two runs is only counted once. In most cases this will make no observable difference in the answer, of course, if one is pulling seeds from bitlevel-random unsigned long ints, but is still a good practice. e) Only if /dev/random is not available consider using the clock. In that case you can use a bit of common sense to determine whether or not to take extra measures. If you're writing a game, don't bother. If you're doing simulations, you MIGHT want to use the clock to reseed one (good) rng, and use the first rng to determine e.g. a bitshuffling or other "randomization" of the original seed to create a new, less obviously correlated seed for the second (better) rng. I don't have an explicit theoretical foundation for this (although there may be one) but intuitively doing this in two stages with good rngs will break up bitlevel correlations in the second while diluting overall seed-based correlation by something like the product of the available phase spaces. A code snippet for seeding from /dev/random (with fallback from the clock) is included below. It basically returns an unsigned long integer with at least some of its bits set by the faster usec scale clock in gettimeofday. If you prefer, you could only use the seconds portion of this. It is important to note that the addition it uses has a distinct nonzero probability of returning the same seed but is generally more "random"; using seconds alone is very strongly correlated (and will OFTEN return the same seed value if multiple jobs are started per second or on a cluster where there is a bit of clock drift). rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu #include #include unsigned long int random_seed() { unsigned int seed; struct timeval tv; FILE *devrandom; if ((devrandom = fopen("/dev/random","r")) == NULL) { gettimeofday(&tv,0); seed = tv.tv_sec + tv.tv_usec; if(verbose == D_SEED) printf("Got seed %u from gettimeofday()\n",seed); } else { fread(&seed,sizeof(seed),1,devrandom); if(verbose == D_SEED) printf("Got seed %u from /dev/random\n",seed); fclose(devrandom); } return(seed); }