Dirchlet RNG - Gavin Crooks

public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed

From: "Gavin Crooks" <gec@threeplusone.com>
To: gsl-discuss@sources.redhat.com
Subject: Dirchlet RNG
Date: Mon, 05 Aug 2002 12:10:00 -0000	[thread overview]
Message-ID: <E17bnF0-0004P7-00@thunder6.cwihosting.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2725 bytes --]

Hi,

I have written a Dirchlet random number generator in the
style of gsl, which I thought might be of some use.

Gavin Crooks

gec@compbio.berkeley.edu
http://threeplusone.com/

Computational Biology Research Group
University of California, Berkeley

--

/* randist/dirchlet.c
 * 
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or (at
 * your option) any later version.
 * 
 * This program is distributed in the hope that it will be useful, but
 * WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * General Public License for more details.
 * 
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 */

/*#include <config.h> Is this needed?*/
#include <math.h>
#include <gsl/gsl_math.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_sf.h>

/* The Dirchlet probability distribution of order K-1 is 

     p(\theta_1,...,\theta_K) d\theta_1 ... d\theta_K = 
        (1/Z) \prod_i=1,K \theta_i^(alpha_i - 1) \delta(1 -\sum_i=1,K
\theta_i)

   The normalization factor Z can be expressed in terms of gamma
functions:

      Z = \prod_i=1,K \gamma(\alpha_i) / \gamma( \sum_i=1,K \alpha_i)  

   The K constants, \alpha_1,...,\alpha_K, must be positive. The K
parameters, 
   \theta_1,...,\theta_K are nonnegative and sum to 1.

   This code generates a Dirchlet by sampling K values from gamma
   distributions with parameters \alpha_i, and renormalizing. 
   See A.M. Law, W.D. Kelton, 1991, Simulation Modeling and Analysis

   Gavin E. Crooks <gec@compbio.berkeley.edu> (2002)
*/

void
gsl_ran_dirchlet (const gsl_rng * r, const size_t K, const double
alpha[], double theta[]) 
{
  size_t i;
  double norm=0.0;

  for(i = 0; i < K; i++) 
    theta[i] = gsl_ran_gamma(r, alpha[i], 1.0);

  for(i = 0; i < K; i++) 
    norm += theta[i];

  for(i = 0; i < K; i++) 
    theta[i] /= norm;
}

double
gsl_ran_dirchlet_pdf (const size_t K, const double alpha[], const double
theta[]) 
{
  /*We calculate the log of the pdf to minimize the possibility of
overflow */ 
  size_t i;
  double log_p = 0.0;
  double sum_alpha = 0.0;
  double log_Z  =0.0;

  for(i = 0; i < K; i++) 
    log_p += (alpha[i]-1.0) * log(theta[i]); 

  for(i = 0; i < K; i++) 
    sum_alpha += alpha[i];

  log_p += gsl_sf_lngamma ( sum_alpha ); 

  for(i = 0; i < K; i++) 
    log_p -= gsl_sf_lngamma ( alpha[i] ); 

  return exp(log_p);
}

[-- Attachment #2: dirchlet.c --]
[-- Type: application/octet-stream, Size: 2463 bytes --]

/* randist/dirchlet.c
 * 
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or (at
 * your option) any later version.
 * 
 * This program is distributed in the hope that it will be useful, but
 * WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * General Public License for more details.
 * 
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 */

/*#include <config.h> Is this needed?*/
#include <math.h>
#include <gsl/gsl_math.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_sf.h>

/* The Dirchlet probability distribution of order K-1 is 

     p(\theta_1,...,\theta_K) d\theta_1 ... d\theta_K = 
        (1/Z) \prod_i=1,K \theta_i^(alpha_i - 1) \delta(1 -\sum_i=1,K \theta_i)

   The normalization factor Z can be expressed in terms of gamma functions:

      Z = \prod_i=1,K \gamma(\alpha_i) / \gamma( \sum_i=1,K \alpha_i)  

   The K constants, \alpha_1,...,\alpha_K, must be positive. The K parameters, 
   \theta_1,...,\theta_K are nonnegative and sum to 1.

   This code generates a Dirchlet by sampling K values from gamma
   distributions with parameters \alpha_i, and renormalizing. 
   See A.M. Law, W.D. Kelton, 1991, Simulation Modeling and Analysis

   Gavin E. Crooks <gec@compbio.berkeley.edu> (2002)
*/

void
gsl_ran_dirchlet (const gsl_rng * r, const size_t K, const double alpha[], double theta[]) 
{
  size_t i;
  double norm=0.0;

  for(i = 0; i < K; i++) 
    theta[i] = gsl_ran_gamma(r, alpha[i], 1.0);

  for(i = 0; i < K; i++) 
    norm += theta[i];

  for(i = 0; i < K; i++) 
    theta[i] /= norm;
}

double
gsl_ran_dirchlet_pdf (const size_t K, const double alpha[], const double theta[]) 
{
  /*We calculate the log of the pdf to minimize the possibility of overflow */ 
  size_t i;
  double log_p = 0.0;
  double sum_alpha = 0.0;
  double log_Z  =0.0;

  for(i = 0; i < K; i++) 
    log_p += (alpha[i]-1.0) * log(theta[i]); 

  for(i = 0; i < K; i++) 
    sum_alpha += alpha[i];

  log_p += gsl_sf_lngamma ( sum_alpha ); 

  for(i = 0; i < K; i++) 
    log_p -= gsl_sf_lngamma ( alpha[i] ); 

  return exp(log_p);
}

[-- Attachment #3: test_dirchlet.c --]
[-- Type: application/octet-stream, Size: 4811 bytes --]

/* modified randist/test.c
 * 
 * Copyright (C) 1996, 1997, 1998, 1999, 2000 James Theiler, Brian Gough
 * 
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or (at
 * your option) any later version.
 * 
 * This program is distributed in the hope that it will be useful, but
 * WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * General Public License for more details.
 * 
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 */

/* #include <config.h> Is this needed? */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <gsl/gsl_math.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_test.h>
#include <gsl/gsl_ieee_utils.h>
#include <gsl/gsl_sf.h>

#define N 100000

void gsl_ran_dirchlet (const gsl_rng * r, const int K, const double alpha[], double theta[]);
double gsl_ran_dirchlet_pdf (const int K, double alpha[], double theta[]);

void test_dirchlet_moments(void);

double test_dirchlet(void);
double test_dirchlet_pdf(double x);
void testPDF (double (*f) (void), double (*pdf)(double), const char *name);

gsl_rng *r_global;

int
main (void)
{
  gsl_ieee_env_setup ();

  gsl_rng_env_setup() ;
  r_global = gsl_rng_alloc (gsl_rng_default);

#define FUNC(x)  test_ ## x,                     "test gsl_ran_" #x
#define FUNC2(x) test_ ## x, test_ ## x ## _pdf, "test gsl_ran_" #x

  testPDF (FUNC2(dirchlet));
  test_dirchlet_moments();

  exit (gsl_test_summary());
}

/* Check that the observed means of the dirchlet variables are
   within reasonable statistical errors of their correct values.
*/

#define DIRCHLET_K 10
void
test_dirchlet_moments(void) 
{
  double alpha[DIRCHLET_K];
  double theta[DIRCHLET_K];
  double theta_sum[DIRCHLET_K];

  double alpha_sum =0.0;
  double mean, obs_mean, sd, sigma;
  int status;  
  int k,n;

  for(k=0; k<DIRCHLET_K; k++)
    {
      alpha[k] = gsl_ran_exponential(r_global, 0.1);
      alpha_sum += alpha[k];
      theta_sum[k] = 0.0;
    }

  for(n=0; n<N; n++) 
    {
      gsl_ran_dirchlet(r_global, DIRCHLET_K, alpha, theta);
      for(k=0;k<DIRCHLET_K; k++) 
	theta_sum[k] += theta[k];
    }

  for(k=0; k<DIRCHLET_K; k++)
    {
      mean = alpha[k]/ alpha_sum;
      sd   = sqrt( (alpha[k] * (1. - alpha[k]/alpha_sum))/(alpha_sum*(alpha_sum+1.)));
      obs_mean = theta_sum[k] / N;
      sigma = sqrt(N)*fabs(mean-obs_mean)/sd;

      status = (sigma>3.);

      gsl_test (status, "test gsl_ran_dirchlet: mean %g expected, %g observed",
	    mean, obs_mean);
    }
}

double
test_dirchlet (void)
{
  /* This is a bit of a lame test, since when K=2, the dirchlet distribution
     becomes a beta distribution */

  int i;
  int K =2;
  double alpha[] = {2.5, 5.0};
  double theta[2];

  gsl_ran_dirchlet ( r_global, K, alpha, theta);

  return theta[0];
}

double
test_dirchlet_pdf (double x)
{
  int i;
  int K =2;
  double alpha[] = {2.5, 5.0};
  double theta[2];

  if( x<=0.0 || x>= 1.0) return 0.0; /* Out of range */

  theta[0]=x;
  theta[1] = 1.0 -x;

  return  gsl_ran_dirchlet_pdf (K, alpha, theta);
}

#define BINS 100

void
testPDF (double (*f) (void), double (*pdf)(double), const char *name)
{
  double count[BINS], p[BINS];
  double a = -5.0, b = +5.0 ;
  double dx = (b - a) / BINS ;
  int i,j,status = 0, status_i =0 ;

  for (i = 0; i < BINS; i++)
    count[i] = 0 ;

  for (i = 0; i < N; i++)
    {
      double r = f ();
      if (r < b && r > a)
	{ 
	  j =  (int)((r - a)/dx) ;
	  count[j]++;
	}
    }

  for (i = 0; i < BINS; i++)
    {
      /* Compute an approximation to the integral of p(x) from x to
         x+dx using Simpson's rule */

      double x = a + i * dx ;
#define STEPS 100
      double sum = 0 ;

      if (fabs(x) < 1e-10) /* hit the origin exactly */
	x = 0.0 ; 

      for (j = 1; j < STEPS; j++)
	sum += pdf(x + j * dx / STEPS) ;

      p[i] =  0.5 * (pdf(x) + 2*sum + pdf(x + dx - 1e-7)) * dx / STEPS ;
    }

  for (i = 0; i < BINS; i++)
    {
      double x = a + i * dx ;
      double d = fabs(count[i] - N*p[i]) ;
      if (p[i] != 0)
	{
	  double s = d / sqrt(N*p[i]) ;
	  status_i = (s > 5) && (d > 1) ;
	}
      else
	{
	  status_i = (count[i] != 0) ;
	}
      status |= status_i ;
      if (status_i) 
	gsl_test (status_i, "%s [%g,%g) (%g/%d=%g observed vs %g expected)", 
		  name, x, x+dx, count[i],N,count[i]/N, p[i]) ;
    }

  if (status == 0)
    gsl_test (status, "%s, sampling against pdf over range [%g,%g) ", 
	      name, a, b) ;
}

next             reply	other threads:[~2002-08-05 19:10 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-05 12:10 Gavin Crooks [this message]
2002-08-06  8:44 ` Strange output "calling fini: /usr/lib/libgsl.so.0" Alexei Meremianin
2002-08-08 14:33   ` Brian Gough
2002-08-09  1:42   ` Alexei Meremianin
2002-08-06 10:43 ` Dirchlet RNG Brian Gough

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E17bnF0-0004P7-00@thunder6.cwihosting.com \
    --to=gec@threeplusone.com \
    --cc=gsl-discuss@sources.redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).