multivariate gaussian distribution

public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed

* multivariate gaussian distribution
@ 2003-12-20 11:10 Emmanuel Benazera
  2003-12-22 14:21 ` Brian Gough
  0 siblings, 1 reply; 13+ messages in thread
From: Emmanuel Benazera @ 2003-12-20 11:10 UTC (permalink / raw)
  To: gsl-discuss

Hi,

I've written a few lines of code that implement a multivariate
gaussian distribution. The function takes the eigenvalues/vectors
decomposition of the covariance matrix as input, and outputs a
random vector. In dimension n, it proceeds to n gsl_ran_gaussian
calls, plus a blas level 2 product. It fits in the GSL framework.

I guess it may be useful to other GSL users. Should it be added
to the library ? I can provide the piece of code + doc + theoretical 
reference.

Regards,

Emmanuel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multivariate gaussian distribution
  2003-12-20 11:10 multivariate gaussian distribution Emmanuel Benazera
@ 2003-12-22 14:21 ` Brian Gough
  2003-12-28 10:29   ` multivariate gaussian distribution (Code) Emmanuel Benazera
  0 siblings, 1 reply; 13+ messages in thread
From: Brian Gough @ 2003-12-22 14:21 UTC (permalink / raw)
  To: Emmanuel Benazera; +Cc: gsl-discuss

Emmanuel Benazera writes:
 > I've written a few lines of code that implement a multivariate
 > gaussian distribution. The function takes the eigenvalues/vectors
 > decomposition of the covariance matrix as input, and outputs a
 > random vector. In dimension n, it proceeds to n gsl_ran_gaussian
 > calls, plus a blas level 2 product. It fits in the GSL framework.
 > 
 > I guess it may be useful to other GSL users. Should it be added
 > to the library ? I can provide the piece of code + doc + theoretical 
 > reference.

Certainly feel free to send it to the list or post it on a webpage
somewhere.

-- 
Brian Gough

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multivariate gaussian distribution (Code)
  2003-12-22 14:21 ` Brian Gough
@ 2003-12-28 10:29   ` Emmanuel Benazera
  2003-12-29 11:41     ` Brian Gough
  0 siblings, 1 reply; 13+ messages in thread
From: Emmanuel Benazera @ 2003-12-28 10:29 UTC (permalink / raw)
  To: gsl-discuss

[-- Attachment #1: Type: text/plain, Size: 65 bytes --]

Hi,

here is a tarball. Hope it helps.

Happy holidays,

Emmanuel

[-- Attachment #2: gsl_mvg.tar.gz --]
[-- Type: application/x-tar-gz, Size: 59845 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multivariate gaussian distribution (Code)
  2003-12-28 10:29   ` multivariate gaussian distribution (Code) Emmanuel Benazera
@ 2003-12-29 11:41     ` Brian Gough
  2003-12-29 16:47       ` Emmanuel Benazera
  2004-04-28  8:08       ` multivariate gaussian distribution (Code) / bugfix Emmanuel Benazera
  0 siblings, 2 replies; 13+ messages in thread
From: Brian Gough @ 2003-12-29 11:41 UTC (permalink / raw)
  To: Emmanuel Benazera; +Cc: gsl-discuss

Emmanuel Benazera writes:
 > Hi,
 > 
 > here is a tarball. Hope it helps.
 > 

Is there any advantage of LU over Cholesky decomposition?

-- 
Brian Gough

Network Theory Ltd
15 Royal Park
Bristol BS8 3AL
United Kingdom

Tel: +44 (0)117 3179309
Fax: +44 (0)117 9048108
Web: http://www.network-theory.co.uk/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multivariate gaussian distribution (Code)
  2003-12-29 11:41     ` Brian Gough
@ 2003-12-29 16:47       ` Emmanuel Benazera
  2004-04-28  8:08       ` multivariate gaussian distribution (Code) / bugfix Emmanuel Benazera
  1 sibling, 0 replies; 13+ messages in thread
From: Emmanuel Benazera @ 2003-12-29 16:47 UTC (permalink / raw)
  To: Brian Gough; +Cc: Emmanuel Benazera, gsl-discuss

Hi Brian,

I remember reading that the matrix decomposition through
eigensystems was more 'stable', while Cholesky was
probably faster. See the implementation in R:
http://rweb.stat.umn.edu/R/library/MASS/html/mvrnorm.html

I used the LU for determinant and matrix inverse only.
I'm not an expert in this field though...

Emmanuel

P.S.: in my opinion the multivariate_gaussian_pdf_LU is rather
useless, because for repeated calls, you want the inverse
 to be computed once and for all.

On Mon, Dec 29, 2003 at 11:41:10AM +0000, Brian Gough wrote:
> Emmanuel Benazera writes:
>  > Hi,
>  > 
>  > here is a tarball. Hope it helps.
>  > 
> 
> Is there any advantage of LU over Cholesky decomposition?
> 
> -- 
> Brian Gough
> 
> Network Theory Ltd
> 15 Royal Park
> Bristol BS8 3AL
> United Kingdom
> 
> Tel: +44 (0)117 3179309
> Fax: +44 (0)117 9048108
> Web: http://www.network-theory.co.uk/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multivariate gaussian distribution (Code) / bugfix
  2003-12-29 11:41     ` Brian Gough
  2003-12-29 16:47       ` Emmanuel Benazera
@ 2004-04-28  8:08       ` Emmanuel Benazera
  1 sibling, 0 replies; 13+ messages in thread
From: Emmanuel Benazera @ 2004-04-28  8:08 UTC (permalink / raw)
  To: Brian Gough; +Cc: gsl-discuss

[-- Attachment #1: Type: text/plain, Size: 735 bytes --]

Hi Brian,

There was an ugly bug in files I sent to this list a few months
ago. Someone kindly pointed it to me. So here is an updated version.

I've added a function that uses the Cholesky decomposition. On a simple
example, results appear deceiving compared to the LU decomposition.
Someone here would like to investigate further (code + results) before
using this function.

Btw, alpha doesn't appear in functions for triangular matrix that are 
described p.117 of gsl-ref.

cheers,

Emmanuel

On Mon, Dec 29, 2003 at 11:41:10AM +0000, Brian Gough wrote:
> Emmanuel Benazera writes:
>  > Hi,
>  > 
>  > here is a tarball. Hope it helps.
>  > 
> 
> Is there any advantage of LU over Cholesky decomposition?
> 
> -- 
> Brian Gough
> 

[-- Attachment #2: gsl_mvg.tar.gz --]
[-- Type: application/x-tar-gz, Size: 60578 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multivariate gaussian distribution (Code)
  2003-12-30  9:47 Emmanuel Benazera
  2003-12-30 10:55 ` Przemyslaw Sliwa
@ 2003-12-30 11:31 ` Przemyslaw Sliwa
  1 sibling, 0 replies; 13+ messages in thread
From: Przemyslaw Sliwa @ 2003-12-30 11:31 UTC (permalink / raw)
  To: ebenazer; +Cc: gsl-discuss

Hi Emmanuel (again),

What kind of multivariate distributions are you interested in?
Did you think of a specific family, some skewed densities, etc.

I am strongly interested in this topic, so probably we both can benefit
from that.

Regards,
Przem

> It is easy to implement the Cholesky method. I'll do it if
> people are interested.
>
> Once again though, the R statistical package
> (that is a well-known library) uses the eigenvalues decomposition:
> http://rweb.stat.umn.edu/R/library/MASS/html/mvrnorm.html
>
> Please take time to read the messages !
>
> Emmanuel



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multivariate gaussian distribution (Code)
  2003-12-30  9:47 Emmanuel Benazera
@ 2003-12-30 10:55 ` Przemyslaw Sliwa
  2003-12-30 11:31 ` Przemyslaw Sliwa
  1 sibling, 0 replies; 13+ messages in thread
From: Przemyslaw Sliwa @ 2003-12-30 10:55 UTC (permalink / raw)
  To: ebenazer; +Cc: gsl-discuss

Hi Emmanuel,

You do not have to implement the cholesky method. It has been already
implented. I gave you some C code.

I do not know how people do it in R. But as I said in the previous
posting. The Cholesky decomposition is very effifcient method and has been
already implemented within the GSL framework. If you use the
Eigendecomposition it might happen that someone inputs a covariance matrix
which is not positive definite (imdefinite?). In this case the
eigendecomposition will still work witout signaling any errors. I have no
idea what will be the result if you generate the random vectors. Probably
they will be normally distributed with imdefinite covariance matrix which
is a very bad case. The Cholesky decomp. will fail in this case indicating
an error.

But it is up to you what you want to do. I am just saying there are
methods better than the Eigen with inefficient code. The same about the LU
decomposition.

kind regards,

Przem

> It is easy to implement the Cholesky method. I'll do it if
> people are interested.
>
> Once again though, the R statistical package
> (that is a well-known library) uses the eigenvalues decomposition:
> http://rweb.stat.umn.edu/R/library/MASS/html/mvrnorm.html
>
> Please take time to read the messages !
>
> Emmanuel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multivariate gaussian distribution (Code)
@ 2003-12-30  9:47 Emmanuel Benazera
  2003-12-30 10:55 ` Przemyslaw Sliwa
  2003-12-30 11:31 ` Przemyslaw Sliwa
  0 siblings, 2 replies; 13+ messages in thread
From: Emmanuel Benazera @ 2003-12-30  9:47 UTC (permalink / raw)
  To: sliwa; +Cc: gsl-discuss

It is easy to implement the Cholesky method. I'll do it if
people are interested.

Once again though, the R statistical package 
(that is a well-known library) uses the eigenvalues decomposition:
http://rweb.stat.umn.edu/R/library/MASS/html/mvrnorm.html

Please take time to read the messages !

Emmanuel



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multivariate gaussian distribution (Code)
@ 2003-12-29 23:02 Przemyslaw Sliwa
  0 siblings, 0 replies; 13+ messages in thread
From: Przemyslaw Sliwa @ 2003-12-29 23:02 UTC (permalink / raw)
  To: gsl-discuss

Hi Emmanuel,

Ok. First thing - i hardly believe that it was written somewhere that
the eigendecomposition is more stable than the Cholesky one. The
Cholesky decomposition i so stable that it can be used to determine if
the matrix is positive definite or not (see "Numerical Reciepes in
..."). The only problem might arise when the eigenvalues of the matrix
are very close to zero. The NAG uses some special matrix balancing
procedure. This is what is missing in GSL.

I personally use the multivariate normal distribution for generating a
multivariate GARCH model. In my case the multivariate time series Xt has
(conditionally) normal distribution. This means that the covariance matrix
chabges all the time. Generating a time series of length 60 I have to
compute 60 various covariance matices. Since I use the monte carlo I
repeat this experiment many times (several milions). In one run I have to
compute about 1bln of random vectors which are normally distributed. Of
course additionally I have to decompose 1bln of covariance matrices. Now,
imagine I have to use the inefficient EIGNECODE from GSL. He, he, he...
Hundreds of years. And with the quick Cholesky code just couple of hours.

Now "my" algorithm:

1. suppose that one uses the nacr() function in order to generate the
(independent) random vector. Due to the property of the multivariate
normal distribution (if all marginal distributions are independent normal
then the join distribution is multivariate normal too) one just generates
the vector V[k] as

  for(i=0; i<DIMENSION; i++) V[i]=nacr();
Now the vector has the multivariate distribution with the cvariance matrix
given by the isentity matrix.
2. Decompose the desired covariance matrix Cov using the Cholesky
decomposition obtainig the lower triangular matrix L, so that Cov=LL'
3. Now the new vector P = LV is normally distributed with mean 0 and
covariance matrix Cov, formally P~N(0,Cov).

This algorithm can be applied only to the normal distribution since it
uses the above property. It holds only for the multivariate normal
distribution and for other distributions another algorithm must be used.
But I do not know if you are interested in it. Below one piece of code
which could help you to understand the three steps.

Cheeres,

Przem

      int rl;

      do
	{
	  rl++;
	  for (i = 0; i < PERMUTATION; i++)
	    gsl_vector_set (LAST_NU, i,
			    gsl_vector_get (LAST_E,
					    a[i]) * gsl_vector_get (LAST_E,
								    b[i]));

	  for (size_t ii = 0; ii < PERMUTATION; ii++)
	    {
	      gsl_vector_set (vector_temp_1, ii,
			      gsl_matrix_get (A, ii,
					      ii) * gsl_vector_get (LAST_NU,
								    ii));
	      gsl_vector_set (vector_temp_2, ii,
			      gsl_matrix_get (B, ii,
					      ii) * gsl_vector_get (LAST_h,
								    ii));
	      gsl_vector_set (h, ii,
			      gsl_vector_get (omega,
					      ii) +
			      gsl_vector_get (vector_temp_1,
					      ii) +
			      gsl_vector_get (vector_temp_2, ii));
	    }

	  index = 1;

	  for (long unsigned ii = 1; ii <= size; ii++)
	    for (long unsigned jj = ii; jj <= size; jj++, index++)
	      gsl_matrix_set (H, jj - 1, ii - 1,
			      gsl_vector_get (h, index - 1));

	  for (long unsigned ii = 0; ii < size; ii++)
	    for (long unsigned jj = ii; jj < size; jj++)
	      gsl_matrix_set (H, ii, jj, gsl_matrix_get (H, jj, ii));

	  ifail = gsl_linalg_cholesky_decomp (H);
	  assert (ifail == 0);

	  for (i = 0; i < DIMENSION; i++)
	    gsl_vector_set (XI, i, nacr ());

	  gsl_vector_memcpy (E, XI);

	  gsl_blas_dtrmv (CblasLower, CblasNoTrans, CblasNonUnit, H, E);
	  gsl_vector_memcpy (OBSERVED_E, E);

	  for (i = 0; i < PERMUTATION; i++)
	    gsl_vector_set (z, i,
			    (1.0 - lambda) * gsl_vector_get (z,
							     i) +
			    lambda * (gsl_vector_get (OBSERVED_E, a[i]) *
						  gsl_vector_get (OBSERVED_E, b[i])));

	  for (i = 0; i < PERMUTATION; i++)
	    gsl_vector_set (CENTRED, i,
			    gsl_vector_get (z, i) - gsl_vector_get (MU_TAU,
								    i));

	  if (rl <= MAXLENGTH)
	    gsl_blas_dgemv (CblasNoTrans, 1.0, SIGMA_INVERSE[rl - 1],
			    CENTRED, 0.0, vector_temp_1);
	  else
	    gsl_blas_dgemv (CblasNoTrans, 1.0, SIGMA_INVERSE[MAXLENGTH - 1],
			    CENTRED, 0.0, vector_temp_1);

	  gsl_blas_ddot (CENTRED, vector_temp_1, &mahalanobis);

	  if (mahalanobis < 0)
	    printf ("%f\n", mahalanobis);

	  /******ALL SHIFTS************/

	  gsl_vector_memcpy (LAST_h, h);
	  gsl_vector_memcpy (LAST_E, E);

      /***************************/

	}
      while (rl <= 999999);

PS. It simulates one milion of time series with condtional normally
distributed vectors Et.

Sorry for the mess in the code, but I have just used a part of one function.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multivariate gaussian distribution (Code)
@ 2003-12-29 17:54 Emmanuel Benazera
  0 siblings, 0 replies; 13+ messages in thread
From: Emmanuel Benazera @ 2003-12-29 17:54 UTC (permalink / raw)
  To: sliwa; +Cc: gsl-discuss

Hi Przem,

> 1. Why did you (Emmanuel) use the Eigenvalue decomposition of the
> covariance matrix. This method is extremely inefficient with the gsl
> eigenvalues code. One shall (I do) use the Cholesky decomposition of the
> covariance matrix in order to compute the lower triangular matrix L of the
> form Cov = LL' This procedure is described in several books e.g.
> Harville's "Matrix Algebra from a Statistician's perspective".

As stated before, I read that eigenvalue decomposition was 'stablier' (...)
than Cholesky. However, I'll be interested implementing a faster 
algorithm. I don't have this book at hand. Could you describe this
procedure that uses the Cholesky decomposition ?

> 2. Why do you use this Box Mueller Algorithm? My point is: if you have a
> vector X of independent, normally distributed variables (Covariance matrix
> equals identity matrix) the product P = LX is always normally distributed
> with covariance matrix Cov, since (assuming E(X) = 0) E(PP') = E(LX(LX)')
> = E(LXX'L) = LIL' = Cov. Similar to this method one can simulate the whole
> family of elliptically countered distributions (like Bessel, generalized
> Lapalce, t-distributions).

Przem, I'm not sure I understand your point. The vector of independent variables 
needs to be generated at some point. Therefore the BM algorithm is used,
or the ratio method. Again, I'm not sure I got your point. I think the method
may be used for sampling from several other multi-dimensional distributions. Please
let me know your sources, I'll be happy to implement these algorithms.

Cheers,

Emmanuel

P.S.: Przem, please answer to me as well as to the list. Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multivariate gaussian distribution (Code)
@ 2003-12-29 11:52 Przemyslaw Sliwa
  0 siblings, 0 replies; 13+ messages in thread
From: Przemyslaw Sliwa @ 2003-12-29 11:52 UTC (permalink / raw)
  To: gsl-discuss

Sure,

The Cholesky decomposition will fail for matrices which are not positive
definite (imdefinite). LU decomposition is applicable to all matrices.
So the general requirement (symmetric, positive definite) is not ensured
when using LU decomposition.

Cheers

Przem

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: multivariate gaussian distribution (Code)
@ 2003-12-29 10:41 Przemyslaw Sliwa
  0 siblings, 0 replies; 13+ messages in thread
From: Przemyslaw Sliwa @ 2003-12-29 10:41 UTC (permalink / raw)
  To: gsl-discuss

Dear All,

There are several thing I cannot understand in the code. I suppose it has
been designed to handle real random vectors for multivariate normal
distribution. Therefore at least two points:

1. Why did you (Emmanuel) use the Eigenvalue decomposition of the
covariance matrix. This method is extremely inefficient with the gsl
eigenvalues code. One shall (I do) use the Cholesky decomposition of the
covariance matrix in order to compute the lower triangular matrix L of the
form Cov = LL' This procedure is described in several books e.g.
Harville's "Matrix Algebra from a Statistician's perspective".

2. Why do you use this Box Mueller Algorithm? My point is: if you have a
vector X of independent, normally distributed variables (Covariance matrix
equals identity matrix) the product P = LX is always normally distributed
with covariance matrix Cov, since (assuming E(X) = 0) E(PP') = E(LX(LX)')
= E(LXX'L) = LIL' = Cov. Similar to this method one can simulate the whole
family of elliptically countered distributions (like Bessel, generalized
Lapalce, t-distributions).

Did I miss something?

Thanks for explanations,

Kind regards,

Przem

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-04-28  8:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-20 11:10 multivariate gaussian distribution Emmanuel Benazera
2003-12-22 14:21 ` Brian Gough
2003-12-28 10:29   ` multivariate gaussian distribution (Code) Emmanuel Benazera
2003-12-29 11:41     ` Brian Gough
2003-12-29 16:47       ` Emmanuel Benazera
2004-04-28  8:08       ` multivariate gaussian distribution (Code) / bugfix Emmanuel Benazera
2003-12-29 10:41 multivariate gaussian distribution (Code) Przemyslaw Sliwa
2003-12-29 11:52 Przemyslaw Sliwa
2003-12-29 17:54 Emmanuel Benazera
2003-12-29 23:02 Przemyslaw Sliwa
2003-12-30  9:47 Emmanuel Benazera
2003-12-30 10:55 ` Przemyslaw Sliwa
2003-12-30 11:31 ` Przemyslaw Sliwa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).