covariance matrix data sampling

public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed

* covariance matrix data sampling
@ 2003-09-06  8:52 Pasquale Tricarico
  2003-09-08  8:05 ` Carsten Svaneborg
  0 siblings, 1 reply; 4+ messages in thread
From: Pasquale Tricarico @ 2003-09-06  8:52 UTC (permalink / raw)
  To: gsl-discuss

Hi All,

  The task is very simple: if you have a set of experimental data, you 
can compute mean values and standard deviation for each derived physical 
quantity. The covariance matrix can also be computed without any 
problem. If the number of physical quantities is i.e. 6, we will have 6 
mean values with the corresponding standard deviations, and the 
covariance matrix will be a 6x6 real symmetric matrix. (I have in mind 
an orbit, with 6 free parameters, and a set of sky observations).
  Now suppose that you want to generate, using a program based on the 
GSL, one million of data points using the covariance matrix. Those 
points must 'agree' with the experimental data in the sense that mean 
values, standard deviation and covariance matrix computed using only the 
generated data points must be as close as possible to the original 
experimental one. (In the orbit analogy, the generated data points 
represent a plausible orbit given the observations.) This is a common 
problem in many simulation programs, and at the moment I use a 'not very 
reliable' set of source code found somewhere in the net to achieve this, 
but I'm not satisfied with this solution.
  I don't know of any GSL function call to achieve this. Any solution?

  Thanks.

   --Pasquale
     http://orsa.sf.net

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: covariance matrix data sampling
  2003-09-06  8:52 covariance matrix data sampling Pasquale Tricarico
@ 2003-09-08  8:05 ` Carsten Svaneborg
  0 siblings, 0 replies; 4+ messages in thread
From: Carsten Svaneborg @ 2003-09-08  8:05 UTC (permalink / raw)
  To: gsl-discuss

On Friday 05 September 2003 23:11, Pasquale Tricarico wrote:
>   Now suppose that you want to generate, using a program based on the
> GSL, one million of data points using the covariance matrix.

Diagonalise the covariance matrix to get the eigenvalues and -vectors.

In the eigenspace of the covariance matrix, generate a 6D vector
with gaussianly distributed random numbers and where the
std deviations are given by 1/eigenvalue.  (or just the eigenvalue??)

Then with a bit of linear algebra transform that vector back from the
eigenspace into the observables you use and translate to fix the mean.

By construction an ensemble of datapoints generated by that process
will reproduce the mean position and the covariance matrix, and each
data point will be statistically independent, which would not be the
case if a Monte Carlo technique was used to generate them.

-- 
  Mvh. Carsten Svaneborg

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: covariance matrix data sampling
  2003-09-06 11:32 ` Pasquale Tricarico
@ 2003-09-06 13:59   ` Well Howell
  0 siblings, 0 replies; 4+ messages in thread
From: Well Howell @ 2003-09-06 13:59 UTC (permalink / raw)
  To: pasquale.tricarico; +Cc: gsl-discuss

If the Cholesky is blowing up, how about using Singular Value 
Decomposition to factor the hard way:
That would also get the rank or help decide if there are really 5 or 6  
coeffecients based on the smallest
singular value you wish to work with....

If you're fortunate enough to have the second edition of the Gelman et 
al text referenced previously,
you'll be looking for page 578 for the basic way to generate draws from 
a multivariate normal.
Seems as if most of us that care about such things are into MCMC 
techniques for Bayesian analysis ~;)

Well Howell

Pasquale Tricarico wrote:

> Martin Jansche wrote:
>
>> Does this mean you want to sample from the multivariate normal
>> distribution?  If so, you're right that there is no code in the GSL
>> (except for the bivariate case), unless I'm missing something.  To
>> simulate a random draw from the multivariate normal, consult any
>> statistics book that discusses computational techniques.  For example,
>> Gelman, Carlin, Stern & Rubin ("Bayesian Data Analysis", 1995, p.478)
>> give the following algorithm.  The basic idea is to use the Cholesky
>> decomposition of the covariance matrix (GSL can do that) in
>> conjunction with several draws from the univariate normal (GSL can do
>> that too).
>>
>> More specifically, gsl_linalg_cholesky_decomp(Sigma) computes the
>> Cholesky decomposition of the covariance matrix Sigma and stores it in
>> the lower triangular part of Sigma.  Call that matrix (the lower
>> triangular one) A.  Perform d independent draws from a standard normal
>> (using gsl_ran_ugaussian()) and store them in a d-dimensional vector
>> x.  Then transform x into a new vector y as by letting y = mu + A x,
>> where mu is the d-dimensional mean vector.  The d-dimensional vector y
>> is then a draw from the mulitvariate normal with covariance matrix
>> Sigma and mean vector mu.
>>
>> Does that answer your question?
>>
>> - martin
>>  
>>
>
> Dear Martin,
>
> Thank you very much! You focused exactly the problem, and now I'm going
> to try the code you have outlined. Using the GSL gives to me much
> control over the generated data, i.e. the possibility to select a random
> seed for the generation of the x vector, and better error control.
>
> Unfortunately sometimes the covariance matrix is not positive-definite,
> i.e. because two variables are too-much correlated, and the
> decomposition fails. I've read that the correct approach to this problem
> should be to reduce the number of variables by 1, i.e. from 6 to 5,
> removing one of the two too-much correlated variables, and then use the
> reduced covariance matrix to generate 5-vectors instead of 6-vectors.
> The only thing I don't understand is how to correctly recover, after the
> data generation, a 6-vector from a 5-vector; I suppose I should use a
> formula relating the mu, the standard deviation of the removed variable
> and the generated 5-vector to the 6th element. Maybe there is a better
> solution to this. The solution to this issue would definitely solve the
> problem.
>
> Best regards
>
> --Pasquale
>
>
>
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: covariance matrix data sampling
       [not found] <Pine.SOL.4.33.0309060457150.3033-100000@julius.ling.ohio-state.edu>
@ 2003-09-06 11:32 ` Pasquale Tricarico
  2003-09-06 13:59   ` Well Howell
  0 siblings, 1 reply; 4+ messages in thread
From: Pasquale Tricarico @ 2003-09-06 11:32 UTC (permalink / raw)
  To: gsl-discuss

Martin Jansche wrote:

>Does this mean you want to sample from the multivariate normal
>distribution?  If so, you're right that there is no code in the GSL
>(except for the bivariate case), unless I'm missing something.  To
>simulate a random draw from the multivariate normal, consult any
>statistics book that discusses computational techniques.  For example,
>Gelman, Carlin, Stern & Rubin ("Bayesian Data Analysis", 1995, p.478)
>give the following algorithm.  The basic idea is to use the Cholesky
>decomposition of the covariance matrix (GSL can do that) in
>conjunction with several draws from the univariate normal (GSL can do
>that too).
>
>More specifically, gsl_linalg_cholesky_decomp(Sigma) computes the
>Cholesky decomposition of the covariance matrix Sigma and stores it in
>the lower triangular part of Sigma.  Call that matrix (the lower
>triangular one) A.  Perform d independent draws from a standard normal
>(using gsl_ran_ugaussian()) and store them in a d-dimensional vector
>x.  Then transform x into a new vector y as by letting y = mu + A x,
>where mu is the d-dimensional mean vector.  The d-dimensional vector y
>is then a draw from the mulitvariate normal with covariance matrix
>Sigma and mean vector mu.
>
>Does that answer your question?
>
>- martin
>  
>

Dear Martin,

Thank you very much! You focused exactly the problem, and now I'm going
to try the code you have outlined. Using the GSL gives to me much
control over the generated data, i.e. the possibility to select a random
seed for the generation of the x vector, and better error control.

Unfortunately sometimes the covariance matrix is not positive-definite,
i.e. because two variables are too-much correlated, and the
decomposition fails. I've read that the correct approach to this problem
should be to reduce the number of variables by 1, i.e. from 6 to 5,
removing one of the two too-much correlated variables, and then use the
reduced covariance matrix to generate 5-vectors instead of 6-vectors.
The only thing I don't understand is how to correctly recover, after the
data generation, a 6-vector from a 5-vector; I suppose I should use a
formula relating the mu, the standard deviation of the removed variable
and the generated 5-vector to the 6th element. Maybe there is a better
solution to this. The solution to this issue would definitely solve the
problem.

Best regards

--Pasquale

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-09-08  8:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-06  8:52 covariance matrix data sampling Pasquale Tricarico
2003-09-08  8:05 ` Carsten Svaneborg
     [not found] <Pine.SOL.4.33.0309060457150.3033-100000@julius.ling.ohio-state.edu>
2003-09-06 11:32 ` Pasquale Tricarico
2003-09-06 13:59   ` Well Howell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).