public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed
* Collecting statistics for time dependent data?
@ 2004-12-09 14:00 Raimondo Giammanco
  2004-12-09 19:31 ` John Lamb
  2004-12-09 21:09 ` Luke Stras
  0 siblings, 2 replies; 4+ messages in thread
From: Raimondo Giammanco @ 2004-12-09 14:00 UTC (permalink / raw)
  To: gsl-discuss

Hello,

 I was wondering if there is a way to compute "running" statistics with
gsl.

 Consider a time dependent simulation where at each time-step you
compute a vector/matrix of data, and you want to compute the 
sample mean, but using only one additional vector for data storage.

Is it possible to adapt the gsl code to work in this conditions to
produce a stable mean without before storing all the vectors
and after using gsl_stats_mean?

I apologize in advance if it is not an interesting question, or 
if it has already been answered elsewhere.

Best Regards

-- 
Raimondo Giammanco <rongten@member.fsf.org>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Collecting statistics for time dependent data?
  2004-12-09 14:00 Collecting statistics for time dependent data? Raimondo Giammanco
@ 2004-12-09 19:31 ` John Lamb
  2004-12-09 21:18   ` James Theiler
  2004-12-09 21:09 ` Luke Stras
  1 sibling, 1 reply; 4+ messages in thread
From: John Lamb @ 2004-12-09 19:31 UTC (permalink / raw)
  To: gsl-discuss

Raimondo Giammanco wrote:
> Hello,
> 
>  I was wondering if there is a way to compute "running" statistics with
> gsl.
> 
Yes you can do it, but there's nothing in GSL that does it and its eay 
enough that you don't need GSL. Something like (untested)

double update_mean( double* mean, int* n, double x ){
   if( *n == 1 )
     *mean = x;
   else
     *mean = (1 - (double)1 / *n ) * *mean + x / n;
}

will work and you can derive a similar method for updating the variance 
using the usual textbook formula.

var[x] = (1/n) sum x^2_i - mean(x)^2

I don't know if there is a method that avoids the rounding errors. I 
don't know why so many textbooks repeat this formula without the 
slightest warning that it can go so badly wrong.

-- 
JDL

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Collecting statistics for time dependent data?
  2004-12-09 14:00 Collecting statistics for time dependent data? Raimondo Giammanco
  2004-12-09 19:31 ` John Lamb
@ 2004-12-09 21:09 ` Luke Stras
  1 sibling, 0 replies; 4+ messages in thread
From: Luke Stras @ 2004-12-09 21:09 UTC (permalink / raw)
  To: gsl-discuss

On Thu, Dec 09, 2004 at 12:53:49PM +0100, Raimondo Giammanco wrote:
> Hello,
> 
>  I was wondering if there is a way to compute "running" statistics with
> gsl.
> 
>  Consider a time dependent simulation where at each time-step you
> compute a vector/matrix of data, and you want to compute the 
> sample mean, but using only one additional vector for data storage.

To the best of my knowledge, GSL doesn't have any routines to this.
However, I would recommend:

@Article{west79,
  author =       {D.H.D. West},
  title =        {Updating Mean and Variance Estimates: An Improved 
                  Method},
  journal =      {Comm. ACM},
  year =         1979,
  volume =       22,
  number =       9,
  pages =        {532-535},
  month =        {September}
}

@Article{chan79,
  author =       {T.F. Chan and J.G. Lewis},
  title =        {Computing Standard Deviations: Accuracy},
  journal =      {Comm. ACM},
  year =         1979,
  volume =       22,
  number =       9,
  pages =        {526-531},
  month =        {September}
}

I used to have some (non-GSL) code lying around to implement West's
method, but I can't find it anywhere.  It wasn't very complex, that's
for sure.

-- 
Luke Stras <stras@utias.toronto.edu>
"The meek can have the Earth; the rest of us have other plans" 
  --Henry Spencer

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Collecting statistics for time dependent data?
  2004-12-09 19:31 ` John Lamb
@ 2004-12-09 21:18   ` James Theiler
  0 siblings, 0 replies; 4+ messages in thread
From: James Theiler @ 2004-12-09 21:18 UTC (permalink / raw)
  To: John Lamb; +Cc: gsl-discuss

On Thu, 9 Dec 2004, John Lamb wrote:

] Raimondo Giammanco wrote:
] > Hello,
] > 
] >  I was wondering if there is a way to compute "running" statistics with
] > gsl.
] > 
] Yes you can do it, but there's nothing in GSL that does it and its eay 
] enough that you don't need GSL. Something like (untested)
] 
] double update_mean( double* mean, int* n, double x ){
]    if( *n == 1 )
]      *mean = x;
]    else
]      *mean = (1 - (double)1 / *n ) * *mean + x / n;
] }
] 
] will work and you can derive a similar method for updating the variance 
] using the usual textbook formula.
] 
] var[x] = (1/n) sum x^2_i - mean(x)^2
] 
] I don't know if there is a method that avoids the rounding errors. I 
] don't know why so many textbooks repeat this formula without the 
] slightest warning that it can go so badly wrong.
] 
]

Stably updating mean and variance is remarkably nontrivial.  There was
a series of papers in Comm ACM that discussed the issue; the final one
(that I know of) refers back to the earlier ones, and it can be found
in D.H.D. West, Updating mean and variance estimates: an improved
method, Comm ACM 22:9, 532 (1979)  [* I see Luke Stras just sent this
reference! *].  I'll just copy out the pseudocode since the paper is
old enough that it might not be easy to find.  This, by the way, is
generalized for weighted data, so it assumes that you get a weight and
a data value (W_i and X_i) that you use to update the estimates XBAR
and S2:

    SUMW = W_1
    M = X_1
    T = 0
    For i=2,3,...,n 
    {
       Q = X_i - M
       TEMP = SUM + W_i    // typo: He meant SUMW
       R = Q*W_i/TEMP
       M = M + R
       T = T + R*SUMW*Q
       SUMW = TEMP
    }
    XBAR = M
    S2 = T*n/((n-1)*SUMW)



jt 

-- 
James Theiler            Space and Remote Sensing Sciences
MS-B244, ISR-2, LANL     Los Alamos National Laboratory
Los Alamos, NM 87545     http://nis-www.lanl.gov/~jt



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-12-09 21:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-09 14:00 Collecting statistics for time dependent data? Raimondo Giammanco
2004-12-09 19:31 ` John Lamb
2004-12-09 21:18   ` James Theiler
2004-12-09 21:09 ` Luke Stras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).