public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed
* computing R-squared with gsl_multifit
@ 2007-09-25 19:46 Patrick Alken
  2007-09-26 15:52 ` Brian Gough
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick Alken @ 2007-09-25 19:46 UTC (permalink / raw)
  To: gsl-discuss

Hi all,

  This is a continuation of the discussion from help-gsl on
computing the R^2 "coefficient of determination" for multi-parameter
fits. I think it would be a useful thing to have and I propose
that the best way to add it in would be to compute it in
gsl_multifit_linear and gsl_multifit_wlinear and store it in
the gsl_multifit workspace. Then provide a function like
gsl_multifit_linear_Rsq to return the value to the user. This
way we wouldn't need to change the api and the multifit_linear
routines would remain backwards compatible.

  Computing the value would require adding about 4-5 lines of
code and it can be computed at the same time as chisq so there
doesn't have to be an extra loop added in.

Patrick Alken

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: computing R-squared with gsl_multifit
  2007-09-25 19:46 computing R-squared with gsl_multifit Patrick Alken
@ 2007-09-26 15:52 ` Brian Gough
  2007-09-26 16:18   ` Dirk Eddelbuettel
  2007-09-26 16:20   ` Patrick Alken
  0 siblings, 2 replies; 7+ messages in thread
From: Brian Gough @ 2007-09-26 15:52 UTC (permalink / raw)
  To: gsl-discuss

At Tue, 25 Sep 2007 13:46:28 -0600,
Patrick Alken wrote:
>   Computing the value would require adding about 4-5 lines of
> code and it can be computed at the same time as chisq so there
> doesn't have to be an extra loop added in.

Hello,

I think I'd prefer not to have results passed back through the
workspace, it's not really part of the design -- a separate function
is better, as for the correlation coefficient.

A question: is the formula the same for both weighted and unweighted
fits?

-- 
Brian Gough

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: computing R-squared with gsl_multifit
  2007-09-26 15:52 ` Brian Gough
@ 2007-09-26 16:18   ` Dirk Eddelbuettel
  2007-09-26 16:26     ` Patrick Alken
  2007-09-26 16:20   ` Patrick Alken
  1 sibling, 1 reply; 7+ messages in thread
From: Dirk Eddelbuettel @ 2007-09-26 16:18 UTC (permalink / raw)
  To: gsl-discuss

On Wed, Sep 26, 2007 at 04:51:19PM +0100, Brian Gough wrote:
> At Tue, 25 Sep 2007 13:46:28 -0600,
> Patrick Alken wrote:
> >   Computing the value would require adding about 4-5 lines of
> > code and it can be computed at the same time as chisq so there
> > doesn't have to be an extra loop added in.
> 
> Hello,
> 
> I think I'd prefer not to have results passed back through the
> workspace, it's not really part of the design -- a separate function
> is better, as for the correlation coefficient.
> 
> A question: is the formula the same for both weighted and unweighted
> fits?

Yes, as it relates 'explained sum of squares' to 'total sum of
squares', so it just uses residuals, irrespective of whether these
were computed with a unit vector of weights, or with actual weights.

Wikipedia is not a bad start:
http://en.wikipedia.org/wiki/R-squared

Hth, Dirk

-- 
Three out of two people have difficulties with fractions.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: computing R-squared with gsl_multifit
  2007-09-26 15:52 ` Brian Gough
  2007-09-26 16:18   ` Dirk Eddelbuettel
@ 2007-09-26 16:20   ` Patrick Alken
  2007-10-02  8:35     ` Brian Gough
  1 sibling, 1 reply; 7+ messages in thread
From: Patrick Alken @ 2007-09-26 16:20 UTC (permalink / raw)
  To: gsl-discuss

> A question: is the formula the same for both weighted and unweighted
> fits?

No, unfortunately.

R^2 = 1 - chisq / Sum [ w_i * (y_i - mean(y))^2 ].

In the case of weighted data, the mean(y) is also a weighted mean:

mean(y) = 1/sum [ w_i ] * sum [ w_i y_i ]

I looked at the GNU R source to verify all this.

> 
> -- 
> Brian Gough
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: computing R-squared with gsl_multifit
  2007-09-26 16:18   ` Dirk Eddelbuettel
@ 2007-09-26 16:26     ` Patrick Alken
  2007-09-26 16:43       ` Dirk Eddelbuettel
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick Alken @ 2007-09-26 16:26 UTC (permalink / raw)
  To: gsl-discuss

> Yes, as it relates 'explained sum of squares' to 'total sum of
> squares', so it just uses residuals, irrespective of whether these
> were computed with a unit vector of weights, or with actual weights.
> 
> Wikipedia is not a bad start:
> http://en.wikipedia.org/wiki/R-squared

Well, that wiki page says:

"If fitting is by weighted least squares or generalized least squares,
alternative versions of R2 can be calculated appropriate to those
statistical frameworks,"

If you use the weights in calculating the residuals then it seems
you'd have to use them in the total sum of squares too. In the GNU
R source, the file src/library/stats/R/lm.R has the code which
calculates R^2 and it definitely takes weights into account for
both the residuals and the total sum of squares.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: computing R-squared with gsl_multifit
  2007-09-26 16:26     ` Patrick Alken
@ 2007-09-26 16:43       ` Dirk Eddelbuettel
  0 siblings, 0 replies; 7+ messages in thread
From: Dirk Eddelbuettel @ 2007-09-26 16:43 UTC (permalink / raw)
  To: Patrick Alken; +Cc: gsl-discuss

On Wed, Sep 26, 2007 at 10:26:29AM -0600, Patrick Alken wrote:
> > Yes, as it relates 'explained sum of squares' to 'total sum of
> > squares', so it just uses residuals, irrespective of whether these
> > were computed with a unit vector of weights, or with actual weights.
> > 
> > Wikipedia is not a bad start:
> > http://en.wikipedia.org/wiki/R-squared
> 
> Well, that wiki page says:
> 
> "If fitting is by weighted least squares or generalized least squares,
> alternative versions of R2 can be calculated appropriate to those
> statistical frameworks,"
> 
> If you use the weights in calculating the residuals then it seems
> you'd have to use them in the total sum of squares too. In the GNU
> R source, the file src/library/stats/R/lm.R has the code which
> calculates R^2 and it definitely takes weights into account for
> both the residuals and the total sum of squares.

Aiee. Thanks for checking, and correcting my off-the-cuff hint.

Dirk

-- 
Three out of two people have difficulties with fractions.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: computing R-squared with gsl_multifit
  2007-09-26 16:20   ` Patrick Alken
@ 2007-10-02  8:35     ` Brian Gough
  0 siblings, 0 replies; 7+ messages in thread
From: Brian Gough @ 2007-10-02  8:35 UTC (permalink / raw)
  To: gsl-discuss

At Wed, 26 Sep 2007 10:20:05 -0600,
Patrick Alken wrote:
> 
> > A question: is the formula the same for both weighted and unweighted
> > fits?
> 
> No, unfortunately.
> 
> R^2 = 1 - chisq / Sum [ w_i * (y_i - mean(y))^2 ].
> 
> In the case of weighted data, the mean(y) is also a weighted mean:
> 
> mean(y) = 1/sum [ w_i ] * sum [ w_i y_i ]
> 
> I looked at the GNU R source to verify all this.

As a starting point I've added gsl_stats_ss and gsl_stats_wss for
computing the unweighted/weighted sum of squares.  

Given chisq and the data this is sufficient for computing R^2 with the
formula above.

-- 
Brian Gough

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-10-02  8:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-25 19:46 computing R-squared with gsl_multifit Patrick Alken
2007-09-26 15:52 ` Brian Gough
2007-09-26 16:18   ` Dirk Eddelbuettel
2007-09-26 16:26     ` Patrick Alken
2007-09-26 16:43       ` Dirk Eddelbuettel
2007-09-26 16:20   ` Patrick Alken
2007-10-02  8:35     ` Brian Gough

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).