public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed
* Re: spearman coefficient
       [not found] <CAGJVmuL7j3Z1jDJudqyNJXqHHQ9f=g6MJqRykd5LA_0x5PT=xw@mail.gmail.com>
@ 2013-05-28 22:44 ` Patrick Alken
  2013-05-29 14:44   ` Timothée Flutre
  0 siblings, 1 reply; 2+ messages in thread
From: Patrick Alken @ 2013-05-28 22:44 UTC (permalink / raw)
  To: timflutre, gsl-discuss

I've added gsl_stats_spearman to the repository and have tested it on a 
few sample datasets. I essentially rewrote the routine using octave and 
numerical recipes as examples, though I rewrote everything from scratch 
so there are no copyright issues.

I added the function gsl_sort_vector2, similar to the numerical recipes 
sort2() function, which eliminates the need to allocate a permutation 
and sort vector. The workspace for the rank vectors is passed directly 
to the function so there is no need to allocate a separate workspace now.

It is possible to write the function to calculate the rank vectors 
in-place in the data vectors, but I opted to keep those inputs untouched 
to stay consistent with the rest of the statistics routines. The user 
must pass in a workspace of size 2*n.

I put the function in statistics/covariance_source.c so it will be 
defined with all the different types (float,double,int,short,etc) and 
its documented in the manual.

I'm sorry I wasn't able to directly use a lot of your code, but I do 
think this implementation is much more consistent with the rest of the 
library design. If you are using this function regularly in your work I 
would appreciate any feedback you can give (ie testing it with a wide 
range of inputs).

Patrick

On 05/25/2013 03:25 PM, Timothée Flutre wrote:
> Hi Patrick,
>
> thanks for your detailed reply. (I don't know why I didn't received
> your email, I had to check the GSL mailing list archive to see it,
> that's why I'm answering directly to you this time.)
>
> About introducing a new workspace, I did it based on your advice from last year:
> http://sourceware.org/ml/gsl-discuss/2012-q1/msg00011.html
>
> I don't have a strong opinion on what is the best, but someone else
> commented on my code and also thought that it would be better to have
> a workspace:
> https://gist.github.com/timflutre/1784199#comment-82458
>
> Maybe the code could offer two functions, with or without the
> workspace? In this case, is there any guidelines to name the
> functions?
>
> I had a look at the implementation in R. The description of the
> interface is here:
> http://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.html).
>
> Even though it indicates that the argument "method" can take the value
> "spearman", I don't see it anymore in the R code and thus I am a bit
> confused by their implementation:
> https://github.com/wch/r-source/blob/trunk/src/library/stats/R/cor.R#L21
>
> Moreover, the R code calls C code:
> https://github.com/wch/r-source/blob/trunk/src/library/stats/src/cov.c#L623
>
> The file with the C code has several macros and functions to compute
> covariance or correlation, to handle missing data in different ways,
> to deal with Pearson, Spearman and Kendall coefficients, etc. All this
> makes it really hard for me to understand it...
>
> Finally, I looked at the algorithm in Numerical Recipes in C, the pdf
> of the book is available here:
> www2.units.it/ipl/students_area/imm2/files/Numerical_Recipes.pdf‎
>
> However, the GSL web site says that we can't use algorithms from this
> book because of the non-free license.
>
> Also, it seems to me that spear() from Numerical Recipe (pdf page 641)
> uses the function srt2() (Quicksort with 2 arrays, page 334) which
> seems to require to allocate another array, "istack". Therefore, at
> the end, it doesn't seem to me that it's much better than my d and
> perm vector, which have the advantage of using other functions of the
> GSL (gsl_sort_vector and gsl_sort_vector_index).
>
> But again, I'm really not an expert programmer, in C or any other
> language. So I tried to see how I could change my code based on what
> you said but I don't see any obvious ways to do it (except copying the
> code from Numerical Recipe).
>
> If you don't want to include the code as it is into the next release
> of the GSL, I'm fine with that. Of course, if you have a better
> understandng of all this and you can explain me what to do, I can try
> to help.
>
> Best,
>
> Timothée Flutre

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: spearman coefficient
  2013-05-28 22:44 ` spearman coefficient Patrick Alken
@ 2013-05-29 14:44   ` Timothée Flutre
  0 siblings, 0 replies; 2+ messages in thread
From: Timothée Flutre @ 2013-05-29 14:44 UTC (permalink / raw)
  To: Patrick Alken; +Cc: gsl-discuss

Looks perfect, thanks a lot!

No problem. In fact I'm not using it myself a lot because I prefer
parametric modeling, but I did use it to reproduce results from other
people.

Timothée Flutre


On Tue, May 28, 2013 at 5:44 PM, Patrick Alken
<patrick.alken@colorado.edu> wrote:
> I've added gsl_stats_spearman to the repository and have tested it on a few
> sample datasets. I essentially rewrote the routine using octave and
> numerical recipes as examples, though I rewrote everything from scratch so
> there are no copyright issues.
>
> I added the function gsl_sort_vector2, similar to the numerical recipes
> sort2() function, which eliminates the need to allocate a permutation and
> sort vector. The workspace for the rank vectors is passed directly to the
> function so there is no need to allocate a separate workspace now.
>
> It is possible to write the function to calculate the rank vectors in-place
> in the data vectors, but I opted to keep those inputs untouched to stay
> consistent with the rest of the statistics routines. The user must pass in a
> workspace of size 2*n.
>
> I put the function in statistics/covariance_source.c so it will be defined
> with all the different types (float,double,int,short,etc) and its documented
> in the manual.
>
> I'm sorry I wasn't able to directly use a lot of your code, but I do think
> this implementation is much more consistent with the rest of the library
> design. If you are using this function regularly in your work I would
> appreciate any feedback you can give (ie testing it with a wide range of
> inputs).
>
> Patrick
>
>
> On 05/25/2013 03:25 PM, Timothée Flutre wrote:
>>
>> Hi Patrick,
>>
>> thanks for your detailed reply. (I don't know why I didn't received
>> your email, I had to check the GSL mailing list archive to see it,
>> that's why I'm answering directly to you this time.)
>>
>> About introducing a new workspace, I did it based on your advice from last
>> year:
>> http://sourceware.org/ml/gsl-discuss/2012-q1/msg00011.html
>>
>> I don't have a strong opinion on what is the best, but someone else
>> commented on my code and also thought that it would be better to have
>> a workspace:
>> https://gist.github.com/timflutre/1784199#comment-82458
>>
>> Maybe the code could offer two functions, with or without the
>> workspace? In this case, is there any guidelines to name the
>> functions?
>>
>> I had a look at the implementation in R. The description of the
>> interface is here:
>> http://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.html).
>>
>> Even though it indicates that the argument "method" can take the value
>> "spearman", I don't see it anymore in the R code and thus I am a bit
>> confused by their implementation:
>> https://github.com/wch/r-source/blob/trunk/src/library/stats/R/cor.R#L21
>>
>> Moreover, the R code calls C code:
>>
>> https://github.com/wch/r-source/blob/trunk/src/library/stats/src/cov.c#L623
>>
>> The file with the C code has several macros and functions to compute
>> covariance or correlation, to handle missing data in different ways,
>> to deal with Pearson, Spearman and Kendall coefficients, etc. All this
>> makes it really hard for me to understand it...
>>
>> Finally, I looked at the algorithm in Numerical Recipes in C, the pdf
>> of the book is available here:
>> www2.units.it/ipl/students_area/imm2/files/Numerical_Recipes.pdf‎
>>
>> However, the GSL web site says that we can't use algorithms from this
>> book because of the non-free license.
>>
>> Also, it seems to me that spear() from Numerical Recipe (pdf page 641)
>> uses the function srt2() (Quicksort with 2 arrays, page 334) which
>> seems to require to allocate another array, "istack". Therefore, at
>> the end, it doesn't seem to me that it's much better than my d and
>> perm vector, which have the advantage of using other functions of the
>> GSL (gsl_sort_vector and gsl_sort_vector_index).
>>
>> But again, I'm really not an expert programmer, in C or any other
>> language. So I tried to see how I could change my code based on what
>> you said but I don't see any obvious ways to do it (except copying the
>> code from Numerical Recipe).
>>
>> If you don't want to include the code as it is into the next release
>> of the GSL, I'm fine with that. Of course, if you have a better
>> understandng of all this and you can explain me what to do, I can try
>> to help.
>>
>> Best,
>>
>> Timothée Flutre
>
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-05-29 14:44 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAGJVmuL7j3Z1jDJudqyNJXqHHQ9f=g6MJqRykd5LA_0x5PT=xw@mail.gmail.com>
2013-05-28 22:44 ` spearman coefficient Patrick Alken
2013-05-29 14:44   ` Timothée Flutre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).