* Re: spearman coefficient [not found] <CAGJVmuL7j3Z1jDJudqyNJXqHHQ9f=g6MJqRykd5LA_0x5PT=xw@mail.gmail.com> @ 2013-05-28 22:44 ` Patrick Alken 2013-05-29 14:44 ` Timothée Flutre 0 siblings, 1 reply; 2+ messages in thread From: Patrick Alken @ 2013-05-28 22:44 UTC (permalink / raw) To: timflutre, gsl-discuss I've added gsl_stats_spearman to the repository and have tested it on a few sample datasets. I essentially rewrote the routine using octave and numerical recipes as examples, though I rewrote everything from scratch so there are no copyright issues. I added the function gsl_sort_vector2, similar to the numerical recipes sort2() function, which eliminates the need to allocate a permutation and sort vector. The workspace for the rank vectors is passed directly to the function so there is no need to allocate a separate workspace now. It is possible to write the function to calculate the rank vectors in-place in the data vectors, but I opted to keep those inputs untouched to stay consistent with the rest of the statistics routines. The user must pass in a workspace of size 2*n. I put the function in statistics/covariance_source.c so it will be defined with all the different types (float,double,int,short,etc) and its documented in the manual. I'm sorry I wasn't able to directly use a lot of your code, but I do think this implementation is much more consistent with the rest of the library design. If you are using this function regularly in your work I would appreciate any feedback you can give (ie testing it with a wide range of inputs). Patrick On 05/25/2013 03:25 PM, Timothée Flutre wrote: > Hi Patrick, > > thanks for your detailed reply. (I don't know why I didn't received > your email, I had to check the GSL mailing list archive to see it, > that's why I'm answering directly to you this time.) > > About introducing a new workspace, I did it based on your advice from last year: > http://sourceware.org/ml/gsl-discuss/2012-q1/msg00011.html > > I don't have a strong opinion on what is the best, but someone else > commented on my code and also thought that it would be better to have > a workspace: > https://gist.github.com/timflutre/1784199#comment-82458 > > Maybe the code could offer two functions, with or without the > workspace? In this case, is there any guidelines to name the > functions? > > I had a look at the implementation in R. The description of the > interface is here: > http://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.html). > > Even though it indicates that the argument "method" can take the value > "spearman", I don't see it anymore in the R code and thus I am a bit > confused by their implementation: > https://github.com/wch/r-source/blob/trunk/src/library/stats/R/cor.R#L21 > > Moreover, the R code calls C code: > https://github.com/wch/r-source/blob/trunk/src/library/stats/src/cov.c#L623 > > The file with the C code has several macros and functions to compute > covariance or correlation, to handle missing data in different ways, > to deal with Pearson, Spearman and Kendall coefficients, etc. All this > makes it really hard for me to understand it... > > Finally, I looked at the algorithm in Numerical Recipes in C, the pdf > of the book is available here: > www2.units.it/ipl/students_area/imm2/files/Numerical_Recipes.pdfâ > > However, the GSL web site says that we can't use algorithms from this > book because of the non-free license. > > Also, it seems to me that spear() from Numerical Recipe (pdf page 641) > uses the function srt2() (Quicksort with 2 arrays, page 334) which > seems to require to allocate another array, "istack". Therefore, at > the end, it doesn't seem to me that it's much better than my d and > perm vector, which have the advantage of using other functions of the > GSL (gsl_sort_vector and gsl_sort_vector_index). > > But again, I'm really not an expert programmer, in C or any other > language. So I tried to see how I could change my code based on what > you said but I don't see any obvious ways to do it (except copying the > code from Numerical Recipe). > > If you don't want to include the code as it is into the next release > of the GSL, I'm fine with that. Of course, if you have a better > understandng of all this and you can explain me what to do, I can try > to help. > > Best, > > Timothée Flutre ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: spearman coefficient 2013-05-28 22:44 ` spearman coefficient Patrick Alken @ 2013-05-29 14:44 ` Timothée Flutre 0 siblings, 0 replies; 2+ messages in thread From: Timothée Flutre @ 2013-05-29 14:44 UTC (permalink / raw) To: Patrick Alken; +Cc: gsl-discuss Looks perfect, thanks a lot! No problem. In fact I'm not using it myself a lot because I prefer parametric modeling, but I did use it to reproduce results from other people. Timothée Flutre On Tue, May 28, 2013 at 5:44 PM, Patrick Alken <patrick.alken@colorado.edu> wrote: > I've added gsl_stats_spearman to the repository and have tested it on a few > sample datasets. I essentially rewrote the routine using octave and > numerical recipes as examples, though I rewrote everything from scratch so > there are no copyright issues. > > I added the function gsl_sort_vector2, similar to the numerical recipes > sort2() function, which eliminates the need to allocate a permutation and > sort vector. The workspace for the rank vectors is passed directly to the > function so there is no need to allocate a separate workspace now. > > It is possible to write the function to calculate the rank vectors in-place > in the data vectors, but I opted to keep those inputs untouched to stay > consistent with the rest of the statistics routines. The user must pass in a > workspace of size 2*n. > > I put the function in statistics/covariance_source.c so it will be defined > with all the different types (float,double,int,short,etc) and its documented > in the manual. > > I'm sorry I wasn't able to directly use a lot of your code, but I do think > this implementation is much more consistent with the rest of the library > design. If you are using this function regularly in your work I would > appreciate any feedback you can give (ie testing it with a wide range of > inputs). > > Patrick > > > On 05/25/2013 03:25 PM, Timothée Flutre wrote: >> >> Hi Patrick, >> >> thanks for your detailed reply. (I don't know why I didn't received >> your email, I had to check the GSL mailing list archive to see it, >> that's why I'm answering directly to you this time.) >> >> About introducing a new workspace, I did it based on your advice from last >> year: >> http://sourceware.org/ml/gsl-discuss/2012-q1/msg00011.html >> >> I don't have a strong opinion on what is the best, but someone else >> commented on my code and also thought that it would be better to have >> a workspace: >> https://gist.github.com/timflutre/1784199#comment-82458 >> >> Maybe the code could offer two functions, with or without the >> workspace? In this case, is there any guidelines to name the >> functions? >> >> I had a look at the implementation in R. The description of the >> interface is here: >> http://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.html). >> >> Even though it indicates that the argument "method" can take the value >> "spearman", I don't see it anymore in the R code and thus I am a bit >> confused by their implementation: >> https://github.com/wch/r-source/blob/trunk/src/library/stats/R/cor.R#L21 >> >> Moreover, the R code calls C code: >> >> https://github.com/wch/r-source/blob/trunk/src/library/stats/src/cov.c#L623 >> >> The file with the C code has several macros and functions to compute >> covariance or correlation, to handle missing data in different ways, >> to deal with Pearson, Spearman and Kendall coefficients, etc. All this >> makes it really hard for me to understand it... >> >> Finally, I looked at the algorithm in Numerical Recipes in C, the pdf >> of the book is available here: >> www2.units.it/ipl/students_area/imm2/files/Numerical_Recipes.pdf >> >> However, the GSL web site says that we can't use algorithms from this >> book because of the non-free license. >> >> Also, it seems to me that spear() from Numerical Recipe (pdf page 641) >> uses the function srt2() (Quicksort with 2 arrays, page 334) which >> seems to require to allocate another array, "istack". Therefore, at >> the end, it doesn't seem to me that it's much better than my d and >> perm vector, which have the advantage of using other functions of the >> GSL (gsl_sort_vector and gsl_sort_vector_index). >> >> But again, I'm really not an expert programmer, in C or any other >> language. So I tried to see how I could change my code based on what >> you said but I don't see any obvious ways to do it (except copying the >> code from Numerical Recipe). >> >> If you don't want to include the code as it is into the next release >> of the GSL, I'm fine with that. Of course, if you have a better >> understandng of all this and you can explain me what to do, I can try >> to help. >> >> Best, >> >> Timothée Flutre > > ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-05-29 14:44 UTC | newest] Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CAGJVmuL7j3Z1jDJudqyNJXqHHQ9f=g6MJqRykd5LA_0x5PT=xw@mail.gmail.com> 2013-05-28 22:44 ` spearman coefficient Patrick Alken 2013-05-29 14:44 ` Timothée Flutre
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).