From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18758 invoked by alias); 29 May 2013 14:44:53 -0000 Mailing-List: contact gsl-discuss-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gsl-discuss-owner@sourceware.org Received: (qmail 18747 invoked by uid 89); 29 May 2013 14:44:53 -0000 X-Spam-SWARE-Status: No, score=-4.0 required=5.0 tests=BAYES_00,FREEMAIL_FROM,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,SPF_PASS,TW_DN autolearn=ham version=3.3.1 Received: from mail-we0-f176.google.com (HELO mail-we0-f176.google.com) (74.125.82.176) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Wed, 29 May 2013 14:44:52 +0000 Received: by mail-we0-f176.google.com with SMTP id p58so6296820wes.7 for ; Wed, 29 May 2013 07:44:49 -0700 (PDT) X-Received: by 10.194.120.70 with SMTP id la6mr1839387wjb.36.1369838689364; Wed, 29 May 2013 07:44:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.238.129 with HTTP; Wed, 29 May 2013 07:44:28 -0700 (PDT) Reply-To: timflutre@gmail.com In-Reply-To: <51A53336.30801@colorado.edu> References: <51A53336.30801@colorado.edu> From: =?UTF-8?Q?Timoth=C3=A9e_Flutre?= Date: Wed, 29 May 2013 14:44:00 -0000 Message-ID: Subject: Re: spearman coefficient To: Patrick Alken Cc: "gsl-discuss@sourceware.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2013-q2/txt/msg00013.txt.bz2 Looks perfect, thanks a lot! No problem. In fact I'm not using it myself a lot because I prefer parametric modeling, but I did use it to reproduce results from other people. Timoth=C3=A9e Flutre On Tue, May 28, 2013 at 5:44 PM, Patrick Alken wrote: > I've added gsl_stats_spearman to the repository and have tested it on a f= ew > sample datasets. I essentially rewrote the routine using octave and > numerical recipes as examples, though I rewrote everything from scratch so > there are no copyright issues. > > I added the function gsl_sort_vector2, similar to the numerical recipes > sort2() function, which eliminates the need to allocate a permutation and > sort vector. The workspace for the rank vectors is passed directly to the > function so there is no need to allocate a separate workspace now. > > It is possible to write the function to calculate the rank vectors in-pla= ce > in the data vectors, but I opted to keep those inputs untouched to stay > consistent with the rest of the statistics routines. The user must pass i= n a > workspace of size 2*n. > > I put the function in statistics/covariance_source.c so it will be defined > with all the different types (float,double,int,short,etc) and its documen= ted > in the manual. > > I'm sorry I wasn't able to directly use a lot of your code, but I do think > this implementation is much more consistent with the rest of the library > design. If you are using this function regularly in your work I would > appreciate any feedback you can give (ie testing it with a wide range of > inputs). > > Patrick > > > On 05/25/2013 03:25 PM, Timoth=C3=A9e Flutre wrote: >> >> Hi Patrick, >> >> thanks for your detailed reply. (I don't know why I didn't received >> your email, I had to check the GSL mailing list archive to see it, >> that's why I'm answering directly to you this time.) >> >> About introducing a new workspace, I did it based on your advice from la= st >> year: >> http://sourceware.org/ml/gsl-discuss/2012-q1/msg00011.html >> >> I don't have a strong opinion on what is the best, but someone else >> commented on my code and also thought that it would be better to have >> a workspace: >> https://gist.github.com/timflutre/1784199#comment-82458 >> >> Maybe the code could offer two functions, with or without the >> workspace? In this case, is there any guidelines to name the >> functions? >> >> I had a look at the implementation in R. The description of the >> interface is here: >> http://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.html). >> >> Even though it indicates that the argument "method" can take the value >> "spearman", I don't see it anymore in the R code and thus I am a bit >> confused by their implementation: >> https://github.com/wch/r-source/blob/trunk/src/library/stats/R/cor.R#L21 >> >> Moreover, the R code calls C code: >> >> https://github.com/wch/r-source/blob/trunk/src/library/stats/src/cov.c#L= 623 >> >> The file with the C code has several macros and functions to compute >> covariance or correlation, to handle missing data in different ways, >> to deal with Pearson, Spearman and Kendall coefficients, etc. All this >> makes it really hard for me to understand it... >> >> Finally, I looked at the algorithm in Numerical Recipes in C, the pdf >> of the book is available here: >> www2.units.it/ipl/students_area/imm2/files/Numerical_Recipes.pdf=E2=80=8E >> >> However, the GSL web site says that we can't use algorithms from this >> book because of the non-free license. >> >> Also, it seems to me that spear() from Numerical Recipe (pdf page 641) >> uses the function srt2() (Quicksort with 2 arrays, page 334) which >> seems to require to allocate another array, "istack". Therefore, at >> the end, it doesn't seem to me that it's much better than my d and >> perm vector, which have the advantage of using other functions of the >> GSL (gsl_sort_vector and gsl_sort_vector_index). >> >> But again, I'm really not an expert programmer, in C or any other >> language. So I tried to see how I could change my code based on what >> you said but I don't see any obvious ways to do it (except copying the >> code from Numerical Recipe). >> >> If you don't want to include the code as it is into the next release >> of the GSL, I'm fine with that. Of course, if you have a better >> understandng of all this and you can explain me what to do, I can try >> to help. >> >> Best, >> >> Timoth=C3=A9e Flutre > >