From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gsl-discuss-return-5776-listarch-gsl-discuss=sources.redhat.com@sourceware.org>
Received: (qmail 14553 invoked by alias); 28 May 2013 22:44:12 -0000
Mailing-List: contact gsl-discuss-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gsl-discuss.sourceware.org>
List-Subscribe: <mailto:gsl-discuss-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gsl-discuss/>
List-Post: <mailto:gsl-discuss@sourceware.org>
List-Help: <mailto:gsl-discuss-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gsl-discuss-owner@sourceware.org
Received: (qmail 14534 invoked by uid 89); 28 May 2013 22:44:11 -0000
X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED,RCVD_IN_HOSTKARMA_NO,RP_MATCHES_RCVD,SPF_PASS,TW_DN autolearn=ham version=3.3.1
Received: from ipmx5.colorado.edu (HELO ipmx5.colorado.edu) (128.138.128.235)
    by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Tue, 28 May 2013 22:44:09 +0000
From: Patrick Alken <patrick.alken@Colorado.EDU>
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AlEGAFsypVGMrLMp/2dsb2JhbABZgwkwgzu+a4EHFnSCIwEBBSMPAQUzAxsLGAICBRMOAgIPAkYGAQwIAQGICQyqAIloiAiBJow1gUmCQYETA4kfj0WEYos1gy4dgTU
Received: from bonanza.ngdc.noaa.gov ([140.172.179.41])
  by smtp.colorado.edu with ESMTP/TLS/DHE-RSA-CAMELLIA256-SHA; 28 May 2013 16:44:07 -0600
Message-ID: <51A53336.30801@colorado.edu>
Date: Tue, 28 May 2013 22:44:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6
MIME-Version: 1.0
To: "timflutre@gmail.com" <timflutre@gmail.com>, 
 "gsl-discuss@sourceware.org" <gsl-discuss@sourceware.org>
Subject: Re: spearman coefficient
References: <CAGJVmuL7j3Z1jDJudqyNJXqHHQ9f=g6MJqRykd5LA_0x5PT=xw@mail.gmail.com>
In-Reply-To: <CAGJVmuL7j3Z1jDJudqyNJXqHHQ9f=g6MJqRykd5LA_0x5PT=xw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-SW-Source: 2013-q2/txt/msg00012.txt.bz2

I've added gsl_stats_spearman to the repository and have tested it on a 
few sample datasets. I essentially rewrote the routine using octave and 
numerical recipes as examples, though I rewrote everything from scratch 
so there are no copyright issues.

I added the function gsl_sort_vector2, similar to the numerical recipes 
sort2() function, which eliminates the need to allocate a permutation 
and sort vector. The workspace for the rank vectors is passed directly 
to the function so there is no need to allocate a separate workspace now.

It is possible to write the function to calculate the rank vectors 
in-place in the data vectors, but I opted to keep those inputs untouched 
to stay consistent with the rest of the statistics routines. The user 
must pass in a workspace of size 2*n.

I put the function in statistics/covariance_source.c so it will be 
defined with all the different types (float,double,int,short,etc) and 
its documented in the manual.

I'm sorry I wasn't able to directly use a lot of your code, but I do 
think this implementation is much more consistent with the rest of the 
library design. If you are using this function regularly in your work I 
would appreciate any feedback you can give (ie testing it with a wide 
range of inputs).

Patrick

On 05/25/2013 03:25 PM, TimothÃ©e Flutre wrote:
> Hi Patrick,
>
> thanks for your detailed reply. (I don't know why I didn't received
> your email, I had to check the GSL mailing list archive to see it,
> that's why I'm answering directly to you this time.)
>
> About introducing a new workspace, I did it based on your advice from last year:
> http://sourceware.org/ml/gsl-discuss/2012-q1/msg00011.html
>
> I don't have a strong opinion on what is the best, but someone else
> commented on my code and also thought that it would be better to have
> a workspace:
> https://gist.github.com/timflutre/1784199#comment-82458
>
> Maybe the code could offer two functions, with or without the
> workspace? In this case, is there any guidelines to name the
> functions?
>
> I had a look at the implementation in R. The description of the
> interface is here:
> http://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.html).
>
> Even though it indicates that the argument "method" can take the value
> "spearman", I don't see it anymore in the R code and thus I am a bit
> confused by their implementation:
> https://github.com/wch/r-source/blob/trunk/src/library/stats/R/cor.R#L21
>
> Moreover, the R code calls C code:
> https://github.com/wch/r-source/blob/trunk/src/library/stats/src/cov.c#L623
>
> The file with the C code has several macros and functions to compute
> covariance or correlation, to handle missing data in different ways,
> to deal with Pearson, Spearman and Kendall coefficients, etc. All this
> makes it really hard for me to understand it...
>
> Finally, I looked at the algorithm in Numerical Recipes in C, the pdf
> of the book is available here:
> www2.units.it/ipl/students_area/imm2/files/Numerical_Recipes.pdfâ
>
> However, the GSL web site says that we can't use algorithms from this
> book because of the non-free license.
>
> Also, it seems to me that spear() from Numerical Recipe (pdf page 641)
> uses the function srt2() (Quicksort with 2 arrays, page 334) which
> seems to require to allocate another array, "istack". Therefore, at
> the end, it doesn't seem to me that it's much better than my d and
> perm vector, which have the advantage of using other functions of the
> GSL (gsl_sort_vector and gsl_sort_vector_index).
>
> But again, I'm really not an expert programmer, in C or any other
> language. So I tried to see how I could change my code based on what
> you said but I don't see any obvious ways to do it (except copying the
> code from Numerical Recipe).
>
> If you don't want to include the code as it is into the next release
> of the GSL, I'm fine with that. Of course, if you have a better
> understandng of all this and you can explain me what to do, I can try
> to help.
>
> Best,
>
> TimothÃ©e Flutre