public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed
From: Patrick Alken <patrick.alken@Colorado.EDU>
To: "timflutre@gmail.com" <timflutre@gmail.com>,
	 "gsl-discuss@sourceware.org" <gsl-discuss@sourceware.org>
Subject: Re: spearman coefficient
Date: Tue, 28 May 2013 22:44:00 -0000	[thread overview]
Message-ID: <51A53336.30801@colorado.edu> (raw)
In-Reply-To: <CAGJVmuL7j3Z1jDJudqyNJXqHHQ9f=g6MJqRykd5LA_0x5PT=xw@mail.gmail.com>

I've added gsl_stats_spearman to the repository and have tested it on a 
few sample datasets. I essentially rewrote the routine using octave and 
numerical recipes as examples, though I rewrote everything from scratch 
so there are no copyright issues.

I added the function gsl_sort_vector2, similar to the numerical recipes 
sort2() function, which eliminates the need to allocate a permutation 
and sort vector. The workspace for the rank vectors is passed directly 
to the function so there is no need to allocate a separate workspace now.

It is possible to write the function to calculate the rank vectors 
in-place in the data vectors, but I opted to keep those inputs untouched 
to stay consistent with the rest of the statistics routines. The user 
must pass in a workspace of size 2*n.

I put the function in statistics/covariance_source.c so it will be 
defined with all the different types (float,double,int,short,etc) and 
its documented in the manual.

I'm sorry I wasn't able to directly use a lot of your code, but I do 
think this implementation is much more consistent with the rest of the 
library design. If you are using this function regularly in your work I 
would appreciate any feedback you can give (ie testing it with a wide 
range of inputs).

Patrick

On 05/25/2013 03:25 PM, Timothée Flutre wrote:
> Hi Patrick,
>
> thanks for your detailed reply. (I don't know why I didn't received
> your email, I had to check the GSL mailing list archive to see it,
> that's why I'm answering directly to you this time.)
>
> About introducing a new workspace, I did it based on your advice from last year:
> http://sourceware.org/ml/gsl-discuss/2012-q1/msg00011.html
>
> I don't have a strong opinion on what is the best, but someone else
> commented on my code and also thought that it would be better to have
> a workspace:
> https://gist.github.com/timflutre/1784199#comment-82458
>
> Maybe the code could offer two functions, with or without the
> workspace? In this case, is there any guidelines to name the
> functions?
>
> I had a look at the implementation in R. The description of the
> interface is here:
> http://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.html).
>
> Even though it indicates that the argument "method" can take the value
> "spearman", I don't see it anymore in the R code and thus I am a bit
> confused by their implementation:
> https://github.com/wch/r-source/blob/trunk/src/library/stats/R/cor.R#L21
>
> Moreover, the R code calls C code:
> https://github.com/wch/r-source/blob/trunk/src/library/stats/src/cov.c#L623
>
> The file with the C code has several macros and functions to compute
> covariance or correlation, to handle missing data in different ways,
> to deal with Pearson, Spearman and Kendall coefficients, etc. All this
> makes it really hard for me to understand it...
>
> Finally, I looked at the algorithm in Numerical Recipes in C, the pdf
> of the book is available here:
> www2.units.it/ipl/students_area/imm2/files/Numerical_Recipes.pdf‎
>
> However, the GSL web site says that we can't use algorithms from this
> book because of the non-free license.
>
> Also, it seems to me that spear() from Numerical Recipe (pdf page 641)
> uses the function srt2() (Quicksort with 2 arrays, page 334) which
> seems to require to allocate another array, "istack". Therefore, at
> the end, it doesn't seem to me that it's much better than my d and
> perm vector, which have the advantage of using other functions of the
> GSL (gsl_sort_vector and gsl_sort_vector_index).
>
> But again, I'm really not an expert programmer, in C or any other
> language. So I tried to see how I could change my code based on what
> you said but I don't see any obvious ways to do it (except copying the
> code from Numerical Recipe).
>
> If you don't want to include the code as it is into the next release
> of the GSL, I'm fine with that. Of course, if you have a better
> understandng of all this and you can explain me what to do, I can try
> to help.
>
> Best,
>
> Timothée Flutre

       reply	other threads:[~2013-05-28 22:44 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAGJVmuL7j3Z1jDJudqyNJXqHHQ9f=g6MJqRykd5LA_0x5PT=xw@mail.gmail.com>
2013-05-28 22:44 ` Patrick Alken [this message]
2013-05-29 14:44   ` Timothée Flutre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51A53336.30801@colorado.edu \
    --to=patrick.alken@colorado.edu \
    --cc=gsl-discuss@sourceware.org \
    --cc=timflutre@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).