From: Patrick Alken <Patrick.Alken@Colorado.EDU>
To: timflutre@gmail.com,
"gsl-discuss@sourceware.org" <gsl-discuss@sourceware.org>
Subject: Re: [Help-gsl] Spearman rank correlation coefficient
Date: Fri, 10 Feb 2012 00:05:00 -0000 [thread overview]
Message-ID: <4F345F28.3060102@colorado.edu> (raw)
In-Reply-To: <CAGJVmuKKL4HCXViqxAf3b=em05hb7ox5pABm5b_LdKNMxTLMkg@mail.gmail.com>
Hello,
It would be best to move this discussion over to gsl-discuss. I think
it would be very useful to have this function in GSL. Just a few
comments on your code:
1) The code looks clean and nicely commented. One issue is that since
you appear to have followed the apache code very closely, there may be a
licensing issue - I don't know if the Apache license is compatible with
the GPL. On a quick check, its possible we can use it but it seems we
need to preserve the original copyright notice.
2) Dynamic allocation - it looks like you dynamically allocate 5
different arrays to do the calculation. It would be better to either
make functions like gsl_stats_spearman_alloc and
gsl_stats_spearman_free, or to pass in a pre-allocated workspace as one
of the function arguments. Since you're using workspace of different
types (double,size_t), its probably better to make the alloc/free functions.
3) One of your dynamically allocated arrays is realloc()'d in a loop. Is
this because the size of the array is unknown before the loop? Perhaps
there is a way to avoid the realloc's.
4) We also need to think of some automated tests that can be added to
statistics/test.c to test this function exhaustively and make sure its
working correctly - even if that consists simply of known output values
for a few different input cases.
Good work,
Patrick Alken
On 02/09/2012 04:26 PM, Timothée Flutre wrote:
> Hello,
>
> I noticed that only the Pearson correlation coefficient is implemented
> in the GSL (http://www.gnu.org/software/gsl/manual/html_node/Correlation.html).
> However, in quantitative genetics, several authors are using the
> Spearman coef (for instance, Stranger et al "Population genomics of
> human gene expression", Nature Genetics, 2007) as it is less
> influenced by outliers.
>
> Current high-throughput data requires to compute such coef several
> millions of times. Thus I implemented the computation of the Spearman
> coef in GSL-like code. In fact, one just need to rank the input
> vectors and then compute the Pearson coef on them. For the ranking, I
> got inspired by the code from the Apache Math module.
>
> I was thinking that it could be useful to other users to add my piece
> of code to the file "covariance_source.c" of the GSL
> (http://bzr.savannah.gnu.org/lh/gsl/trunk/annotate/head:/statistics/covariance_source.c#L77).
> So here is the code: https://gist.github.com/1784199
>
> I am not very proficient in C, so even if it is not possible to
> include the code in the GSL, don't hesitate to give me advice.
>
> Thanks,
> Tim
>
next parent reply other threads:[~2012-02-10 0:05 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAGJVmuKKL4HCXViqxAf3b=em05hb7ox5pABm5b_LdKNMxTLMkg@mail.gmail.com>
2012-02-10 0:05 ` Patrick Alken [this message]
2012-02-11 20:53 ` Timothée Flutre
2012-03-02 15:33 ` Timothée Flutre
2012-03-02 15:59 ` Rhys Ulerich
2012-03-02 16:01 ` Patrick Alken
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F345F28.3060102@colorado.edu \
--to=patrick.alken@colorado.edu \
--cc=gsl-discuss@sourceware.org \
--cc=timflutre@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).