From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11634 invoked by alias); 10 Feb 2012 00:05:14 -0000 Received: (qmail 11624 invoked by uid 22791); 10 Feb 2012 00:05:12 -0000 X-SWARE-Spam-Status: No, hits=-0.1 required=5.0 tests=AWL,BAYES_00,SARE_MILLIONSOF,T_FRT_PROFILE2,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from ipmx2.colorado.edu (HELO ipmx2.colorado.edu) (128.138.128.232) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 10 Feb 2012 00:04:58 +0000 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAKddNE/AqBGb/2dsb2JhbABDhQ+rVYFyAQEFIw8BBRslEQsYAgIFFgsCAgkDAgECAUUGAQwIAQEFh3ymXYkbiQeBL4oCCAYdBgECBQUFAwQGBAMECwIHBQYBAwIIF4NyCUyCLYEWBIhIn2k Received: from omr-raz-2-priv.int.colorado.edu ([192.168.17.155]) by ipmx2-priv.int.colorado.edu with ESMTP; 09 Feb 2012 17:04:58 -0700 Received: from bonanza.ngdc.noaa.gov (EHLO bonanza.ngdc.noaa.gov) ([140.172.179.41]) by omr-raz-2-priv.int.colorado.edu (MOS 4.1.10-GA FastPath queued) with ESMTP id CYT10509 (AUTH alken); Thu, 09 Feb 2012 17:04:57 -0700 (MST) Message-ID: <4F345F28.3060102@colorado.edu> Date: Fri, 10 Feb 2012 00:05:00 -0000 From: Patrick Alken User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:10.0) Gecko/20120129 Thunderbird/10.0 MIME-Version: 1.0 To: timflutre@gmail.com, "gsl-discuss@sourceware.org" Subject: Re: [Help-gsl] Spearman rank correlation coefficient References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Mailing-List: contact gsl-discuss-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gsl-discuss-owner@sourceware.org X-SW-Source: 2012-q1/txt/msg00011.txt.bz2 Hello, It would be best to move this discussion over to gsl-discuss. I think it would be very useful to have this function in GSL. Just a few comments on your code: 1) The code looks clean and nicely commented. One issue is that since you appear to have followed the apache code very closely, there may be a licensing issue - I don't know if the Apache license is compatible with the GPL. On a quick check, its possible we can use it but it seems we need to preserve the original copyright notice. 2) Dynamic allocation - it looks like you dynamically allocate 5 different arrays to do the calculation. It would be better to either make functions like gsl_stats_spearman_alloc and gsl_stats_spearman_free, or to pass in a pre-allocated workspace as one of the function arguments. Since you're using workspace of different types (double,size_t), its probably better to make the alloc/free functions. 3) One of your dynamically allocated arrays is realloc()'d in a loop. Is this because the size of the array is unknown before the loop? Perhaps there is a way to avoid the realloc's. 4) We also need to think of some automated tests that can be added to statistics/test.c to test this function exhaustively and make sure its working correctly - even if that consists simply of known output values for a few different input cases. Good work, Patrick Alken On 02/09/2012 04:26 PM, Timothée Flutre wrote: > Hello, > > I noticed that only the Pearson correlation coefficient is implemented > in the GSL (http://www.gnu.org/software/gsl/manual/html_node/Correlation.html). > However, in quantitative genetics, several authors are using the > Spearman coef (for instance, Stranger et al "Population genomics of > human gene expression", Nature Genetics, 2007) as it is less > influenced by outliers. > > Current high-throughput data requires to compute such coef several > millions of times. Thus I implemented the computation of the Spearman > coef in GSL-like code. In fact, one just need to rank the input > vectors and then compute the Pearson coef on them. For the ranking, I > got inspired by the code from the Apache Math module. > > I was thinking that it could be useful to other users to add my piece > of code to the file "covariance_source.c" of the GSL > (http://bzr.savannah.gnu.org/lh/gsl/trunk/annotate/head:/statistics/covariance_source.c#L77). > So here is the code: https://gist.github.com/1784199 > > I am not very proficient in C, so even if it is not possible to > include the code in the GSL, don't hesitate to give me advice. > > Thanks, > Tim >