public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed
From: Patrick Alken <patrick.alken@Colorado.EDU>
To: "gsl-discuss@sourceware.org" <gsl-discuss@sourceware.org>
Subject: Re: Robust linear least squares
Date: Fri, 24 May 2013 20:12:00 -0000	[thread overview]
Message-ID: <519FC995.1090303@colorado.edu> (raw)
In-Reply-To: <CAGJVmuLLgYvVG+H_2ccctxmrB0Esm+No6zC1=+N6YWCTWOpuqw@mail.gmail.com>

Hi Tim,

   Yes I remember and I do still think it would be a very nice function 
to have in GSL. My main worry at this point is the need to call the 
_alloc and _free functions for this function. Looking through the 
statistics chapter of the manual, none of the functions there require 
alloc/free calls, so it would be really nice to implement spearman in a 
similar way.

   One issue we need to worry about, is once we introduce a new 
workspace (gsl_stats_spearman) into GSL, it will be there for a long 
time since we need to keep GSL binary compatible for future releases (so 
that for future releases users won't have to recompile their code and 
can just link to the new library).

   If there is no other way to nicely implement this function then so be 
it - we will include a spearman workspace, but I'd really like to 
exhaust other options first.

   For example, I know you've looked at the Apache implementation. Have 
you looked at the R implementation as well to get any ideas? Also, 
numerical recipes implements this function where they allocate 2 vectors 
(your ranks1/ranks2 vectors) inside the spear function. Numerical 
recipes doesn't seem to need an additional sort vector (your d) or 
permutation vector (p). Do you think there is a way to eliminate these 2 
parameters, and then perhaps the user could simply pass in a double 
variable of size 2*n which you could use as your ranks1/ranks2 vectors, 
eliminating the need for spearman_workspace.

   Alternatively, do you think there is any clever way to compute the 
ranks in-place in the data1/2 vectors, so you won't have to allocate 
additional ranks1/ranks2 vectors?

   Finally, I know I asked you before about the ties_trace realloc call 
- it looks like this variable is allocated to 'nties' on the fly. Is 
there any way to count the number of ties initially, so that this only 
needs to be allocated once?

   I don't see any realloc calls in the Numerical Recipes 
implementation, so I'd like to ask you to try to understand how they do 
it (Also look at GNU R which may be more professionally written). 
Perhaps look at octave too?

   Sorry to be a stickler about this but I do think its worth trying to 
eliminate the alloc/free calls for this function. Even with the 
alloc/free calls there is still a performance hit due to the realloc 
calls of ties_trace. I may have some time next week to look into this a 
bit more myself.

Patrick

On 05/24/2013 01:23 PM, Timothée Flutre wrote:
> Hello Patrick,
>
> about the next release, a while ago I proposed some code (+tests) to
> compute the Spearman rank correlation coefficient. I uploaded my code
> on savannah (http://savannah.gnu.org/bugs/?36199) and it is also
> available on github (https://github.com/timflutre/spearman). At least
> one person asked on the mailing list if this coef was implemented
> (http://savannah.gnu.org/bugs/?37728) so I think it would be useful to
> add it.
> I tried to follow the GSL guidelines as close as possible so that it
> should be possible to integrate the code easily into the next release.
> I would be glad to help in this matter if necessary.
>
> Best,
> Tim

  reply	other threads:[~2013-05-24 20:12 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-24 19:23 Timothée Flutre
2013-05-24 20:12 ` Patrick Alken [this message]
  -- strict thread matches above, loose matches on Subject: below --
2013-05-10 22:01 Patrick Alken
2013-05-12 17:56 ` Peter Teuben
2013-05-12 18:13   ` Dirk Eddelbuettel
2013-05-12 18:14   ` Patrick Alken

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=519FC995.1090303@colorado.edu \
    --to=patrick.alken@colorado.edu \
    --cc=gsl-discuss@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).