public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed
From: Patrick Alken <patrick.alken@Colorado.EDU>
To: Peter Teuben <teuben@astro.umd.edu>
Cc: "gsl-discuss@sourceware.org" <gsl-discuss@sourceware.org>
Subject: Re: Robust linear least squares
Date: Sun, 12 May 2013 18:14:00 -0000	[thread overview]
Message-ID: <518FDC14.9070600@colorado.edu> (raw)
In-Reply-To: <518FD7B4.8070100@astro.umd.edu>

Hi Peter,

   The most common robust least squares algorithm is called 
"M-estimation" which is what I've implemented. At each step of the 
iteration, you calculate the residuals and use a weighting function 
which is designed to assign large weights to small residuals and small 
weights to large residuals, so that the large residuals (outliers) 
contribute less and less to the model at each iteration. At each 
iteration, you need an estimate of the residual standard deviation, and 
I am using the Mean-Absolute-Deviation (MAD) of the p largest residuals 
(where p is the number of model parameters). There are alternatives to 
computing sigma but the MAD seems to be the most widely used.

   If you check out the latest repository, have a look at the manual 
since I've documented everything including a description of the 
algorithm used. Let me know if you have more questions.

Patrick

On 05/12/2013 11:56 AM, Peter Teuben wrote:
> Patrick
> I agree, this is a useful option!
>
>     can you say a little more here how you define robustness. The one I
> know takes the quartiles Q1 and Q3 (where Q2 would
> be the median), then define D=Q3-Q1 and only uses points between
> Q1-1.5*D and Q3+1.5*D to define things like  a robust mean and variance.
> Why 1.5 I don't know, I guess you could keep that a variable and tinker
> with it.
> For OLS you can imagine applying this in an iterative way to the Y
> values, since formally the errors in X are neglibable compared to those
> in Y. I'm saying iterative, since in theory the 2nd iteration could have
> rejected points that should have
> been part or the "core points".  For non-linear fitting this could be a
> lot more tricky.
>
> peter
>
>
> On 05/10/2013 06:01 PM, Patrick Alken wrote:
>> Hi all,
>>
>>    I just committed a significant chunk of code related to robust
>> linear regression into GSL and mainly wanted to update the other
>> developers and any other interested parties. The main idea here is
>> that ordinary least squares is very sensitive to data outliers, and
>> the robust algorithm tries to identify and downweight outlier points
>> so they don't drastically affect the model. I think this is something
>> that has been needed in gsl for a while.
>>
>>    I've been developing the code for a while and have been using it
>> successfully in my own work, and also validated it pretty extensively
>> against the matlab implementation. I still need to make some automated
>> tests for it which I should get to next week.
>>
>>    In the meantime, the code is very usable and working so feel free to
>> try it out.
>>
>> Patrick

  parent reply	other threads:[~2013-05-12 18:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-10 22:01 Patrick Alken
2013-05-12 17:56 ` Peter Teuben
2013-05-12 18:13   ` Dirk Eddelbuettel
2013-05-12 18:14   ` Patrick Alken [this message]
2013-05-24 19:23 Timothée Flutre
2013-05-24 20:12 ` Patrick Alken

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=518FDC14.9070600@colorado.edu \
    --to=patrick.alken@colorado.edu \
    --cc=gsl-discuss@sourceware.org \
    --cc=teuben@astro.umd.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).