From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32386 invoked by alias); 12 May 2013 17:56:11 -0000 Mailing-List: contact gsl-discuss-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gsl-discuss-owner@sourceware.org Received: (qmail 32376 invoked by uid 89); 12 May 2013 17:56:11 -0000 X-Spam-SWARE-Status: No, score=-4.0 required=5.0 tests=BAYES_00,KHOP_THREADED,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 Received: from gaia.astro.umd.edu (HELO gaia.astro.umd.edu) (129.2.14.3) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Sun, 12 May 2013 17:56:10 +0000 Received: from [192.168.1.10] (pool-70-110-19-11.washdc.fios.verizon.net [70.110.19.11]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by gaia.astro.umd.edu (Postfix) with ESMTPSA id E7F2BA6C0D3; Sun, 12 May 2013 13:56:04 -0400 (EDT) Message-ID: <518FD7B4.8070100@astro.umd.edu> Date: Sun, 12 May 2013 17:56:00 -0000 From: Peter Teuben User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 MIME-Version: 1.0 To: "gsl-discuss@sourceware.org" CC: Patrick Alken Subject: Re: Robust linear least squares References: <518D6E3B.8080503@colorado.edu> In-Reply-To: <518D6E3B.8080503@colorado.edu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-SW-Source: 2013-q2/txt/msg00001.txt.bz2 Patrick I agree, this is a useful option! can you say a little more here how you define robustness. The one I know takes the quartiles Q1 and Q3 (where Q2 would be the median), then define D=Q3-Q1 and only uses points between Q1-1.5*D and Q3+1.5*D to define things like a robust mean and variance. Why 1.5 I don't know, I guess you could keep that a variable and tinker with it. For OLS you can imagine applying this in an iterative way to the Y values, since formally the errors in X are neglibable compared to those in Y. I'm saying iterative, since in theory the 2nd iteration could have rejected points that should have been part or the "core points". For non-linear fitting this could be a lot more tricky. peter On 05/10/2013 06:01 PM, Patrick Alken wrote: > Hi all, > > I just committed a significant chunk of code related to robust > linear regression into GSL and mainly wanted to update the other > developers and any other interested parties. The main idea here is > that ordinary least squares is very sensitive to data outliers, and > the robust algorithm tries to identify and downweight outlier points > so they don't drastically affect the model. I think this is something > that has been needed in gsl for a while. > > I've been developing the code for a while and have been using it > successfully in my own work, and also validated it pretty extensively > against the matlab implementation. I still need to make some automated > tests for it which I should get to next week. > > In the meantime, the code is very usable and working so feel free to > try it out. > > Patrick