From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19089 invoked by alias); 15 Mar 2007 23:40:06 -0000 Received: (qmail 18808 invoked by uid 22791); 15 Mar 2007 23:40:04 -0000 X-Spam-Check-By: sourceware.org Received: from delphi.hss.caltech.edu (HELO mailhost.hss.caltech.edu) (131.215.23.131) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 15 Mar 2007 23:39:58 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by mailhost.hss.caltech.edu (Postfix) with ESMTP id BC560153A75 for ; Thu, 15 Mar 2007 16:39:56 -0700 (PDT) Received: from mailhost.hss.caltech.edu ([127.0.0.1]) by localhost (delphi.hss.caltech.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 07579-02 for ; Thu, 15 Mar 2007 16:39:56 -0700 (PDT) Received: from thebes.hss.caltech.edu (thebes.hss.caltech.edu [131.215.23.160]) by mailhost.hss.caltech.edu (Postfix) with ESMTP id 9C3EF153A2E for ; Thu, 15 Mar 2007 16:39:56 -0700 (PDT) Received: by thebes.hss.caltech.edu (Postfix, from userid 1031) id 87CCC98312; Thu, 15 Mar 2007 15:39:56 -0800 (PST) Date: Thu, 15 Mar 2007 23:40:00 -0000 From: Ben Klemens To: gsl-discuss@sourceware.org Subject: Sample skew and kurtosis Message-ID: <20070315233956.GF1287@thebes.hss.caltech.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i Mailing-List: contact gsl-discuss-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gsl-discuss-owner@sourceware.org X-SW-Source: 2007-q1/txt/msg00047.txt.bz2 And while I'm writing in, I thought I'd mention a little anomaly in the skew and kurtosis calculations. The documentation defines the kurtosis as kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4) - 3, and similarly for the skew. This is inconsistent. \Hat\sigma and \Hat\mu are based on a sample, meaning that the unbiased estimate involves \sum(...)/(n-1), as opposed to the population variance, which involves \sum(...)/n. The same holds for the kurtosis and skew: if you have a sample and not a population, then the unbiased estimate is of the form \sum(...)/(n-1). But the above starts with 1/n, meaning we have population kurtosis normalized by sample variance squared. If we have to choose only one kurtosis and skew function, it should probably be the sample and not the population version. The fix is trivial: just return kurtosis * n/(n+1.0) at the end of kurtosis_m_sd, and similarly for skew. Regards, BK