From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gsl-discuss-return-5051-listarch-gsl-discuss=sources.redhat.com@sourceware.org>
Received: (qmail 19089 invoked by alias); 15 Mar 2007 23:40:06 -0000
Received: (qmail 18808 invoked by uid 22791); 15 Mar 2007 23:40:04 -0000
X-Spam-Check-By: sourceware.org
Received: from delphi.hss.caltech.edu (HELO mailhost.hss.caltech.edu) (131.215.23.131)
    by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 15 Mar 2007 23:39:58 +0000
Received: from localhost (localhost.localdomain [127.0.0.1])
	by mailhost.hss.caltech.edu (Postfix) with ESMTP id BC560153A75
	for <gsl-discuss@sourceware.org>; Thu, 15 Mar 2007 16:39:56 -0700 (PDT)
Received: from mailhost.hss.caltech.edu ([127.0.0.1])
 by localhost (delphi.hss.caltech.edu [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 07579-02 for <gsl-discuss@sourceware.org>;
 Thu, 15 Mar 2007 16:39:56 -0700 (PDT)
Received: from thebes.hss.caltech.edu (thebes.hss.caltech.edu [131.215.23.160])
	by mailhost.hss.caltech.edu (Postfix) with ESMTP id 9C3EF153A2E
	for <gsl-discuss@sourceware.org>; Thu, 15 Mar 2007 16:39:56 -0700 (PDT)
Received: by thebes.hss.caltech.edu (Postfix, from userid 1031)
	id 87CCC98312; Thu, 15 Mar 2007 15:39:56 -0800 (PST)
Date: Thu, 15 Mar 2007 23:40:00 -0000
From: Ben Klemens <klemens@hss.caltech.edu>
To: gsl-discuss@sourceware.org
Subject: Sample skew and kurtosis
Message-ID: <20070315233956.GF1287@thebes.hss.caltech.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.1i
Mailing-List: contact gsl-discuss-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gsl-discuss.sourceware.org>
List-Subscribe: <mailto:gsl-discuss-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gsl-discuss/>
List-Post: <mailto:gsl-discuss@sourceware.org>
List-Help: <mailto:gsl-discuss-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gsl-discuss-owner@sourceware.org
X-SW-Source: 2007-q1/txt/msg00047.txt.bz2

And while I'm writing in, I thought I'd mention a little anomaly in the
skew and kurtosis calculations. The documentation defines the kurtosis as 
kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4)  - 3,
and similarly for the skew.

This is inconsistent. \Hat\sigma and \Hat\mu are based on a sample,
meaning that the unbiased estimate involves \sum(...)/(n-1), as opposed
to the population variance, which involves \sum(...)/n.

The same holds for the kurtosis and skew: if you have a sample and not a
population, then the unbiased estimate is of the form \sum(...)/(n-1). But
the above starts with 1/n, meaning we have population kurtosis normalized
by sample variance squared.

If we have to choose only one kurtosis and skew function, it should
probably be the sample and not the population version. The fix is trivial:
just return kurtosis * n/(n+1.0) at the end of kurtosis_m_sd, and
similarly for skew.

Regards,

BK