public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
From: "mcermak at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: systemtap@sourceware.org
Subject: [Bug runtime/11308] aggregate operations for @variance, @skew, @kurtosis
Date: Fri, 03 Jun 2016 11:41:00 -0000	[thread overview]
Message-ID: <bug-11308-6586-VjKZU5jfpz@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-11308-6586@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=11308

--- Comment #1 from Martin Cermak <mcermak at redhat dot com> ---
Created attachment 9311
  --> https://sourceware.org/bugzilla/attachment.cgi?id=9311&action=edit
proposed patch

The variance of N data points is V = S / (N - 1) where S is the sum of squares
of the deviations from the mean.  Here is an attempt to implement @variance()
operator using Knuth's algorithm [1]:

=======
def online_variance(data):
    n = 0
    mean = 0.0
    M2 = 0.0

    for x in data:
        n += 1
        delta = x - mean
        mean += delta/n
        M2 += delta*(x - mean)

    if n < 2:
        return float('nan')
    else:
        return M2 / (n - 1)
=======

This patch is based on current systemtap implementation of the aggregation
operators, which first pre-aggregates the data per each CPU (__stp_stat_add()),
and then, when the aggregations are actually being read via e.g. @sum (or
@variance), they are aggregated again, this time across all the CPUs
(_stp_stat_get()) and outputted.  This approach saves shared resources at the
collection time.  So, in this patch, per cpu variances are being collected
first and then they are being aggregated again across all the CPUs to give the
resulting @variance.  The N is assumed to be N >> 1 and so the resulting
@variance() is being computed as a simple mean of per-cpu variances.  Integer
arithmetic is being used.  With this patch, we get something relatively small
for data points closely spread along the mean, and something relatively big for
data points widely spread along the mean.  So it passes a rough sanity test:

=======
# stap -e 'global a probe oneshot { for(i=0; i<1000; i++) { a<<<42 } }  probe
end { printdln(", ", @count(a), @max(a), @variance(a)) }'
1000, 42, 1
# stap -e 'global a probe oneshot { for(i=0; i<1000; i++) { a<<<42 } for(i=0;
i<20; i++) { a<<<99 } }  probe end { printdln(", ", @count(a), @max(a),
@variance(a)) }'
1020, 99, 65
# stap -e 'global a probe oneshot { for(i=0; i<1000; i++) { a<<<i } }  probe
end { printdln(" ", @count(a), @max(a), @variance(a)) }'
1000 999 332833
# 
=======


-------------------------------------------
[1]
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm

-- 
You are receiving this mail because:
You are the assignee for the bug.

  parent reply	other threads:[~2016-06-03 11:41 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-11308-6586@http.sourceware.org/bugzilla/>
2016-05-31 13:25 ` mcermak at redhat dot com
2016-06-03 11:41 ` mcermak at redhat dot com [this message]
2016-06-17 23:44 ` fche at redhat dot com
2016-06-28 11:57 ` mcermak at redhat dot com
2016-06-28 11:58 ` mcermak at redhat dot com
2016-06-28 15:13 ` fche at redhat dot com
2016-08-01 14:38 ` mcermak at redhat dot com
2016-08-01 14:40 ` mcermak at redhat dot com
2016-08-01 15:37 ` fche at redhat dot com
2016-08-03 17:16 ` mcermak at redhat dot com
2016-09-07 17:54 ` mcermak at redhat dot com
2016-09-08 13:11 ` mcermak at redhat dot com
2016-09-08 13:18 ` fche at redhat dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-11308-6586-VjKZU5jfpz@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=systemtap@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).