[Bug runtime/20820] another "soft lockup" BUG on RHEL7 ppc64

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

From: "mcermak at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: systemtap@sourceware.org
Subject: [Bug runtime/20820] another "soft lockup" BUG on RHEL7 ppc64
Date: Thu, 24 Nov 2016 16:06:00 -0000	[thread overview]
Message-ID: <bug-20820-6586-VjiPuXHzHF@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-20820-6586@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=20820

--- Comment #4 from Martin Cermak <mcermak at redhat dot com> ---
I'll be inexact but terse:  The testcase ensures, that the aggregation operator
'<<<' works faster for stats with only computionally simple operators like e.g.
@count, then for stats with computionally complex operators like @variance. 
For more verbose description, please, refer to [1].

Currently we support 6 stat operators: @count, @sum, @min, @max, @avg, and
@variance.  The optimization in question is based on GCC optimizing the
following inlined function based on its parameters:

=======
$ grep -A 33 __stp_stat_add runtime/stat-common.c
static inline void __stp_stat_add(Hist st, stat_data *sd, int64_t val,
                                  int stat_op_count, int stat_op_sum, int
stat_op_min,
                                  int stat_op_max, int stat_op_variance)
{
        int n;
        int delta = 0;

        sd->shift = st->bit_shift;
        sd->stat_ops = st->stat_ops;
        if (sd->count == 0) {
                sd->count = 1;
                sd->sum = sd->min = sd->max = val;
                sd->avg_s = val << sd->shift;
                sd->_M2 = 0;
        } else {
                if(stat_op_count)
                        sd->count++;
                if(stat_op_sum)
                        sd->sum += val;
                if (stat_op_min && (val > sd->max))
                        sd->max = val;
                if (stat_op_max && (val < sd->min))
                        sd->min = val;
                /*
                 * Below, we use Welford's online algorithm for computing
variance.
                 *
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
                 */
                if (stat_op_variance) {
                    delta = (val << sd->shift) - sd->avg_s;
                    sd->avg_s += _stp_div64(NULL, delta, sd->count);
                    sd->_M2 += delta * ((val << sd->shift) - sd->avg_s);
                    sd->variance_s = (sd->count < 2) ? -1 : _stp_div64(NULL,
sd->_M2, (sd->count - 1));
                }
        }
$ 
=======

For example, if @variance isn't being used with given stat, stat_op_variance is
set to 0, and GCC is expected to optimize respective computations out.  Looking
at the above code snippet, it's easy to see, that the effect of optimizing the
@variance computations out is much more significant then the effect of
optimizing out the other stat op computations.  Of course, the effect of such
optimizations is also architecture and compiler dependent.

The testcase tries to detect all the optimizations and confirm they are there. 
Detecting optimizations for @count, @sum, @min, @max and @avg is relatively
tricky.  It's hard to distinguish their optimization effect from the noise. 
The test results are of a low quality and the test generates lots of load.  On
the other hand, detecting and verifying the @variance optimization is
relatively simple, testing this makes pretty good sense.

I've been running the testcase right now in its original form, and it gives all
expected passes for most of the rhel 6 and 7 supported arches. But sometimes
'kernel:NMI watchdog: BUG: soft lockup' errors are happening.  However, this
was using the testsuite serial mode, which certainly gives better results then
the parallel mode.

So, I propose to drop the first subtest (optim_stats1.stp) for @count, @sum,
@min, @max optimizations altogether, since it's "not so much fun for a lot of
money",  but to keep the second subtest (optim_stats2.stp) for the @variance
optimization.  Also the high count of iterations in optim_stats2.stp can be
lowered down (the values were copied from optim_stats1.stp, but appear to be
unnecessarily high).  Following seems to help:

=======
$ git diff
diff --git a/testsuite/systemtap.base/optim_stats.exp
b/testsuite/systemtap.base/optim_stats.exp
index e46de40..1955853 100644
--- a/testsuite/systemtap.base/optim_stats.exp
+++ b/testsuite/systemtap.base/optim_stats.exp
@@ -8,7 +8,7 @@ if {![installtest_p]} {
     return
 }

-for {set i 1} {$i <= 2} {incr i} {
+for {set i 2} {$i <= 2} {incr i} {
     foreach runtime [get_runtime_list] {
        if {$runtime != ""} {
            spawn stap --runtime=$runtime -g --suppress-time-limits
$srcdir/$subdir/$test$i.stp
diff --git a/testsuite/systemtap.base/optim_stats2.stp
b/testsuite/systemtap.base/optim_stats2.stp
index 53bbc69..65fe06d 100644
--- a/testsuite/systemtap.base/optim_stats2.stp
+++ b/testsuite/systemtap.base/optim_stats2.stp
@@ -2,9 +2,9 @@
  * Analogy to optim_stats1.stp, but for pmaps.  See optim_stats1.stp for
comments.
  */

-@define RANDCNT %( 200000 %)
+@define RANDCNT %( 2000 %)
 @define RANDMAX %( 1000 %)
-@define ITERS %( 1500 %)
+@define ITERS %( 15 %)

 @define feed(agg, tagg)
 %(
$ 
=======

Thoughts?

------------------------
[1]
https://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=blob;f=testsuite/systemtap.base/optim_stats1.stp;h=2144b7bb210ee8f0c620487ac63fffba14e0d1bf;hb=HEAD

-- 
You are receiving this mail because:
You are the assignee for the bug.

next prev parent reply	other threads:[~2016-11-24 16:06 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-14 18:57 [Bug runtime/20820] New: " dsmith at redhat dot com
2016-11-16 21:26 ` [Bug runtime/20820] " dsmith at redhat dot com
2016-11-18 17:22 ` dsmith at redhat dot com
2016-11-18 19:01 ` dsmith at redhat dot com
2016-11-18 20:15 ` mcermak at redhat dot com
2016-11-24 16:06 ` mcermak at redhat dot com [this message]
2016-11-24 17:30 ` fche at redhat dot com
2016-11-28 16:55 ` dsmith at redhat dot com
2016-11-28 17:48 ` mcermak at redhat dot com
2016-11-29 13:47 ` mcermak at redhat dot com
2016-11-29 13:48 ` mcermak at redhat dot com
2016-11-30 19:10 ` dsmith at redhat dot com
2016-12-01 13:04 ` mcermak at redhat dot com
2016-12-01 16:11 ` dsmith at redhat dot com
2016-12-01 16:44 ` dsmith at redhat dot com
2016-12-01 19:29 ` jistone at redhat dot com
2016-12-01 20:38 ` dsmith at redhat dot com
2016-12-01 20:52 ` jistone at redhat dot com
2016-12-02 15:01 ` mcermak at redhat dot com
2016-12-02 17:30 ` dsmith at redhat dot com
2017-02-08 17:05 ` dsmith at redhat dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-20820-6586-VjiPuXHzHF@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=systemtap@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).