public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
From: fche@redhat.com (Frank Ch. Eigler)
To: "Richard W.M. Jones" <rjones@redhat.com>
Cc: Josh Stone <jistone@redhat.com>, systemtap@sourceware.org
Subject: Re: Rapidly running systemtap causing hangs or oops
Date: Thu, 23 Jun 2011 14:13:00 -0000	[thread overview]
Message-ID: <y0m1uykvdo2.fsf@fche.csb> (raw)
In-Reply-To: <20110623075126.GJ803@amd.home.annexia.org> (Richard W. M. Jones's message of "Thu, 23 Jun 2011 08:51:26 +0100")


Hi, Richard -

rjones wrote:

> [...]
>> Can you try running stap with "-D STP_ALIBI"?  This alibi mode compiles
>> out most of stap's code, so each probe handler is reduced to just an
>> atomic increment, then a final hit count is reported on exit.

> Adding -D STP_ALIBI [...]  did not change the behaviour.  The mount
> process crashed quickly with the oops below:
> [  159.454020]  [<ffffffffa00d0a3b>] ext2_fill_super+0x9b5/0xc3b [ext2]
> [  159.454020]  [<ffffffff8113a0df>] mount_bdev+0x155/0x1b7
> [  159.454020]  [<ffffffffa00d0086>] ? ext2_error+0x112/0x112 [ext2]
> [...]

OK, that does seem to implicate the kernel or our registration /
unregistration process.  Telling which is a bit tricky because the
kernel's own 'perf probe' widget cannot register/unregister as many
probes as quickly as we can, which means that if the kernel has race
conditions in all that text-segment manipulation, we are more likely
to hit it than e.g. perf.  Such has happened before, and it's tough to
diagnose.

An intermediate option is to extract all the kprobe addresses from the
"stap -p2" processing loop, and modify systemtap source-tree
scripts/kprobes_test/gen_code.py to take a symbol+offset list rather
than just a symbol list, to generate a non-systemtap pure-kprobes
module.  Then one could insmod;test;rmmod in a tight loop to see if
the same problem reappears.  At that point, one punts to the kernel
folks.

Another hacky intermediate possibility is to put some deliberate
time delays here and there, like between your while true; do stap; done 
loop iterations.  Or disable runtime/autoconf-unregister-kprobes.c, so
stap doesn't use the kernel bulk-unregistration functions but rather
goes one by one.

- FChE

  parent reply	other threads:[~2011-06-23 14:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-22 23:00 Richard W.M. Jones
2011-06-22 23:52 ` Josh Stone
2011-06-23  7:51   ` Richard W.M. Jones
2011-06-23 10:12     ` Flushing systemtap output without restarting (was: Re: Rapidly running systemtap causing hangs or oops) Richard W.M. Jones
2011-06-23 12:45       ` Richard W.M. Jones
2011-06-23 14:13     ` Frank Ch. Eigler [this message]
2011-06-23 16:16     ` Rapidly running systemtap causing hangs or oops Josh Stone
2011-06-23 16:28       ` Richard W.M. Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=y0m1uykvdo2.fsf@fche.csb \
    --to=fche@redhat.com \
    --cc=jistone@redhat.com \
    --cc=rjones@redhat.com \
    --cc=systemtap@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).