From: "Frank Ch. Eigler" <fche@redhat.com>
To: ksummit-2008-discuss@lists.linux-foundation.org
Cc: systemtap@sources.redhat.com
Subject: DTrace
Date: Mon, 30 Jun 2008 13:57:00 -0000 [thread overview]
Message-ID: <20080630010423.GA7068@redhat.com> (raw)
Hi -
Please forgive me for "crashing" the discussion party here. I would
like to clarify some systemtap-related issues that people have raised.
(I'm one of its developers.) I'll just list individual points,
roughly in order they were raised. For a fuller treatment of any of
the topics, please involve our public <systemtap@sources.redhat.com>
mailing list.
* postgres, other dtrace-probe-instrumented userspace programs
We aim to piggyback on these efforts by reusing the dtrace
instrumentation calls embedded into postgres etc., if at all
possible.
* "klunky and prone to break in unexpected ways"
There's a germ of truth there, but OTOH the case James ran into
involved complications beyond normal symbolic debugging too
(possibly having to search separately compiled modules for
definitions of opaque struct-pointer types). We're working on it;
our bug/feature list is in public bugzilla.
* "unhappy week with dwarf"
Guilty as charged. :-)
* kprobes, markers
Performance of kprobes-based probes is about 1 us per hit overhead.
Markers are on the order of tens of nanoseconds, which makes a huge
difference for frequently-hit probes. We'd be happy to interface to
other event sources like ftrace or whatever, as long as they provide
suitable kernel-module-accessible APIs.
* user-space probing
We're finally getting very close in this. Yes, it'd use the IBM
uprobes prototype above the Red Hat utrace work as a lower layer,
which we hope get upstream as soon as possible. It will behave
analogously to dtrace: executing probes in kernel space. If it can
be made safe (and we think it can), it's a huge performance win over
trying to do it in userspace (with some gang of debugging processes
or whatever).
* oprofile
It's a fine special-purpose tool. We hope to hook into the same
sorts of underlying hardware performance counters to enable the same
profiling capability in systemtap - except well integrated with the
rest of the probing events / scripts. perfmon2 upstream would be
very helpful.
* dtrace "just works"
Yeah, so I hear, but think about how different their target
environment is. Their kernel hardly changes (several fixed APIs,
ABIs): this has huge implications. Their kernel was willing to
insert probes (~ markers), a bunch of build system changes (debug
info subset transcribing). Here in linux land, we suffer
multifaceted tensions and it is hard to go toward a goal without
obstructions (well-meaning as they may be).
A bunch of third-party scripts are often conflated with "dtrace",
which is just a matter of growing the user community enough, and
giving them a good tool to build on top of. A growing set of
runnable end-user scripts is already packaged with systemtap,
intended for use by nonexperts, more help (e.g. concise problem
statements about what you'd like to measure/see) would be welcome.
* integrating systemtap runtime into kernel
We did some analysis about how much of the runtime code contains
novel & relevant code to the kernel. We came up with a fraction
like 20% (IIRC; still searching for a link to the thread). Some of
the code is indeed in need of some cleanup love.
Some of it has been necessary to work around kernel disruptions
(e.g., unexporting stuff like kallsyms_lookup). The parts that are
deeply kernel-version-sensitive (and would thus benefit from your
maintenance) are quite small. We're still open to trying to pursue
copying/upstreaming some of this code into the kernel.
* tapsets
Theodore is mistaken that we are deflecting the job of tapset (probe
macro; abstracting architecture and kernel version-change -
$foo->bar->baz, function names) authorship. We have asked for help,
and have received a little, but the group has in fact authored a
growing collection of this stuff.
We would welcome having tapsets be included with the kernel and
cared for by you guys.
* debuginfo
Yes, it's very helpful & necessary if one wants to place probes at
just about any statement and extract just about any data value.
It's the same prerequisite that crash or kgdb would have, since we
operate at a similar level of object/source code visibility. Other
distros are learning to package this admittedly bulky data up, so
it'll be a matter of a largish download for distro users. Kernel
developers will of course have the data generated locally already.
We've recently gained the ability to work on symbol table level data
only. It's a compromise technology: it shrinks the installation
footprint but we get only function-entry probes; we lose data
typing; can only get at ABI-dictated positional integral arguments.
* systemtap building
The only thing unusual with building the thing is the use of the
elfutils library to parse elf/dwarf data; links to that are provided
and one can link to a private copy if the system lacks it.
* systemtap releases
True, we've been spotty with formal releases, though they are
archived and available, and we're moving to a more regular release
schedule very shortly. The weekly snapshots have been good (except
a recent unfortunate regression that hits 2.6.25 kernels
particularly badly - that's holding up the new release plans).
Thanks for reading; sorry about the length.
- FChE
next reply other threads:[~2008-06-30 1:05 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-30 13:57 Frank Ch. Eigler [this message]
2008-06-30 19:00 ` [Ksummit-2008-discuss] DTrace Grant Grundler
2008-06-30 19:40 ` Theodore Tso
2008-06-30 20:00 ` Frank Ch. Eigler
2008-06-30 20:19 ` Theodore Tso
2008-06-30 21:12 ` Arnaldo Carvalho de Melo
2008-06-30 23:02 ` David Miller
2008-06-30 21:13 ` James Bottomley
2008-06-30 22:10 ` Frank Ch. Eigler
2008-07-01 2:42 ` Theodore Tso
2008-07-01 7:08 ` Roland McGrath
2008-07-01 10:15 ` Theodore Tso
2008-07-01 11:04 ` Sam Ravnborg
2008-07-01 12:13 ` Theodore Tso
2008-07-02 20:27 ` Sam Ravnborg
2008-07-01 20:06 ` Roland McGrath
2008-07-01 23:13 ` Theodore Tso
2008-07-02 2:23 ` Frank Ch. Eigler
2008-07-02 19:27 ` Frank Ch. Eigler
2008-07-02 21:40 ` Potential Systemtap topics for the Kernel Summit Theodore Tso
2008-07-02 21:51 ` [Ksummit-2008-discuss] " Jonathan Corbet
2008-07-02 23:41 ` Arnaldo Carvalho de Melo
2008-07-02 22:38 ` Masami Hiramatsu
2008-07-02 22:54 ` [Ksummit-2008-discuss] " Stephen Hemminger
2008-07-03 0:44 ` Ulrich Drepper
2008-07-03 1:02 ` H. Peter Anvin
2008-07-03 1:50 ` Theodore Tso
2008-07-03 1:51 ` Ulrich Drepper
2008-07-02 20:08 ` [Ksummit-2008-discuss] DTrace Joel Becker
2008-07-02 20:17 ` J. Bruce Fields
2008-07-02 20:41 ` Frank Ch. Eigler
2008-07-02 21:19 ` H. Peter Anvin
2008-07-02 21:30 ` Theodore Tso
2008-07-02 21:46 ` J. Bruce Fields
2008-07-05 9:46 ` Peter Zijlstra
2008-07-05 10:07 ` Christoph Hellwig
2008-07-05 12:12 ` Frank Ch. Eigler
2008-07-05 18:08 ` Christoph Hellwig
2008-07-05 13:50 ` James Bottomley
2008-07-05 18:08 ` Christoph Hellwig
2008-07-05 18:05 ` K.Prasad
2008-07-07 14:36 ` Christoph Hellwig
2008-07-07 17:44 ` K.Prasad
2008-07-05 12:34 ` Theodore Tso
2008-07-01 5:29 ` Ananth N Mavinakayanahalli
2008-06-30 19:59 ` James Bottomley
2008-06-30 23:52 ` Masami Hiramatsu
2008-07-08 23:32 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080630010423.GA7068@redhat.com \
--to=fche@redhat.com \
--cc=ksummit-2008-discuss@lists.linux-foundation.org \
--cc=systemtap@sources.redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).