public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* DTrace
@ 2008-06-30 13:57 Frank Ch. Eigler
  2008-06-30 19:00 ` [Ksummit-2008-discuss] DTrace Grant Grundler
                   ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Frank Ch. Eigler @ 2008-06-30 13:57 UTC (permalink / raw)
  To: ksummit-2008-discuss; +Cc: systemtap


Hi -


Please forgive me for "crashing" the discussion party here.  I would
like to clarify some systemtap-related issues that people have raised.
(I'm one of its developers.)  I'll just list individual points,
roughly in order they were raised.  For a fuller treatment of any of
the topics, please involve our public <systemtap@sources.redhat.com>
mailing list.


* postgres, other dtrace-probe-instrumented userspace programs

  We aim to piggyback on these efforts by reusing the dtrace
  instrumentation calls embedded into postgres etc., if at all
  possible.

* "klunky and prone to break in unexpected ways"

  There's a germ of truth there, but OTOH the case James ran into
  involved complications beyond normal symbolic debugging too
  (possibly having to search separately compiled modules for
  definitions of opaque struct-pointer types).  We're working on it;
  our bug/feature list is in public bugzilla.

* "unhappy week with dwarf"

  Guilty as charged. :-)

* kprobes, markers

  Performance of kprobes-based probes is about 1 us per hit overhead.
  Markers are on the order of tens of nanoseconds, which makes a huge
  difference for frequently-hit probes.  We'd be happy to interface to
  other event sources like ftrace or whatever, as long as they provide
  suitable kernel-module-accessible APIs.

* user-space probing

  We're finally getting very close in this.  Yes, it'd use the IBM
  uprobes prototype above the Red Hat utrace work as a lower layer,
  which we hope get upstream as soon as possible.  It will behave
  analogously to dtrace: executing probes in kernel space.  If it can
  be made safe (and we think it can), it's a huge performance win over
  trying to do it in userspace (with some gang of debugging processes
  or whatever).

* oprofile

  It's a fine special-purpose tool.  We hope to hook into the same
  sorts of underlying hardware performance counters to enable the same
  profiling capability in systemtap - except well integrated with the
  rest of the probing events / scripts.  perfmon2 upstream would be
  very helpful.

* dtrace "just works"

  Yeah, so I hear, but think about how different their target
  environment is.  Their kernel hardly changes (several fixed APIs,
  ABIs): this has huge implications.  Their kernel was willing to
  insert probes (~ markers), a bunch of build system changes (debug
  info subset transcribing).  Here in linux land, we suffer
  multifaceted tensions and it is hard to go toward a goal without
  obstructions (well-meaning as they may be).

  A bunch of third-party scripts are often conflated with "dtrace",
  which is just a matter of growing the user community enough, and
  giving them a good tool to build on top of.  A growing set of
  runnable end-user scripts is already packaged with systemtap,
  intended for use by nonexperts, more help (e.g. concise problem
  statements about what you'd like to measure/see) would be welcome.

* integrating systemtap runtime into kernel

  We did some analysis about how much of the runtime code contains
  novel & relevant code to the kernel.  We came up with a fraction
  like 20% (IIRC; still searching for a link to the thread).  Some of
  the code is indeed in need of some cleanup love.  

  Some of it has been necessary to work around kernel disruptions
  (e.g., unexporting stuff like kallsyms_lookup).  The parts that are
  deeply kernel-version-sensitive (and would thus benefit from your
  maintenance) are quite small.  We're still open to trying to pursue
  copying/upstreaming some of this code into the kernel.

* tapsets

  Theodore is mistaken that we are deflecting the job of tapset (probe
  macro; abstracting architecture and kernel version-change -
  $foo->bar->baz, function names) authorship.  We have asked for help,
  and have received a little, but the group has in fact authored a
  growing collection of this stuff.

  We would welcome having tapsets be included with the kernel and
  cared for by you guys.

* debuginfo

  Yes, it's very helpful & necessary if one wants to place probes at
  just about any statement and extract just about any data value.
  It's the same prerequisite that crash or kgdb would have, since we
  operate at a similar level of object/source code visibility.  Other
  distros are learning to package this admittedly bulky data up, so
  it'll be a matter of a largish download for distro users. Kernel
  developers will of course have the data generated locally already.

  We've recently gained the ability to work on symbol table level data
  only.  It's a compromise technology: it shrinks the installation
  footprint but we get only function-entry probes; we lose data
  typing; can only get at ABI-dictated positional integral arguments.

* systemtap building

  The only thing unusual with building the thing is the use of the
  elfutils library to parse elf/dwarf data; links to that are provided
  and one can link to a private copy if the system lacks it.

* systemtap releases

  True, we've been spotty with formal releases, though they are
  archived and available, and we're moving to a more regular release
  schedule very shortly.  The weekly snapshots have been good (except
  a recent unfortunate regression that hits 2.6.25 kernels
  particularly badly - that's holding up the new release plans).


Thanks for reading; sorry about the length.


- FChE

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 13:57 DTrace Frank Ch. Eigler
@ 2008-06-30 19:00 ` Grant Grundler
  2008-06-30 19:40 ` Theodore Tso
  2008-06-30 19:59 ` James Bottomley
  2 siblings, 0 replies; 48+ messages in thread
From: Grant Grundler @ 2008-06-30 19:00 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: ksummit-2008-discuss, systemtap

On Sun, Jun 29, 2008 at 6:04 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
> Please forgive me for "crashing" the discussion party here.

Frank,
Excellent reply! Thanks!

Just one observation here:

> * dtrace "just works"
...
> A growing set of
>  runnable end-user scripts is already packaged with systemtap,
>  intended for use by nonexperts, more help (e.g. concise problem
>  statements about what you'd like to measure/see) would be welcome.

Would it be possible to look at prepackaged Dtrace "scripts"
and generate something comparable for systemtap?

I know the kernels are very different but the underlying functionality
(manage resources: CPU, memory, disk, network, etc) is basically
the same. In x86-64 case, it's the same HW doing essentially
the same things.

thanks,
grant

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 13:57 DTrace Frank Ch. Eigler
  2008-06-30 19:00 ` [Ksummit-2008-discuss] DTrace Grant Grundler
@ 2008-06-30 19:40 ` Theodore Tso
  2008-06-30 20:00   ` Frank Ch. Eigler
  2008-07-01  5:29   ` Ananth N Mavinakayanahalli
  2008-06-30 19:59 ` James Bottomley
  2 siblings, 2 replies; 48+ messages in thread
From: Theodore Tso @ 2008-06-30 19:40 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: ksummit-2008-discuss, systemtap

On Sun, Jun 29, 2008 at 09:04:23PM -0400, Frank Ch. Eigler wrote:
> * tapsets
> 
>   Theodore is mistaken that we are deflecting the job of tapset (probe
>   macro; abstracting architecture and kernel version-change -
>   $foo->bar->baz, function names) authorship.  We have asked for help,
>   and have received a little, but the group has in fact authored a
>   growing collection of this stuff.

Well I've heard the line that it's up to the kernel subsystem experts
to write tapsets from Ulrich Drepper (on the ksummit-2008-discuss
list) and from Ananth N Mavinakayanahalli (private communication) so I
think it's fair to say that at least some people associated with
Systemtap have been placing the blame for the lack of tapsets on the
kernel developers.

As far as the growing collection of this stuff?  Where is it?  Do you
mean in the tapsets directory of the systemtap sources in the git
repository?  Is there any documentation or example usage scenarios for
these tapsets?

> * debuginfo
> 
>   Yes, it's very helpful & necessary if one wants to place probes at
>   just about any statement and extract just about any data value.
>   It's the same prerequisite that crash or kgdb would have, since we
>   operate at a similar level of object/source code visibility.  Other
>   distros are learning to package this admittedly bulky data up, so
>   it'll be a matter of a largish download for distro users. Kernel
>   developers will of course have the data generated locally already.

The problem is that kernel developers are often juggling multiple
kernels, so kernel developers need to learn how to package up this
bulky data as well.  And it's not obvious what needs to be done with
for example the modules files, especially if they've been stripped so
they will fit into the /boot partition.  It would be useful if
http://sourceware.org/systemtap/wiki/SystemTapWithSelfBuiltKernel 
was a bit more explicit about exactly what SystemTap expects to find
in SYSTEMTAP_DEBUGINFO_PATH.  I'm sure it's blindly obvious to a Systemtap
developer, but it isn't for someone who is just getting started with
Systemtap, and runs into one brick wall after another.

> * systemtap building
> 
>   The only thing unusual with building the thing is the use of the
>   elfutils library to parse elf/dwarf data; links to that are provided
>   and one can link to a private copy if the system lacks it.

So how do you link to a private copy?  There's nothing in the wiki
that describes this.  It seems to imply that you have to install the
elfutils globally, and I've been hesitant to do this lest it break
things that aren't expecting the latest bleeding edge library.  (I
have no idea whether the elfutils library developers worry about ABI
compatibility for applications dynamically link with the
system-provided elfutils library.)

It would be nice if the Systemtap libraries had some provision where
you could either point to a source directory where the patched
elfutils libraries had been placed, and automatically used them for
static linking, or if you could download the patched elfutils library
into some directory in the Systemtap sources, and if present, the
build system would automatically use them.  This sort of minor thing
makes life much simpler for people who are trying to pull down the
latest Systemtap tree, especially since the Wiki is filled with
assertions (echoed by Ulrich in the recent ksummit-discuss thread)
about how Systemtap is a fast moving project, and why it's absolutely
necessary to grab the latest bleeding edge sources from the git tree.

I'm willing to send patches for this sorts of usability issues if it's
likely such patches would be accepted...

> * systemtap releases
> 
>   True, we've been spotty with formal releases, though they are
>   archived and available, and we're moving to a more regular release
>   schedule very shortly.  The weekly snapshots have been good (except
>   a recent unfortunate regression that hits 2.6.25 kernels
>   particularly badly - that's holding up the new release plans).

Does the regression hit 2.6.26-rc8 kernels?  (i.e., should I not
bother trying Systemtap until this gets cleared up, lest I waste hours
and hours again getting frustrated?)

						- Ted

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 13:57 DTrace Frank Ch. Eigler
  2008-06-30 19:00 ` [Ksummit-2008-discuss] DTrace Grant Grundler
  2008-06-30 19:40 ` Theodore Tso
@ 2008-06-30 19:59 ` James Bottomley
  2008-06-30 23:52   ` Masami Hiramatsu
  2008-07-08 23:32   ` Eric W. Biederman
  2 siblings, 2 replies; 48+ messages in thread
From: James Bottomley @ 2008-06-30 19:59 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: ksummit-2008-discuss, systemtap

On Sun, 2008-06-29 at 21:04 -0400, Frank Ch. Eigler wrote:
> Please forgive me for "crashing" the discussion party here.  I would
> like to clarify some systemtap-related issues that people have raised.
> (I'm one of its developers.)  I'll just list individual points,
> roughly in order they were raised.  For a fuller treatment of any of
> the topics, please involve our public <systemtap@sources.redhat.com>
> mailing list.

It's not a private party ... hence the "discuss" part of the list
naming ...

> * postgres, other dtrace-probe-instrumented userspace programs
> 
>   We aim to piggyback on these efforts by reusing the dtrace
>   instrumentation calls embedded into postgres etc., if at all
>   possible.
> 
> * "klunky and prone to break in unexpected ways"
> 
>   There's a germ of truth there, but OTOH the case James ran into
>   involved complications beyond normal symbolic debugging too
>   (possibly having to search separately compiled modules for
>   definitions of opaque struct-pointer types).  We're working on it;
>   our bug/feature list is in public bugzilla.

Well, let me give you another example, because it tripped me up for
days:  Return probes give access to the entry variables in the state the
routine was entered (not on return).  I ran into it because I was trying
to look at what a routine had done to the scsi command structure which
was passed as an input.

I've also found it very easy to crash the system under probe if you use
the wrong build tree for the running kernel (not a problem, I know that
enterprise customers run into, but a common one for kernel developers).
Since we have a kernel build version that increments with every build,
it would be useful to sanity check the one systemtap pulled out of the
debug with the one in the running kernel.

> * "unhappy week with dwarf"
> 
>   Guilty as charged. :-)
> 
> * kprobes, markers
> 
>   Performance of kprobes-based probes is about 1 us per hit overhead.
>   Markers are on the order of tens of nanoseconds, which makes a huge
>   difference for frequently-hit probes.  We'd be happy to interface to
>   other event sources like ftrace or whatever, as long as they provide
>   suitable kernel-module-accessible APIs.

There were two specific latencies of concern to the financial trading
house type end user: One was the latency from execution to run.  This is
caused mostly by the module build and insertion.  I really can't see
this getting better except by divorcing systemtap from having to use the
whole of the kernel build infrastructure.  To do that, we need to begin
putting a lot of the C fragments that make up that infrastructure into
the kernel to lessen the load.  It would actually be nice finally to get
to the point where you simply link the probe routines with a special
module stub (built as part of the kernel) and insert it.

The other is the probe execution latency.  Since the institutions are
tracing transactions on the order of milliseconds, microsecond latencies
in the probes do give them cause for concern (it only takes a few probe
points to add up to a significant perturbation).

> * user-space probing
> 
>   We're finally getting very close in this.  Yes, it'd use the IBM
>   uprobes prototype above the Red Hat utrace work as a lower layer,
>   which we hope get upstream as soon as possible.  It will behave
>   analogously to dtrace: executing probes in kernel space.  If it can
>   be made safe (and we think it can), it's a huge performance win over
>   trying to do it in userspace (with some gang of debugging processes
>   or whatever).
> 
> * oprofile
> 
>   It's a fine special-purpose tool.  We hope to hook into the same
>   sorts of underlying hardware performance counters to enable the same
>   profiling capability in systemtap - except well integrated with the
>   rest of the probing events / scripts.  perfmon2 upstream would be
>   very helpful.
> 
> * dtrace "just works"
> 
>   Yeah, so I hear, but think about how different their target
>   environment is.  Their kernel hardly changes (several fixed APIs,
>   ABIs): this has huge implications.  Their kernel was willing to
>   insert probes (~ markers), a bunch of build system changes (debug
>   info subset transcribing).  Here in linux land, we suffer
>   multifaceted tensions and it is hard to go toward a goal without
>   obstructions (well-meaning as they may be).

The goal has to be well articulated and agreed to.  Open source is rapid
at progressing towards common goals ... it's when the goals aren't
common that progress gets bogged down.

>   A bunch of third-party scripts are often conflated with "dtrace",
>   which is just a matter of growing the user community enough, and
>   giving them a good tool to build on top of.  A growing set of
>   runnable end-user scripts is already packaged with systemtap,
>   intended for use by nonexperts, more help (e.g. concise problem
>   statements about what you'd like to measure/see) would be welcome.
> 
> * integrating systemtap runtime into kernel
> 
>   We did some analysis about how much of the runtime code contains
>   novel & relevant code to the kernel.  We came up with a fraction
>   like 20% (IIRC; still searching for a link to the thread).  Some of
>   the code is indeed in need of some cleanup love.  
> 
>   Some of it has been necessary to work around kernel disruptions
>   (e.g., unexporting stuff like kallsyms_lookup).  The parts that are
>   deeply kernel-version-sensitive (and would thus benefit from your
>   maintenance) are quite small.  We're still open to trying to pursue
>   copying/upstreaming some of this code into the kernel.

Actually, this one is an example of a wrong approach.  What you're
effectively doing is trying to implement an ABI for staprun in these
files (as well as various helpers for the modules).  The work around for
kallsyms_lookup is pretty horrible as well ... expecially as the kernel
has its own address to symbol string converter.

This is a lot of what needs to be cleaned up and simplified.  The
interface between systemtap and the kernel is essentially a private ABI
and we should treat it as such, so all the helpers for the modules and
the necessary implementers of the ABI should be in kernel ... there
shouldn't be any (if done right) carried around as C fragments with
kernel version ifdefs ...

> * tapsets
> 
>   Theodore is mistaken that we are deflecting the job of tapset (probe
>   macro; abstracting architecture and kernel version-change -
>   $foo->bar->baz, function names) authorship.  We have asked for help,
>   and have received a little, but the group has in fact authored a
>   growing collection of this stuff.
> 
>   We would welcome having tapsets be included with the kernel and
>   cared for by you guys.
> 
> * debuginfo
> 
>   Yes, it's very helpful & necessary if one wants to place probes at
>   just about any statement and extract just about any data value.
>   It's the same prerequisite that crash or kgdb would have, since we
>   operate at a similar level of object/source code visibility.  Other
>   distros are learning to package this admittedly bulky data up, so
>   it'll be a matter of a largish download for distro users. Kernel
>   developers will of course have the data generated locally already.
> 
>   We've recently gained the ability to work on symbol table level data
>   only.  It's a compromise technology: it shrinks the installation
>   footprint but we get only function-entry probes; we lose data
>   typing; can only get at ABI-dictated positional integral arguments.
> 
> * systemtap building
> 
>   The only thing unusual with building the thing is the use of the
>   elfutils library to parse elf/dwarf data; links to that are provided
>   and one can link to a private copy if the system lacks it.

That's true, just: I've done it but it's not exactly easy.  The
necessity of this undocumented --enable-staticdw flag stalled my
attempts to build it for a while.

> * systemtap releases
> 
>   True, we've been spotty with formal releases, though they are
>   archived and available, and we're moving to a more regular release
>   schedule very shortly.  The weekly snapshots have been good (except
>   a recent unfortunate regression that hits 2.6.25 kernels
>   particularly badly - that's holding up the new release plans).
> 
> 
> Thanks for reading; sorry about the length.

James


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 19:40 ` Theodore Tso
@ 2008-06-30 20:00   ` Frank Ch. Eigler
  2008-06-30 20:19     ` Theodore Tso
  2008-07-01  5:29   ` Ananth N Mavinakayanahalli
  1 sibling, 1 reply; 48+ messages in thread
From: Frank Ch. Eigler @ 2008-06-30 20:00 UTC (permalink / raw)
  To: Theodore Tso; +Cc: ksummit-2008-discuss, systemtap

Hi -

On Mon, Jun 30, 2008 at 02:19:59PM -0400, Theodore Tso wrote:
> [...]
> >   Theodore is mistaken that we are deflecting the job of tapset (probe
> >   macro; abstracting architecture and kernel version-change -
> >   $foo->bar->baz, function names) authorship.  We have asked for help,
> >   and have received a little, but the group has in fact authored a
> >   growing collection of this stuff.
> 
> Well I've heard the line that it's up to the kernel subsystem experts
> to write tapsets from Ulrich Drepper (on the ksummit-2008-discuss
> list) and from Ananth N Mavinakayanahalli (private communication) so I
> think it's fair to say that at least some people associated with
> Systemtap have been placing the blame for the lack of tapsets on the
> kernel developers.

We wouldn't talk about blame.


> As far as the growing collection of this stuff?  Where is it?  Do you
> mean in the tapsets directory of the systemtap sources in the git
> repository?  

Yes.

> Is there any documentation or example usage scenarios for these
> tapsets?

Yes, documentation - where exists - is in man pages (stapprobes, ...);
sample usage is in the example scripts, wiki, or the test suite itself.


> > * debuginfo
> > 
> >   Yes, it's very helpful & necessary if one wants to place probes at
> >   just about any statement and extract just about any data value.
> >   It's the same prerequisite that crash or kgdb would have, since we
> >   operate at a similar level of object/source code visibility.  Other
> >   distros are learning to package this admittedly bulky data up, so
> >   it'll be a matter of a largish download for distro users. Kernel
> >   developers will of course have the data generated locally already.
> 
> The problem is that kernel developers are often juggling multiple
> kernels, so kernel developers need to learn how to package up this
> bulky data as well.

They shouldn't have to repackage it at all - just leave it in the
build tree.

> It would be useful if
> http://sourceware.org/systemtap/wiki/SystemTapWithSelfBuiltKernel
> was a bit more explicit about exactly what SystemTap expects to find
> in SYSTEMTAP_DEBUGINFO_PATH.  [...]

That's a good point.  I'll make sure that the recipe for self-built
kernels is more complete.


> > * systemtap building
> > 
> >   The only thing unusual with building the thing is the use of the
> >   elfutils library to parse elf/dwarf data; links to that are provided
> >   and one can link to a private copy if the system lacks it.

> So how do you link to a private copy?  There's nothing in the wiki
> that describes this.  [...]  It would be nice if the Systemtap
> libraries had some provision where you could either point to a
> source directory where the patched elfutils libraries had been
> placed, and automatically used them for static linking,

That's exactly what the "--with-elfutils=DIRECTORY" systemtap autoconf
option does.

> [...] since the Wiki is filled with assertions (echoed by Ulrich in
> the recent ksummit-discuss thread) about how Systemtap is a fast
> moving project, and why it's absolutely necessary to grab the latest
> bleeding edge sources from the git tree.

That's been generally true - but that does not apply to elfutils.
Some of us run with rather old elfutils just fine.

> I'm willing to send patches for this sorts of usability issues if
> it's likely such patches would be accepted...

We would welcome any help with this stuff.

> > * systemtap releases
> > 
> >   True, we've been spotty with formal releases, though they are
> >   archived and available, and we're moving to a more regular release
> >   schedule very shortly.  The weekly snapshots have been good (except
> >   a recent unfortunate regression that hits 2.6.25 kernels
> >   particularly badly - that's holding up the new release plans).
> 
> Does the regression hit 2.6.26-rc8 kernels?  (i.e., should I not
> bother trying Systemtap until this gets cleared up, lest I waste hours
> and hours again getting frustrated?)

Early data suggests it's better under 2.6.26, so I recommend trying it
just once (don't spend hours).  If it fails, then please wait until
the 0.7 release -- or just try the older 0.6.2, which will almost
certainly work fine for you.

- FChE

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 20:00   ` Frank Ch. Eigler
@ 2008-06-30 20:19     ` Theodore Tso
  2008-06-30 21:12       ` Arnaldo Carvalho de Melo
                         ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Theodore Tso @ 2008-06-30 20:19 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: ksummit-2008-discuss, systemtap

On Mon, Jun 30, 2008 at 03:25:33PM -0400, Frank Ch. Eigler wrote:
> > The problem is that kernel developers are often juggling multiple
> > kernels, so kernel developers need to learn how to package up this
> > bulky data as well.
> 
> They shouldn't have to repackage it at all - just leave it in the
> build tree.

The problem is that I am often juggling multiple kernel builds, and so
I don't want to keep the full build tree around.  So I just want to
extract out the specific files needed by Systemtap, especially becuase
they are so bulky.  So normally I actually do create specific packages
for the kernels I use (so I can give them to others or put them on my
server machines if they prove to be stable), and I want to be able to
package up the debuginfo files as well --- and only exactly the
debuginfo files which are needed to make systemtap work.

Stupid question --- has anyone thought about writing tools to strip
out specific debug information not needed by Systemtap?  For example,
I assume systemtap doesn't need the line number information, since you
can't set probes on arbitrary line numbers (and even if you could,
such tapsets would be so brittle that it wouldn't be funny); so would
the debuginfo files be smaller if that information were stripped out?
I understand that this would make the files less useful for
kdump/crash, but for systemtap only users, it might be quite useful.
What about stripping out the text segment of the object files, so you
aren't storing the information twice on disk, or compressing the
debuginfo files so they take up less room on disk?

> > [...] since the Wiki is filled with assertions (echoed by Ulrich in
> > the recent ksummit-discuss thread) about how Systemtap is a fast
> > moving project, and why it's absolutely necessary to grab the latest
> > bleeding edge sources from the git tree.
> 
> That's been generally true - but that does not apply to elfutils.
> Some of us run with rather old elfutils just fine.

Hmm, well it doesn't work with the version of elfutils shipped with
the latest (8.04) Ubuntu.  <Checking to get the exact message
configure blew up with...>  Ah, now it does.  The wiki didn't say
anyting about needing --enable-staticdw, and I see with a recent
commit from last Friday you don't even need to specify
--enable-staticdw any more, it DTRT automatcally.  Nice!  Thanks for
fixing this!

							- Ted

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 20:19     ` Theodore Tso
@ 2008-06-30 21:12       ` Arnaldo Carvalho de Melo
  2008-06-30 23:02         ` David Miller
  2008-06-30 21:13       ` James Bottomley
  2008-06-30 22:10       ` Frank Ch. Eigler
  2 siblings, 1 reply; 48+ messages in thread
From: Arnaldo Carvalho de Melo @ 2008-06-30 21:12 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Frank Ch. Eigler, David S. Miller, ksummit-2008-discuss, systemtap

Em Mon, Jun 30, 2008 at 04:10:31PM -0400, Theodore Tso escreveu:
> On Mon, Jun 30, 2008 at 03:25:33PM -0400, Frank Ch. Eigler wrote:
> > > The problem is that kernel developers are often juggling multiple
> > > kernels, so kernel developers need to learn how to package up this
> > > bulky data as well.
> > 
> > They shouldn't have to repackage it at all - just leave it in the
> > build tree.
> 
> The problem is that I am often juggling multiple kernel builds, and so
> I don't want to keep the full build tree around.  So I just want to
> extract out the specific files needed by Systemtap, especially becuase
> they are so bulky.  So normally I actually do create specific packages
> for the kernels I use (so I can give them to others or put them on my
> server machines if they prove to be stable), and I want to be able to
> package up the debuginfo files as well --- and only exactly the
> debuginfo files which are needed to make systemtap work.
> 
> Stupid question --- has anyone thought about writing tools to strip
> out specific debug information not needed by Systemtap?  For example,
> I assume systemtap doesn't need the line number information, since you
> can't set probes on arbitrary line numbers (and even if you could,
> such tapsets would be so brittle that it wouldn't be funny); so would
> the debuginfo files be smaller if that information were stripped out?
> I understand that this would make the files less useful for
> kdump/crash, but for systemtap only users, it might be quite useful.
> What about stripping out the text segment of the object files, so you
> aren't storing the information twice on disk, or compressing the
> debuginfo files so they take up less room on disk?

Yes, its called CTF, Compressed C Type Format, in DTrace land:

http://opensolaris.org/os/project/ppc-dev/task_map/ctf/

DaveM wrote a CTF loader that I included in my dwarves package, so that
we can pretty-print and use all the other features in pahole on files
with CTF sections, such as the Open Solaris kernel and the userland
binaries, that all ship with CTF embedded, dispensing the usage of
-debuginfo packages, all AFAIK.

In my TODO I have "encode CTF from DWARF and make it a part of the
kernel building process" together with "publish results about the savings
obtained", how much would be added to the kernel image so that we always
ship the, by then compressed debugging information, to the kernel.

I hope to get back to working on this RSN.

- Arnaldo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 20:19     ` Theodore Tso
  2008-06-30 21:12       ` Arnaldo Carvalho de Melo
@ 2008-06-30 21:13       ` James Bottomley
  2008-06-30 22:10       ` Frank Ch. Eigler
  2 siblings, 0 replies; 48+ messages in thread
From: James Bottomley @ 2008-06-30 21:13 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

On Mon, 2008-06-30 at 16:10 -0400, Theodore Tso wrote:
> Stupid question --- has anyone thought about writing tools to strip
> out specific debug information not needed by Systemtap?  For example,
> I assume systemtap doesn't need the line number information, since you
> can't set probes on arbitrary line numbers (and even if you could,
> such tapsets would be so brittle that it wouldn't be funny); so would
> the debuginfo files be smaller if that information were stripped out?
> I understand that this would make the files less useful for
> kdump/crash, but for systemtap only users, it might be quite useful.
> What about stripping out the text segment of the object files, so you
> aren't storing the information twice on disk, or compressing the
> debuginfo files so they take up less room on disk?

Actually, you can ... and I know it's brittle, but I do use this feature
a lot (there's no other way to get at local variables currently than by
specifying a line number through the statement interface).  I believe
the point of the markers project is to add pieces to the kernel that
identify useful (and invariant) internal points in the routines where
you can get at the local variables without having to specify line
numbers.

The debug information is bulky because dwarf is so damn wasteful.
Practically every kernel module will contain the dwarf definition of
certain central structures.  When a debugger works on an executable, it
first of all builds up and in house view of all the dwarf statements,
combining all of the duplicate symbols.  If we could find a way of doing
that for the kernel and then spitting it out as a single file, it would
be far smaller than the debuginfo. Assuming we don't want a monolith,
but actual reduced files (so that modules can be added) we immediately
run across the other annoying thing with dwarf: it has a mechanism to
stub out definitions (DW_AT_declaration) but no way of providing input
about where the real definition is (you now have to search the entire
tree to find it).

Unfortunately, you really have to do these type of reduction tricks, and
strip really just won't do them usefully.

James


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 20:19     ` Theodore Tso
  2008-06-30 21:12       ` Arnaldo Carvalho de Melo
  2008-06-30 21:13       ` James Bottomley
@ 2008-06-30 22:10       ` Frank Ch. Eigler
  2008-07-01  2:42         ` Theodore Tso
  2 siblings, 1 reply; 48+ messages in thread
From: Frank Ch. Eigler @ 2008-06-30 22:10 UTC (permalink / raw)
  To: Theodore Tso; +Cc: ksummit-2008-discuss, systemtap

Hi -

On Mon, Jun 30, 2008 at 04:10:31PM -0400, Theodore Tso wrote:
> [...]
> > They shouldn't have to repackage it at all - just leave it in the
> > build tree.
> 
> The problem is that I am often juggling multiple kernel builds, and so
> I don't want to keep the full build tree around.  So I just want to
> extract out the specific files needed by Systemtap [...]

OK, we'll have to think about how to support that well.

> Stupid question --- has anyone thought about writing tools to strip
> out specific debug information not needed by Systemtap?

Yes, but nothing so simple/workable as to have been done already.

> For example, I assume systemtap doesn't need the line number
> information, since you can't set probes on arbitrary line numbers

Actually, we can - and now with wildcards too if you want
source-line-by-line tracing.  See the top of the NEWS file.

> (and even if you could, such tapsets would be so brittle that it
> wouldn't be funny); [...]

Yes, this is not a good fit for tapsets, but is handy for exploring
one's known version of code.  Also, we can now use relative line
numbers (line #10 within this function), which might be stable enough
for some tapset use.  (This is all very recent stuff, beware.)

> [...]  What about stripping out the text segment of the object
> files, so you aren't storing the information twice on disk, or
> compressing the debuginfo files so they take up less room on disk?

This is roughly what the Fedora/RHEL-style separated .ko.debug files
do, though I don't know if they are that complete.  (They'd need a
copy of the symbol tables, and probably other stuff.)


- FChE

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 21:12       ` Arnaldo Carvalho de Melo
@ 2008-06-30 23:02         ` David Miller
  0 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2008-06-30 23:02 UTC (permalink / raw)
  To: acme; +Cc: tytso, fche, ksummit-2008-discuss, systemtap

From: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Mon, 30 Jun 2008 17:22:33 -0300

> Em Mon, Jun 30, 2008 at 04:10:31PM -0400, Theodore Tso escreveu:
> > On Mon, Jun 30, 2008 at 03:25:33PM -0400, Frank Ch. Eigler wrote:
> > Stupid question --- has anyone thought about writing tools to strip
> > out specific debug information not needed by Systemtap?  For example,
> > I assume systemtap doesn't need the line number information, since you
> > can't set probes on arbitrary line numbers (and even if you could,
> > such tapsets would be so brittle that it wouldn't be funny); so would
> > the debuginfo files be smaller if that information were stripped out?
> > I understand that this would make the files less useful for
> > kdump/crash, but for systemtap only users, it might be quite useful.
> > What about stripping out the text segment of the object files, so you
> > aren't storing the information twice on disk, or compressing the
> > debuginfo files so they take up less room on disk?
> 
> Yes, its called CTF, Compressed C Type Format, in DTrace land:
> 
> http://opensolaris.org/os/project/ppc-dev/task_map/ctf/
> 
> DaveM wrote a CTF loader that I included in my dwarves package, so that
> we can pretty-print and use all the other features in pahole on files
> with CTF sections, such as the Open Solaris kernel and the userland
> binaries, that all ship with CTF embedded, dispensing the usage of
> -debuginfo packages, all AFAIK.

One thing you lose with CTF is the stack unwind tables,
and I don't know if systemtap needs that or not.

If someone can state what the absolute minimum requirement
is for systemtap to be able to analyze a binary properly,
we can figure out if CTF provides it.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 19:59 ` James Bottomley
@ 2008-06-30 23:52   ` Masami Hiramatsu
  2008-07-08 23:32   ` Eric W. Biederman
  1 sibling, 0 replies; 48+ messages in thread
From: Masami Hiramatsu @ 2008-06-30 23:52 UTC (permalink / raw)
  To: James Bottomley; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

Hi,

James Bottomley wrote:
> On Sun, 2008-06-29 at 21:04 -0400, Frank Ch. Eigler wrote:
>> * kprobes, markers
>>
>>   Performance of kprobes-based probes is about 1 us per hit overhead.
>>   Markers are on the order of tens of nanoseconds, which makes a huge
>>   difference for frequently-hit probes.  We'd be happy to interface to
>>   other event sources like ftrace or whatever, as long as they provide
>>   suitable kernel-module-accessible APIs.
> 
> There were two specific latencies of concern to the financial trading
> house type end user: One was the latency from execution to run.  This is
> caused mostly by the module build and insertion.  I really can't see
> this getting better except by divorcing systemtap from having to use the
> whole of the kernel build infrastructure.  To do that, we need to begin
> putting a lot of the C fragments that make up that infrastructure into
> the kernel to lessen the load.  It would actually be nice finally to get
> to the point where you simply link the probe routines with a special
> module stub (built as part of the kernel) and insert it.

I agree, compiling systemtap runtime code to an independent module(or
object file) could reduce building time.
(However, I think it depends on what script you write. if you probe all
of sys_* functions, function searching time becomes long)

> The other is the probe execution latency.  Since the institutions are
> tracing transactions on the order of milliseconds, microsecond latencies
> in the probes do give them cause for concern (it only takes a few probe
> points to add up to a significant perturbation).

Marker has another benefit, it enables you to probe irq handler.
Since Kprobe uses exceptions and isn't recursive, it can't probe
irq related functions. Marker can probe it, because it doesn't use
any exceptions.

[...]
>> * integrating systemtap runtime into kernel
>>
>>   We did some analysis about how much of the runtime code contains
>>   novel & relevant code to the kernel.  We came up with a fraction
>>   like 20% (IIRC; still searching for a link to the thread).  Some of
>>   the code is indeed in need of some cleanup love.  
>>
>>   Some of it has been necessary to work around kernel disruptions
>>   (e.g., unexporting stuff like kallsyms_lookup).  The parts that are
>>   deeply kernel-version-sensitive (and would thus benefit from your
>>   maintenance) are quite small.  We're still open to trying to pursue
>>   copying/upstreaming some of this code into the kernel.
> 
> Actually, this one is an example of a wrong approach.  What you're
> effectively doing is trying to implement an ABI for staprun in these
> files (as well as various helpers for the modules).  The work around for
> kallsyms_lookup is pretty horrible as well ... expecially as the kernel
> has its own address to symbol string converter.
> 
> This is a lot of what needs to be cleaned up and simplified.  The
> interface between systemtap and the kernel is essentially a private ABI
> and we should treat it as such, so all the helpers for the modules and
> the necessary implementers of the ABI should be in kernel ... there
> shouldn't be any (if done right) carried around as C fragments with
> kernel version ifdefs ...

And also, some of them should be isolated from the kernel itself.
For example, systemtap can not call do_gettimeofday() because it
is not recursive. So, now, systemtap has its own time.c.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 22:10       ` Frank Ch. Eigler
@ 2008-07-01  2:42         ` Theodore Tso
  2008-07-01  7:08           ` Roland McGrath
  0 siblings, 1 reply; 48+ messages in thread
From: Theodore Tso @ 2008-07-01  2:42 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: ksummit-2008-discuss, systemtap

On Mon, Jun 30, 2008 at 04:42:19PM -0400, Frank Ch. Eigler wrote:
> > [...]  What about stripping out the text segment of the object
> > files, so you aren't storing the information twice on disk, or
> > compressing the debuginfo files so they take up less room on disk?
> 
> This is roughly what the Fedora/RHEL-style separated .ko.debug files
> do, though I don't know if they are that complete.  (They'd need a
> copy of the symbol tables, and probably other stuff.)

Do you have a pointer to whatever program is used to generate the
Fedora/RHEL-style separated .ko.debug files?

Thanks, regards,

						- Ted

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 19:40 ` Theodore Tso
  2008-06-30 20:00   ` Frank Ch. Eigler
@ 2008-07-01  5:29   ` Ananth N Mavinakayanahalli
  1 sibling, 0 replies; 48+ messages in thread
From: Ananth N Mavinakayanahalli @ 2008-07-01  5:29 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

On Mon, Jun 30, 2008 at 02:19:59PM -0400, Theodore Tso wrote:
> On Sun, Jun 29, 2008 at 09:04:23PM -0400, Frank Ch. Eigler wrote:
> > * tapsets
> > 
> >   Theodore is mistaken that we are deflecting the job of tapset (probe
> >   macro; abstracting architecture and kernel version-change -
> >   $foo->bar->baz, function names) authorship.  We have asked for help,
> >   and have received a little, but the group has in fact authored a
> >   growing collection of this stuff.
> 
> Well I've heard the line that it's up to the kernel subsystem experts
> to write tapsets from Ulrich Drepper (on the ksummit-2008-discuss
> list) and from Ananth N Mavinakayanahalli (private communication) so I
> think it's fair to say that at least some people associated with
> Systemtap have been placing the blame for the lack of tapsets on the
> kernel developers.

I wouldn't call that 'blame'. What I was trying to say simply was that
kernel subsystem experts are best suited to identify the location and
types of data one could extract from their subsystem. I believe that is
also what Ulrich was trying to say.

Apologies if you felt I was blaming anybody.

Ananth

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01  2:42         ` Theodore Tso
@ 2008-07-01  7:08           ` Roland McGrath
  2008-07-01 10:15             ` Theodore Tso
  0 siblings, 1 reply; 48+ messages in thread
From: Roland McGrath @ 2008-07-01  7:08 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

> Do you have a pointer to whatever program is used to generate the
> Fedora/RHEL-style separated .ko.debug files?

It's eu-strip -f (elfutils) or a combination of binutils tools with several
special options (I think it's two objcopy's and a strip or something).

In rpm-based distros, this is done automagically in rpmbuild and driven by
magic macros and shell scripts.  I had the impression Debian also did
parallel -debuginfo packages of the same sort, so I presume some scripts
using either objcopy/strip or eu-strip are buried in that build magic too.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01  7:08           ` Roland McGrath
@ 2008-07-01 10:15             ` Theodore Tso
  2008-07-01 11:04               ` Sam Ravnborg
  2008-07-01 20:06               ` Roland McGrath
  0 siblings, 2 replies; 48+ messages in thread
From: Theodore Tso @ 2008-07-01 10:15 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

On Tue, Jul 01, 2008 at 12:07:46AM -0700, Roland McGrath wrote:
> In rpm-based distros, this is done automagically in rpmbuild and driven by
> magic macros and shell scripts.  I had the impression Debian also did
> parallel -debuginfo packages of the same sort, so I presume some scripts
> using either objcopy/strip or eu-strip are buried in that build magic too.

Debian doesn't have -debuginfo packages, hence my request to get a
pointer at the magic shell script to do the separation.  To the extent
that Systemtap will be used by more people (and hence grow its tapset
collection more quickly) it would be useful if more distributions
could figure out how to deal with the -debuginfo information in a more
sane fashion (where quadroupling or so the space needed in /boot for
each kernel is often not practical :-).

I've pulled apart RHEL's rpm macro magic before, and it's not a
pleasant wading through all of the files; maybe we can teach the
native kernel build infrastructure how to create debuginfo files so
that each distribution doesn't have to re-invent the wheel from
scratch, but rather can reuse common infrastructure in Kbuild....

							- Ted

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 10:15             ` Theodore Tso
@ 2008-07-01 11:04               ` Sam Ravnborg
  2008-07-01 12:13                 ` Theodore Tso
  2008-07-01 20:06               ` Roland McGrath
  1 sibling, 1 reply; 48+ messages in thread
From: Sam Ravnborg @ 2008-07-01 11:04 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Tue, Jul 01, 2008 at 06:15:07AM -0400, Theodore Tso wrote:
> 
> I've pulled apart RHEL's rpm macro magic before, and it's not a
> pleasant wading through all of the files; maybe we can teach the
> native kernel build infrastructure how to create debuginfo files so
> that each distribution doesn't have to re-invent the wheel from
> scratch, but rather can reuse common infrastructure in Kbuild....

What is needed to create debuginfo files?
Seems like a simple thing to integrate in kbuild
if this is per file or per module.

If it is for the kernel as a whole things gets a bit more complex.

	Sam

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 11:04               ` Sam Ravnborg
@ 2008-07-01 12:13                 ` Theodore Tso
  2008-07-02 20:27                   ` Sam Ravnborg
  0 siblings, 1 reply; 48+ messages in thread
From: Theodore Tso @ 2008-07-01 12:13 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Tue, Jul 01, 2008 at 01:05:17PM +0200, Sam Ravnborg wrote:
> On Tue, Jul 01, 2008 at 06:15:07AM -0400, Theodore Tso wrote:
> > 
> > I've pulled apart RHEL's rpm macro magic before, and it's not a
> > pleasant wading through all of the files; maybe we can teach the
> > native kernel build infrastructure how to create debuginfo files so
> > that each distribution doesn't have to re-invent the wheel from
> > scratch, but rather can reuse common infrastructure in Kbuild....
> 
> What is needed to create debuginfo files?
> Seems like a simple thing to integrate in kbuild
> if this is per file or per module.

Well, the simple/stupdiest thing we can do is simply have an alternate
target which installs the modules in

       $(INSTALL_MOD_PATH)/usr/lib/debug/$(KERNELRELEASE)

... while ignoring the INSTALL_MOD_STRIP option.  You may recall that
that I submitted the patch to add INSTALL_MOD_STRIP (commit
ac031f26e); this was from an earlier attempt of mine to use
kdump/systemtap.

RHEL's rpm macro magic does some additional objcopy's which I'll have
to try to ease out to strip out the text segments and only leave the
debug information in debuginfo files, which helps slim them down a
little.  

> 
> If it is for the kernel as a whole things gets a bit more complex.
> 

It would be nice to do this for the base kernel as well (a vmlinux
with strip --strip-debug applied takes only 6 megs in /boot on my
system, but a vmlinux with full debugging information takes 66 megs;
so moving the unstripped vmlinux out of /boot to /usr/lib/debug would
be quite helpful for people who created their /boot partition not
allowing for the rather dramatic increase in size needed for kernels
built with debugging information.)

						- Ted

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 10:15             ` Theodore Tso
  2008-07-01 11:04               ` Sam Ravnborg
@ 2008-07-01 20:06               ` Roland McGrath
  2008-07-01 23:13                 ` Theodore Tso
  1 sibling, 1 reply; 48+ messages in thread
From: Roland McGrath @ 2008-07-01 20:06 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

Like I said, the essential command is eu-strip -f.  It is simple to use.

For one's own local hacking purposes, there is no real reason to bother
with strip-to-file complexities.  You can just copy the unstripped files
before stripping them.  The effect is the same (or easier for you, with
most of the tools), and the extra 6M (stripped) where you have disk space
for the 60M (debuginfo) is never an issue (i.e. if it's 66M unstripped).

In the Fedora packaging, an ELF vmlinux file in /boot is treated the same
as the .ko files (and all installed binaries for any package) and gets the
strip-to-file treatment.  It works the same on ELF executables (be they
kernels or otherwise), DSOs, and .ko's.  There is a special case in the
kernel packaging when what's in /boot is not in ELF format (i.e. bzImage
format and such)--the strip-to-file convention requires having the stripped
ELF file intact and on hand too.  When there won't be any plain ELF vmlinux
in /boot, we just copy the unstripped vmlinux into /usr/src/debug.

I honestly don't think it's ever going to be useful to any distro build to
have kernel makefiles do .debug file splitting.  For purposes of separate
debuginfo, the kernel really isn't a very special package.  The distro
packaging magic needs to do its debuginfo diddling, strip-to-file, and
related cataloguing magic for all packages anyway.  All the packagers have
to do for each individual package is get it to compile with -g and not
strip the binaries it installs.  The packaging hooey takes care of the
rest, and having a package's "make install" try to "do it for you" would
just break everything.  Future distro magic will evolve with newer tools to
pack the .debug file data in different, better ways, etc.  It just is not
going to help packagers to have any version of such logic built into the
kernel build process.

That said, knock yourself out.  I'm glad to answer questions about the
tools.  But we have gone pretty darn far afield from this thread's topic
now.  This does not seem like the logical place to pursue those technical
details of the toolchain.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 20:06               ` Roland McGrath
@ 2008-07-01 23:13                 ` Theodore Tso
  2008-07-02  2:23                   ` Frank Ch. Eigler
                                     ` (3 more replies)
  0 siblings, 4 replies; 48+ messages in thread
From: Theodore Tso @ 2008-07-01 23:13 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

On Tue, Jul 01, 2008 at 01:06:32PM -0700, Roland McGrath wrote:
> Like I said, the essential command is eu-strip -f.  It is simple to use.
> 
> For one's own local hacking purposes, there is no real reason to bother
> with strip-to-file complexities.  You can just copy the unstripped files
> before stripping them.  The effect is the same (or easier for you, with
> most of the tools), and the extra 6M (stripped) where you have disk space
> for the 60M (debuginfo) is never an issue (i.e. if it's 66M unstripped).

Well, actually, it *does* matter, at least to me.  Sometimes when I am
trying to track down a problem, either using git-bisect or evaluating
multiple patches, I might have five, six, seven, eight or more kernels
installed.  And on a number of my systems, the amount of space on the
parititons where /boot and /lib live can't take the space demands of
compiling the kernel and modules with -g.

And remember, for the average kernel developer, the question is
whether using SystemTap is easier than inserting a bunch of printk's
and recompiling.  And one of the major flaws of the Linux's RAS tools
is that the LKML development community doesn't use them; and to the
extent that tapsets would be written more quickly if they are easy for
kernel developers who aren't depending on distro packaging and distro
building of systemtap.  (Especially if systemtap is so fast moving
that people shouldn't depend on stable releases but rather the git
repository or weekly automated snapshots.)  So actually, being able to
install stripped modules and vmlinux into /boot and /linux, and then
being able to put the unstripped binaries somewhere else, without
having to use the !@#@! complicated RPM macros by Fedora/RHEL is
actually **very** important to me.

I don't know how many people considered that a showstopper; but James
mentioned on another thread that figuring out the magic, undocumanted
--enable-staticdw flag hit him as well.  Yes, I know that's been fixed
as of last Friday in the git repository, but again, it's these little
things that cause people to throw up their hands in frustration and
say, "Eh!  I'll just use printk's and recompilations instead; it's
easier."

In the past two years, I've on average tried Systemtap every 9 months
or so, and each time, I'd hit a different annoying roadblock, and then
I was so busy I would move on to a more productive way of solving my
problems.  And I've asked various different Systemtap developers and
architects (mostly inside IBM), and I'd get the same answers that
Ulrich spouted just recently on this list.  "Tapsets?  Yeah, we're
depending on kernel subsystem experts to write them; we don't know how
to get inside the internals of the various subsystems."  "Building it?
Stable releases?  That's a distro problem; just use what your distro
uses."  "Ooooh, sorry, that's an ancient version of Systemtap, blame
your distro provider for doing a sucky job."  And my reaction each
time was, "OK, back to printk debugging; and if you want me to write
tapsets for you, you're in fantasy land."

So I think this issue is very much a potential topic for the kernel
summit, namely --- why is it that so few kernel developers are using
RAS tools like Systemtap, and what can be done to improve this
situation?  Or if the Systemtap team doesn't need any help, and can
write all of these tapsets without kernel developer's participation,
or maybe assume that System administrators can write Systemtap scripts
that do things like:

probe kernel.function ("vfs_write"),
      kernel.function ("vfs_read")
{
  dev_nr = $file->f_dentry->d_inode->i_sb->s_dev
  inode_nr = $file->f_dentry->d_inode->i_ino

  if (dev_nr == ($1 << 20 | $2) # major/minor device
      && inode_nr == $3)
    printf ("%s(%d) %s 0x%x/%u\n",
      execname(), pid(), probefunc(), dev_nr, inode_nr)
}

and still be a credible competition to the audience served by DTrace,
hey, knock yourself out.  But I think there may be a connection
between problems which Systemtap developers seem to continually assert
a Sombody Else's Problem field around, and the lack of uptake by the
LKML community.  Maybe.  Just a guess on my part.

						- Ted

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 23:13                 ` Theodore Tso
@ 2008-07-02  2:23                   ` Frank Ch. Eigler
  2008-07-02 19:27                   ` Frank Ch. Eigler
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 48+ messages in thread
From: Frank Ch. Eigler @ 2008-07-02  2:23 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

Hi -

On Tue, Jul 01, 2008 at 07:13:27PM -0400, Theodore Tso wrote:

> On Tue, Jul 01, 2008 at 01:06:32PM -0700, Roland McGrath wrote:
> > Like I said, the essential command is eu-strip -f.  It is simple to use.
> > 
> > For one's own local hacking purposes, there is no real reason to bother
> > with strip-to-file complexities.  You can just copy the unstripped files
> > before stripping them.  [...]

> Well, actually, it *does* matter, at least to me.  [...]  I might
> have five, six, seven, eight or more kernels installed.  And on a
> number of my systems, the amount of space on the parititons where
> /boot and /lib live can't take the space demands of compiling the
> kernel and modules with -g.

You simply misunderstood Roland's suggestion: that you save the
unstripped copies of vmlinux etc. someplace - anyplace - for
systemtap's use, and that you strip (as normal) the pieces that go
into /boot.  No one is asking you to enlarge your boot partition.


> [...] And one of the major flaws of the Linux's RAS tools is that
> the LKML development community doesn't use them; and to the extent
> that tapsets would be written more quickly if they are easy for
> kernel developers [...]

Point taken (and applies broadly to all the other RAS tools).


> In the past two years, I've on average tried Systemtap every 9
> months or so, and each time, I'd hit a different annoying roadblock,
> and then I was so busy I would move on to a more productive way of
> solving my problems. [...]

Hearing about your problems at the time could well have steered us
toward focusing on their solution.

There has been a bit of a vicious circle in play: apparent lack of
interest from the LKML community drives focus toward on customery
problem areas, which then apparently disappoints (members of) the LKML
community into more disinterst.  Let's break this.


- FChE

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 23:13                 ` Theodore Tso
  2008-07-02  2:23                   ` Frank Ch. Eigler
@ 2008-07-02 19:27                   ` Frank Ch. Eigler
  2008-07-02 21:40                     ` Potential Systemtap topics for the Kernel Summit Theodore Tso
  2008-07-02 20:08                   ` [Ksummit-2008-discuss] DTrace Joel Becker
  2008-07-05  9:46                   ` Peter Zijlstra
  3 siblings, 1 reply; 48+ messages in thread
From: Frank Ch. Eigler @ 2008-07-02 19:27 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

Hi -


On Tue, Jul 01, 2008 at 07:13:27PM -0400, Theodore Tso wrote:

> [...]  And one of the major flaws of the Linux's RAS tools is that
> the LKML development community doesn't use them; and to the extent
> that tapsets would be written more quickly if they are easy for
> kernel developers who aren't depending on distro packaging and
> distro building of systemtap.  [...]

Please excuse my return to this point, but it meshes with something
else:

> probe kernel.function ("vfs_write"),
>       kernel.function ("vfs_read")
> {
>   dev_nr = $file->f_dentry->d_inode->i_sb->s_dev
>   inode_nr = $file->f_dentry->d_inode->i_ino
> 
>   if (dev_nr == ($1 << 20 | $2) # major/minor device
>       && inode_nr == $3)
>     printf ("%s(%d) %s 0x%x/%u\n",
>       execname(), pid(), probefunc(), dev_nr, inode_nr)
> }

So, one way a kernel developer could help write a tapset piece for us
is to encapsulate this into a tapset script fragment:

probe vfs.read = kernel.function ("vfs_read")
  {
    dev_nr = $...expression
    inode_nr = $...expression
  }

Then this definition would be shipped with the kernel or systemtap,
tested in one or the other build system for currency.  (Not by
coincidence, something much like that is already in our tapset, just
lacks those two values.)

Then the end user just does

   probe vfs.read { if (dev_nr != MKDEV(2,3)) printf ("whatever you want to print") }


****  or  ****


Kernel maintainers could add a marker or two right into their C code:

vfs_read() 
{
    /* ... */
    trace_mark (vfs_read, "dev %u inode %u whatever %s",
                          expression1, expression2, whatever);
    /* ... */
}

And that's it.  It's compiled-in, and checked as a part of your
routine builds.  Then the systemtap-side interpration code is trivial,
and anyone can write it.  And it doesn't require debugging data.

   probe vfs.read = kernel.mark("vfs_read") { dev_nr = $arg1; inode_nr = $arg2 }
   probe vfs.read = kernel.mark("vfs_read") { dev_nr = $arg1; inode_nr = $arg2 }


If people could get over the funny look of the markers (since
performance effects have been shown to be negligible), they could make
a significant contribution to this problem, with just a few lines of C
code.


- FChE

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 23:13                 ` Theodore Tso
  2008-07-02  2:23                   ` Frank Ch. Eigler
  2008-07-02 19:27                   ` Frank Ch. Eigler
@ 2008-07-02 20:08                   ` Joel Becker
  2008-07-02 20:17                     ` J. Bruce Fields
  2008-07-05  9:46                   ` Peter Zijlstra
  3 siblings, 1 reply; 48+ messages in thread
From: Joel Becker @ 2008-07-02 20:08 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Tue, Jul 01, 2008 at 07:13:27PM -0400, Theodore Tso wrote:
> And remember, for the average kernel developer, the question is
> whether using SystemTap is easier than inserting a bunch of printk's

	I'll throw in a datapoint here.  I recently had to track a
problem down on a distro kernel, and rebuilding distro kernels takes a
lot of time.  So I decided to try SystemTap.  Once I'd discovered the
magic location of the distro's debuginfo package, systemtap was *WAY*
faster than prink+recompile.  I mean, we're talking 30 second turnaround
between "Oh, I'd like to print this other value" and actually printing
it.  In the core kernel, not a module.  No reboot, no nothin.  This is a
huge win.
	But I'll never replicate that for my normal work at this rate.
I'm usually floating multiple hand-built mainline kernels with new
work.  Just like Ted describes.

> repository or weekly automated snapshots.)  So actually, being able to
> install stripped modules and vmlinux into /boot and /linux, and then
> being able to put the unstripped binaries somewhere else, without
> having to use the !@#@! complicated RPM macros by Fedora/RHEL is
> actually **very** important to me.

	Me too.  I want to be able to say "make install; make
tap_install" in my kernel objdir.  "install" does what it always has
done - no change.  "tap_install" (or whatever) drops things in eg
/lib/modules/<version>/debug such that systemtap Just Works.  It can
error if systemtap isn't installed or is too old.  But I shouldn't have
to build a distro package of my kernel, or even understand the
mechanism for building 'debuginfo' bits (even if I do).

Joel

-- 

"I am working for the time when unqualified blacks, browns, and
 women join the unqualified men in running our overnment."
	- Sissy Farenthold

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-02 20:08                   ` [Ksummit-2008-discuss] DTrace Joel Becker
@ 2008-07-02 20:17                     ` J. Bruce Fields
  2008-07-02 20:41                       ` Frank Ch. Eigler
                                         ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: J. Bruce Fields @ 2008-07-02 20:17 UTC (permalink / raw)
  To: Theodore Tso, Roland McGrath, ksummit-2008-discuss, systemtap

On Wed, Jul 02, 2008 at 01:06:51PM -0700, Joel Becker wrote:
> On Tue, Jul 01, 2008 at 07:13:27PM -0400, Theodore Tso wrote:
> > And remember, for the average kernel developer, the question is
> > whether using SystemTap is easier than inserting a bunch of printk's
> 
> 	I'll throw in a datapoint here.  I recently had to track a
> problem down on a distro kernel, and rebuilding distro kernels takes a
> lot of time.  So I decided to try SystemTap.  Once I'd discovered the
> magic location of the distro's debuginfo package, systemtap was *WAY*
> faster than prink+recompile.  I mean, we're talking 30 second turnaround
> between "Oh, I'd like to print this other value" and actually printing
> it.  In the core kernel, not a module.  No reboot, no nothin.  This is a
> huge win.
> 	But I'll never replicate that for my normal work at this rate.
> I'm usually floating multiple hand-built mainline kernels with new
> work.  Just like Ted describes.
> 
> > repository or weekly automated snapshots.)  So actually, being able to
> > install stripped modules and vmlinux into /boot and /linux, and then
> > being able to put the unstripped binaries somewhere else, without
> > having to use the !@#@! complicated RPM macros by Fedora/RHEL is
> > actually **very** important to me.
> 
> 	Me too.  I want to be able to say "make install; make
> tap_install" in my kernel objdir.  "install" does what it always has
> done - no change.  "tap_install" (or whatever) drops things in eg
> /lib/modules/<version>/debug such that systemtap Just Works.

That would be nice.  But I'm afraid I normally don't even have access to
the kernel tree on the machine I'm installing to--I usually build a
monolithic kernel and then scp it to the test machines.  Is there hope
for me?

--b.

> It can
> error if systemtap isn't installed or is too old.  But I shouldn't have
> to build a distro package of my kernel, or even understand the
> mechanism for building 'debuginfo' bits (even if I do).
> 
> Joel
> 
> -- 
> 
> "I am working for the time when unqualified blacks, browns, and
>  women join the unqualified men in running our overnment."
> 	- Sissy Farenthold
> 
> Joel Becker
> Principal Software Developer
> Oracle
> E-mail: joel.becker@oracle.com
> Phone: (650) 506-8127
> _______________________________________________
> Ksummit-2008-discuss mailing list
> Ksummit-2008-discuss@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/ksummit-2008-discuss

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 12:13                 ` Theodore Tso
@ 2008-07-02 20:27                   ` Sam Ravnborg
  0 siblings, 0 replies; 48+ messages in thread
From: Sam Ravnborg @ 2008-07-02 20:27 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Tue, Jul 01, 2008 at 08:12:43AM -0400, Theodore Tso wrote:
> On Tue, Jul 01, 2008 at 01:05:17PM +0200, Sam Ravnborg wrote:
> > On Tue, Jul 01, 2008 at 06:15:07AM -0400, Theodore Tso wrote:
> > > 
> > > I've pulled apart RHEL's rpm macro magic before, and it's not a
> > > pleasant wading through all of the files; maybe we can teach the
> > > native kernel build infrastructure how to create debuginfo files so
> > > that each distribution doesn't have to re-invent the wheel from
> > > scratch, but rather can reuse common infrastructure in Kbuild....
> > 
> > What is needed to create debuginfo files?
> > Seems like a simple thing to integrate in kbuild
> > if this is per file or per module.
> 
> Well, the simple/stupdiest thing we can do is simply have an alternate
> target which installs the modules in
> 
>        $(INSTALL_MOD_PATH)/usr/lib/debug/$(KERNELRELEASE)
> 
> ... while ignoring the INSTALL_MOD_STRIP option.  You may recall that
> that I submitted the patch to add INSTALL_MOD_STRIP (commit
> ac031f26e); this was from an earlier attempt of mine to use
> kdump/systemtap.
> 
> RHEL's rpm macro magic does some additional objcopy's which I'll have
> to try to ease out to strip out the text segments and only leave the
> debug information in debuginfo files, which helps slim them down a
> little.  
> 
> > 
> > If it is for the kernel as a whole things gets a bit more complex.
> > 
> 
> It would be nice to do this for the base kernel as well (a vmlinux
> with strip --strip-debug applied takes only 6 megs in /boot on my
> system, but a vmlinux with full debugging information takes 66 megs;
> so moving the unstripped vmlinux out of /boot to /usr/lib/debug would
> be quite helpful for people who created their /boot partition not
> allowing for the rather dramatic increase in size needed for kernels
> built with debugging information.)

It all seems quite simple to do with a bit of careful testing.
I can give it a try when I'm properly installed in my new house.
So lets hope someone jump in and do it.

	Sam

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-02 20:17                     ` J. Bruce Fields
@ 2008-07-02 20:41                       ` Frank Ch. Eigler
  2008-07-02 21:19                       ` H. Peter Anvin
  2008-07-02 21:30                       ` Theodore Tso
  2 siblings, 0 replies; 48+ messages in thread
From: Frank Ch. Eigler @ 2008-07-02 20:41 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Theodore Tso, Roland McGrath, ksummit-2008-discuss, systemtap

>> [...]
>> 	Me too.  I want to be able to say "make install; make
>> tap_install" in my kernel objdir.  "install" does what it always has
>> done - no change.  "tap_install" (or whatever) drops things in eg
>> /lib/modules/<version>/debug such that systemtap Just Works.

OK, we'll try to work out something like that soon.


"J. Bruce Fields" <bfields@fieldses.org> writes:

> That would be nice.  But I'm afraid I normally don't even have
> access to the kernel tree on the machine I'm installing to--I
> usually build a monolithic kernel and then scp it to the test
> machines.  Is there hope for me?

You can cross-compile systemtap scripts today, if that kernel at least
is built with CONFIG_MODULES etc.  On your development host:

   % stap -p4 SCRIPT
   % scp RESULT.ko target-machine:
   % ssh root@target-machine staprun RESULT.ko

or something very close to that.  The Fedora/RHEL packages separate a
"systemtap-runtime" piece consisting of one or two small binaries that
need to go onto the target machine.  We're working on a target-side
driven network client/server widget to fully automate this.

- FChE

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-02 20:17                     ` J. Bruce Fields
  2008-07-02 20:41                       ` Frank Ch. Eigler
@ 2008-07-02 21:19                       ` H. Peter Anvin
  2008-07-02 21:30                       ` Theodore Tso
  2 siblings, 0 replies; 48+ messages in thread
From: H. Peter Anvin @ 2008-07-02 21:19 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Theodore Tso, Roland McGrath, ksummit-2008-discuss, systemtap

J. Bruce Fields wrote:
> 
> That would be nice.  But I'm afraid I normally don't even have access to
> the kernel tree on the machine I'm installing to--I usually build a
> monolithic kernel and then scp it to the test machines.  Is there hope
> for me?
> 

I usually don't even do the scp bit.  There is always NFS, I guess.

	-hpa

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-02 20:17                     ` J. Bruce Fields
  2008-07-02 20:41                       ` Frank Ch. Eigler
  2008-07-02 21:19                       ` H. Peter Anvin
@ 2008-07-02 21:30                       ` Theodore Tso
  2008-07-02 21:46                         ` J. Bruce Fields
  2 siblings, 1 reply; 48+ messages in thread
From: Theodore Tso @ 2008-07-02 21:30 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Wed, Jul 02, 2008 at 04:16:37PM -0400, J. Bruce Fields wrote:
> 
> That would be nice.  But I'm afraid I normally don't even have access to
> the kernel tree on the machine I'm installing to--I usually build a
> monolithic kernel and then scp it to the test machines.  Is there hope
> for me?
> 

Well, that's why I build kernels packaged up in .deb format; it's much
easier to scp a .deb file over to a test machine and install it.  The
take-home here is that "make deb-pkg" and "make rpm-pkg" should
ideally create bare-bones linux-image, linux-headers, and
linux-debuginfo packages.  Maybe no distribution will ever use it, but
it will be invaluable for kernel developers.

The good news is that a member of the Debian kernel team member
contacted me off-line and mentioned that he plans to work with Sam to
send a patch series to get "make deb-pkg" working again, and in the
long term with proper debuginfo support.  Maybe someone will be
inspired to enhance "rpm-pkg" to do the right thing.

	    	    	      	    	      - Ted

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Potential Systemtap topics for the Kernel Summit
  2008-07-02 19:27                   ` Frank Ch. Eigler
@ 2008-07-02 21:40                     ` Theodore Tso
  2008-07-02 21:51                       ` [Ksummit-2008-discuss] " Jonathan Corbet
                                         ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Theodore Tso @ 2008-07-02 21:40 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Wed, Jul 02, 2008 at 03:25:19PM -0400, Frank Ch. Eigler wrote:
> So, one way a kernel developer could help write a tapset piece for us
> is to encapsulate this into a tapset script fragment:
> 
> probe vfs.read = kernel.function ("vfs_read")
>   {
>     dev_nr = $...expression
>     inode_nr = $...expression
>   }
> 
> ****  or  ****
> 
> Kernel maintainers could add a marker or two right into their C code:
> 
> {
>     /* ... */
>     trace_mark (vfs_read, "dev %u inode %u whatever %s",
>                           expression1, expression2, whatever);
>     /* ... */
> }

So it sounds like potential Systemtap topics for the kernel summit
might include:

A)  Enhancements to kbuild to better support kernel developers who want
to use systemtap.

B) Discussion of list of tapsets and markers that would be useful for
System administrators wanting to use systemtap.  This is one place
where if someone could volunteer to examine some of the Dtrace
examples and blog entries where Dtrace users have raved about how
Dtrace saved their bacon by instantly identifying some performance
problem, and then assemble a "tapset or marker WANTED" bounty list,
that would be very useful.  One potential problem is that I suspect
kernel developers may not know or have the intuition of what sort of
markers or tapsets would be most useful.  Having a targetted wish list
would be very useful.  (We might then have some discussions about
whether a particular tapset or marker is too hard to maintain, or
represents too much of a performance hit, but at least would be
dealing with concrete requests.)

C) Whether tapsets/markers should be maintained inside the kernel, and
if so, how.

D) What is the right way to do user probes.

Of course, if some of these topics are handled via e-mail before the
kernel summit, even better.  But somehow, I'm guessing there will
still be more to talk about.  :-)

The bottom line is more communication between the kernel and systemtap
developers is a good thing (and getting more kernel developers to use
systemtap would be a good start).  And I do want to make sure I get
across that I wasn't trying to imply that all of the work and changes
should happen on the systemtap side.  In fact, if you look at some of
the topics that have come up on this thread, more than few of them
involve changes in the kernel side....

Does this sound reasonable?

					- Ted

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-02 21:30                       ` Theodore Tso
@ 2008-07-02 21:46                         ` J. Bruce Fields
  0 siblings, 0 replies; 48+ messages in thread
From: J. Bruce Fields @ 2008-07-02 21:46 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Wed, Jul 02, 2008 at 05:29:28PM -0400, Theodore Tso wrote:
> On Wed, Jul 02, 2008 at 04:16:37PM -0400, J. Bruce Fields wrote:
> > 
> > That would be nice.  But I'm afraid I normally don't even have access to
> > the kernel tree on the machine I'm installing to--I usually build a
> > monolithic kernel and then scp it to the test machines.  Is there hope
> > for me?
> > 
> 
> Well, that's why I build kernels packaged up in .deb format; it's much
> easier to scp a .deb file over to a test machine and install it.  The
> take-home here is that "make deb-pkg" and "make rpm-pkg" should
> ideally create bare-bones linux-image, linux-headers, and
> linux-debuginfo packages.  Maybe no distribution will ever use it, but
> it will be invaluable for kernel developers.

Oh, sure, that's a good idea.

One other minor consideration: I do sometimes find myself building on my
laptop while away on a slow network, and wanting to install the result
on a test machine back at the office.  So it's nice if the resulting
debs or rpms aren't too much larger than they need to be.

Though scp'ing a kernel in that situation is already pretty slow, so
I've mostly gotten used to doing a git push to a machine in the same
room as the test machines, then building there.

--b.

> The good news is that a member of the Debian kernel team member
> contacted me off-line and mentioned that he plans to work with Sam to
> send a patch series to get "make deb-pkg" working again, and in the
> long term with proper debuginfo support.  Maybe someone will be
> inspired to enhance "rpm-pkg" to do the right thing.
> 
> 	    	    	      	    	      - Ted

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] Potential Systemtap topics for the  Kernel Summit
  2008-07-02 21:40                     ` Potential Systemtap topics for the Kernel Summit Theodore Tso
@ 2008-07-02 21:51                       ` Jonathan Corbet
  2008-07-02 23:41                         ` Arnaldo Carvalho de Melo
  2008-07-02 22:38                       ` Masami Hiramatsu
  2008-07-02 22:54                       ` [Ksummit-2008-discuss] " Stephen Hemminger
  2 siblings, 1 reply; 48+ messages in thread
From: Jonathan Corbet @ 2008-07-02 21:51 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Frank Ch. Eigler, ksummit-2008-discuss, Roland McGrath, systemtap

On Wed, 2 Jul 2008 17:39:38 -0400
Theodore Tso <tytso@mit.edu> wrote:

> The bottom line is more communication between the kernel and systemtap
> developers is a good thing (and getting more kernel developers to use
> systemtap would be a good start).

And that's perhaps the most important part of the discussion, in a way:
how can we get rid of the wall between the kernel and systemtap
developers and get everybody working toward the goal of solving this
persistent problem?  If we can do that, the rest will work out.

jon

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Potential Systemtap topics for the Kernel Summit
  2008-07-02 21:40                     ` Potential Systemtap topics for the Kernel Summit Theodore Tso
  2008-07-02 21:51                       ` [Ksummit-2008-discuss] " Jonathan Corbet
@ 2008-07-02 22:38                       ` Masami Hiramatsu
  2008-07-02 22:54                       ` [Ksummit-2008-discuss] " Stephen Hemminger
  2 siblings, 0 replies; 48+ messages in thread
From: Masami Hiramatsu @ 2008-07-02 22:38 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Frank Ch. Eigler, Roland McGrath, ksummit-2008-discuss, systemtap

Theodore Tso wrote:
> On Wed, Jul 02, 2008 at 03:25:19PM -0400, Frank Ch. Eigler wrote:
>> So, one way a kernel developer could help write a tapset piece for us
>> is to encapsulate this into a tapset script fragment:
>>
>> probe vfs.read = kernel.function ("vfs_read")
>>   {
>>     dev_nr = $...expression
>>     inode_nr = $...expression
>>   }
>>
>> ****  or  ****
>>
>> Kernel maintainers could add a marker or two right into their C code:
>>
>> {
>>     /* ... */
>>     trace_mark (vfs_read, "dev %u inode %u whatever %s",
>>                           expression1, expression2, whatever);
>>     /* ... */
>> }
> 
> So it sounds like potential Systemtap topics for the kernel summit
> might include:
> 
> A)  Enhancements to kbuild to better support kernel developers who want
> to use systemtap.
> 
> B) Discussion of list of tapsets and markers that would be useful for
> System administrators wanting to use systemtap.  This is one place
> where if someone could volunteer to examine some of the Dtrace
> examples and blog entries where Dtrace users have raved about how
> Dtrace saved their bacon by instantly identifying some performance
> problem, and then assemble a "tapset or marker WANTED" bounty list,
> that would be very useful.  One potential problem is that I suspect
> kernel developers may not know or have the intuition of what sort of
> markers or tapsets would be most useful.  Having a targetted wish list
> would be very useful.  (We might then have some discussions about
> whether a particular tapset or marker is too hard to maintain, or
> represents too much of a performance hit, but at least would be
> dealing with concrete requests.)

I think, not only tapset, but also (a part of) runtime and building script
should be merged into kernel tree. Some of them depends on kernel features
and currently we uses autoconf-like method for checking whether we can use
those features. Obviously, this method has a limitation. :-(

> C) Whether tapsets/markers should be maintained inside the kernel, and
> if so, how.

We have been discussing how we can push markers into upstream on LKML.
http://lkml.org/lkml/2008/6/2/463
http://lkml.org/lkml/2008/6/20/250

as far as I know, current discussion point of markers are;
- Interface Ugliness
  Markers needs printf-style format string, and some people
  complains that is ugly.
- KABI exposure
  some Markers expose internal kernel structures to user space,
  it is not useful because those internal structures rapidly change.
  And also, might we treat it as stable ABIs?
- Maintenance issue
  We might have to consider maintenance cost of markers which are
  scattered throughout the kernel. If those markers perpetuate
  the logic and code, it can prevent linux's evolution.
  And subsystem maintainers might want not to get involved
  in maintenance of markers embedded in their code.

For resolving first and second issues technically, Mathieu
proposed tracepoint layer.
http://lkml.org/lkml/2008/6/25/426

I think we still have maintenance issue. So we need to discuss
clear policies for marker maintenance which don't prevent
linux evolution.

For example,
- Subsystem maintainers or developers can freely modify
  or remove trace points by their patch, if needed.
- Marker/Tracepoint maintainers should follow upstream changes.
- No one can NAK a patch from the reason that it
  modifies or removes trace points.
etc.

> 
> D) What is the right way to do user probes.
> 
> Of course, if some of these topics are handled via e-mail before the
> kernel summit, even better.  But somehow, I'm guessing there will
> still be more to talk about.  :-)
> 
> The bottom line is more communication between the kernel and systemtap
> developers is a good thing (and getting more kernel developers to use
> systemtap would be a good start).

Yeah, we need to have a party to talk each other. :-P


Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] Potential Systemtap topics for the  Kernel Summit
  2008-07-02 21:40                     ` Potential Systemtap topics for the Kernel Summit Theodore Tso
  2008-07-02 21:51                       ` [Ksummit-2008-discuss] " Jonathan Corbet
  2008-07-02 22:38                       ` Masami Hiramatsu
@ 2008-07-02 22:54                       ` Stephen Hemminger
  2008-07-03  0:44                         ` Ulrich Drepper
  2 siblings, 1 reply; 48+ messages in thread
From: Stephen Hemminger @ 2008-07-02 22:54 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Frank Ch. Eigler, ksummit-2008-discuss, Roland McGrath, systemtap

On Wed, 2 Jul 2008 17:39:38 -0400
Theodore Tso <tytso@mit.edu> wrote:

> On Wed, Jul 02, 2008 at 03:25:19PM -0400, Frank Ch. Eigler wrote:
> > So, one way a kernel developer could help write a tapset piece for us
> > is to encapsulate this into a tapset script fragment:
> > 
> > probe vfs.read = kernel.function ("vfs_read")
> >   {
> >     dev_nr = $...expression
> >     inode_nr = $...expression
> >   }
> > 
> > ****  or  ****
> > 
> > Kernel maintainers could add a marker or two right into their C code:
> > 
> > {
> >     /* ... */
> >     trace_mark (vfs_read, "dev %u inode %u whatever %s",
> >                           expression1, expression2, whatever);
> >     /* ... */
> > }

On a related topic. Why not figure out a way to embed the tapsets
into the kernel source (a.l.a Docbook)? If the tapsets are maintained in
a far-far away repository or even in a separate directory, then kernel
developers are sure to break tapsets all the time. But if the tapset
is "in your face" when dealing with code, then there is a chance of
keeping both up to date.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] Potential Systemtap topics for the  Kernel Summit
  2008-07-02 21:51                       ` [Ksummit-2008-discuss] " Jonathan Corbet
@ 2008-07-02 23:41                         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 48+ messages in thread
From: Arnaldo Carvalho de Melo @ 2008-07-02 23:41 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Theodore Tso, Frank Ch. Eigler, ksummit-2008-discuss,
	Roland McGrath, systemtap

Em Wed, Jul 02, 2008 at 03:50:50PM -0600, Jonathan Corbet escreveu:
> On Wed, 2 Jul 2008 17:39:38 -0400
> Theodore Tso <tytso@mit.edu> wrote:
> 
> > The bottom line is more communication between the kernel and systemtap
> > developers is a good thing (and getting more kernel developers to use
> > systemtap would be a good start).
> 
> And that's perhaps the most important part of the discussion, in a way:
> how can we get rid of the wall between the kernel and systemtap
> developers and get everybody working toward the goal of solving this
> persistent problem?  If we can do that, the rest will work out.

Reducing the cost every potential user of the debugging information has
to bear just because it may need the insurance (DWARF -> CTF) probably
can also be an acceptance driver. And there are so many more cool stuff
we can do if we have type information always available... :)

- Arnaldo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] Potential Systemtap topics for the  Kernel  Summit
  2008-07-02 22:54                       ` [Ksummit-2008-discuss] " Stephen Hemminger
@ 2008-07-03  0:44                         ` Ulrich Drepper
  2008-07-03  1:02                           ` H. Peter Anvin
  0 siblings, 1 reply; 48+ messages in thread
From: Ulrich Drepper @ 2008-07-03  0:44 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Theodore Tso, Frank Ch. Eigler, ksummit-2008-discuss,
	Roland McGrath, systemtap

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Stephen Hemminger wrote:
> On a related topic. Why not figure out a way to embed the tapsets
> into the kernel source (a.l.a Docbook)?

Sure, why not.  The only problem is to come up with a notation which
isn't too invasive.  Frank's example showed one problem tapsets are
supposed to solve: make parameters available under a name which doesn't
change.  Another goal for them is to make commonly used expressions
derived from the state when entering the function available (e.g., inode
belonging to a file descriptor).

This kind of information might need a single line for each convenience
variable defined in the tapset.  So, it should be manageable.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkhsIMoACgkQ2ijCOnn/RHThzgCeMIIR9KnHvgygE5/cCTdQfwfK
1u8AnAy9iqIb7XzgzWifMG0aWlZATjHR
=hNt1
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] Potential Systemtap topics for the Kernel  Summit
  2008-07-03  0:44                         ` Ulrich Drepper
@ 2008-07-03  1:02                           ` H. Peter Anvin
  2008-07-03  1:50                             ` Theodore Tso
  2008-07-03  1:51                             ` Ulrich Drepper
  0 siblings, 2 replies; 48+ messages in thread
From: H. Peter Anvin @ 2008-07-03  1:02 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Stephen Hemminger, ksummit-2008-discuss, Roland McGrath, systemtap

Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Stephen Hemminger wrote:
>> On a related topic. Why not figure out a way to embed the tapsets
>> into the kernel source (a.l.a Docbook)?
> 
> Sure, why not.  The only problem is to come up with a notation which
> isn't too invasive.  Frank's example showed one problem tapsets are
> supposed to solve: make parameters available under a name which doesn't
> change.  Another goal for them is to make commonly used expressions
> derived from the state when entering the function available (e.g., inode
> belonging to a file descriptor).
> 
> This kind of information might need a single line for each convenience
> variable defined in the tapset.  So, it should be manageable.
> 

They wouldn't even have to be embedded in the C code directly (unless 
that makes them easier to write by being in situ), but even just having 
them as separate files in the kernel tarball should make keeping them in 
sync easier.

	-hpa

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] Potential Systemtap topics for the  Kernel Summit
  2008-07-03  1:02                           ` H. Peter Anvin
@ 2008-07-03  1:50                             ` Theodore Tso
  2008-07-03  1:51                             ` Ulrich Drepper
  1 sibling, 0 replies; 48+ messages in thread
From: Theodore Tso @ 2008-07-03  1:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ulrich Drepper, ksummit-2008-discuss, Stephen Hemminger,
	Roland McGrath, systemtap

On Wed, Jul 02, 2008 at 06:01:56PM -0700, H. Peter Anvin wrote:
> They wouldn't even have to be embedded in the C code directly (unless 
> that makes them easier to write by being in situ), but even just having 
> them as separate files in the kernel tarball should make keeping them in 
> sync easier.

Sure, as long as we have some automated way to make sure the tapsets
aren't broken.  But that should be relatively easy --- just have a
Makefile rule that collects all of the tapsets and runs them through
stap.

						- Ted

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] Potential Systemtap topics for the Kernel  Summit
  2008-07-03  1:02                           ` H. Peter Anvin
  2008-07-03  1:50                             ` Theodore Tso
@ 2008-07-03  1:51                             ` Ulrich Drepper
  1 sibling, 0 replies; 48+ messages in thread
From: Ulrich Drepper @ 2008-07-03  1:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Stephen Hemminger, ksummit-2008-discuss, Roland McGrath, systemtap

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

H. Peter Anvin wrote:
> They wouldn't even have to be embedded in the C code directly (unless
> that makes them easier to write by being in situ), but even just having
> them as separate files in the kernel tarball should make keeping them in
> sync easier.

Right, that' always possible.  I think Stephen meant his proposal to be
an additional improvement for the sake of maintainability.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkhsMH0ACgkQ2ijCOnn/RHRyLACdFGuNPsg+FFot2ACNgfw6WFCj
eroAn3tvHfyhR1DGuAxtBKU4gKg0AhPo
=NLDf
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-01 23:13                 ` Theodore Tso
                                     ` (2 preceding siblings ...)
  2008-07-02 20:08                   ` [Ksummit-2008-discuss] DTrace Joel Becker
@ 2008-07-05  9:46                   ` Peter Zijlstra
  2008-07-05 10:07                     ` Christoph Hellwig
  2008-07-05 12:34                     ` Theodore Tso
  3 siblings, 2 replies; 48+ messages in thread
From: Peter Zijlstra @ 2008-07-05  9:46 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Tue, 2008-07-01 at 19:13 -0400, Theodore Tso wrote:

.. snip Ted talking about how using Systemtap is a pain ..

Well said. Until the point where using Systemtap is less work than
recompile and boot I'm not remotely interested in even looking at it.

Also, it would be really great if you could write probes in regular C,
some pseudo C language just messes up my mind.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05  9:46                   ` Peter Zijlstra
@ 2008-07-05 10:07                     ` Christoph Hellwig
  2008-07-05 12:12                       ` Frank Ch. Eigler
                                         ` (2 more replies)
  2008-07-05 12:34                     ` Theodore Tso
  1 sibling, 3 replies; 48+ messages in thread
From: Christoph Hellwig @ 2008-07-05 10:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Theodore Tso, ksummit-2008-discuss, Roland McGrath, systemtap

On Sat, Jul 05, 2008 at 11:44:09AM +0200, Peter Zijlstra wrote:
> Also, it would be really great if you could write probes in regular C,
> some pseudo C language just messes up my mind.

You can write probes in plain C, in fact I do this all the time.  what's
missing is a nice and easy to use channel to get the traces to userspace
and interpret them, and helper for poking at kernel data structures.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05 10:07                     ` Christoph Hellwig
@ 2008-07-05 12:12                       ` Frank Ch. Eigler
  2008-07-05 18:08                         ` Christoph Hellwig
  2008-07-05 13:50                       ` James Bottomley
  2008-07-05 18:05                       ` K.Prasad
  2 siblings, 1 reply; 48+ messages in thread
From: Frank Ch. Eigler @ 2008-07-05 12:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, ksummit-2008-discuss, Roland McGrath, systemtap

Hi -

On Sat, Jul 05, 2008 at 12:05:36PM +0200, Christoph Hellwig wrote:
> > Also, it would be really great if you could write probes in regular C,
> > some pseudo C language just messes up my mind.
> 
> You can write probes in plain C, in fact I do this all the time.  what's
> missing is a nice and easy to use channel to get the traces to userspace
> and interpret them, and helper for poking at kernel data structures.

Perhaps something like like this:


probe kernel.function("foobar") // or kernel.statement("*@dir/file.c:222")
{
    probe_me_harder ($var1, $ptr->field)
}

%{
#include "linux/something.h"
%}
function probe_me_harder (v1, f1) 
%{
   struct something *s = (struct something*) THIS->v1;
   struct something_else *w = (struct something_else*) THIS->f1;
   call_some_safe_kernel_function__be_careful_out_there (s, w);
   _stp_printf ("I did something fun with %p and %p\n", s, w);
%}


# stap -g probe.stp
I did something fun with 0xdick and 0xjane.
^C


- FChE

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05  9:46                   ` Peter Zijlstra
  2008-07-05 10:07                     ` Christoph Hellwig
@ 2008-07-05 12:34                     ` Theodore Tso
  1 sibling, 0 replies; 48+ messages in thread
From: Theodore Tso @ 2008-07-05 12:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Roland McGrath, ksummit-2008-discuss, systemtap

On Sat, Jul 05, 2008 at 11:44:09AM +0200, Peter Zijlstra wrote:
> On Tue, 2008-07-01 at 19:13 -0400, Theodore Tso wrote:
> 
> .. snip Ted talking about how using Systemtap is a pain ..
> 
> Well said. Until the point where using Systemtap is less work than
> recompile and boot I'm not remotely interested in even looking at it.

To be fair, Systemtap has gotten a lot better.  About a week ago or so
the latest version in its git tree actually passes the "download,
configure, make" test on Ubuntu Hardy.  Apparently previous to that
you needed to know about the magic --enable-staticdw configure option,
but its build system has been made much smarter now, which is good.

Kernel compiles with symbols *do* take longer, since DWARF is just
such a bloated method of storing debugging information, and so the
files get much, much bigger.  However, if you are worried about your
root partition filling up due to bloaded modules taking up a huge
amount of space, you can do this: "make INSTALL_MOD_STRIP=1 modules_install;
make INSTALL_MOD_PATH=/usr/lib/debug modules_install".

I have this partially automated with some patches to make-kpkg, on my
Ubuntu system.  (Available on request).  The real issue is that there
is a bunch of basic usability issues that aren't well documented, or
better yet, automated.  As a result, people hit these things, get
frustrated, and then go back to printk.  They're not going to file
bugs in some bugzilla; they're not going to switch their distribution
to Fedora; they're just going to move on to something more productive.

The really wierd thing is that this might make sense if Systemtap were
primarily targetted at system administrators, and not kernel
developers.  If this were so, making a bunch of installation issues
distro-specific, and only worrying about distro-supplied kernels and
not about kernel developer workflow might make sense.  But given that
a common excuse I've heard for the paucity of tapsets is "well, we
need the kernel developer's expertise", and the fact that Systemtap
has various UI decisions that really seem to make it much more
optimized for developers (there are various things that in Dtrace can
be done with a single command line, but which in Systemtap requires a
much longer scripts, which is bad for system administrators who find
it much easier to use DTrace as a result), this just seems to be a
very puzzling gap.

> Also, it would be really great if you could write probes in regular C,
> some pseudo C language just messes up my mind.

You can.  Just surround the code with %{ and %} symbols.  In fact, you
have to do this a lot in tapsets.

							- Ted

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05 10:07                     ` Christoph Hellwig
  2008-07-05 12:12                       ` Frank Ch. Eigler
@ 2008-07-05 13:50                       ` James Bottomley
  2008-07-05 18:08                         ` Christoph Hellwig
  2008-07-05 18:05                       ` K.Prasad
  2 siblings, 1 reply; 48+ messages in thread
From: James Bottomley @ 2008-07-05 13:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, ksummit-2008-discuss, Roland McGrath, systemtap

On Sat, 2008-07-05 at 12:05 +0200, Christoph Hellwig wrote:
> On Sat, Jul 05, 2008 at 11:44:09AM +0200, Peter Zijlstra wrote:
> > Also, it would be really great if you could write probes in regular C,
> > some pseudo C language just messes up my mind.
> 
> You can write probes in plain C, in fact I do this all the time.  what's
> missing is a nice and easy to use channel to get the traces to userspace
> and interpret them, and helper for poking at kernel data structures.

To be fair, you can simply "just write" entry (jprobes) and return
probes (kretprobes).  For the entry probes, if you want access to the
function arguments you need to know the deep magic of the calling
conventions of your platform (pretty easy on x86, though).  However,
what you can't just write are the arbitrary kprobes in file x line y
because you need to know all the nasty details of dwarf to have a clue
what the absolute address is and where all the local varaibles you're
trying to look at are.  Now ... is using systemtap to do this easier
than printk?  For me, yes, since a recompile reboot sequence takes quite
a while, perhaps for someone with a faster machine ...

For getting information back, as you know, systemtap uses relayfs.  It's
not the most friendly or efficient thing in the world, so I'm happy to
have it wrappered by systemtap ...

James


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05 10:07                     ` Christoph Hellwig
  2008-07-05 12:12                       ` Frank Ch. Eigler
  2008-07-05 13:50                       ` James Bottomley
@ 2008-07-05 18:05                       ` K.Prasad
  2008-07-07 14:36                         ` Christoph Hellwig
  2 siblings, 1 reply; 48+ messages in thread
From: K.Prasad @ 2008-07-05 18:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, Theodore Tso, ksummit-2008-discuss,
	Roland McGrath, systemtap

On Sat, Jul 05, 2008 at 12:05:36PM +0200, Christoph Hellwig wrote:
> On Sat, Jul 05, 2008 at 11:44:09AM +0200, Peter Zijlstra wrote:
> > Also, it would be really great if you could write probes in regular C,
> > some pseudo C language just messes up my mind.
> 
> You can write probes in plain C, in fact I do this all the time.  what's
> missing is a nice and easy to use channel to get the traces to userspace
> and interpret them, and helper for poking at kernel data structures.
>
As you might be aware the "trace" interface which is part of the -mm
tree was meant to satisfy such needs.

Moreover, an enhancement over the "trace" interface introduced in the form
of relay_printk() [for string output] and relay_dump() [for binary output]
was meant to give the user a printk()-like interface, but with the
advantages of using relay (such as per-cpu buffers) is still waiting for
acceptance and use-cases (Refer http://lkml.org/lkml/2008/5/28/212). A
typical example illustrating the brevity of the code when using the
relay_printk() interface can be found in the
samples/trace/fork_new_trace.c file in the patch.

Thanks,
K.Prasad
 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05 13:50                       ` James Bottomley
@ 2008-07-05 18:08                         ` Christoph Hellwig
  0 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2008-07-05 18:08 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Peter Zijlstra, ksummit-2008-discuss,
	Roland McGrath, systemtap

On Sat, Jul 05, 2008 at 08:49:49AM -0500, James Bottomley wrote:
> To be fair, you can simply "just write" entry (jprobes) and return
> probes (kretprobes).  For the entry probes, if you want access to the
> function arguments you need to know the deep magic of the calling
> conventions of your platform (pretty easy on x86, though).

For jprobes your probe just has the same signature as the probed
function, and for kretprobes you use the regs_return_value()
arch-provided helper.  It's really quite easy and covers 90% of what
I need as a kernel developer.  Well, except for the cases where gcc
gets to smart and inlines enormous callchains, but at least in XFS
we've just delcared everything noinline..

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05 12:12                       ` Frank Ch. Eigler
@ 2008-07-05 18:08                         ` Christoph Hellwig
  0 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2008-07-05 18:08 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Christoph Hellwig, Peter Zijlstra, ksummit-2008-discuss,
	Roland McGrath, systemtap

On Sat, Jul 05, 2008 at 08:10:39AM -0400, Frank Ch. Eigler wrote:
> Hi -
> 
> On Sat, Jul 05, 2008 at 12:05:36PM +0200, Christoph Hellwig wrote:
> > > Also, it would be really great if you could write probes in regular C,
> > > some pseudo C language just messes up my mind.
> > 
> > You can write probes in plain C, in fact I do this all the time.  what's
> > missing is a nice and easy to use channel to get the traces to userspace
> > and interpret them, and helper for poking at kernel data structures.
> 
> Perhaps something like like this:

But that means I need to pull in all the systemtap crap, which is exatly
what I want to avoid.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-05 18:05                       ` K.Prasad
@ 2008-07-07 14:36                         ` Christoph Hellwig
  2008-07-07 17:44                           ` K.Prasad
  0 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2008-07-07 14:36 UTC (permalink / raw)
  To: K.Prasad
  Cc: Christoph Hellwig, Peter Zijlstra, Theodore Tso,
	ksummit-2008-discuss, Roland McGrath, systemtap

On Sat, Jul 05, 2008 at 11:34:32PM +0530, K.Prasad wrote:
> > You can write probes in plain C, in fact I do this all the time.  what's
> > missing is a nice and easy to use channel to get the traces to userspace
> > and interpret them, and helper for poking at kernel data structures.
> >
> As you might be aware the "trace" interface which is part of the -mm
> tree was meant to satisfy such needs.

From the interface POV it's a step in the right direction.  But to make
adhoc kprobe tracing viable for anyone but hardcore kernel hackers we
basically want a /debug/trace always availble so that the traces just
need to relay_printk() on it.  We also want some helpers to encode
complex strucures.  See the current printk on steroids discussion on
lkml which would be pretty helpful for it.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-07-07 14:36                         ` Christoph Hellwig
@ 2008-07-07 17:44                           ` K.Prasad
  0 siblings, 0 replies; 48+ messages in thread
From: K.Prasad @ 2008-07-07 17:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, Theodore Tso, ksummit-2008-discuss,
	Roland McGrath, systemtap

On Sun, Jul 06, 2008 at 02:36:47PM +0200, Christoph Hellwig wrote:
> On Sat, Jul 05, 2008 at 11:34:32PM +0530, K.Prasad wrote:
> > > You can write probes in plain C, in fact I do this all the time.  what's
> > > missing is a nice and easy to use channel to get the traces to userspace
> > > and interpret them, and helper for poking at kernel data structures.
> > >
> > As you might be aware the "trace" interface which is part of the -mm
> > tree was meant to satisfy such needs.
> 
> From the interface POV it's a step in the right direction.  But to make
> adhoc kprobe tracing viable for anyone but hardcore kernel hackers we
> basically want a /debug/trace always availble so that the traces just
> need to relay_printk() on it.

The relay_printk() creates a directory structure that contains control
and data files under the 'debugfs' mounted path, while the user is kept
oblivious of the internals that are required to set it up. The user
simply does a relay_printk(relay_structure, "my desired string") and the
output is sent to the data file(s).

> We also want some helpers to encode
> complex strucures.  See the current printk on steroids discussion on
> lkml which would be pretty helpful for it.

On a related note, one of the improvements envisaged for the
"relay_printk()" interface was the ability to have user-defined
callback functions - which would  be invoked before the "trace"
infrastructure actually accesses the data for output.
Typical uses for the same, could range from defining call-backs for
obtaining/releasing locks on each protected variable without the user
having to bother about the same ... or to prefix/append a given data
with user-defined string automatically through the callback function
everytime before being output.

Such callback functions may also be used to encode/decode data into/from
complex data structures before they are sent to user-space without the
need to explicitly program them before every *printk(). Say something
like - also print i_ino when printing i_count etc. Not sure if the
example represents your case of a need for "helpers to encode complex
structures".

Thanks,
K.Prasad

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Ksummit-2008-discuss] DTrace
  2008-06-30 19:59 ` James Bottomley
  2008-06-30 23:52   ` Masami Hiramatsu
@ 2008-07-08 23:32   ` Eric W. Biederman
  1 sibling, 0 replies; 48+ messages in thread
From: Eric W. Biederman @ 2008-07-08 23:32 UTC (permalink / raw)
  To: James Bottomley; +Cc: Frank Ch. Eigler, ksummit-2008-discuss, systemtap

James Bottomley <James.Bottomley@HansenPartnership.com> writes:

> I've also found it very easy to crash the system under probe if you use
> the wrong build tree for the running kernel (not a problem, I know that
> enterprise customers run into, but a common one for kernel developers).
> Since we have a kernel build version that increments with every build,
> it would be useful to sanity check the one systemtap pulled out of the
> debug with the one in the running kernel.

Interesting.  I wonder how much the DTrace in kernel interpreter
protects them from that.

>> * dtrace "just works"
>> 
>>   Yeah, so I hear, but think about how different their target
>>   environment is.  Their kernel hardly changes (several fixed APIs,
>>   ABIs): this has huge implications.  Their kernel was willing to
>>   insert probes (~ markers), a bunch of build system changes (debug
>>   info subset transcribing).  Here in linux land, we suffer
>>   multifaceted tensions and it is hard to go toward a goal without
>>   obstructions (well-meaning as they may be).
>
> The goal has to be well articulated and agreed to.  Open source is rapid
> at progressing towards common goals ... it's when the goals aren't
> common that progress gets bogged down.

In addition to a well articulated goal.  A feature poor implementation
that works and gives people lots of itches to scratch helps.

What is the goal with SystemTap?

The goal with Dtrace (and I am badly paraphrasing) is to allow a system
level look into what is happening.

There was a talk at OLS in 2006 entitled "Why user space sucks."
Which pointed out all of the silly things that were happening during
boot from a system level perspective.  How do we make that kind of
tracing easy with SystemTap?

Those are the interesting questions, and in general they shouldn't
need complicated tapsets and other prebuilt knowledge.  It is the
global view of how things are connected that looks most interesting.

Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2008-07-08 23:32 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-30 13:57 DTrace Frank Ch. Eigler
2008-06-30 19:00 ` [Ksummit-2008-discuss] DTrace Grant Grundler
2008-06-30 19:40 ` Theodore Tso
2008-06-30 20:00   ` Frank Ch. Eigler
2008-06-30 20:19     ` Theodore Tso
2008-06-30 21:12       ` Arnaldo Carvalho de Melo
2008-06-30 23:02         ` David Miller
2008-06-30 21:13       ` James Bottomley
2008-06-30 22:10       ` Frank Ch. Eigler
2008-07-01  2:42         ` Theodore Tso
2008-07-01  7:08           ` Roland McGrath
2008-07-01 10:15             ` Theodore Tso
2008-07-01 11:04               ` Sam Ravnborg
2008-07-01 12:13                 ` Theodore Tso
2008-07-02 20:27                   ` Sam Ravnborg
2008-07-01 20:06               ` Roland McGrath
2008-07-01 23:13                 ` Theodore Tso
2008-07-02  2:23                   ` Frank Ch. Eigler
2008-07-02 19:27                   ` Frank Ch. Eigler
2008-07-02 21:40                     ` Potential Systemtap topics for the Kernel Summit Theodore Tso
2008-07-02 21:51                       ` [Ksummit-2008-discuss] " Jonathan Corbet
2008-07-02 23:41                         ` Arnaldo Carvalho de Melo
2008-07-02 22:38                       ` Masami Hiramatsu
2008-07-02 22:54                       ` [Ksummit-2008-discuss] " Stephen Hemminger
2008-07-03  0:44                         ` Ulrich Drepper
2008-07-03  1:02                           ` H. Peter Anvin
2008-07-03  1:50                             ` Theodore Tso
2008-07-03  1:51                             ` Ulrich Drepper
2008-07-02 20:08                   ` [Ksummit-2008-discuss] DTrace Joel Becker
2008-07-02 20:17                     ` J. Bruce Fields
2008-07-02 20:41                       ` Frank Ch. Eigler
2008-07-02 21:19                       ` H. Peter Anvin
2008-07-02 21:30                       ` Theodore Tso
2008-07-02 21:46                         ` J. Bruce Fields
2008-07-05  9:46                   ` Peter Zijlstra
2008-07-05 10:07                     ` Christoph Hellwig
2008-07-05 12:12                       ` Frank Ch. Eigler
2008-07-05 18:08                         ` Christoph Hellwig
2008-07-05 13:50                       ` James Bottomley
2008-07-05 18:08                         ` Christoph Hellwig
2008-07-05 18:05                       ` K.Prasad
2008-07-07 14:36                         ` Christoph Hellwig
2008-07-07 17:44                           ` K.Prasad
2008-07-05 12:34                     ` Theodore Tso
2008-07-01  5:29   ` Ananth N Mavinakayanahalli
2008-06-30 19:59 ` James Bottomley
2008-06-30 23:52   ` Masami Hiramatsu
2008-07-08 23:32   ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).