[RFC] SystemTap future direction

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* [RFC] SystemTap future direction
@ 2010-08-04  5:19 Masami Hiramatsu
  2010-08-04  7:39 ` Mark Wielaard
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Masami Hiramatsu @ 2010-08-04  5:19 UTC (permalink / raw)
  To: systemtap; +Cc: Satoshi Oshima

Hi,

As you may know (of course I Cc'd discussion on LKML), Ingo and
Christoph said that (at least) uprobes (but also kprobes) should
not support out-of-tree module.
This means that if we succeed to merge uprobes into kernel,
SystemTap can't use uprobes itself. Even worse, if someone tries
to remove kprobes' module support, that could shake the foundation
of SystemTap.

At least, to add support kmodules to uprobes, I think we have two
options, one is pushing systemtap itself and useful scripts into
kernel tree, or the other is finding very useful use-case of *probes
which requires out-of-tree module. (But the first one is hard because
Linus hates C++, and systemtap is too huge to push into the kernel)

Anyway, I think it's the time to discuss how we can get over this
situation and which is the feature direction of SystemTap together.
Since we already has many users, we are responsible to support them.

I'd like to suggest some directions here;

- Merge runtime and module-source generator into linux kernel.
 This will requires rewriting whole of systemtap code from C++ to
 C or other LL (perl or python)

- Port SystemTap on the perf/ftrace and extend perf/ftrace to support
 extend handlers which provided by modules.

- Port SystemTap on the perf/ftrace but drop embedded-C support.
 This will enhance perf/ftrace to support enough flexible data
 filter/modifier (including fault injection feature). In this case,
 SystemTap scripts will handle the data in user-space (not on-line).

- Or, just do nothing and wait for kernel  maintainers choking
 our necks...

I don't think the last one is the best one.
What would you think about that?

BTW, does no one attend to LinuxCon 2010 in Boston?
I'll be there next week...

Best Regards,

-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-04  5:19 [RFC] SystemTap future direction Masami Hiramatsu
@ 2010-08-04  7:39 ` Mark Wielaard
  2010-08-04  9:13   ` Masami Hiramatsu
  2010-08-04  9:39   ` Srikar Dronamraju
  2010-08-04  9:59 ` Andi Kleen
  2010-08-04 16:32 ` Frank Ch. Eigler
  2 siblings, 2 replies; 17+ messages in thread
From: Mark Wielaard @ 2010-08-04  7:39 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: systemtap, Satoshi Oshima

Hi Masami,

On Wed, 2010-08-04 at 14:19 +0900, Masami Hiramatsu wrote:
> As you may know (of course I Cc'd discussion on LKML), Ingo and
> Christoph said that (at least) uprobes (but also kprobes) should
> not support out-of-tree module.

I thought there were already modules using kprobes. And I think module
support for uprobes will be beneficial too.

> This means that if we succeed to merge uprobes into kernel,
> SystemTap can't use uprobes itself.

:) So helping push things upstream means not using them yourself.
If that happens we can always do what we do now of course, ship our own
version. But it would be ideal if we could reuse the upstreamed code of
course.

>  Even worse, if someone tries
> to remove kprobes' module support, that could shake the foundation
> of SystemTap.

kprobes are just one event source. An important one. But there are
others and people do write scripts that never touch kprobes. They are
very nice to have though. Especially if you want cross kernel/user space
observability.

> At least, to add support kmodules to uprobes, I think we have two
> options, one is pushing systemtap itself and useful scripts into
> kernel tree, or the other is finding very useful use-case of *probes
> which requires out-of-tree module. (But the first one is hard because
> Linus hates C++, and systemtap is too huge to push into the kernel)

That would be nice. The c++ part is just the user space translator
anyway. So that doesn't have to be pushed (and doesn't really make sense
IMHO) in the kernel sources. But maybe it can sit next to the user space
perf tools if that is a nicer repository to hack in.

> Anyway, I think it's the time to discuss how we can get over this
> situation and which is the feature direction of SystemTap together.
> Since we already has many users, we are responsible to support them.

Yes. I was at GUADEC last week and was happily surprised to meet
multiple Gnome hackers who were happy systemtap users. glib and gobject
have their own static markers (dtrace compatible) and tapsets now.

> I'd like to suggest some directions here;
> 
> - Merge runtime and module-source generator into linux kernel.
>  This will requires rewriting whole of systemtap code from C++ to
>  C or other LL (perl or python)

If that requires rewriting the whole translator that seems very
unattractive. The translator is just the script parser and translator,
so I don't see why it matters what language it is written in. But
merging some of the runtime, specifically the utrace/task-finder code so
it can be reused by others to get better user space task/process
observability seems like a nice thing to have.

> - Port SystemTap on the perf/ftrace and extend perf/ftrace to support
>  extend handlers which provided by modules.

That would be nice. If we can attach systemtap probe handlers to
perf/ftrace events in kernel then those would be really nice event
sources.

> - Port SystemTap on the perf/ftrace but drop embedded-C support.
>  This will enhance perf/ftrace to support enough flexible data
>  filter/modifier (including fault injection feature). In this case,
>  SystemTap scripts will handle the data in user-space (not on-line).

I think the "not on-line" part is a bit of a showstopper. Since that
kills the main idea of having powerful scriptable observability. Simple
filters are too restrictive IMHO. It might be enough for simple
profiling, where you analyze the data off-line afterwards. But that
isn't an option for everybody (you need to store/push the data
somewhere), and not very efficient some cases.

But we could try translating to something not-C for the runtime. That is
the approach that the fish project seems to be going with extended GDB
agent expressions (see the archer and utrace mailinglist for the
discussion).

> - Or, just do nothing and wait for kernel  maintainers choking
>  our necks...

The kernel maintainers can make our lives easier by letting us upstream
more stuff that we can then reuse. But if not, we can upstream and still
carry our own copy if necessary. That is far from ideal, but if it is
the only option, at least the user experience wouldn't be worse than
what we have now. But I hope we can convince them otherwise of course.

> I don't think the last one is the best one.
> What would you think about that?

Personally I would like to push for an in-kernel interpreter/jit that
our translator can translate to. And make it powerful enough so that it
cannot just be used for systemtap probe handlers, but also for
perf/ftrace/gdb-agent-expressions. But that is a lot of work. It is the
most flexible one though.

I do realize that the current SystemTap design comes from the fact that
years ago the kernel maintainers rejected such an interpreter out of
hand. But now that we have some many alternative obervability techniques
that can use kernel support I hope they will now be more accepting.

> BTW, does no one attend to LinuxCon 2010 in Boston?
> I'll be there next week...

Sorry, wrong continent for me. I currently live in Europe.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-04  7:39 ` Mark Wielaard
@ 2010-08-04  9:13   ` Masami Hiramatsu
  2010-08-04 12:50     ` Mark Wielaard
  2010-08-04  9:39   ` Srikar Dronamraju
  1 sibling, 1 reply; 17+ messages in thread
From: Masami Hiramatsu @ 2010-08-04  9:13 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: systemtap, Satoshi Oshima

Hi Mark,

Mark Wielaard wrote:
> Hi Masami,
> 
> On Wed, 2010-08-04 at 14:19 +0900, Masami Hiramatsu wrote:
>> As you may know (of course I Cc'd discussion on LKML), Ingo and
>> Christoph said that (at least) uprobes (but also kprobes) should
>> not support out-of-tree module.
> 
> I thought there were already modules using kprobes. And I think module
> support for uprobes will be beneficial too.

Yeah, but they can be (and Christoph said, should be) replaced by
tracepoints.

>> This means that if we succeed to merge uprobes into kernel,
>> SystemTap can't use uprobes itself.
> 
> :) So helping push things upstream means not using them yourself.
> If that happens we can always do what we do now of course, ship our own
> version. But it would be ideal if we could reuse the upstreamed code of
> course.

Hmm, that just makes things worse... Kernel developers migh just think
us as rogues :(.

>>  Even worse, if someone tries
>> to remove kprobes' module support, that could shake the foundation
>> of SystemTap.
> 
> kprobes are just one event source. An important one. But there are
> others and people do write scripts that never touch kprobes. They are
> very nice to have though. Especially if you want cross kernel/user space
> observability.

Yeah, but without dynamic tracing, SystemTap lose an advantage.

>> At least, to add support kmodules to uprobes, I think we have two
>> options, one is pushing systemtap itself and useful scripts into
>> kernel tree, or the other is finding very useful use-case of *probes
>> which requires out-of-tree module. (But the first one is hard because
>> Linus hates C++, and systemtap is too huge to push into the kernel)
> 
> That would be nice. The c++ part is just the user space translator
> anyway. So that doesn't have to be pushed (and doesn't really make sense
> IMHO) in the kernel sources. But maybe it can sit next to the user space
> perf tools if that is a nicer repository to hack in.

Yeah, maybe under tools/systemtap/.

>> Anyway, I think it's the time to discuss how we can get over this
>> situation and which is the feature direction of SystemTap together.
>> Since we already has many users, we are responsible to support them.
> 
> Yes. I was at GUADEC last week and was happily surprised to meet
> multiple Gnome hackers who were happy systemtap users. glib and gobject
> have their own static markers (dtrace compatible) and tapsets now.

That's a good news. Is that possible perf to support static markers too?

>> I'd like to suggest some directions here;
>>
>> - Merge runtime and module-source generator into linux kernel.
>>  This will requires rewriting whole of systemtap code from C++ to
>>  C or other LL (perl or python)
> 
> If that requires rewriting the whole translator that seems very
> unattractive. The translator is just the script parser and translator,
> so I don't see why it matters what language it is written in.

Because that's the policy of kernel majority. :P

> But
> merging some of the runtime, specifically the utrace/task-finder code so
> it can be reused by others to get better user space task/process
> observability seems like a nice thing to have.

Yes, that will be the next step of uprobes. Christoph already argued
that pid-only uprobe is hard to widely use.

>> - Port SystemTap on the perf/ftrace and extend perf/ftrace to support
>>  extend handlers which provided by modules.
> 
> That would be nice. If we can attach systemtap probe handlers to
> perf/ftrace events in kernel then those would be really nice event
> sources.
> 
>> - Port SystemTap on the perf/ftrace but drop embedded-C support.
>>  This will enhance perf/ftrace to support enough flexible data
>>  filter/modifier (including fault injection feature). In this case,
>>  SystemTap scripts will handle the data in user-space (not on-line).
> 
> I think the "not on-line" part is a bit of a showstopper. Since that
> kills the main idea of having powerful scriptable observability. Simple
> filters are too restrictive IMHO. It might be enough for simple
> profiling, where you analyze the data off-line afterwards. But that
> isn't an option for everybody (you need to store/push the data
> somewhere), and not very efficient some cases.

The efficiency is the key, and perf and systemtap aim to
different efficiency. SystemTap focuses on the efficiency of
transporting data, but perf focuses on the efficiency of
probing time. What they are trying to is reducing the overhead
of recording data to buffers, because it is less disturbance for
the performance of target processes.

> But we could try translating to something not-C for the runtime. That is
> the approach that the fish project seems to be going with extended GDB
> agent expressions (see the archer and utrace mailinglist for the
> discussion).

Ah, that's a good idea. Linux already have gdb command parser in kgdb.
So we can reuse it (or share new one with kgdb).

>> - Or, just do nothing and wait for kernel  maintainers choking
>>  our necks...
> 
> The kernel maintainers can make our lives easier by letting us upstream
> more stuff that we can then reuse. But if not, we can upstream and still
> carry our own copy if necessary. That is far from ideal, but if it is
> the only option, at least the user experience wouldn't be worse than
> what we have now. But I hope we can convince them otherwise of course.

Anyway, it is important that we show our effort which things goes forward.

>> I don't think the last one is the best one.
>> What would you think about that?
> 
> Personally I would like to push for an in-kernel interpreter/jit that
> our translator can translate to. And make it powerful enough so that it
> cannot just be used for systemtap probe handlers, but also for
> perf/ftrace/gdb-agent-expressions. But that is a lot of work. It is the
> most flexible one though.

Agreed, that greatly helps us, and good way to go.

> I do realize that the current SystemTap design comes from the fact that
> years ago the kernel maintainers rejected such an interpreter out of
> hand. But now that we have some many alternative obervability techniques
> that can use kernel support I hope they will now be more accepting.
> 
>> BTW, does no one attend to LinuxCon 2010 in Boston?
>> I'll be there next week...
> 
> Sorry, wrong continent for me. I currently live in Europe.

Thank you!

> 
> Cheers,
> 
> Mark
> 


-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-04  7:39 ` Mark Wielaard
  2010-08-04  9:13   ` Masami Hiramatsu
@ 2010-08-04  9:39   ` Srikar Dronamraju
  2010-08-04 13:07     ` Mark Wielaard
  2010-08-05 10:26     ` Masami Hiramatsu
  1 sibling, 2 replies; 17+ messages in thread
From: Srikar Dronamraju @ 2010-08-04  9:39 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Masami Hiramatsu, systemtap, Satoshi Oshima

> 
> Yes. I was at GUADEC last week and was happily surprised to meet
> multiple Gnome hackers who were happy systemtap users. glib and gobject
> have their own static markers (dtrace compatible) and tapsets now.
> 

This is nice to hear. Probably would it help if some of these folks
talk about how they used SystemTap with some key kernel developers
whenever they meet let say in conferences like say Plumbers, end
user summits etc ??

> > I'd like to suggest some directions here;
> > 
> > - Merge runtime and module-source generator into linux kernel.
> >  This will requires rewriting whole of systemtap code from C++ to
> >  C or other LL (perl or python)
> 
> If that requires rewriting the whole translator that seems very
> unattractive. The translator is just the script parser and translator,
> so I don't see why it matters what language it is written in. But
> merging some of the runtime, specifically the utrace/task-finder code so
> it can be reused by others to get better user space task/process
> observability seems like a nice thing to have.
> 

I think the task-finder would be gated by utrace.
I am working on a file based uprobing stuff that provides very
minimal task-finder like features.

> 
> The kernel maintainers can make our lives easier by letting us upstream
> more stuff that we can then reuse. But if not, we can upstream and still
> carry our own copy if necessary. That is far from ideal, but if it is
> the only option, at least the user experience wouldn't be worse than
> what we have now. But I hope we can convince them otherwise of course.
> 

But Mark, that may not provide the out-of-box experience that most
of the users esp the first timers would look for. And it would
certainly cap our users.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-04  5:19 [RFC] SystemTap future direction Masami Hiramatsu
  2010-08-04  7:39 ` Mark Wielaard
@ 2010-08-04  9:59 ` Andi Kleen
  2010-08-05 12:33   ` Masami Hiramatsu
  2010-08-04 16:32 ` Frank Ch. Eigler
  2 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2010-08-04  9:59 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: systemtap, Satoshi Oshima

Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> writes:

> As you may know (of course I Cc'd discussion on LKML), Ingo and
> Christoph said that (at least) uprobes (but also kprobes) should
> not support out-of-tree module.

They don't necessarily have the last word on this. Last word would be 
Linus.

> This means that if we succeed to merge uprobes into kernel,
> SystemTap can't use uprobes itself. Even worse, if someone tries
> to remove kprobes' module support, that could shake the foundation
> of SystemTap.

There are already kprobes modules in tree I believe, which
would be broken too.

Also traditionally patches that broke a widely used free software
out of tree module haven't been merged in the past.

> At least, to add support kmodules to uprobes, I think we have two
> options, one is pushing systemtap itself and useful scripts into
> kernel tree, or the other is finding very useful use-case of *probes
> which requires out-of-tree module. (But the first one is hard because
> Linus hates C++, and systemtap is too huge to push into the kernel)

One thing that might work is to move at least larger parts of the
systemtap kernel runtime library into the kernel tree and then have
a couple of example modules written in C that exercise all the
interfaces (and ideally do something useful in the process too)

In principle even compiler generated modules could be (at least
partially used for this), but I suppose they would need
quite some cleanups. It might be easier to do this with handwritten
C.

Then make sure the compiler output mostly only uses these interfaces.
That is they would need to be useful higher level interfaces, not just an 
thin abstraction layer. I suspect 100% coverage wouldn't be possible
and also the compiler also would use use some other interfaces,
but as long as those are widely used driver interfaces there's
usually no problem.

Basically it's important to have testing coverage in the kernel
for everything that can be used by compiler output (minus guru mode)

This would probably need significant work to clean the library
up for kernel coding style etc. I think some of the code
could be also a lot simplified, especially if there were some
minor changes in the main kernel for this 

(I still don't understand how one needs 1.6kLOC to find a task :)

But you could start in staging with this, so it doesn't require
doing all that work outside mainline.

I think this would also largely fix the problem that systemtap often
breaks with new kernel versions.

For example one of things I really like in systemtap are the 
easy histograms. So if there was a histogram library function
in the kernel I assume that could even find other users.
Now that's only a small part of the code, but there could be more
of this.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-04  9:13   ` Masami Hiramatsu
@ 2010-08-04 12:50     ` Mark Wielaard
  2010-08-05 12:28       ` Masami Hiramatsu
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Wielaard @ 2010-08-04 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: systemtap, Satoshi Oshima

On Wed, 2010-08-04 at 18:13 +0900, Masami Hiramatsu wrote:
> Mark Wielaard wrote:
> >> This means that if we succeed to merge uprobes into kernel,
> >> SystemTap can't use uprobes itself.
> > 
> > :) So helping push things upstream means not using them yourself.
> > If that happens we can always do what we do now of course, ship our own
> > version. But it would be ideal if we could reuse the upstreamed code of
> > course.
> 
> Hmm, that just makes things worse... Kernel developers migh just think
> us as rogues :(.

Some might already consider us as such :)

> >> Anyway, I think it's the time to discuss how we can get over this
> >> situation and which is the feature direction of SystemTap together.
> >> Since we already has many users, we are responsible to support them.
> > 
> > Yes. I was at GUADEC last week and was happily surprised to meet
> > multiple Gnome hackers who were happy systemtap users. glib and gobject
> > have their own static markers (dtrace compatible) and tapsets now.
> 
> That's a good news. Is that possible perf to support static markers too?

Certainly. They are now also in java, python, mysql, postgresql,
firefox, Ray and Roland are trying to add some to glibc for mutex
monitoring. And I hope gdb will also support them as consumer. So you
can place breakpoints on them. Basically they are just a mapping to a
high-level even name to addresses plus a description of where to find
the arguments. We have had two versions based on including an
allocated .probes elf section in the executable/library. V1 which
required the dwarf location expressions for locating the arguments and
V2 which is includes the locations of the arguments directly (see the
description of the structs in sys/sdt.h). Both are source compatible
with the DTRACE static markers, so people only have to instrument their
executables/libraries once independent of the underlying observation
tool. Roland has proposed a refinement of the way we store the names,
addresses and arguments in a new elf note which doesn't have to be
allocated and which doesn't need relocations (see the "revamp sdt.h"
email on this list from a couple of days ago).

gdb or perf could parse this section (or the new elf note) and select
the addresses to watch for a given event name. It might be good to go
over the proposal from Roland since it includes a "stand alone" parser
that might just be reused as is.

Of course it is harder to reuse the tapsets in other programs based on
these events. But those provide higher-level functionality (mapping
arguments to names, keeping track of arguments/strings constructed,
generating backtraces through jitted code, etc.). For perf just being
able to select and profile the addresses (through uprobes) and record
the arguments is probably a better match for their profiler interfaces
anyway. In GDB you already have access to much more higher level
information anyway, since you will be able to consult the debuginfo and
have the thread under inspection stopped already.

> >> - Port SystemTap on the perf/ftrace but drop embedded-C support.
> >>  This will enhance perf/ftrace to support enough flexible data
> >>  filter/modifier (including fault injection feature). In this case,
> >>  SystemTap scripts will handle the data in user-space (not on-line).
> > 
> > I think the "not on-line" part is a bit of a showstopper. Since that
> > kills the main idea of having powerful scriptable observability. Simple
> > filters are too restrictive IMHO. It might be enough for simple
> > profiling, where you analyze the data off-line afterwards. But that
> > isn't an option for everybody (you need to store/push the data
> > somewhere), and not very efficient some cases.
> 
> The efficiency is the key, and perf and systemtap aim to
> different efficiency. SystemTap focuses on the efficiency of
> transporting data, but perf focuses on the efficiency of
> probing time. What they are trying to is reducing the overhead
> of recording data to buffers, because it is less disturbance for
> the performance of target processes.

Right. It just comes down to priorities of the different goals.
Profiling (with offline analysis) versus scriptable tracing (with some
debugging elements). But making either efficient will help both cases.
We just have to be careful not to trade in one completely for the other,
or we kill useful use cases at probing time.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-04  9:39   ` Srikar Dronamraju
@ 2010-08-04 13:07     ` Mark Wielaard
  2010-08-05 10:34       ` Masami Hiramatsu
  2010-08-05 10:26     ` Masami Hiramatsu
  1 sibling, 1 reply; 17+ messages in thread
From: Mark Wielaard @ 2010-08-04 13:07 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: Masami Hiramatsu, systemtap, Satoshi Oshima

On Wed, 2010-08-04 at 15:06 +0530, Srikar Dronamraju wrote:
> > 
> > Yes. I was at GUADEC last week and was happily surprised to meet
> > multiple Gnome hackers who were happy systemtap users. glib and gobject
> > have their own static markers (dtrace compatible) and tapsets now.
> > 
> This is nice to hear. Probably would it help if some of these folks
> talk about how they used SystemTap with some key kernel developers
> whenever they meet let say in conferences like say Plumbers, end
> user summits etc ??

I don't know if these users and kernel hackers hang out at the same
conferences. But there have been some blog posts by people using
systemtap for these kind of static markers in gnome libraries:
http://tecnocode.co.uk/2010/07/13/reference-count-debugging-with-systemtap/
http://blogs.gnome.org/alexl/2010/01/04/tracing-glib/

> > The kernel maintainers can make our lives easier by letting us upstream
> > more stuff that we can then reuse. But if not, we can upstream and still
> > carry our own copy if necessary. That is far from ideal, but if it is
> > the only option, at least the user experience wouldn't be worse than
> > what we have now. But I hope we can convince them otherwise of course.
> > 
> 
> But Mark, that may not provide the out-of-box experience that most
> of the users esp the first timers would look for. And it would
> certainly cap our users.

I am not saying it is ideal. But it wouldn't be worse than the current
experience. If the kernel maintainers really don't want to export the
functionality and we don't want to ship a parallel module of our own,
then we could also use the exported kallsyms support in newer kernels to
call the function addresses directly. That might actually be slightly
nicer for the user, although a bit more fragile.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-04  5:19 [RFC] SystemTap future direction Masami Hiramatsu
  2010-08-04  7:39 ` Mark Wielaard
  2010-08-04  9:59 ` Andi Kleen
@ 2010-08-04 16:32 ` Frank Ch. Eigler
  2010-08-05 11:48   ` Masami Hiramatsu
  2 siblings, 1 reply; 17+ messages in thread
From: Frank Ch. Eigler @ 2010-08-04 16:32 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: systemtap, Satoshi Oshima

masami.hiramatsu.pt wrote:

> As you may know (of course I Cc'd discussion on LKML), Ingo and
> Christoph said that (at least) uprobes (but also kprobes) should not
> support out-of-tree module.

This is unfortunate, but is in a way useful evidence about what sorts
of effects we can expect from further outreach efforts.  You and
Srikar have spent at least a year working on mini-stap functionality
for the benefit of the perf side of the house, and yet the amount of
goodwill offered in return is ... where?

> This means that if we succeed to merge uprobes into kernel,
> SystemTap can't use uprobes itself.

Well, there are always various social and technical measures to work
around gratuitious obstacles.

> Even worse, if someone tries to remove kprobes' module support, that
> could shake the foundation of SystemTap.

It is hard to imagine someone deliberately hurting linux users that way.

> At least, to add support kmodules to uprobes, I think we have two
> options, one is pushing systemtap itself and useful scripts into
> kernel tree, or the other is finding very useful use-case of *probes
> which requires out-of-tree module.  [...]

Or #3, coming up with one more substantial in-tree uprobes example
than the one hch instructed srikar to drop.

> [...]
> I'd like to suggest some directions here;
>
> - Merge runtime and module-source generator into linux kernel.
>   This will requires rewriting whole of systemtap code from C++ to
>   C or other LL (perl or python)

More concretely, to rewrite and LKML-code-standarize the lot, but
retain current architecture?  Do you sense that there's any interest
in this sort of solution by Linus?

> - Port SystemTap on the perf/ftrace and extend perf/ftrace to support
>   extend handlers which provided by modules.

More concretely, to make a version of systemtap that instead of
generating stand-alone kernel modules that operate independently of
perf/etc., that they be bound to perf event sources & infrastructure?
But retain the power of our system by still executing arbitrary
generated code from those callbacks?  Do you sense that there's any
interest in this sort of solution by the perf people?

Now if we're talking about a module-encased bytecode interpreter / JIT
rich enough to encompass our runtime/language features, I have some
interest in this sort of solution, whether coupled or decoupled from
perf.  But this is a large amount of effort.  But we're tempted.

> - Port SystemTap on the perf/ftrace but drop embedded-C support.
>   This will enhance perf/ftrace to support enough flexible data
>   filter/modifier (including fault injection feature). In this case,
>   SystemTap scripts will handle the data in user-space (not on-line).

I get the sense the perf people believe they are on this course
already, without needing any help.

> - Or, just do nothing and wait for kernel maintainers choking
>   our necks...

I don't think the situation is in fact deteriorating.  We're shipping
decent releases, growing our user base, within and without the kernel
developer community, and still have plenty of major feature areas to
work on.  We have not seen regressive LKML obstructions, though
admittedly that is a low standard when it comes to serving the
community.

- FChE

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-04  9:39   ` Srikar Dronamraju
  2010-08-04 13:07     ` Mark Wielaard
@ 2010-08-05 10:26     ` Masami Hiramatsu
  1 sibling, 0 replies; 17+ messages in thread
From: Masami Hiramatsu @ 2010-08-05 10:26 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: Mark Wielaard, systemtap, Satoshi Oshima

Srikar Dronamraju wrote:
>> Yes. I was at GUADEC last week and was happily surprised to meet
>> multiple Gnome hackers who were happy systemtap users. glib and gobject
>> have their own static markers (dtrace compatible) and tapsets now.
>>
> 
> This is nice to hear. Probably would it help if some of these folks
> talk about how they used SystemTap with some key kernel developers
> whenever they meet let say in conferences like say Plumbers, end
> user summits etc ??

I'm asking some systemtap users in Japan to join us at
LinuxCon Japan, tracing track. I found some PostgreSQL
developers who are using systemtap for profiling transactions
instead of Dtrace ;)

So, we may be able to convince kernel people if we can bring
systemtap users voice to them. If they know not only kernel
developers but also application developers and users use
systemtap, they need to consider features for those users.

>>> I'd like to suggest some directions here;
>>>
>>> - Merge runtime and module-source generator into linux kernel.
>>>  This will requires rewriting whole of systemtap code from C++ to
>>>  C or other LL (perl or python)
>> If that requires rewriting the whole translator that seems very
>> unattractive. The translator is just the script parser and translator,
>> so I don't see why it matters what language it is written in. But
>> merging some of the runtime, specifically the utrace/task-finder code so
>> it can be reused by others to get better user space task/process
>> observability seems like a nice thing to have.
>>
> 
> I think the task-finder would be gated by utrace.
> I am working on a file based uprobing stuff that provides very
> minimal task-finder like features.

Nice! :)

Thank you,

-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-04 13:07     ` Mark Wielaard
@ 2010-08-05 10:34       ` Masami Hiramatsu
  2010-08-05 11:03         ` Mark Wielaard
  0 siblings, 1 reply; 17+ messages in thread
From: Masami Hiramatsu @ 2010-08-05 10:34 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Srikar Dronamraju, systemtap, Satoshi Oshima

Mark Wielaard wrote:
> On Wed, 2010-08-04 at 15:06 +0530, Srikar Dronamraju wrote:
>>> Yes. I was at GUADEC last week and was happily surprised to meet
>>> multiple Gnome hackers who were happy systemtap users. glib and gobject
>>> have their own static markers (dtrace compatible) and tapsets now.
>>>
>> This is nice to hear. Probably would it help if some of these folks
>> talk about how they used SystemTap with some key kernel developers
>> whenever they meet let say in conferences like say Plumbers, end
>> user summits etc ??
> 
> I don't know if these users and kernel hackers hang out at the same
> conferences. But there have been some blog posts by people using
> systemtap for these kind of static markers in gnome libraries:
> http://tecnocode.co.uk/2010/07/13/reference-count-debugging-with-systemtap/
> http://blogs.gnome.org/alexl/2010/01/04/tracing-glib/

Hm, these are very interesting stories.
We should let kernel people know that application people already start
using systemtap/dtrace in there developing process.


>>> The kernel maintainers can make our lives easier by letting us upstream
>>> more stuff that we can then reuse. But if not, we can upstream and still
>>> carry our own copy if necessary. That is far from ideal, but if it is
>>> the only option, at least the user experience wouldn't be worse than
>>> what we have now. But I hope we can convince them otherwise of course.
>>>
>> But Mark, that may not provide the out-of-box experience that most
>> of the users esp the first timers would look for. And it would
>> certainly cap our users.
> 
> I am not saying it is ideal. But it wouldn't be worse than the current
> experience. If the kernel maintainers really don't want to export the
> functionality and we don't want to ship a parallel module of our own,
> then we could also use the exported kallsyms support in newer kernels to
> call the function addresses directly. That might actually be slightly
> nicer for the user, although a bit more fragile.

Uh, that's really really the last resort.
We have to talk about what is the best way for users with kernel people
and try hard to find out how to compromise as far as we can.

Thank you,

> 
> Cheers,
> 
> Mark
> 

-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-05 10:34       ` Masami Hiramatsu
@ 2010-08-05 11:03         ` Mark Wielaard
  0 siblings, 0 replies; 17+ messages in thread
From: Mark Wielaard @ 2010-08-05 11:03 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Srikar Dronamraju, systemtap, Satoshi Oshima

On Thu, 2010-08-05 at 19:34 +0900, Masami Hiramatsu wrote:
> Mark Wielaard wrote:
> > I don't know if these users and kernel hackers hang out at the same
> > conferences. But there have been some blog posts by people using
> > systemtap for these kind of static markers in gnome libraries:
> > http://tecnocode.co.uk/2010/07/13/reference-count-debugging-with-systemtap/
> > http://blogs.gnome.org/alexl/2010/01/04/tracing-glib/
> 
> Hm, these are very interesting stories.
> We should let kernel people know that application people already start
> using systemtap/dtrace in there developing process.

I thought that was already widely known. But perhaps not. Personally I
am using systemtap much more for user space obervability than for any
kernel stuff. Just because I am mainly a user space hacker.

Some other stories of people using systemtap for normal user space
introspection that I could find with a quick search:

- Postgresql:
http://www.fosslc.org/drupal/content/probing-postgresql-dtrace-and-systemtap

- Mysql:
http://assets.en.oreilly.com/1/event/36/Monitoring%20Drizzle%20or%
20MySQL%20With%20DTrace%20and%20SystemTap%20Presentation.pdf

- Mozilla:
http://blog.mozilla.com/tglek/2009/10/23/studying-library-io-systemtap-style/
http://blog.mozilla.com/tglek/2010/07/22/file-fragmentation/
http://blog.mozilla.com/tglek/2010/05/24/teethig-troubles-assigning-blame-for-pagefaults/

- Python:
http://press.redhat.com/2010/04/27/fedora-13-spotlight-feature-exploring-new-frontiers-of-python-development/

- Java:
http://icedtea.classpath.org/~vanaltj/stapexamples/

Cheers,

Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-04 16:32 ` Frank Ch. Eigler
@ 2010-08-05 11:48   ` Masami Hiramatsu
  0 siblings, 0 replies; 17+ messages in thread
From: Masami Hiramatsu @ 2010-08-05 11:48 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap, Satoshi Oshima

Frank Ch. Eigler wrote:
> masami.hiramatsu.pt wrote:
> 
>> As you may know (of course I Cc'd discussion on LKML), Ingo and
>> Christoph said that (at least) uprobes (but also kprobes) should not
>> support out-of-tree module.
> 
> This is unfortunate, but is in a way useful evidence about what sorts
> of effects we can expect from further outreach efforts.  You and
> Srikar have spent at least a year working on mini-stap functionality
> for the benefit of the perf side of the house, and yet the amount of
> goodwill offered in return is ... where?

Well..., there may be still less use-cases they know.

>> This means that if we succeed to merge uprobes into kernel,
>> SystemTap can't use uprobes itself.
> 
> Well, there are always various social and technical measures to work
> around gratuitious obstacles.
> 
>> Even worse, if someone tries to remove kprobes' module support, that
>> could shake the foundation of SystemTap.
> 
> It is hard to imagine someone deliberately hurting linux users that way.
> 

I hope so. and now I know we already have many users, use-cases.
I think the next step is let them know the usefulness of systemtap,
with actual use-case examples.

> 
>> At least, to add support kmodules to uprobes, I think we have two
>> options, one is pushing systemtap itself and useful scripts into
>> kernel tree, or the other is finding very useful use-case of *probes
>> which requires out-of-tree module.  [...]
> 
> Or #3, coming up with one more substantial in-tree uprobes example
> than the one hch instructed srikar to drop.

OK.

>> [...]
>> I'd like to suggest some directions here;
>>
>> - Merge runtime and module-source generator into linux kernel.
>>   This will requires rewriting whole of systemtap code from C++ to
>>   C or other LL (perl or python)
> 
> More concretely, to rewrite and LKML-code-standarize the lot, but
> retain current architecture?  Do you sense that there's any interest
> in this sort of solution by Linus?

Not sure. If the on-line scripting (which requires module-generator,
or in-kernel interpreter) is so useful as you think, you can convince
him to support it in-kernel.

Again, we already have several users who are using systemtap (with
Dtrace tracepoints) agressively. I think that means there are real
demands of application tracing. (I also think we still need to discuss
what is the best implemantation for that.)

>> - Port SystemTap on the perf/ftrace and extend perf/ftrace to support
>>   extend handlers which provided by modules.
> 
> More concretely, to make a version of systemtap that instead of
> generating stand-alone kernel modules that operate independently of
> perf/etc., that they be bound to perf event sources & infrastructure?

Right.

> But retain the power of our system by still executing arbitrary
> generated code from those callbacks?  Do you sense that there's any
> interest in this sort of solution by the perf people?

I think so, if we can indicate the flexibility of on-line scripting
and its power, we may convince them.

> Now if we're talking about a module-encased bytecode interpreter / JIT
> rich enough to encompass our runtime/language features, I have some
> interest in this sort of solution, whether coupled or decoupled from
> perf.  But this is a large amount of effort.  But we're tempted.

Yeah, me too :)
Actually, Ingo had once suggested to build an interpreter in kernel.

>> - Port SystemTap on the perf/ftrace but drop embedded-C support.
>>   This will enhance perf/ftrace to support enough flexible data
>>   filter/modifier (including fault injection feature). In this case,
>>   SystemTap scripts will handle the data in user-space (not on-line).
> 
> I get the sense the perf people believe they are on this course
> already, without needing any help.

Yeah, I just would like to help systemtap users to run their
stap scripts even if perf people choose this way.

>> - Or, just do nothing and wait for kernel maintainers choking
>>   our necks...
> 
> I don't think the situation is in fact deteriorating.  We're shipping
> decent releases, growing our user base, within and without the kernel
> developer community, and still have plenty of major feature areas to
> work on.  We have not seen regressive LKML obstructions, though
> admittedly that is a low standard when it comes to serving the
> community.

Maybe I'm a paranoid. However, I can't see the things getting better
too, at least in the kernel mailing list. perf and ftrace are
becoming to de-facto standard tracing tools among the linux kernel
community. Indeed systemtap has growing user base and major features.
And then, why NOT appeal that? I can't see any systemtap developers
in the collaboration summit this year, and on LinuxCon attendee list.
Recently I feel that we are losing presence among the linux kernel
community, even though systemtap can run ONLY on the linux.

Anyway, I plan to hold a tracing panel discussion with systemtap
users and perf/ftrace developers in LinuxCon Japan and will try
to clarify what features users need. I hope that helps things
going forward.

Thank you,

-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-04 12:50     ` Mark Wielaard
@ 2010-08-05 12:28       ` Masami Hiramatsu
  2010-08-05 13:37         ` Mark Wielaard
  0 siblings, 1 reply; 17+ messages in thread
From: Masami Hiramatsu @ 2010-08-05 12:28 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: systemtap, Satoshi Oshima

Mark Wielaard wrote:
>>>> - Port SystemTap on the perf/ftrace but drop embedded-C support.
>>>>  This will enhance perf/ftrace to support enough flexible data
>>>>  filter/modifier (including fault injection feature). In this case,
>>>>  SystemTap scripts will handle the data in user-space (not on-line).
>>> I think the "not on-line" part is a bit of a showstopper. Since that
>>> kills the main idea of having powerful scriptable observability. Simple
>>> filters are too restrictive IMHO. It might be enough for simple
>>> profiling, where you analyze the data off-line afterwards. But that
>>> isn't an option for everybody (you need to store/push the data
>>> somewhere), and not very efficient some cases.
>> The efficiency is the key, and perf and systemtap aim to
>> different efficiency. SystemTap focuses on the efficiency of
>> transporting data, but perf focuses on the efficiency of
>> probing time. What they are trying to is reducing the overhead
>> of recording data to buffers, because it is less disturbance for
>> the performance of target processes.
> 
> Right. It just comes down to priorities of the different goals.
> Profiling (with offline analysis) versus scriptable tracing (with some
> debugging elements). But making either efficient will help both cases.
> We just have to be careful not to trade in one completely for the other,
> or we kill useful use cases at probing time.

Hmm, could you find any useful use-case of scriptable tracing (on-line)?
I know generally scriptable interface provides us great flexibility, but
does it have to be done on-line? I mean, perf already has scriptable
interface(python and perl) for off-line analysis. Why isn't that enough?

I just know that some complicated fault-injections are really useful example
of scriptable tracing on-line.

e.g.
http://ols.fedoraproject.org/OLS/Reprints-2008/tanaka-reprint.pdf

Thank you,

-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-04  9:59 ` Andi Kleen
@ 2010-08-05 12:33   ` Masami Hiramatsu
  0 siblings, 0 replies; 17+ messages in thread
From: Masami Hiramatsu @ 2010-08-05 12:33 UTC (permalink / raw)
  To: Andi Kleen; +Cc: systemtap, Satoshi Oshima

Andi Kleen wrote:
> Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> writes:
> 
>> As you may know (of course I Cc'd discussion on LKML), Ingo and
>> Christoph said that (at least) uprobes (but also kprobes) should
>> not support out-of-tree module.
> 
> They don't necessarily have the last word on this. Last word would be 
> Linus.
> 
>> This means that if we succeed to merge uprobes into kernel,
>> SystemTap can't use uprobes itself. Even worse, if someone tries
>> to remove kprobes' module support, that could shake the foundation
>> of SystemTap.
> 
> There are already kprobes modules in tree I believe, which
> would be broken too.

Sure, there are some kprobe examples and jprobe users in
kernel tree.

> Also traditionally patches that broke a widely used free software
> out of tree module haven't been merged in the past.

I hope so.

>> At least, to add support kmodules to uprobes, I think we have two
>> options, one is pushing systemtap itself and useful scripts into
>> kernel tree, or the other is finding very useful use-case of *probes
>> which requires out-of-tree module. (But the first one is hard because
>> Linus hates C++, and systemtap is too huge to push into the kernel)
> 
> One thing that might work is to move at least larger parts of the
> systemtap kernel runtime library into the kernel tree and then have
> a couple of example modules written in C that exercise all the
> interfaces (and ideally do something useful in the process too)
> 
> In principle even compiler generated modules could be (at least
> partially used for this), but I suppose they would need
> quite some cleanups. It might be easier to do this with handwritten
> C.

Yeah, should be :) Nowadays systemtap generated modules are
very complicated, because of safeness.

> Then make sure the compiler output mostly only uses these interfaces.
> That is they would need to be useful higher level interfaces, not just an 
> thin abstraction layer. I suspect 100% coverage wouldn't be possible
> and also the compiler also would use use some other interfaces,
> but as long as those are widely used driver interfaces there's
> usually no problem.
> 
> Basically it's important to have testing coverage in the kernel
> for everything that can be used by compiler output (minus guru mode)
> 
> This would probably need significant work to clean the library
> up for kernel coding style etc. I think some of the code
> could be also a lot simplified, especially if there were some
> minor changes in the main kernel for this 

That's one reason why I've started to push trace-kprobe into
the kernel, which provides a standard interface set for getting
registers, stacks, and memories. I hope finally it could be
shared with systemtap modules.

> (I still don't understand how one needs 1.6kLOC to find a task :)
> 
> But you could start in staging with this, so it doesn't require
> doing all that work outside mainline.
> 
> I think this would also largely fix the problem that systemtap often
> breaks with new kernel versions.
> 
> For example one of things I really like in systemtap are the 
> easy histograms. So if there was a histogram library function
> in the kernel I assume that could even find other users.
> Now that's only a small part of the code, but there could be more
> of this.

Thanks for a good advice. :) It should be a good candidate to
be included in the kernel.

Thank you,

> 
> -Andi
> 

-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-05 12:28       ` Masami Hiramatsu
@ 2010-08-05 13:37         ` Mark Wielaard
  2010-08-06  9:31           ` Masami Hiramatsu
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Wielaard @ 2010-08-05 13:37 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: systemtap, Satoshi Oshima

On Thu, 2010-08-05 at 21:27 +0900, Masami Hiramatsu wrote:
> Mark Wielaard wrote:
> >>>> - Port SystemTap on the perf/ftrace but drop embedded-C support.
> >>>>  This will enhance perf/ftrace to support enough flexible data
> >>>>  filter/modifier (including fault injection feature). In this case,
> >>>>  SystemTap scripts will handle the data in user-space (not on-line).
> >>> I think the "not on-line" part is a bit of a showstopper. Since that
> >>> kills the main idea of having powerful scriptable observability. Simple
> >>> filters are too restrictive IMHO. It might be enough for simple
> >>> profiling, where you analyze the data off-line afterwards. But that
> >>> isn't an option for everybody (you need to store/push the data
> >>> somewhere), and not very efficient some cases.
> >> The efficiency is the key, and perf and systemtap aim to
> >> different efficiency. SystemTap focuses on the efficiency of
> >> transporting data, but perf focuses on the efficiency of
> >> probing time. What they are trying to is reducing the overhead
> >> of recording data to buffers, because it is less disturbance for
> >> the performance of target processes.
> > 
> > Right. It just comes down to priorities of the different goals.
> > Profiling (with offline analysis) versus scriptable tracing (with some
> > debugging elements). But making either efficient will help both cases.
> > We just have to be careful not to trade in one completely for the other,
> > or we kill useful use cases at probing time.
> 
> Hmm, could you find any useful use-case of scriptable tracing (on-line)?
> I know generally scriptable interface provides us great flexibility, but
> does it have to be done on-line? I mean, perf already has scriptable
> interface(python and perl) for off-line analysis. Why isn't that enough?

But you should know already! :) You wrote the kprobe-based event tracer!
Allowing to do online filtering based on simple register and memory
values on-line. So you don't need to do the filtering afterwards
off-line.

SystemTap just extends that basic idea with making on-line tracing
decision based on global variables, associate arrays, statistical
variables and some context gathering functions. All so you can easily
zoom into what is really important in your particular environment.

The off-line analysis through perf/python/perl is fine if you can afford
recording all the data you might be using at all the probe trace events.
And for some kinds of statistical profiling that is precisely what you
need anyway. Dump all data and analyse it off-line.

But this obviously has a cost. Say, you are interested in getting a
backtrace for when a certain event triggers, in the reference counting
gnome example when the count goes up to > 1024 for a particular object
type, but only if allocated from a particular module after startup,
which you mark as when the number of different objects created passes a
certain threshold. You can easily express that with systemtap using a
global variable and associate array. Only when the "context" is right do
you gather all the context data at the probe point, dump a stack trace
and the aggregated data. You can make that decision on-line before
gathering all the possibly relevant data at the probe point.

If you need to post-process off-line and don't have "on-line storage"
for keeping the statistics/associate arrays then you need to dump the
full context with every probe, including for example the backtraces
(which is expensive) since you don't know whether or not you might need
it. The post-processing off-line tools can of course then filter out all
unneeded data for you. But you still need to generate all the data
first, and then process it all (from a different context - context
switches aren't cheap either). That does disturb your system, especially
with larger data sets. If you are unlucky you could actually get into a
situation where the data dumping & post-processing tools generate more
i/o, syscalls, memory usage, etc. than the issue you are trying to
observe in the first place.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-05 13:37         ` Mark Wielaard
@ 2010-08-06  9:31           ` Masami Hiramatsu
  2010-08-06 16:50             ` Frank Ch. Eigler
  0 siblings, 1 reply; 17+ messages in thread
From: Masami Hiramatsu @ 2010-08-06  9:31 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: systemtap, Satoshi Oshima

Mark Wielaard wrote:
> On Thu, 2010-08-05 at 21:27 +0900, Masami Hiramatsu wrote:
>> Mark Wielaard wrote:
>>>>>> - Port SystemTap on the perf/ftrace but drop embedded-C support.
>>>>>>  This will enhance perf/ftrace to support enough flexible data
>>>>>>  filter/modifier (including fault injection feature). In this case,
>>>>>>  SystemTap scripts will handle the data in user-space (not on-line).
>>>>> I think the "not on-line" part is a bit of a showstopper. Since that
>>>>> kills the main idea of having powerful scriptable observability. Simple
>>>>> filters are too restrictive IMHO. It might be enough for simple
>>>>> profiling, where you analyze the data off-line afterwards. But that
>>>>> isn't an option for everybody (you need to store/push the data
>>>>> somewhere), and not very efficient some cases.
>>>> The efficiency is the key, and perf and systemtap aim to
>>>> different efficiency. SystemTap focuses on the efficiency of
>>>> transporting data, but perf focuses on the efficiency of
>>>> probing time. What they are trying to is reducing the overhead
>>>> of recording data to buffers, because it is less disturbance for
>>>> the performance of target processes.
>>> Right. It just comes down to priorities of the different goals.
>>> Profiling (with offline analysis) versus scriptable tracing (with some
>>> debugging elements). But making either efficient will help both cases.
>>> We just have to be careful not to trade in one completely for the other,
>>> or we kill useful use cases at probing time.
>> Hmm, could you find any useful use-case of scriptable tracing (on-line)?
>> I know generally scriptable interface provides us great flexibility, but
>> does it have to be done on-line? I mean, perf already has scriptable
>> interface(python and perl) for off-line analysis. Why isn't that enough?
> 
> But you should know already! :) You wrote the kprobe-based event tracer!
> Allowing to do online filtering based on simple register and memory
> values on-line. So you don't need to do the filtering afterwards
> off-line.

Hm,

> SystemTap just extends that basic idea with making on-line tracing
> decision based on global variables, associate arrays, statistical
> variables and some context gathering functions. All so you can easily
> zoom into what is really important in your particular environment.
> 
> The off-line analysis through perf/python/perl is fine if you can afford
> recording all the data you might be using at all the probe trace events.
> And for some kinds of statistical profiling that is precisely what you
> need anyway. Dump all data and analyse it off-line.

Agreed.

> But this obviously has a cost. Say, you are interested in getting a
> backtrace for when a certain event triggers, in the reference counting
> gnome example when the count goes up to > 1024 for a particular object
> type, but only if allocated from a particular module after startup,
> which you mark as when the number of different objects created passes a
> certain threshold. You can easily express that with systemtap using a
> global variable and associate array. Only when the "context" is right do
> you gather all the context data at the probe point, dump a stack trace
> and the aggregated data. You can make that decision on-line before
> gathering all the possibly relevant data at the probe point.

Yeah, that's a good case when the systemtap on-line analysis works fine!

> If you need to post-process off-line and don't have "on-line storage"
> for keeping the statistics/associate arrays then you need to dump the
> full context with every probe, including for example the backtraces
> (which is expensive) since you don't know whether or not you might need
> it. The post-processing off-line tools can of course then filter out all
> unneeded data for you. But you still need to generate all the data
> first, and then process it all (from a different context - context
> switches aren't cheap either). That does disturb your system, especially
> with larger data sets. If you are unlucky you could actually get into a
> situation where the data dumping & post-processing tools generate more
> i/o, syscalls, memory usage, etc. than the issue you are trying to
> observe in the first place.

So we might better start with useful concrete examples which show
the actual performance differences between on-line and off-line.

Thank you for giving us good ideas! :)

> 
> Cheers,
> 
> Mark
> 
-- 
Masami HIRAMATSU
2nd Research Dept.
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] SystemTap future direction
  2010-08-06  9:31           ` Masami Hiramatsu
@ 2010-08-06 16:50             ` Frank Ch. Eigler
  0 siblings, 0 replies; 17+ messages in thread
From: Frank Ch. Eigler @ 2010-08-06 16:50 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Mark Wielaard, systemtap, Satoshi Oshima

Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> writes:

> [...]
> So we might better start with useful concrete examples which show
> the actual performance differences between on-line and off-line.

It's also not a matter of mere performance difference.  Sometimes the
data being collected - probes being activated - are themselves a
function of history.  Sometimes action needs to be taken.  Sometimes
tetris needs to be played. :-) These are all beyond offline parsing
approach.

- FChE

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-08-06 16:50 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-04  5:19 [RFC] SystemTap future direction Masami Hiramatsu
2010-08-04  7:39 ` Mark Wielaard
2010-08-04  9:13   ` Masami Hiramatsu
2010-08-04 12:50     ` Mark Wielaard
2010-08-05 12:28       ` Masami Hiramatsu
2010-08-05 13:37         ` Mark Wielaard
2010-08-06  9:31           ` Masami Hiramatsu
2010-08-06 16:50             ` Frank Ch. Eigler
2010-08-04  9:39   ` Srikar Dronamraju
2010-08-04 13:07     ` Mark Wielaard
2010-08-05 10:34       ` Masami Hiramatsu
2010-08-05 11:03         ` Mark Wielaard
2010-08-05 10:26     ` Masami Hiramatsu
2010-08-04  9:59 ` Andi Kleen
2010-08-05 12:33   ` Masami Hiramatsu
2010-08-04 16:32 ` Frank Ch. Eigler
2010-08-05 11:48   ` Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).