kernel summit session on systemtap

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* kernel summit session on systemtap
@ 2008-09-17 14:42 Frank Ch. Eigler
  2008-09-17 22:14 ` Theodore Tso
  2008-09-18 15:28 ` Theodore Tso
  0 siblings, 2 replies; 14+ messages in thread
From: Frank Ch. Eigler @ 2008-09-17 14:42 UTC (permalink / raw)
  To: systemtap

Hi -

Yesterday morning, we had a 90-minute session on topics touching
systemtap.  Though there were several serious concerns raised, it went
better than I had expected.

Here are some things we need to work more on:

- Making it dead easy for kernel guys to build and use the thing.
  It's hard to say how what problems they run into since only the rare
  bug report gets sent, but some possibilities could be:
  - be able to work against an un-installed kernel build tree
  - add buildid checking
  - bundle elfutils sources for the distro-challenged :-)
  - other ideas, please!

- It's time to really improve & shrink debuginfo.  Enough said.

- We need to test constantly agaist linus/linux-next type git trees,
  at least to confirm that the runtime works.  There is a perception
  that it breaks often, and this in turn is driving the impulse to
  pull the runtime into the kernel.  If we can more aggressively
  handle the problem, the impulse would die off.  We could still of
  course push some of the runtime upstream, but that could happen for
  stronger technical reasons rather than kneejerks.  Linus hinted at
  going through akpm since he's such a pushover :-)

- Stability.  Kernel crashes are an instant and long-lasting turn-off,
  even to the point of mean laughter.  We need to urgently and deeply
  stress-test and robustify our kernel-side foundations (kprobes,
  utrace, uprobes, runtime).


Here are some areas I am no longer that worried about:

- The general approach of synthesized modules vs.  bytecode
  interpeters.  This dtrace-favouring marketing canard was brought up
  yet again ("systemtap is unstable because ...", but before I got a
  chance to rebut, Linus himself said that VMs are not a good answer
  either.  With the above "foundation robustness" problem improved,
  there will be evidence that should satisfy more skeptics.

- Markers.  There was a concensus that the kernel needs more of them.
  Well, there was quite some indecision over who should champion them,
  but a tasteful set of some dozens should be uncontroversial.  A good
  start to prime the pump would be some baby kernel-side tool that
  connects markers to an existing tracing channel, perhaps one little
  piece of lttng.

- The tool's generality.  Linus is rightly skeptical of a tool that
  aims too high and turns out to be too hard to use.  (I believe
  "piece of shit" was his shock-value opening comment.  :-) He`s also
  annoyed at the continuing proliferation of tracing widgets.  There
  was a short spurt of support for increasing the reach of tools like
  "latencytop" (speculating that it "solves the problems for 80% of
  people"), but then many people spoke of important problems that only
  a broader tool can address.


- FChE

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel summit session on systemtap
  2008-09-17 14:42 kernel summit session on systemtap Frank Ch. Eigler
@ 2008-09-17 22:14 ` Theodore Tso
  2008-09-18  0:51   ` Frank Ch. Eigler
  2008-09-18 15:28 ` Theodore Tso
  1 sibling, 1 reply; 14+ messages in thread
From: Theodore Tso @ 2008-09-17 22:14 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Wed, Sep 17, 2008 at 10:41:15AM -0400, Frank Ch. Eigler wrote:
> Here are some things we need to work more on:
> 
> - It's time to really improve & shrink debuginfo.  Enough said.

The more I've played with debuginfo, the more I've been convinced that
at least for me, the costs vastly outweight the benefits.  It causes
the time to compile the kernel (and kernel developers need to compile
the kernel a lot) to explode, just simply due to disk I/O time; if
/lib is on a separate partition, you can simply not have the space to
store the huge, vastly bloated modules.  From the benefits side, given
GCC's increasingly aggressive optimizations, being able to set
breakpoints at random lines is less important when it (a) often
doesn't work because it's been optimized out, or (b) the symbol you
want to reference isn't easily available.  Case (b) ends up being very
frustrating because you end up getting a highly confusing error
message, such as:

	semantic error: failed to retrieve location attribute for local 'sb'
	(dieoffset: 0x9cf22): identifier '$sb' at ext4-check-desk.stp:3:47

Not something that a system administrator will appreciate, never mind
the kernel developer.  It just ends up leaving the developer and or
administrator a very bad impression of Systemtap.

How could this be mitigated:

*) Promote the use of Steven Rostedt's streamline_config, telling
	people that if they decide to compile with debuginfo, they
	will very likely ***badly*** regret it unless they use a
	special config file that aggressively restricts their
	configuration in terms of not building modules they don't need
	on that system.
 
*) Maybe for kernel developers there should be some suggested patches
	 that compile the kernel with some amount of optimization
	 supressed, so that in particular, functions are never
	 inlined, and maybe in an extreme sense, optimizations are
	 disabled altogether --- or at least enough that if someone is
	 going to pay the vast cost of debuginfo, at least they will
	 get something useful out of it by actually being able to set
	 traces at arbitrary line numbers, and will hopefully be able
	 to access variables with much greater probability of success.

	 Yes, this goes against the Systemtap goal of not requiring
	 people to compile special kernels and rebooting, but if the
	 advantage of using debuginfo and being able to set
	 tracepoints at arbitrary points, at least for me, in the code
	 I've tried to instrument, I have absolutely no confidence
	 that I can set tracepoints where I want except at the
	 beginning of functions anyway.  So if I'm going to slow down
	 my compile-edit-debug cycle in the kernel by an order of
	 magnitude, say to debug some really hard problem, I want to
	 be able to really, truly and reliabily be able to set
	 tracepoints **anywhere** and be able to usefully probe
	 variables when and where I want.

*) Alternatively, if we are going to take as a given that the only
	kind of probe points that are going to be reliable is the
	beginning or end of functions (and specifically, non-static
	functions), is there some way to generate a restricted set of
	debuginfo that only gives enough information that it is
	possible to decode the types of the function parameters, but
	none of the line number information?  Maybe some way of simply
	running nm on vmliux, and then creating some kind of magically
	.c file that references all of the functions and forcing a
	single .o with DWARF information with the function and type
	information, and nothing else.  I'm not a tools person, so
	this may be a stupid way of doing it, but the basic idea is
	simply having a highly compressed debuginfo file that only has
	function parameter information, and nothing else, which
	hopefully will only be a megabyte or two instead of hundreds
	and hundreds of megabytes of debuginfo.  And to do this
	without having to write garguantuan .o files in the build
	tree, since that slows down the compile.

	I know that Systemtap can run without debuginfo, but if you
	can't decode the function arguments, at that point I would
	probably use ftrace because it's simpler than Systemtap.
	Systemtap could add a huge amount of value over ftrace, if it
	could decode function parameters without having to pay the
	cost of debuginfo.

	Quite frankly, these days the main reason why I haven't been
	playing with Systemtap much lately is because I'm tired of
	waiting for compiles to complete when compiling with
	debuginfo.  Sure, it's handy for getting line number
	information when debugging oops, but compiling with debuginfo
	is **so** painful that I'd much rather paw through
	disassembled assembly code to figure out where the system died
	when I need to analyze a kernel oops than to wait for a kernel
	compile to finish.  Pawing through assembly code takes much
	less time for me, and is much more efficient, because I'm very
	often recompiling the kernel tree.  (This is a very different
	scenario then when a distribution compiles a kernel once, on a
	build machine, and as opposed to multiple times during a
	development cycle.)

> - The tool's generality.  Linus is rightly skeptical of a tool that
>   aims too high and turns out to be too hard to use.  (I believe
>   "piece of shit" was his shock-value opening comment.  :-)


Speaking of that.... this isn't as big of a deal for kernel
developers, but if it really is true that Systemtap is aiming to be
used for System Administrators (and I believe that based on the
assumption that debuginfo management would be done by RPM macros in
the distribution packaging, and ignoring the kernel compile-edit-debug
time problem plus some of the ways Systemtap had been marketed at
events such as the Red Hat Summit), then when looking at the Systemtap
vs. Dtrace comparison chart, I have to agree with the DTrace folks;
the Systemptap projct is very much being disengenuous about some of
the items on the part, such as the comparison of speculative tracing.

The comment "(from first principles via auxiliary data and control
structures)", and the related one for thread-local variables "(from
first principles via tid-indexed auxiliary arrays)" is really lame.
Of *course* you can do anything from first principles.  A systemtap
trace is (modulo the time constraint) turing equivalent.  That's like
saying there's no need for perl, I can in principle do everything in
assembly language.  You *can*, but you might not want to.

One HUGE advantage has over DTrace is that it has certain constructs,
such as its default report generation, and speculative tracing, which
means you can do things on a single command line, i.e.:

dtrace -n 'syscall::exec*:return { trace(execname); }'

By default dtrace will print a line for each probe that fires, and if
you use the trace command, it will print the contents of the name.

Or take this example:

% dtrace -n 'syscall:::entry { @num[pid, execname] = count(); }'

This will automatically print out the number of system calls each
process (printed with pid and execname) was executed between the time
dtrace was started and when the adminsitrator hit ^C:

 3104 gnome-terminal    2
 3153 gnome-terminal    2
 3098 nautilus          3
 4804 java             10
  599 sshd             24
 8117 acroread         45
28921 dtrace           71
  113 nscd            270
28920 find           3418


You can do the same thing in systemtap, but you have to do it as a
full script, and you have to explicily have a print command in each
probe statement and you have to explicitly dump out the contents of
each assocative array.  Dtrace can supress the automatic output (using
-q), and for any long, sophisticated script, a Dtrace script probably
will do its own explicit output.  However, for a system administrator,
they can copy simple Dtrace one-liners and modify them to their needs
much more easily than what you can do under Systemtap.  Remember, most
system administrators aren't necessarily programmers!

If we are going to let distribution marketing folks to claim that
Systemtap is meant for System Admiistrators, it has to be easy to use,
and not necessarily assume deep programming skills.  (Such as
simulating thread local variables using tid's --- sorry, but that's
just LAME.  :-)

                                                        - Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel summit session on systemtap
  2008-09-17 22:14 ` Theodore Tso
@ 2008-09-18  0:51   ` Frank Ch. Eigler
  2008-09-18 15:19     ` Theodore Tso
  0 siblings, 1 reply; 14+ messages in thread
From: Frank Ch. Eigler @ 2008-09-18  0:51 UTC (permalink / raw)
  To: Theodore Tso; +Cc: systemtap

Hi -

On Wed, Sep 17, 2008 at 06:13:49PM -0400, Theodore Tso wrote:
> [...]
> > - It's time to really improve & shrink debuginfo.  Enough said.
> 
> The more I've played with debuginfo, the more I've been convinced that
> at least for me, the costs vastly outweight the benefits.  [...]

Right.  We (systemtap and associated tools folks) are working on
- improving quality (benefits) of dwarf
- shrinking dwarf dramatically
- if all else fails, dwarf subsetting

Changing kernel build flags is of course possible, but it is not our
place to mandate that.


> [...] Quite frankly, these days the main reason why I haven't been
> 	playing with Systemtap much lately is because I'm tired of
> 	waiting for compiles to complete when compiling with
> 	debuginfo.  [...]

(By the way, do you build distro-style kernels on your laptop, with
allmodconfig or somesuch, or something more linus-sized?)


> One HUGE advantage has over DTrace is that it has certain constructs,
> such as its default report generation, and speculative tracing, which
> means you can do things on a single command line, i.e.:
> 
> dtrace -n 'syscall::exec*:return { trace(execname); }'
> 
> By default dtrace will print a line for each probe that fires, and if
> you use the trace command, it will print the contents of the name.

stap -e 'probe syscall.exec* { log(name." ".execname()) }


> Or take this example:
> 
> % dtrace -n 'syscall:::entry { @num[pid, execname] = count(); }'
> 
> This will automatically print out the number of system calls each
> process (printed with pid and execname) [...]

stap -e 'probe process.syscall { num[pid(),execname()] <<< 1 } global num'


> You can do the same thing in systemtap, but you have to do it as a
> full script, and you have to explicily have a print command [...]

Your information is slightly obsolete.  We just added some such
automation, and can do more.


> (Such as simulating thread local variables using tid's --- sorry,
> but that's just LAME.  :-)

We can bring our old 'thread->FOO' / 'process->FOO' syntax back,
No big deal.


- FChE

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel summit session on systemtap
  2008-09-18  0:51   ` Frank Ch. Eigler
@ 2008-09-18 15:19     ` Theodore Tso
  2008-09-26 19:53       ` Frank Ch. Eigler
  0 siblings, 1 reply; 14+ messages in thread
From: Theodore Tso @ 2008-09-18 15:19 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Wed, Sep 17, 2008 at 08:49:02PM -0400, Frank Ch. Eigler wrote:
> Right.  We (systemtap and associated tools folks) are working on
> - improving quality (benefits) of dwarf
> - shrinking dwarf dramatically
> - if all else fails, dwarf subsetting

What do you think is the timeline for this happening?  I assume this
requires changes to gcc, right?  So would an estimate of 6-9 months,
minimum, be a fair one --- and that's assuming that a new gcc with
more aggressive optimizations that may or may not kernel the kernel
will be immediately usable for kernel builds (regardless of whether
the fault lies with the ANSI standards committeee, gcc developers'
boneheadedness, or kernel developers' boneheadedness, it's not always
the case that a new gcc is immediately trusted or even suitable for
use with the kernel.)

Given that, it might be a good idea to pursue a dwarf subsetting idea;
otherwise, many other tracing tools may make great strides and
improvements, while this particular shortcoming of systemtap means
that many kernel developers will find other ways of getting the
functionality they need and stop paying attention to Systemtap in
favor of tools that require debuginfo information.

As you may have seen by some of the horrible hacks that Steven Rostedt
pursued in order to work how gcc inserts mcount code for profiling, or
some of the other "interesting" uses/abuses of compiler toolchain,
there is no shortage of gross build kludges in the kernel.  So
something which some how manages to trick the current compiler into
emitting DWARF information so that function parameters can be decoded
could provide substantial benefits over simply using a limited set of
markers.

> Changing kernel build flags is of course possible, but it is not our
> place to mandate that.

Well, both of these ideas --- some kind of build-time kludge to create
limited DWARF information quickly using current compiler technology
and a different set of compiler optimization flags to make Systemtap
more useful --- are patches that would need to be applied to the
kernel, yes.  So it's not a question of mandating them, but rather
suggesting them as options that might make the use of Systemtap more
palatable in the short term.

In the past I've submitted patches gave an option to the kernel's
"make install" to strip out the debuginfo so that the partition
containing /lib wouldn't run out of space.  Right now I manually
install the full set of module files in /usr/lib/debug/lib/modules/... via:

	make INSTALL_MOD_STRIP=1 install_modules
	make INSTALL_MOD_PATH=/usr/lib/debug install_modules

Yeah, I should strip out the code/data segments out of what's in
/usr/lib/debug, but as a percentage of the bloat-o-rama which is the
debuginfo information, the savings just hasn't been big enough for me
to find the motivation to hack in the kernel build extension to do
this for me.

So submitting patches to make systemtap more useful isn't something
you can "mandate", but there are some of us who have used Systemtap
enough who would be willing to champion such patches, if we got some
help in crafting them in the first place.

I can implement the patch to reduce kernel optimization levels to make
debuginfo more helpful; but tricking the compiler to quickly generate
a limited set of DWARF informatoin is something I would need help
doing.

> > [...] Quite frankly, these days the main reason why I haven't been
> > 	playing with Systemtap much lately is because I'm tired of
> > 	waiting for compiles to complete when compiling with
> > 	debuginfo.  [...]
> 
> (By the way, do you build distro-style kernels on your laptop, with
> allmodconfig or somesuch, or something more linus-sized?)

I do both.  The distro-style kernels are the ones that I build with
debuginfo information, and it's been useful for playing around with
systemtap, but the moment I need do any serious development work, I
tend to fall back to a limited subset of compile options, generally
without any modules, and printk debugging.  Once we get a useful
circular buffer, I'd probably start logging to the circular buffer and
use grep as the alternative to systemtap or printk debugging.  As a
result, I've had no motivation to create any tapsets, since at least
for my own personal needs, the costs of creating the debuginfo so that
SystemTap would be useful for my personal needs just far outweighs the
benefits.

> Your information is slightly obsolete.  We just added some such
> automation, and can do more.

Glad to hear it.  I suspect then that this page:

     http://sources.redhat.com/systemtap/wiki/SystemtapDtraceComparison

is also slightly out of date.

							- Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel summit session on systemtap
  2008-09-17 14:42 kernel summit session on systemtap Frank Ch. Eigler
  2008-09-17 22:14 ` Theodore Tso
@ 2008-09-18 15:28 ` Theodore Tso
  2008-09-22 21:41   ` Roland McGrath
  1 sibling, 1 reply; 14+ messages in thread
From: Theodore Tso @ 2008-09-18 15:28 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Wed, Sep 17, 2008 at 10:41:15AM -0400, Frank Ch. Eigler wrote:
> - Stability.  Kernel crashes are an instant and long-lasting turn-off,
>   even to the point of mean laughter.  We need to urgently and deeply
>   stress-test and robustify our kernel-side foundations (kprobes,
>   utrace, uprobes, runtime).

To clarify for those who weren't there, the reason for the laughter is
that when Arjan was demonstrating the kerneloops.org project, and at
the time a crash in the utrace was #6 on the highest number of kernel
Oops reported to kerneloops.org in the preceeding seven days.  I just
checked, and the crash in utrace now has the distinction of being #3
on the most popular kernel oops for the past seven days.  So the
comments where along the lines of Fedora merged *what*?
Unfortunately, utrace making the top-10 hit parade on kerneloops.org
probably didn't help its reputation of being something that should be
merged, at least in the short term.

I assume the problem is already known and fixed, but if not:

	http://www.kerneloops.org/search.php?search=utrace_control

						- Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel summit session on systemtap
  2008-09-18 15:28 ` Theodore Tso
@ 2008-09-22 21:41   ` Roland McGrath
  0 siblings, 0 replies; 14+ messages in thread
From: Roland McGrath @ 2008-09-22 21:41 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Frank Ch. Eigler, systemtap

Just in case anyone actually cares about what the reality is behind the
perception, the "kerneloops #3" issue in fact has zero to do with the
"utrace sucks" question.  (No issue of substance was raised about utrace at
the conference, for either good or ill.)  That quip, and the kerneloops
top-10 list that provoked it, was during a general "kernel stability"
discussion.  The real cause of "kerneloops #3" is another point that was
mentioned in that discussion--kernel hackers only care about bleeding edge
upstream/rawhide.

Of that one, I am indeed thoroughly guilty (like, I think, most people in
the room).  This crash was a trivial one-line bug (typo level of depth),
which I had fixed very shortly after introducing it, and had not been in
the "current" code (i.e. my git tree, or f10/rawhide kernels) for weeks.
It was fixed well before the version of utrace patches posted to LKML this
month, and has nothing whatsoever to do with the stress-test crash cases
being discussed there.  (Those are crashes you have to try hard to get.
They are obviously valid issues to raise about merging the utrace code
upstream, but they are a thousand miles from anything that would ever rise
to the kerneloops top 10 because average users hit them frequently.)

Being behind on at least five other things far more interesting to me, I
had just let "check what is up with the F9 kernel" fall down to the bottom
of my queue and kept it quite out of my mind.  Getting laughed at by a
roomful of people to whom I was not inclined to explain the difference
between "I can't be bothered to maintain backports this month even when
they're crashing" and "I write code that sucks ass" made me spend the 15
minutes during lunch it took to remember/find the silly one-liner I'd
nearly forgotten about, and commit that fix to F-9 kernel cvs right then.

A build with that fix still hasn't made it even to F-9 updates-testing,
about which you'll have to ask cebbert (who was also in the room to feel
that derision about Fedora kernel quality).

Thanks,
Roland

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel summit session on systemtap
  2008-09-18 15:19     ` Theodore Tso
@ 2008-09-26 19:53       ` Frank Ch. Eigler
  2008-09-26 22:02         ` Theodore Tso
  0 siblings, 1 reply; 14+ messages in thread
From: Frank Ch. Eigler @ 2008-09-26 19:53 UTC (permalink / raw)
  To: Theodore Tso; +Cc: systemtap

Hi, Ted -

On Thu, Sep 18, 2008 at 11:18:43AM -0400, Theodore Tso wrote:
> [...]
> > - improving quality (benefits) of dwarf
> > - shrinking dwarf dramatically
> > - if all else fails, dwarf subsetting
> 
> What do you think is the timeline for this happening?  I assume this
> requires changes to gcc, right?  So would an estimate of 6-9 months,
> minimum, be a fair one [...]

For improving debuginfo quality, yeah.  For subsetting or compressing,
we can probably attack the problem with separate postprocessing tools
that could be ready sooner.



> "make install" to strip out the debuginfo so that the partition
> containing /lib wouldn't run out of space.  Right now I manually
> install the full set of module files in /usr/lib/debug/lib/modules/... via:
> 
> 	make INSTALL_MOD_STRIP=1 install_modules
> 	make INSTALL_MOD_PATH=/usr/lib/debug install_modules

That's a clever alternative to using the separate-debuginfo style
stripping.


> > (By the way, do you build distro-style kernels on your laptop, with
> > allmodconfig or somesuch, or something more linus-sized?)
> 
> I do both.  The distro-style kernels are the ones that I build with
> debuginfo information, and it's been useful for playing around with
> systemtap, but the moment I need do any serious development work, I
> tend to fall back to a limited subset of compile options, generally
> without any modules, and printk debugging.  

OK; is there some obstruction in the way of using systemtap on your
'serious development' kernels?


> Once we get a useful circular buffer, I'd probably start logging to
> the circular buffer and use grep as the alternative to systemtap or
> printk debugging.

OK. 


> As a result, I've had no motivation to create any tapsets, since at
> least for my own personal needs, the costs of creating the debuginfo
> so that SystemTap would be useful for my personal needs just far
> outweighs the benefits.

My recollection of the ksummit yak was that the sort of tapset that
kernel people would be willing to help write/maintain consisted of
compiled-in instrumentation like markers, or whatever event layer
comes on top of the new ringbuffer widget.  If that's done right, it
should not require debuginfo for systemtap to hook in.


> > Your information is slightly obsolete.  We just added some such
> > automation, and can do more.
> 
> Glad to hear it.  I suspect then that this page:
>      http://sources.redhat.com/systemtap/wiki/SystemtapDtraceComparison
> is also slightly out of date.

Yeah.


- FChE

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel summit session on systemtap
  2008-09-26 19:53       ` Frank Ch. Eigler
@ 2008-09-26 22:02         ` Theodore Tso
  2008-09-26 23:35           ` Roland McGrath
  0 siblings, 1 reply; 14+ messages in thread
From: Theodore Tso @ 2008-09-26 22:02 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Fri, Sep 26, 2008 at 03:51:25PM -0400, Frank Ch. Eigler wrote:
> 
> For improving debuginfo quality, yeah.  For subsetting or compressing,
> we can probably attack the problem with separate postprocessing tools
> that could be ready sooner.
> 

I'd recommend it something that could be done sooner if you really
want at least developers like me to use it when they want fast
compile, edit, debug cycles.  I think the right answer has to be not
postprocessing tools (since gcc still has to write out the gargantuan
.o files, which is most of the problem), but some way of extracting
the function type information in a way where function entry and exits
can decode the function arguments.  That's *the* one thing that
Systemtap has over ftrace; ftrace can only tell us that we've entered
a function, and not what arguments it has.  (Well, I guess the other
advantage Systemtap can have is that it if a function entry point is
called a lot, it can be more efficient about only printing information
on some kind of condition, instead of logging output at every single
trace point.  In some cases this will be critical; in other cases,
using "grep" as a postprocessing tool on tracing output can be
sufficient.)

> > > (By the way, do you build distro-style kernels on your laptop, with
> > > allmodconfig or somesuch, or something more linus-sized?)
> > 
> > I do both.  The distro-style kernels are the ones that I build with
> > debuginfo information, and it's been useful for playing around with
> > systemtap, but the moment I need do any serious development work, I
> > tend to fall back to a limited subset of compile options, generally
> > without any modules, and printk debugging.  
> 
> OK; is there some obstruction in the way of using systemtap on your
> 'serious development' kernels?

The main problem is that it takes too long to build the kernels with
-g debugging information.  If on average, the output object files grow
in size by a factor of 5, the the time it takes to build the kernel at
the end of the day tends to grow by somewhere between a factor of 3-5,
since compiles are often write-bound, especially if you are using
ccache.  I just got tired of the increased amount of time to compile
and install debuginfo, especially when more often than not I couldn't
set arbitrary trace points anyway.  Not being put them at static
function point entrypoints was a very rude wakeup call, since ext4 has
a lot of static functions.

> My recollection of the ksummit yak was that the sort of tapset that
> kernel people would be willing to help write/maintain consisted of
> compiled-in instrumentation like markers, or whatever event layer
> comes on top of the new ringbuffer widget.  If that's done right, it
> should not require debuginfo for systemtap to hook in.

No, but then in many cases it's not necessary to use systemtap,
either; we can just grep the output out of the circular ring buffer.
The only time we would need systemtap would be (a) when we can't
anticipant in advance when to put in the markers, and (b) where the
amount of tracing information is too much so to extract what we need
from the trace buffer, such that putting in compild-in conditional is
necessary.  So the real risk for Systemtap as a project is that if
people find they can solve 80-90% of their problems simply by using
ftrace plus markers and grep, there will be much less incentive to
accomodate Systemtap as potential solution that needs to be
accomodated moving forward.

The other question is how many tracepoints get added, and how quickly.
If it's only a core set of 30, then the flexibility of being able to
rely on debuginfo becomes much more important, especially if the cost
of debuginfo can be brought down significantly.  If Systemtap had a
very lightweight way of decoding function arguments without having to
build every single .o file with -g, that would be the equivalent of a
very large number of markers, and at the kernel summit, Linus very
much was against dumping in a large number of markers into the kernel;
and in particular, he didn't want to put *any* more markers or tracing
facilities in until there was tools that were simple and easy enough
to use that even kernel developers could use them could take advantage
of the existing markers and tracing facilities.

That being said, it could be that if we can get Google top-30
tracepoints, and we can figure out some cool ways Systemtap could use
those tracepoints that wouldn't necessarily be as easily by replicated
using LTTng plus grep/awk, it might be a good way of convincing people
that it's worthwhile to give Systemtap another try.

						- Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel summit session on systemtap
  2008-09-26 22:02         ` Theodore Tso
@ 2008-09-26 23:35           ` Roland McGrath
  0 siblings, 0 replies; 14+ messages in thread
From: Roland McGrath @ 2008-09-26 23:35 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Frank Ch. Eigler, systemtap

You are the only person I've ever heard raise the issue of kernel
compilation time.  At the kernel summit, nobody else mentioned this
concern to me (not that I heard from everyone), and of those I did
talk to, build times were not a concern at all, only the size of the
final debuginfo that has to be installed or distributed somehow.

Before embarking on any plans motivated by a desperate resistance to
-g, I think we should take clearer stock of where this really lies
in the list of priorities.  The plans we already understand and a
have a clear direction to go on about presume -g, and that even a
somewhat slow postprocessing stage is acceptable at least for the
first few cuts.  That approach has manifold benefits, even beyond
just systemtap's concerns, that well motivate putting our limited
hacking resources into it for a variety of long-run payoffs.

Inherently valuable as it is to satisfy Ted's preferences, I fear
descending into a tunnel-vision rat hole of -g avoidance littered
with fresh cans of worms.  If the mythical 80% of real potential
uses by all interested people are well-served by optimizing and
improving the usability of plans we already have, then whole new
swaths of effort guided solely by -g avoidance seem likely to be poor
allocations of our resources, even at the risk of Ted's disenchantment.

Thanks,
Roland

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel summit session on systemtap
  2008-09-22  9:18   ` R. J. Moore
@ 2008-09-22 14:12     ` Theodore Tso
  0 siblings, 0 replies; 14+ messages in thread
From: Theodore Tso @ 2008-09-22 14:12 UTC (permalink / raw)
  To: R. J. Moore; +Cc: systemtap

On Mon, Sep 22, 2008 at 10:17:35AM +0100, R. J. Moore wrote:
> Why should you expect this (or kdump) to be solely for kernel  
> developers? Its for debuggers, which may or may not be the same set of  
> folks. And those who concern themselves with complex live scenarios are  
> often not the same folks - at least not willingly so. This tired old  
> view that stipulates all that goes on in a kernel is for an exclusive  
> clique of kernel developers is a nonsense.

It is if the assumption is that the kernel developers are the ones
going to write and maintain the tapsets....  Who is that going to be?
I've heard many times from many systemtap developers, over many years,
from multiple companies, that they weren't kernel subsystem experts,
but rather tool people, and so they were expecting the kernel
developers to write the tapsets.

I suppose we could wait for the set of system administrators who are
willing to read and paw through kernel sources to write and maintain
tapsets, but the last time I went to LISA, I found that many system
administrators are specializing so highly that some sysadmins don't
even write perl scripts any more; to quote one of them, "I have
students/interns who do that for me."  Does anyone really think that
is a winning strategy?

							- Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel summit session on systemtap
  2008-09-18 15:50 ` Theodore Tso
  2008-09-18 16:11   ` Ananth N Mavinakayanahalli
@ 2008-09-22  9:18   ` R. J. Moore
  2008-09-22 14:12     ` Theodore Tso
  1 sibling, 1 reply; 14+ messages in thread
From: R. J. Moore @ 2008-09-22  9:18 UTC (permalink / raw)
  To: Theodore Tso; +Cc: systemtap

Theodore Tso wrote:
> I spent about a couple of hours trying to get kdump to work on my
> laptop, and then gave up.  That's another RAS tool which has mostly
> ignored by the kernel development community, and deployability and
> usability has been one of its problems...
>
>   
Never had a problem with kdump, so your comment surprises me. Have you 
documented your issues?

>> I admit that this is extreme debugging,  
>> but if system tap won't even operate without a ton of extra junk present  
>> then I see its application as being very limited indeed.   Not everyone  
>> will want to work at the assembler level, but if System Tap can, then  
>> tools can be built that do code analysis to help generate and divine  
>> probepoints. Much can be done knowing that nearly 100% of the code we  
>> probe is generated by the one tool - gcc. In theory, what is generated  
>> is deterministic, or the reverse engineering of it is.
>>     
>
> My guess is that right now, the level 3 debugging experts at places
> like Red Hat, IBM, Novell, etc., are the only people who find
> SystemTap useful.  The quick survey done at the kernel summit pretty
> conclusively showed that most kernel developers haven't tried to use
> it, ....

Why should you expect this (or kdump) to be solely for kernel 
developers? Its for debuggers, which may or may not be the same set of 
folks. And those who concern themselves with complex live scenarios are 
often not the same folks - at least not willingly so. This tired old 
view that stipulates all that goes on in a kernel is for an exclusive 
clique of kernel developers is a nonsense.


Richard


-- 
Richard J Moore
Tel: (44) 1962-817072
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel summit session on systemtap
  2008-09-18 15:50 ` Theodore Tso
@ 2008-09-18 16:11   ` Ananth N Mavinakayanahalli
  2008-09-22  9:18   ` R. J. Moore
  1 sibling, 0 replies; 14+ messages in thread
From: Ananth N Mavinakayanahalli @ 2008-09-18 16:11 UTC (permalink / raw)
  To: Theodore Tso; +Cc: R. J. Moore, systemtap

On Thu, Sep 18, 2008 at 11:46:29AM -0400, Theodore Tso wrote:
> On Thu, Sep 18, 2008 at 09:00:15AM +0100, R. J. Moore wrote:
> > I don't know whether it is possible with the latest code, but for  
> > debugging purposes, I would be happy if SystemTap could operate on  
> > external names and relative addresses - i.e. without the need to have  
> > any symbolic information. 
> 
> You can certainly set tracepoints that way.  I don't know how easy it
> would be to fetch out register information or interpret complex data
> structures living on the stack or in function parameters without debug
> information, though.  Usually if I have to do that level of analysis,
> at least at this point it's faster for me to rebuild without debuginfo
> information, drop the use of system tap, and retreat to printk
> debugging.

Or you could use just vanilla kprobes. Kprobes allows for users to specify
a symbol name string (kp.symbol_name) and an offset (kp.offset) and the
probe is inserted at that exact location. No printks and no requirement for
debuginfo :-)

...
 
> (By the way, is testing to make sure all
> of the tapsets in the tree to make sure they work still turned off
> because so many of them are still broken?  That was few months ago
> when someone on the list told me that, but that was another face-palm
> moment for me.)   

There are some probe points in tapsets (eg., the signal tapset) which
used certain function probes, which later got inlined due to compiler
optimizations. Such tests do fail.

A panacea for such cases is to migrate to a kernel marker based mechanism.

Ananth

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel summit session on systemtap
  2008-09-18  8:01 R. J. Moore
@ 2008-09-18 15:50 ` Theodore Tso
  2008-09-18 16:11   ` Ananth N Mavinakayanahalli
  2008-09-22  9:18   ` R. J. Moore
  0 siblings, 2 replies; 14+ messages in thread
From: Theodore Tso @ 2008-09-18 15:50 UTC (permalink / raw)
  To: R. J. Moore; +Cc: systemtap

On Thu, Sep 18, 2008 at 09:00:15AM +0100, R. J. Moore wrote:
> I don't know whether it is possible with the latest code, but for  
> debugging purposes, I would be happy if SystemTap could operate on  
> external names and relative addresses - i.e. without the need to have  
> any symbolic information. 

You can certainly set tracepoints that way.  I don't know how easy it
would be to fetch out register information or interpret complex data
structures living on the stack or in function parameters without debug
information, though.  Usually if I have to do that level of analysis,
at least at this point it's faster for me to rebuild without debuginfo
information, drop the use of system tap, and retreat to printk
debugging.

> This is the way I used ancestors to systemtap  
> for shooting very difficult kernel problems in the field. Generally I  
> started with a crashdump. 

I spent about a couple of hours trying to get kdump to work on my
laptop, and then gave up.  That's another RAS tool which has mostly
ignored by the kernel development community, and deployability and
usability has been one of its problems...

> I admit that this is extreme debugging,  
> but if system tap won't even operate without a ton of extra junk present  
> then I see its application as being very limited indeed.   Not everyone  
> will want to work at the assembler level, but if System Tap can, then  
> tools can be built that do code analysis to help generate and divine  
> probepoints. Much can be done knowing that nearly 100% of the code we  
> probe is generated by the one tool - gcc. In theory, what is generated  
> is deterministic, or the reverse engineering of it is.

My guess is that right now, the level 3 debugging experts at places
like Red Hat, IBM, Novell, etc., are the only people who find
SystemTap useful.  The quick survey done at the kernel summit pretty
conclusively showed that most kernel developers haven't tried to use
it, and of those who tried to use it, a much smaller percentage
succeeded (although keep in mind it Systemtap didn't even compile out
of the box for Debian/Ubuntu distributions until July, and if you are
kernel developer, you can't depend on an ancient distribution-provided
Systemtap, so that may be responsible for the small numbers of people
trying to use it).  So it's not ready for kernel developers, and I'm
pretty sure it's not ready for system administrators yet, due to the
lack of tapsets, lack of tapset documentation (especially compared to
what Dtrace has), and so on.  (By the way, is testing to make sure all
of the tapsets in the tree to make sure they work still turned off
because so many of them are still broken?  That was few months ago
when someone on the list told me that, but that was another face-palm
moment for me.)   

But based on all of that, I suspect that extreme debugging by Level 3
support folks for enterprise linux distributions are the only folks
who are best suited as SystemTap users at this point in time.

    	     	       		       	  - Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel summit session on systemtap
@ 2008-09-18  8:01 R. J. Moore
  2008-09-18 15:50 ` Theodore Tso
  0 siblings, 1 reply; 14+ messages in thread
From: R. J. Moore @ 2008-09-18  8:01 UTC (permalink / raw)
  To: systemtap

I don't know whether it is possible with the latest code, but for 
debugging purposes, I would be happy if SystemTap could operate on 
external names and relative addresses - i.e. without the need to have 
any symbolic information. This is the way I used ancestors to systemtap 
for shooting very difficult kernel problems in the field. Generally I 
started with a crashdump. Determined that I needed some extra info in a 
particular code path. Looked at the underlying assembler and plugged in 
a probepoint at the required location as a relative address to the 
beginning of the load module. I used this technique 100s and 100s of 
times to shoot those bugs that would only show in live environment with 
a complex work load patterns. I admit that this is extreme debugging, 
but if system tap won't even operate without a ton of extra junk present 
then I see its application as being very limited indeed.   Not everyone 
will want to work at the assembler level, but if System Tap can, then 
tools can be built that do code analysis to help generate and divine 
probepoints. Much can be done knowing that nearly 100% of the code we 
probe is generated by the one tool - gcc. In theory, what is generated 
is deterministic, or the reverse engineering of it is.

We should not ignore System Tap efficacy when being used to complement 
core dumps, crash dumps, kernel and application debuggers.

Richard

-- 
Richard J Moore
Tel: (44) 1962-817072
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-09-26 23:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-17 14:42 kernel summit session on systemtap Frank Ch. Eigler
2008-09-17 22:14 ` Theodore Tso
2008-09-18  0:51   ` Frank Ch. Eigler
2008-09-18 15:19     ` Theodore Tso
2008-09-26 19:53       ` Frank Ch. Eigler
2008-09-26 22:02         ` Theodore Tso
2008-09-26 23:35           ` Roland McGrath
2008-09-18 15:28 ` Theodore Tso
2008-09-22 21:41   ` Roland McGrath
2008-09-18  8:01 R. J. Moore
2008-09-18 15:50 ` Theodore Tso
2008-09-18 16:11   ` Ananth N Mavinakayanahalli
2008-09-22  9:18   ` R. J. Moore
2008-09-22 14:12     ` Theodore Tso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).