* Re: kernel summit session on systemtap @ 2008-09-18 8:01 R. J. Moore 2008-09-18 15:50 ` Theodore Tso 0 siblings, 1 reply; 14+ messages in thread From: R. J. Moore @ 2008-09-18 8:01 UTC (permalink / raw) To: systemtap I don't know whether it is possible with the latest code, but for debugging purposes, I would be happy if SystemTap could operate on external names and relative addresses - i.e. without the need to have any symbolic information. This is the way I used ancestors to systemtap for shooting very difficult kernel problems in the field. Generally I started with a crashdump. Determined that I needed some extra info in a particular code path. Looked at the underlying assembler and plugged in a probepoint at the required location as a relative address to the beginning of the load module. I used this technique 100s and 100s of times to shoot those bugs that would only show in live environment with a complex work load patterns. I admit that this is extreme debugging, but if system tap won't even operate without a ton of extra junk present then I see its application as being very limited indeed. Not everyone will want to work at the assembler level, but if System Tap can, then tools can be built that do code analysis to help generate and divine probepoints. Much can be done knowing that nearly 100% of the code we probe is generated by the one tool - gcc. In theory, what is generated is deterministic, or the reverse engineering of it is. We should not ignore System Tap efficacy when being used to complement core dumps, crash dumps, kernel and application debuggers. Richard -- Richard J Moore Tel: (44) 1962-817072 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel summit session on systemtap 2008-09-18 8:01 kernel summit session on systemtap R. J. Moore @ 2008-09-18 15:50 ` Theodore Tso 2008-09-18 16:11 ` Ananth N Mavinakayanahalli 2008-09-22 9:18 ` R. J. Moore 0 siblings, 2 replies; 14+ messages in thread From: Theodore Tso @ 2008-09-18 15:50 UTC (permalink / raw) To: R. J. Moore; +Cc: systemtap On Thu, Sep 18, 2008 at 09:00:15AM +0100, R. J. Moore wrote: > I don't know whether it is possible with the latest code, but for > debugging purposes, I would be happy if SystemTap could operate on > external names and relative addresses - i.e. without the need to have > any symbolic information. You can certainly set tracepoints that way. I don't know how easy it would be to fetch out register information or interpret complex data structures living on the stack or in function parameters without debug information, though. Usually if I have to do that level of analysis, at least at this point it's faster for me to rebuild without debuginfo information, drop the use of system tap, and retreat to printk debugging. > This is the way I used ancestors to systemtap > for shooting very difficult kernel problems in the field. Generally I > started with a crashdump. I spent about a couple of hours trying to get kdump to work on my laptop, and then gave up. That's another RAS tool which has mostly ignored by the kernel development community, and deployability and usability has been one of its problems... > I admit that this is extreme debugging, > but if system tap won't even operate without a ton of extra junk present > then I see its application as being very limited indeed. Not everyone > will want to work at the assembler level, but if System Tap can, then > tools can be built that do code analysis to help generate and divine > probepoints. Much can be done knowing that nearly 100% of the code we > probe is generated by the one tool - gcc. In theory, what is generated > is deterministic, or the reverse engineering of it is. My guess is that right now, the level 3 debugging experts at places like Red Hat, IBM, Novell, etc., are the only people who find SystemTap useful. The quick survey done at the kernel summit pretty conclusively showed that most kernel developers haven't tried to use it, and of those who tried to use it, a much smaller percentage succeeded (although keep in mind it Systemtap didn't even compile out of the box for Debian/Ubuntu distributions until July, and if you are kernel developer, you can't depend on an ancient distribution-provided Systemtap, so that may be responsible for the small numbers of people trying to use it). So it's not ready for kernel developers, and I'm pretty sure it's not ready for system administrators yet, due to the lack of tapsets, lack of tapset documentation (especially compared to what Dtrace has), and so on. (By the way, is testing to make sure all of the tapsets in the tree to make sure they work still turned off because so many of them are still broken? That was few months ago when someone on the list told me that, but that was another face-palm moment for me.) But based on all of that, I suspect that extreme debugging by Level 3 support folks for enterprise linux distributions are the only folks who are best suited as SystemTap users at this point in time. - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel summit session on systemtap 2008-09-18 15:50 ` Theodore Tso @ 2008-09-18 16:11 ` Ananth N Mavinakayanahalli 2008-09-22 9:18 ` R. J. Moore 1 sibling, 0 replies; 14+ messages in thread From: Ananth N Mavinakayanahalli @ 2008-09-18 16:11 UTC (permalink / raw) To: Theodore Tso; +Cc: R. J. Moore, systemtap On Thu, Sep 18, 2008 at 11:46:29AM -0400, Theodore Tso wrote: > On Thu, Sep 18, 2008 at 09:00:15AM +0100, R. J. Moore wrote: > > I don't know whether it is possible with the latest code, but for > > debugging purposes, I would be happy if SystemTap could operate on > > external names and relative addresses - i.e. without the need to have > > any symbolic information. > > You can certainly set tracepoints that way. I don't know how easy it > would be to fetch out register information or interpret complex data > structures living on the stack or in function parameters without debug > information, though. Usually if I have to do that level of analysis, > at least at this point it's faster for me to rebuild without debuginfo > information, drop the use of system tap, and retreat to printk > debugging. Or you could use just vanilla kprobes. Kprobes allows for users to specify a symbol name string (kp.symbol_name) and an offset (kp.offset) and the probe is inserted at that exact location. No printks and no requirement for debuginfo :-) ... > (By the way, is testing to make sure all > of the tapsets in the tree to make sure they work still turned off > because so many of them are still broken? That was few months ago > when someone on the list told me that, but that was another face-palm > moment for me.) There are some probe points in tapsets (eg., the signal tapset) which used certain function probes, which later got inlined due to compiler optimizations. Such tests do fail. A panacea for such cases is to migrate to a kernel marker based mechanism. Ananth ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel summit session on systemtap 2008-09-18 15:50 ` Theodore Tso 2008-09-18 16:11 ` Ananth N Mavinakayanahalli @ 2008-09-22 9:18 ` R. J. Moore 2008-09-22 14:12 ` Theodore Tso 1 sibling, 1 reply; 14+ messages in thread From: R. J. Moore @ 2008-09-22 9:18 UTC (permalink / raw) To: Theodore Tso; +Cc: systemtap Theodore Tso wrote: > I spent about a couple of hours trying to get kdump to work on my > laptop, and then gave up. That's another RAS tool which has mostly > ignored by the kernel development community, and deployability and > usability has been one of its problems... > > Never had a problem with kdump, so your comment surprises me. Have you documented your issues? >> I admit that this is extreme debugging, >> but if system tap won't even operate without a ton of extra junk present >> then I see its application as being very limited indeed. Not everyone >> will want to work at the assembler level, but if System Tap can, then >> tools can be built that do code analysis to help generate and divine >> probepoints. Much can be done knowing that nearly 100% of the code we >> probe is generated by the one tool - gcc. In theory, what is generated >> is deterministic, or the reverse engineering of it is. >> > > My guess is that right now, the level 3 debugging experts at places > like Red Hat, IBM, Novell, etc., are the only people who find > SystemTap useful. The quick survey done at the kernel summit pretty > conclusively showed that most kernel developers haven't tried to use > it, .... Why should you expect this (or kdump) to be solely for kernel developers? Its for debuggers, which may or may not be the same set of folks. And those who concern themselves with complex live scenarios are often not the same folks - at least not willingly so. This tired old view that stipulates all that goes on in a kernel is for an exclusive clique of kernel developers is a nonsense. Richard -- Richard J Moore Tel: (44) 1962-817072 Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel summit session on systemtap 2008-09-22 9:18 ` R. J. Moore @ 2008-09-22 14:12 ` Theodore Tso 0 siblings, 0 replies; 14+ messages in thread From: Theodore Tso @ 2008-09-22 14:12 UTC (permalink / raw) To: R. J. Moore; +Cc: systemtap On Mon, Sep 22, 2008 at 10:17:35AM +0100, R. J. Moore wrote: > Why should you expect this (or kdump) to be solely for kernel > developers? Its for debuggers, which may or may not be the same set of > folks. And those who concern themselves with complex live scenarios are > often not the same folks - at least not willingly so. This tired old > view that stipulates all that goes on in a kernel is for an exclusive > clique of kernel developers is a nonsense. It is if the assumption is that the kernel developers are the ones going to write and maintain the tapsets.... Who is that going to be? I've heard many times from many systemtap developers, over many years, from multiple companies, that they weren't kernel subsystem experts, but rather tool people, and so they were expecting the kernel developers to write the tapsets. I suppose we could wait for the set of system administrators who are willing to read and paw through kernel sources to write and maintain tapsets, but the last time I went to LISA, I found that many system administrators are specializing so highly that some sysadmins don't even write perl scripts any more; to quote one of them, "I have students/interns who do that for me." Does anyone really think that is a winning strategy? - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* kernel summit session on systemtap @ 2008-09-17 14:42 Frank Ch. Eigler 2008-09-17 22:14 ` Theodore Tso 2008-09-18 15:28 ` Theodore Tso 0 siblings, 2 replies; 14+ messages in thread From: Frank Ch. Eigler @ 2008-09-17 14:42 UTC (permalink / raw) To: systemtap Hi - Yesterday morning, we had a 90-minute session on topics touching systemtap. Though there were several serious concerns raised, it went better than I had expected. Here are some things we need to work more on: - Making it dead easy for kernel guys to build and use the thing. It's hard to say how what problems they run into since only the rare bug report gets sent, but some possibilities could be: - be able to work against an un-installed kernel build tree - add buildid checking - bundle elfutils sources for the distro-challenged :-) - other ideas, please! - It's time to really improve & shrink debuginfo. Enough said. - We need to test constantly agaist linus/linux-next type git trees, at least to confirm that the runtime works. There is a perception that it breaks often, and this in turn is driving the impulse to pull the runtime into the kernel. If we can more aggressively handle the problem, the impulse would die off. We could still of course push some of the runtime upstream, but that could happen for stronger technical reasons rather than kneejerks. Linus hinted at going through akpm since he's such a pushover :-) - Stability. Kernel crashes are an instant and long-lasting turn-off, even to the point of mean laughter. We need to urgently and deeply stress-test and robustify our kernel-side foundations (kprobes, utrace, uprobes, runtime). Here are some areas I am no longer that worried about: - The general approach of synthesized modules vs. bytecode interpeters. This dtrace-favouring marketing canard was brought up yet again ("systemtap is unstable because ...", but before I got a chance to rebut, Linus himself said that VMs are not a good answer either. With the above "foundation robustness" problem improved, there will be evidence that should satisfy more skeptics. - Markers. There was a concensus that the kernel needs more of them. Well, there was quite some indecision over who should champion them, but a tasteful set of some dozens should be uncontroversial. A good start to prime the pump would be some baby kernel-side tool that connects markers to an existing tracing channel, perhaps one little piece of lttng. - The tool's generality. Linus is rightly skeptical of a tool that aims too high and turns out to be too hard to use. (I believe "piece of shit" was his shock-value opening comment. :-) He`s also annoyed at the continuing proliferation of tracing widgets. There was a short spurt of support for increasing the reach of tools like "latencytop" (speculating that it "solves the problems for 80% of people"), but then many people spoke of important problems that only a broader tool can address. - FChE ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel summit session on systemtap 2008-09-17 14:42 Frank Ch. Eigler @ 2008-09-17 22:14 ` Theodore Tso 2008-09-18 0:51 ` Frank Ch. Eigler 2008-09-18 15:28 ` Theodore Tso 1 sibling, 1 reply; 14+ messages in thread From: Theodore Tso @ 2008-09-17 22:14 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: systemtap On Wed, Sep 17, 2008 at 10:41:15AM -0400, Frank Ch. Eigler wrote: > Here are some things we need to work more on: > > - It's time to really improve & shrink debuginfo. Enough said. The more I've played with debuginfo, the more I've been convinced that at least for me, the costs vastly outweight the benefits. It causes the time to compile the kernel (and kernel developers need to compile the kernel a lot) to explode, just simply due to disk I/O time; if /lib is on a separate partition, you can simply not have the space to store the huge, vastly bloated modules. From the benefits side, given GCC's increasingly aggressive optimizations, being able to set breakpoints at random lines is less important when it (a) often doesn't work because it's been optimized out, or (b) the symbol you want to reference isn't easily available. Case (b) ends up being very frustrating because you end up getting a highly confusing error message, such as: semantic error: failed to retrieve location attribute for local 'sb' (dieoffset: 0x9cf22): identifier '$sb' at ext4-check-desk.stp:3:47 Not something that a system administrator will appreciate, never mind the kernel developer. It just ends up leaving the developer and or administrator a very bad impression of Systemtap. How could this be mitigated: *) Promote the use of Steven Rostedt's streamline_config, telling people that if they decide to compile with debuginfo, they will very likely ***badly*** regret it unless they use a special config file that aggressively restricts their configuration in terms of not building modules they don't need on that system. *) Maybe for kernel developers there should be some suggested patches that compile the kernel with some amount of optimization supressed, so that in particular, functions are never inlined, and maybe in an extreme sense, optimizations are disabled altogether --- or at least enough that if someone is going to pay the vast cost of debuginfo, at least they will get something useful out of it by actually being able to set traces at arbitrary line numbers, and will hopefully be able to access variables with much greater probability of success. Yes, this goes against the Systemtap goal of not requiring people to compile special kernels and rebooting, but if the advantage of using debuginfo and being able to set tracepoints at arbitrary points, at least for me, in the code I've tried to instrument, I have absolutely no confidence that I can set tracepoints where I want except at the beginning of functions anyway. So if I'm going to slow down my compile-edit-debug cycle in the kernel by an order of magnitude, say to debug some really hard problem, I want to be able to really, truly and reliabily be able to set tracepoints **anywhere** and be able to usefully probe variables when and where I want. *) Alternatively, if we are going to take as a given that the only kind of probe points that are going to be reliable is the beginning or end of functions (and specifically, non-static functions), is there some way to generate a restricted set of debuginfo that only gives enough information that it is possible to decode the types of the function parameters, but none of the line number information? Maybe some way of simply running nm on vmliux, and then creating some kind of magically .c file that references all of the functions and forcing a single .o with DWARF information with the function and type information, and nothing else. I'm not a tools person, so this may be a stupid way of doing it, but the basic idea is simply having a highly compressed debuginfo file that only has function parameter information, and nothing else, which hopefully will only be a megabyte or two instead of hundreds and hundreds of megabytes of debuginfo. And to do this without having to write garguantuan .o files in the build tree, since that slows down the compile. I know that Systemtap can run without debuginfo, but if you can't decode the function arguments, at that point I would probably use ftrace because it's simpler than Systemtap. Systemtap could add a huge amount of value over ftrace, if it could decode function parameters without having to pay the cost of debuginfo. Quite frankly, these days the main reason why I haven't been playing with Systemtap much lately is because I'm tired of waiting for compiles to complete when compiling with debuginfo. Sure, it's handy for getting line number information when debugging oops, but compiling with debuginfo is **so** painful that I'd much rather paw through disassembled assembly code to figure out where the system died when I need to analyze a kernel oops than to wait for a kernel compile to finish. Pawing through assembly code takes much less time for me, and is much more efficient, because I'm very often recompiling the kernel tree. (This is a very different scenario then when a distribution compiles a kernel once, on a build machine, and as opposed to multiple times during a development cycle.) > - The tool's generality. Linus is rightly skeptical of a tool that > aims too high and turns out to be too hard to use. (I believe > "piece of shit" was his shock-value opening comment. :-) Speaking of that.... this isn't as big of a deal for kernel developers, but if it really is true that Systemtap is aiming to be used for System Administrators (and I believe that based on the assumption that debuginfo management would be done by RPM macros in the distribution packaging, and ignoring the kernel compile-edit-debug time problem plus some of the ways Systemtap had been marketed at events such as the Red Hat Summit), then when looking at the Systemtap vs. Dtrace comparison chart, I have to agree with the DTrace folks; the Systemptap projct is very much being disengenuous about some of the items on the part, such as the comparison of speculative tracing. The comment "(from first principles via auxiliary data and control structures)", and the related one for thread-local variables "(from first principles via tid-indexed auxiliary arrays)" is really lame. Of *course* you can do anything from first principles. A systemtap trace is (modulo the time constraint) turing equivalent. That's like saying there's no need for perl, I can in principle do everything in assembly language. You *can*, but you might not want to. One HUGE advantage has over DTrace is that it has certain constructs, such as its default report generation, and speculative tracing, which means you can do things on a single command line, i.e.: dtrace -n 'syscall::exec*:return { trace(execname); }' By default dtrace will print a line for each probe that fires, and if you use the trace command, it will print the contents of the name. Or take this example: % dtrace -n 'syscall:::entry { @num[pid, execname] = count(); }' This will automatically print out the number of system calls each process (printed with pid and execname) was executed between the time dtrace was started and when the adminsitrator hit ^C: 3104 gnome-terminal 2 3153 gnome-terminal 2 3098 nautilus 3 4804 java 10 599 sshd 24 8117 acroread 45 28921 dtrace 71 113 nscd 270 28920 find 3418 You can do the same thing in systemtap, but you have to do it as a full script, and you have to explicily have a print command in each probe statement and you have to explicitly dump out the contents of each assocative array. Dtrace can supress the automatic output (using -q), and for any long, sophisticated script, a Dtrace script probably will do its own explicit output. However, for a system administrator, they can copy simple Dtrace one-liners and modify them to their needs much more easily than what you can do under Systemtap. Remember, most system administrators aren't necessarily programmers! If we are going to let distribution marketing folks to claim that Systemtap is meant for System Admiistrators, it has to be easy to use, and not necessarily assume deep programming skills. (Such as simulating thread local variables using tid's --- sorry, but that's just LAME. :-) - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel summit session on systemtap 2008-09-17 22:14 ` Theodore Tso @ 2008-09-18 0:51 ` Frank Ch. Eigler 2008-09-18 15:19 ` Theodore Tso 0 siblings, 1 reply; 14+ messages in thread From: Frank Ch. Eigler @ 2008-09-18 0:51 UTC (permalink / raw) To: Theodore Tso; +Cc: systemtap Hi - On Wed, Sep 17, 2008 at 06:13:49PM -0400, Theodore Tso wrote: > [...] > > - It's time to really improve & shrink debuginfo. Enough said. > > The more I've played with debuginfo, the more I've been convinced that > at least for me, the costs vastly outweight the benefits. [...] Right. We (systemtap and associated tools folks) are working on - improving quality (benefits) of dwarf - shrinking dwarf dramatically - if all else fails, dwarf subsetting Changing kernel build flags is of course possible, but it is not our place to mandate that. > [...] Quite frankly, these days the main reason why I haven't been > playing with Systemtap much lately is because I'm tired of > waiting for compiles to complete when compiling with > debuginfo. [...] (By the way, do you build distro-style kernels on your laptop, with allmodconfig or somesuch, or something more linus-sized?) > One HUGE advantage has over DTrace is that it has certain constructs, > such as its default report generation, and speculative tracing, which > means you can do things on a single command line, i.e.: > > dtrace -n 'syscall::exec*:return { trace(execname); }' > > By default dtrace will print a line for each probe that fires, and if > you use the trace command, it will print the contents of the name. stap -e 'probe syscall.exec* { log(name." ".execname()) } > Or take this example: > > % dtrace -n 'syscall:::entry { @num[pid, execname] = count(); }' > > This will automatically print out the number of system calls each > process (printed with pid and execname) [...] stap -e 'probe process.syscall { num[pid(),execname()] <<< 1 } global num' > You can do the same thing in systemtap, but you have to do it as a > full script, and you have to explicily have a print command [...] Your information is slightly obsolete. We just added some such automation, and can do more. > (Such as simulating thread local variables using tid's --- sorry, > but that's just LAME. :-) We can bring our old 'thread->FOO' / 'process->FOO' syntax back, No big deal. - FChE ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel summit session on systemtap 2008-09-18 0:51 ` Frank Ch. Eigler @ 2008-09-18 15:19 ` Theodore Tso 2008-09-26 19:53 ` Frank Ch. Eigler 0 siblings, 1 reply; 14+ messages in thread From: Theodore Tso @ 2008-09-18 15:19 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: systemtap On Wed, Sep 17, 2008 at 08:49:02PM -0400, Frank Ch. Eigler wrote: > Right. We (systemtap and associated tools folks) are working on > - improving quality (benefits) of dwarf > - shrinking dwarf dramatically > - if all else fails, dwarf subsetting What do you think is the timeline for this happening? I assume this requires changes to gcc, right? So would an estimate of 6-9 months, minimum, be a fair one --- and that's assuming that a new gcc with more aggressive optimizations that may or may not kernel the kernel will be immediately usable for kernel builds (regardless of whether the fault lies with the ANSI standards committeee, gcc developers' boneheadedness, or kernel developers' boneheadedness, it's not always the case that a new gcc is immediately trusted or even suitable for use with the kernel.) Given that, it might be a good idea to pursue a dwarf subsetting idea; otherwise, many other tracing tools may make great strides and improvements, while this particular shortcoming of systemtap means that many kernel developers will find other ways of getting the functionality they need and stop paying attention to Systemtap in favor of tools that require debuginfo information. As you may have seen by some of the horrible hacks that Steven Rostedt pursued in order to work how gcc inserts mcount code for profiling, or some of the other "interesting" uses/abuses of compiler toolchain, there is no shortage of gross build kludges in the kernel. So something which some how manages to trick the current compiler into emitting DWARF information so that function parameters can be decoded could provide substantial benefits over simply using a limited set of markers. > Changing kernel build flags is of course possible, but it is not our > place to mandate that. Well, both of these ideas --- some kind of build-time kludge to create limited DWARF information quickly using current compiler technology and a different set of compiler optimization flags to make Systemtap more useful --- are patches that would need to be applied to the kernel, yes. So it's not a question of mandating them, but rather suggesting them as options that might make the use of Systemtap more palatable in the short term. In the past I've submitted patches gave an option to the kernel's "make install" to strip out the debuginfo so that the partition containing /lib wouldn't run out of space. Right now I manually install the full set of module files in /usr/lib/debug/lib/modules/... via: make INSTALL_MOD_STRIP=1 install_modules make INSTALL_MOD_PATH=/usr/lib/debug install_modules Yeah, I should strip out the code/data segments out of what's in /usr/lib/debug, but as a percentage of the bloat-o-rama which is the debuginfo information, the savings just hasn't been big enough for me to find the motivation to hack in the kernel build extension to do this for me. So submitting patches to make systemtap more useful isn't something you can "mandate", but there are some of us who have used Systemtap enough who would be willing to champion such patches, if we got some help in crafting them in the first place. I can implement the patch to reduce kernel optimization levels to make debuginfo more helpful; but tricking the compiler to quickly generate a limited set of DWARF informatoin is something I would need help doing. > > [...] Quite frankly, these days the main reason why I haven't been > > playing with Systemtap much lately is because I'm tired of > > waiting for compiles to complete when compiling with > > debuginfo. [...] > > (By the way, do you build distro-style kernels on your laptop, with > allmodconfig or somesuch, or something more linus-sized?) I do both. The distro-style kernels are the ones that I build with debuginfo information, and it's been useful for playing around with systemtap, but the moment I need do any serious development work, I tend to fall back to a limited subset of compile options, generally without any modules, and printk debugging. Once we get a useful circular buffer, I'd probably start logging to the circular buffer and use grep as the alternative to systemtap or printk debugging. As a result, I've had no motivation to create any tapsets, since at least for my own personal needs, the costs of creating the debuginfo so that SystemTap would be useful for my personal needs just far outweighs the benefits. > Your information is slightly obsolete. We just added some such > automation, and can do more. Glad to hear it. I suspect then that this page: http://sources.redhat.com/systemtap/wiki/SystemtapDtraceComparison is also slightly out of date. - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel summit session on systemtap 2008-09-18 15:19 ` Theodore Tso @ 2008-09-26 19:53 ` Frank Ch. Eigler 2008-09-26 22:02 ` Theodore Tso 0 siblings, 1 reply; 14+ messages in thread From: Frank Ch. Eigler @ 2008-09-26 19:53 UTC (permalink / raw) To: Theodore Tso; +Cc: systemtap Hi, Ted - On Thu, Sep 18, 2008 at 11:18:43AM -0400, Theodore Tso wrote: > [...] > > - improving quality (benefits) of dwarf > > - shrinking dwarf dramatically > > - if all else fails, dwarf subsetting > > What do you think is the timeline for this happening? I assume this > requires changes to gcc, right? So would an estimate of 6-9 months, > minimum, be a fair one [...] For improving debuginfo quality, yeah. For subsetting or compressing, we can probably attack the problem with separate postprocessing tools that could be ready sooner. > "make install" to strip out the debuginfo so that the partition > containing /lib wouldn't run out of space. Right now I manually > install the full set of module files in /usr/lib/debug/lib/modules/... via: > > make INSTALL_MOD_STRIP=1 install_modules > make INSTALL_MOD_PATH=/usr/lib/debug install_modules That's a clever alternative to using the separate-debuginfo style stripping. > > (By the way, do you build distro-style kernels on your laptop, with > > allmodconfig or somesuch, or something more linus-sized?) > > I do both. The distro-style kernels are the ones that I build with > debuginfo information, and it's been useful for playing around with > systemtap, but the moment I need do any serious development work, I > tend to fall back to a limited subset of compile options, generally > without any modules, and printk debugging. OK; is there some obstruction in the way of using systemtap on your 'serious development' kernels? > Once we get a useful circular buffer, I'd probably start logging to > the circular buffer and use grep as the alternative to systemtap or > printk debugging. OK. > As a result, I've had no motivation to create any tapsets, since at > least for my own personal needs, the costs of creating the debuginfo > so that SystemTap would be useful for my personal needs just far > outweighs the benefits. My recollection of the ksummit yak was that the sort of tapset that kernel people would be willing to help write/maintain consisted of compiled-in instrumentation like markers, or whatever event layer comes on top of the new ringbuffer widget. If that's done right, it should not require debuginfo for systemtap to hook in. > > Your information is slightly obsolete. We just added some such > > automation, and can do more. > > Glad to hear it. I suspect then that this page: > http://sources.redhat.com/systemtap/wiki/SystemtapDtraceComparison > is also slightly out of date. Yeah. - FChE ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel summit session on systemtap 2008-09-26 19:53 ` Frank Ch. Eigler @ 2008-09-26 22:02 ` Theodore Tso 2008-09-26 23:35 ` Roland McGrath 0 siblings, 1 reply; 14+ messages in thread From: Theodore Tso @ 2008-09-26 22:02 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: systemtap On Fri, Sep 26, 2008 at 03:51:25PM -0400, Frank Ch. Eigler wrote: > > For improving debuginfo quality, yeah. For subsetting or compressing, > we can probably attack the problem with separate postprocessing tools > that could be ready sooner. > I'd recommend it something that could be done sooner if you really want at least developers like me to use it when they want fast compile, edit, debug cycles. I think the right answer has to be not postprocessing tools (since gcc still has to write out the gargantuan .o files, which is most of the problem), but some way of extracting the function type information in a way where function entry and exits can decode the function arguments. That's *the* one thing that Systemtap has over ftrace; ftrace can only tell us that we've entered a function, and not what arguments it has. (Well, I guess the other advantage Systemtap can have is that it if a function entry point is called a lot, it can be more efficient about only printing information on some kind of condition, instead of logging output at every single trace point. In some cases this will be critical; in other cases, using "grep" as a postprocessing tool on tracing output can be sufficient.) > > > (By the way, do you build distro-style kernels on your laptop, with > > > allmodconfig or somesuch, or something more linus-sized?) > > > > I do both. The distro-style kernels are the ones that I build with > > debuginfo information, and it's been useful for playing around with > > systemtap, but the moment I need do any serious development work, I > > tend to fall back to a limited subset of compile options, generally > > without any modules, and printk debugging. > > OK; is there some obstruction in the way of using systemtap on your > 'serious development' kernels? The main problem is that it takes too long to build the kernels with -g debugging information. If on average, the output object files grow in size by a factor of 5, the the time it takes to build the kernel at the end of the day tends to grow by somewhere between a factor of 3-5, since compiles are often write-bound, especially if you are using ccache. I just got tired of the increased amount of time to compile and install debuginfo, especially when more often than not I couldn't set arbitrary trace points anyway. Not being put them at static function point entrypoints was a very rude wakeup call, since ext4 has a lot of static functions. > My recollection of the ksummit yak was that the sort of tapset that > kernel people would be willing to help write/maintain consisted of > compiled-in instrumentation like markers, or whatever event layer > comes on top of the new ringbuffer widget. If that's done right, it > should not require debuginfo for systemtap to hook in. No, but then in many cases it's not necessary to use systemtap, either; we can just grep the output out of the circular ring buffer. The only time we would need systemtap would be (a) when we can't anticipant in advance when to put in the markers, and (b) where the amount of tracing information is too much so to extract what we need from the trace buffer, such that putting in compild-in conditional is necessary. So the real risk for Systemtap as a project is that if people find they can solve 80-90% of their problems simply by using ftrace plus markers and grep, there will be much less incentive to accomodate Systemtap as potential solution that needs to be accomodated moving forward. The other question is how many tracepoints get added, and how quickly. If it's only a core set of 30, then the flexibility of being able to rely on debuginfo becomes much more important, especially if the cost of debuginfo can be brought down significantly. If Systemtap had a very lightweight way of decoding function arguments without having to build every single .o file with -g, that would be the equivalent of a very large number of markers, and at the kernel summit, Linus very much was against dumping in a large number of markers into the kernel; and in particular, he didn't want to put *any* more markers or tracing facilities in until there was tools that were simple and easy enough to use that even kernel developers could use them could take advantage of the existing markers and tracing facilities. That being said, it could be that if we can get Google top-30 tracepoints, and we can figure out some cool ways Systemtap could use those tracepoints that wouldn't necessarily be as easily by replicated using LTTng plus grep/awk, it might be a good way of convincing people that it's worthwhile to give Systemtap another try. - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel summit session on systemtap 2008-09-26 22:02 ` Theodore Tso @ 2008-09-26 23:35 ` Roland McGrath 0 siblings, 0 replies; 14+ messages in thread From: Roland McGrath @ 2008-09-26 23:35 UTC (permalink / raw) To: Theodore Tso; +Cc: Frank Ch. Eigler, systemtap You are the only person I've ever heard raise the issue of kernel compilation time. At the kernel summit, nobody else mentioned this concern to me (not that I heard from everyone), and of those I did talk to, build times were not a concern at all, only the size of the final debuginfo that has to be installed or distributed somehow. Before embarking on any plans motivated by a desperate resistance to -g, I think we should take clearer stock of where this really lies in the list of priorities. The plans we already understand and a have a clear direction to go on about presume -g, and that even a somewhat slow postprocessing stage is acceptable at least for the first few cuts. That approach has manifold benefits, even beyond just systemtap's concerns, that well motivate putting our limited hacking resources into it for a variety of long-run payoffs. Inherently valuable as it is to satisfy Ted's preferences, I fear descending into a tunnel-vision rat hole of -g avoidance littered with fresh cans of worms. If the mythical 80% of real potential uses by all interested people are well-served by optimizing and improving the usability of plans we already have, then whole new swaths of effort guided solely by -g avoidance seem likely to be poor allocations of our resources, even at the risk of Ted's disenchantment. Thanks, Roland ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel summit session on systemtap 2008-09-17 14:42 Frank Ch. Eigler 2008-09-17 22:14 ` Theodore Tso @ 2008-09-18 15:28 ` Theodore Tso 2008-09-22 21:41 ` Roland McGrath 1 sibling, 1 reply; 14+ messages in thread From: Theodore Tso @ 2008-09-18 15:28 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: systemtap On Wed, Sep 17, 2008 at 10:41:15AM -0400, Frank Ch. Eigler wrote: > - Stability. Kernel crashes are an instant and long-lasting turn-off, > even to the point of mean laughter. We need to urgently and deeply > stress-test and robustify our kernel-side foundations (kprobes, > utrace, uprobes, runtime). To clarify for those who weren't there, the reason for the laughter is that when Arjan was demonstrating the kerneloops.org project, and at the time a crash in the utrace was #6 on the highest number of kernel Oops reported to kerneloops.org in the preceeding seven days. I just checked, and the crash in utrace now has the distinction of being #3 on the most popular kernel oops for the past seven days. So the comments where along the lines of Fedora merged *what*? Unfortunately, utrace making the top-10 hit parade on kerneloops.org probably didn't help its reputation of being something that should be merged, at least in the short term. I assume the problem is already known and fixed, but if not: http://www.kerneloops.org/search.php?search=utrace_control - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel summit session on systemtap 2008-09-18 15:28 ` Theodore Tso @ 2008-09-22 21:41 ` Roland McGrath 0 siblings, 0 replies; 14+ messages in thread From: Roland McGrath @ 2008-09-22 21:41 UTC (permalink / raw) To: Theodore Tso; +Cc: Frank Ch. Eigler, systemtap Just in case anyone actually cares about what the reality is behind the perception, the "kerneloops #3" issue in fact has zero to do with the "utrace sucks" question. (No issue of substance was raised about utrace at the conference, for either good or ill.) That quip, and the kerneloops top-10 list that provoked it, was during a general "kernel stability" discussion. The real cause of "kerneloops #3" is another point that was mentioned in that discussion--kernel hackers only care about bleeding edge upstream/rawhide. Of that one, I am indeed thoroughly guilty (like, I think, most people in the room). This crash was a trivial one-line bug (typo level of depth), which I had fixed very shortly after introducing it, and had not been in the "current" code (i.e. my git tree, or f10/rawhide kernels) for weeks. It was fixed well before the version of utrace patches posted to LKML this month, and has nothing whatsoever to do with the stress-test crash cases being discussed there. (Those are crashes you have to try hard to get. They are obviously valid issues to raise about merging the utrace code upstream, but they are a thousand miles from anything that would ever rise to the kerneloops top 10 because average users hit them frequently.) Being behind on at least five other things far more interesting to me, I had just let "check what is up with the F9 kernel" fall down to the bottom of my queue and kept it quite out of my mind. Getting laughed at by a roomful of people to whom I was not inclined to explain the difference between "I can't be bothered to maintain backports this month even when they're crashing" and "I write code that sucks ass" made me spend the 15 minutes during lunch it took to remember/find the silly one-liner I'd nearly forgotten about, and commit that fix to F-9 kernel cvs right then. A build with that fix still hasn't made it even to F-9 updates-testing, about which you'll have to ask cebbert (who was also in the room to feel that derision about Fedora kernel quality). Thanks, Roland ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-09-26 23:35 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-09-18 8:01 kernel summit session on systemtap R. J. Moore 2008-09-18 15:50 ` Theodore Tso 2008-09-18 16:11 ` Ananth N Mavinakayanahalli 2008-09-22 9:18 ` R. J. Moore 2008-09-22 14:12 ` Theodore Tso -- strict thread matches above, loose matches on Subject: below -- 2008-09-17 14:42 Frank Ch. Eigler 2008-09-17 22:14 ` Theodore Tso 2008-09-18 0:51 ` Frank Ch. Eigler 2008-09-18 15:19 ` Theodore Tso 2008-09-26 19:53 ` Frank Ch. Eigler 2008-09-26 22:02 ` Theodore Tso 2008-09-26 23:35 ` Roland McGrath 2008-09-18 15:28 ` Theodore Tso 2008-09-22 21:41 ` Roland McGrath
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).