From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29660 invoked by alias); 17 Sep 2008 22:14:30 -0000 Received: (qmail 29649 invoked by uid 22791); 17 Sep 2008 22:14:28 -0000 X-Spam-Status: No, hits=-0.3 required=5.0 tests=AWL,BAYES_50,SPF_HELO_PASS,SPF_SOFTFAIL X-Spam-Check-By: sourceware.org Received: from www.church-of-our-saviour.ORG (HELO thunker.thunk.org) (69.25.196.31) by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 17 Sep 2008 22:13:54 +0000 Received: from root (helo=closure.thunk.org) by thunker.thunk.org with local-esmtp (Exim 4.50 #1 (Debian)) id 1Kg5HT-0001NN-5B; Wed, 17 Sep 2008 18:13:51 -0400 Received: from tytso by closure.thunk.org with local (Exim 4.69) (envelope-from ) id 1Kg5HR-0001sB-5k; Wed, 17 Sep 2008 18:13:49 -0400 Date: Wed, 17 Sep 2008 22:14:00 -0000 From: Theodore Tso To: "Frank Ch. Eigler" Cc: systemtap@sources.redhat.com Bcc: tytso@mit.edu Subject: Re: kernel summit session on systemtap Message-ID: <20080917221349.GA939@mit.edu> References: <20080917144115.GA10231@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080917144115.GA10231@redhat.com> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@mit.edu X-SA-Exim-Scanned: No (on thunker.thunk.org); SAEximRunCond expanded to false X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2008-q3/txt/msg00683.txt.bz2 On Wed, Sep 17, 2008 at 10:41:15AM -0400, Frank Ch. Eigler wrote: > Here are some things we need to work more on: > > - It's time to really improve & shrink debuginfo. Enough said. The more I've played with debuginfo, the more I've been convinced that at least for me, the costs vastly outweight the benefits. It causes the time to compile the kernel (and kernel developers need to compile the kernel a lot) to explode, just simply due to disk I/O time; if /lib is on a separate partition, you can simply not have the space to store the huge, vastly bloated modules. From the benefits side, given GCC's increasingly aggressive optimizations, being able to set breakpoints at random lines is less important when it (a) often doesn't work because it's been optimized out, or (b) the symbol you want to reference isn't easily available. Case (b) ends up being very frustrating because you end up getting a highly confusing error message, such as: semantic error: failed to retrieve location attribute for local 'sb' (dieoffset: 0x9cf22): identifier '$sb' at ext4-check-desk.stp:3:47 Not something that a system administrator will appreciate, never mind the kernel developer. It just ends up leaving the developer and or administrator a very bad impression of Systemtap. How could this be mitigated: *) Promote the use of Steven Rostedt's streamline_config, telling people that if they decide to compile with debuginfo, they will very likely ***badly*** regret it unless they use a special config file that aggressively restricts their configuration in terms of not building modules they don't need on that system. *) Maybe for kernel developers there should be some suggested patches that compile the kernel with some amount of optimization supressed, so that in particular, functions are never inlined, and maybe in an extreme sense, optimizations are disabled altogether --- or at least enough that if someone is going to pay the vast cost of debuginfo, at least they will get something useful out of it by actually being able to set traces at arbitrary line numbers, and will hopefully be able to access variables with much greater probability of success. Yes, this goes against the Systemtap goal of not requiring people to compile special kernels and rebooting, but if the advantage of using debuginfo and being able to set tracepoints at arbitrary points, at least for me, in the code I've tried to instrument, I have absolutely no confidence that I can set tracepoints where I want except at the beginning of functions anyway. So if I'm going to slow down my compile-edit-debug cycle in the kernel by an order of magnitude, say to debug some really hard problem, I want to be able to really, truly and reliabily be able to set tracepoints **anywhere** and be able to usefully probe variables when and where I want. *) Alternatively, if we are going to take as a given that the only kind of probe points that are going to be reliable is the beginning or end of functions (and specifically, non-static functions), is there some way to generate a restricted set of debuginfo that only gives enough information that it is possible to decode the types of the function parameters, but none of the line number information? Maybe some way of simply running nm on vmliux, and then creating some kind of magically .c file that references all of the functions and forcing a single .o with DWARF information with the function and type information, and nothing else. I'm not a tools person, so this may be a stupid way of doing it, but the basic idea is simply having a highly compressed debuginfo file that only has function parameter information, and nothing else, which hopefully will only be a megabyte or two instead of hundreds and hundreds of megabytes of debuginfo. And to do this without having to write garguantuan .o files in the build tree, since that slows down the compile. I know that Systemtap can run without debuginfo, but if you can't decode the function arguments, at that point I would probably use ftrace because it's simpler than Systemtap. Systemtap could add a huge amount of value over ftrace, if it could decode function parameters without having to pay the cost of debuginfo. Quite frankly, these days the main reason why I haven't been playing with Systemtap much lately is because I'm tired of waiting for compiles to complete when compiling with debuginfo. Sure, it's handy for getting line number information when debugging oops, but compiling with debuginfo is **so** painful that I'd much rather paw through disassembled assembly code to figure out where the system died when I need to analyze a kernel oops than to wait for a kernel compile to finish. Pawing through assembly code takes much less time for me, and is much more efficient, because I'm very often recompiling the kernel tree. (This is a very different scenario then when a distribution compiles a kernel once, on a build machine, and as opposed to multiple times during a development cycle.) > - The tool's generality. Linus is rightly skeptical of a tool that > aims too high and turns out to be too hard to use. (I believe > "piece of shit" was his shock-value opening comment. :-) Speaking of that.... this isn't as big of a deal for kernel developers, but if it really is true that Systemtap is aiming to be used for System Administrators (and I believe that based on the assumption that debuginfo management would be done by RPM macros in the distribution packaging, and ignoring the kernel compile-edit-debug time problem plus some of the ways Systemtap had been marketed at events such as the Red Hat Summit), then when looking at the Systemtap vs. Dtrace comparison chart, I have to agree with the DTrace folks; the Systemptap projct is very much being disengenuous about some of the items on the part, such as the comparison of speculative tracing. The comment "(from first principles via auxiliary data and control structures)", and the related one for thread-local variables "(from first principles via tid-indexed auxiliary arrays)" is really lame. Of *course* you can do anything from first principles. A systemtap trace is (modulo the time constraint) turing equivalent. That's like saying there's no need for perl, I can in principle do everything in assembly language. You *can*, but you might not want to. One HUGE advantage has over DTrace is that it has certain constructs, such as its default report generation, and speculative tracing, which means you can do things on a single command line, i.e.: dtrace -n 'syscall::exec*:return { trace(execname); }' By default dtrace will print a line for each probe that fires, and if you use the trace command, it will print the contents of the name. Or take this example: % dtrace -n 'syscall:::entry { @num[pid, execname] = count(); }' This will automatically print out the number of system calls each process (printed with pid and execname) was executed between the time dtrace was started and when the adminsitrator hit ^C: 3104 gnome-terminal 2 3153 gnome-terminal 2 3098 nautilus 3 4804 java 10 599 sshd 24 8117 acroread 45 28921 dtrace 71 113 nscd 270 28920 find 3418 You can do the same thing in systemtap, but you have to do it as a full script, and you have to explicily have a print command in each probe statement and you have to explicitly dump out the contents of each assocative array. Dtrace can supress the automatic output (using -q), and for any long, sophisticated script, a Dtrace script probably will do its own explicit output. However, for a system administrator, they can copy simple Dtrace one-liners and modify them to their needs much more easily than what you can do under Systemtap. Remember, most system administrators aren't necessarily programmers! If we are going to let distribution marketing folks to claim that Systemtap is meant for System Admiistrators, it has to be easy to use, and not necessarily assume deep programming skills. (Such as simulating thread local variables using tid's --- sorry, but that's just LAME. :-) - Ted