From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31877 invoked by alias); 12 Jan 2009 19:52:40 -0000 Received: (qmail 31870 invoked by uid 22791); 12 Jan 2009 19:52:39 -0000 X-SWARE-Spam-Status: No, hits=-0.0 required=5.0 tests=AWL,BAYES_50,SPF_HELO_PASS,SPF_SOFTFAIL X-Spam-Check-By: sourceware.org Received: from THUNK.ORG (HELO thunker.thunk.org) (69.25.196.29) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 12 Jan 2009 19:52:03 +0000 Received: from root (helo=closure.thunk.org) by thunker.thunk.org with local-esmtp (Exim 4.50 #1 (Debian)) id 1LMSpM-00080k-JS; Mon, 12 Jan 2009 14:52:00 -0500 Received: from tytso by closure.thunk.org with local (Exim 4.69) (envelope-from ) id 1LMSpM-00031j-0C; Mon, 12 Jan 2009 14:52:00 -0500 Date: Mon, 12 Jan 2009 19:52:00 -0000 From: Theodore Tso To: Jason Baron Cc: Masami Hiramatsu , Roland McGrath , "Frank Ch. Eigler" , systemtap@sources.redhat.com Bcc: tytso@mit.edu Subject: Re: Discussion at Linux Foundation Japan Symposium Message-ID: <20090112195159.GJ21793@mit.edu> References: <20081221003831.GG24081@redhat.com> <20081222181921.GH23723@mit.edu> <20081222203747.GA4195@redhat.com> <20081223211306.67D29FC3B7@magilla.sf.frob.com> <20081223223217.GW23723@mit.edu> <49518856.80907@redhat.com> <20090110024810.GL23869@mit.edu> <20090112190401.GE3107@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090112190401.GE3107@redhat.com> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@mit.edu X-SA-Exim-Scanned: No (on thunker.thunk.org); SAEximRunCond expanded to false X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2009-q1/txt/msg00102.txt.bz2 On Mon, Jan 12, 2009 at 02:04:01PM -0500, Jason Baron wrote: > > We have been actively looking at and adding tracepoints to the lttng > kernel tree via the ltt-dev list to support Systemtap. These tracepoints are > being added at "key" kernel points in the fs, vm, scheduler, and other > subsystems. Unfortunately, we just realized that these tracepoints are not > going to be proposed for a merge until lttng is proposed for merge. Systemtap > can not be held up by this. Huh? Last I checked Systemtap didn't support tracepoints at all. Did I miss something? And what what do you mean by "adding tracepoints to support Systemtap"? Do you mean that these would help you write better tapsets, for when Systemtap could support tracepoints? Also, the trace points won't necessarily be helpd up for merge until lttng is proposed for merge. What is necessary is a way to access those tracepoints without needing some big, harry, complex, userspace package (whether it is called Lttng or Systemtap), since said packages often are written with massive distro-dependencies, or are written in C++ so kernel developers have a hard time customizing/fixing them to meet their needs, and so on. So what Linus Torvalds and other senior kernel developers proposed at the Kernel Summit was a simple debugfs/proc interface which would allow individual activation of a tracepoint/marker, and which would dump out the data collected by that marker as a simple text file accessed via a pseudo-filesystem. This would be the "in-mainline user" of the markers/tracepoints, and would guarantee that tracepoints could be made *useful* by kernel developers using grep, awk, and perl of that text file. Simple filtering for bandwidth reasons might be done via debugfs knobs, only for the 99.9% common cases. > Therefore, I was thinking of proposing 100+ tracepoints that are > currently in the lttng tree (and not upstream, but many have already > been reviewed upstream), on lkml. Linus has basically said at the Kernel Summit that he was not going to accept new markers until there was a way to make sure that they could actually be made *useful* for real, live kernel maintainers, via this simple text interface. There were some questions about whether text or a compressed binary would be used to ship the log to userspace, but in the latter case, a simple .c file shipped with the kernel sources in the examples directory must be all that would be necessary to generate the text stream that would then be processed via grep/awk --- not a massive out-of-tree C++ program. I was able to sneak in some markers for ext4, but that's primarily because it was maintainer's discretion and ext4 isn't in widespread use and is in late-development stage, so Linus doesn't pay as close attention. :-) However, if you are going to try to get 100+ tracepoints into the core kernel, that *will* draw notice, and the first question people will ask is "what's the in-tree consume of tracepoints". I pinged Steven Rostedt at Red Hat, and he indicated that this was still on his todo list. So my recommendation to you would be to reach out to Steven Rostedt, and see if you can help with trying to get the "simple text output" enhancements to ftrace completed so it can get merged into mainline. There has already been general approval of that game plan for at the Kernel Summit, so this is basically a question of "Show Me The Code". Once this is done, getting the tracepoints you want into the kernel should be relatively straightforward. > If we also propose Systemtap specific set of > markers to interface, with these tracepoints, then Systemtap will work out of > the box with no debuginfo, no gcc changes, and be effective immediately to > filter ext4 debug information. That assumes that SystemTap can access tracepoints, but I assume that's a Small Matter of Programming. :-) > Longer term, we can look at merging markers into tracepoints, having > Systemtap directly interface with tracepoints, and merging > utrace/probes. This proposal makes Systemtap immediately more useful on > upstream kernels, while longer term issues are addressed. thoughts? Markers are probably just going to disappear. Most of the markers that were in the core kernel have already disappeared; all that's left was a handful of Markers arch-specific code, the KVM subsystem, and the ext4 subsystem. I don't think you'll be able to get the tracepoints into the main kernel until we get a way to access tracepoints directly via debugfs and getting exported output files via either a simple .c file accessing a binary log output, or simply using "cat /debugfs/..../foo.txt", but I suspect that can happen very quickly. I'll admit that once this is done, I may end up using the plain text output mechanism far more often that SystemTap, since it will be more conveient that creating stap scripts like this: probe kernel.mark("ext4_sync_file") { t = gettimeofday_ms(); printf("%d.%d:ext4_sync_file: dev %s datasync %d ino %d parent %d\n", t / 1000, t % 1000, $arg1, $arg2, $arg3, $arg4) } ... but maybe that's OK, the SystemTap developers will probably get many fewer annoying complaints from kernel developers complaining about how SystemTap doesn't work with 2.6.29-rc1, or how "xmlto pdf" or elfutils doesn't work exactly the same (or not at all) on non-RedHat distributions, etc. - Ted