From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 860 invoked by alias); 6 Jun 2006 21:50:10 -0000 Received: (qmail 793 invoked by uid 22791); 6 Jun 2006 21:50:09 -0000 X-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00,SPF_PASS X-Spam-Check-By: sourceware.org Received: from e3.ny.us.ibm.com (HELO e3.ny.us.ibm.com) (32.97.182.143) by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 06 Jun 2006 21:50:06 +0000 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e3.ny.us.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id k56Lo3R2013141 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 6 Jun 2006 17:50:03 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.13.6/NCO/VER7.0) with ESMTP id k56Lo2KB260248 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 6 Jun 2006 17:50:02 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id k56Lo1Xe001702 for ; Tue, 6 Jun 2006 17:50:02 -0400 Received: from dyn9047018079.beaverton.ibm.com (dyn9047018079.beaverton.ibm.com [9.47.18.79]) by d01av02.pok.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id k56Lo1J7001606; Tue, 6 Jun 2006 17:50:01 -0400 Subject: Re: user-space probes -- plan B from outer space From: Jim Keniston To: "Frank Ch. Eigler" Cc: SystemTAP In-Reply-To: References: Content-Type: text/plain Organization: Message-Id: <1149630600.2852.41.camel@dyn9047018079.beaverton.ibm.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: Tue, 06 Jun 2006 21:50:00 -0000 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2006-q2/txt/msg00572.txt.bz2 On Tue, 2006-06-06 at 12:07, Frank Ch. Eigler wrote: > Hi - > > Here is an outline of how systemtap might support user-space probes, > even in the absence of kernel-based user kprobes. This is a "plan B" > only, a desperate stopgap until lkml sees the light. Maybe "plan Z" > is more appropriate, considering the limitations I'm about to outline. I'm just now prototyping something very much like what you've described. See below for more info. > > The idea is to support limited systemtap scripts that refer only to > user-space probe targets such as existing processes. These scripts > would be translated to a user-space probe program instead of a kernel > probe module. I was thinking a user-mode (instrumentation) program + a kernel module that defines handlers that could be invoked from the instrumentation program. The latter (which requires kernel enhancements) is necessary only for convenient & efficient coordination of user-space and kernel-space instrumentation. (But that's what we're after, right?) > > Probes would be specified with a probe point syntax such as: > > user.process(233).statement(0xfeedface) > user("fche").process("/bin/vi").function("*init*") > > Instead of kprobes of a probe module, this probe program would use > ptrace to insert breakpoints into any target processes, Got that running, although the API needs to be generalized. > perhaps using > code from RDA or GDB. Given the process-id or process name, systemtap > should be able to locate the necessary debugging information at > translation time. When probes are hit, the probe process would run > the compiled probe handlers in much the same way as now. Access to > $target vars should be possible. The runtime code would have to have > a new variant to use some user-level facility (plain pipes?) to > communicate with the front-end. Comm with front end not undertaken yet. > > > Q: Wouldn't this be slow? > A: Oh yes, quite. Several ptrace context-switch round-trips per > probe hit. Lots more if we want to pull out target-side > state like $variables or stack backtraces. Yes, pretty slow. In my prototype, my user-mode handler just increments a counter. On my Pentium M, overhead per probepoint hit is ~14.2 usec, compared with 1.03 usec for the uprobes version last posted to LKML. For comparison, using "gdb -batch" to do the same thing cost 111 usec per hit, and tracing one syscall with strace cost ~10 usec per hit. (Of course, strace can be more efficient than ad hoc probing because ptrace has special support for syscall tracing; and a C-code handler can do all sorts of things that a gdb command-script can't.) > > Q: What about concurrency? > A: You mean like probes concurrently hit in several target processes, > like SMP kprobes? If there was any indication that this was > worthwhile, then we could make the systemtap-generated probe > process be multi-threaded (one probe thread per target thread). Yes. I haven't taken that on. > > Q: Any other limitations? > A: Because of ptrace, any process can be supervised by only one > process at a time. So if you run systemtap on a user process, > you won't be able to run gdb or another systemtap session on it. Yes. > > Q: What about probing the kernel and user space together? > A: Maybe this scheme would work if kernel-space systemtap probes > run concurrently, and arrange to share systemtap globals with > userspace somehow (mmap?). Shared variables like this would > likely cause many more locking timeouts (=> skipped probes) > than now. There are also additional security concerns. My proposed approach to user/kernel data sharing is a new system call or ptrace request that just passes a pid, a handler ID, and a pointer to an area in user space that the handler (installed via a kernel module) can read and/or write. Again, there are security concerns. But a Bad Guy would have to have the help of somebody who has permission to install the module. If all kernel/user comm were initiated by the instrumentation program, the kernel handler could sleep as needed. > > Q: What about probing shared libraries? > A: Because of the way ptrace works, we'd have to turn these into > process-level probes, including probes that just sit around > monitoring the threads and all their children to dlopen/mmap > the named libraries. Yes. > > Q: Is it worth it to try? Is there a better way? > A: You tell me. There are certainly ways that perform better and lack some of these limitations. Selling them on LKML is another matter. An "incremental approach" might enhance ptrace to reduce probepoint overhead -- e.g., let the kernel handle single-stepping and continuing all in one ptrace call. > > > - FChE Jim