From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26632 invoked by alias); 22 Jan 2010 19:08:38 -0000 Received: (qmail 26610 invoked by uid 22791); 22 Jan 2010 19:08:36 -0000 X-SWARE-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00 X-Spam-Check-By: sourceware.org Received: from tomts13.bellnexxia.net (HELO tomts13-srv.bellnexxia.net) (209.226.175.34) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 22 Jan 2010 19:08:31 +0000 Received: from toip4.srvr.bell.ca ([209.226.175.87]) by tomts13-srv.bellnexxia.net (InterMail vM.5.01.06.13 201-253-122-130-113-20050324) with ESMTP id <20100122190825.NGND20196.tomts13-srv.bellnexxia.net@toip4.srvr.bell.ca> for ; Fri, 22 Jan 2010 14:08:25 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArsEAAqAWUuuWOiG/2dsb2JhbACBRddzgimCEwSDIQ Received: from bas6-montreal19-2925062278.dsl.bell.ca (HELO krystal.dyndns.org) ([174.88.232.134]) by toip4.srvr.bell.ca with ESMTP; 22 Jan 2010 14:29:39 -0500 Received: from localhost (localhost [127.0.0.1]) (uid 1000) by krystal.dyndns.org with local; Fri, 22 Jan 2010 13:58:16 -0500 id 001FA19E.4B59F548.0000561F Date: Fri, 22 Jan 2010 19:08:00 -0000 From: Mathieu Desnoyers To: Masami Hiramatsu Cc: Frederic Weisbecker , Ingo Molnar , Ananth N Mavinakayanahalli , lkml , Jim Keniston , Srikar Dronamraju , Christoph Hellwig , Steven Rostedt , "H. Peter Anvin" , Anders Kaseorg , Tim Abbott , Andi Kleen , Jason Baron , systemtap , DLE Subject: Re: [PATCH -tip v8 0/9] kprobes: Kprobes jump optimization support Message-ID: <20100122185816.GB25202@Krystal> References: <20100122185450.9022.87506.stgit@dhcp-100-2-132.bos.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: <20100122185450.9022.87506.stgit@dhcp-100-2-132.bos.redhat.com> X-Editor: vi User-Agent: Mutt/1.5.18 (2008-05-17) X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2010-q1/txt/msg00239.txt.bz2 * Masami Hiramatsu (mhiramat@redhat.com) wrote: > Hi, > > Here are the patchset of the kprobes jump optimization v8 > (a.k.a. Djprobe). This version is just moving onto > 2.6.33-rc4-tip. Ingo, I assume its a good timing to > push this code onto -tip tree (maybe developing branch?), > since people can test it with perf-probe. > > I've decided to make a separated series of patches of > jump optimization with text_poke_smp() which is > 'officially' supported on Intel's processors. > So, this version of patches are just updated against > the latest tip/master, no other updates are included. > > I know that int3-bypassing method (text_poke_fixup()) > is currently unofficially believed as safe. But we > need to get more official answers from x86 vendors. > Moreover, we need to tweak entry_*.S for preventing > recursive NMI, because int3 inside NMI handler will > unblock NMI blocking. I'd like to push it after this > series of patches are merged. > > Anyway, thanks Mathieu and Peter, for helping me to > implement it and organizing discussion points about > int3-bypass XMC! > > These patches can be applied on the latest -tip. > > Changes in v8: > - Update patches against the latest tip/master. > - Drop text_poke_fixup() related patches. > - Update benchmark results and add jprobes and kprobe(post-handler) > results. > > And kprobe stress test didn't found any regressions - from kprobes, > under kvm/x86. > > TODO: > - Support NMI-safe int3-bypassing text_poke. Please have a look at: "x86 NMI-safe INT3 and Page Fault" http://git.kernel.org/?p=linux/kernel/git/compudj/linux-2.6-lttng.git;a=commit;h=90516e3c718e0502f6f2eb616fad4447645ca47d and "x86_64 page fault NMI-safe" http://git.kernel.org/?p=linux/kernel/git/compudj/linux-2.6-lttng.git;a=commit;h=ad1bf11a68c35a44edd8d686a0842896f408e17c That turns this TODO into the "done" section ;) I've been using these patches in the lttng tree for 1-2 years. Thanks, Mathieu > - Support preemptive kernel (by stack unwinding and checking address). > > > Jump Optimized Kprobes > ====================== > o Concept > Kprobes uses the int3 breakpoint instruction on x86 for instrumenting > probes into running kernel. Jump optimization allows kprobes to replace > breakpoint with a jump instruction for reducing probing overhead drastically. > > o Performance > An optimized kprobe 5 times faster than a kprobe. > > Optimizing probes gains its performance. Usually, a kprobe hit takes > 0.5 to 1.0 microseconds to process. On the other hand, a jump optimized > probe hit takes less than 0.1 microseconds (actual number depends on the > processor). Here is a sample overheads. > > Intel(R) Xeon(R) CPU E5410 @ 2.33GHz > (without debugging options, with text_poke_smp patch, 2.6.33-rc4-tip+) > > x86-32 x86-64 > kprobe: 0.80us 0.99us > kprobe+booster: 0.33us 0.43us > kprobe+optimized: 0.05us 0.06us > kprobe(post-handler): 0.81us 1.00us > > kretprobe : 1.10us 1.24us > kretprobe+booster: 0.61us 0.68us > kretprobe+optimized: 0.33us 0.30us > > jprobe: 1.37us 1.67us > jprobe+booster: 0.80us 1.10us > > (booster skips single-stepping, kprobe with post handler > isn't boosted/optimized, and jprobe isn't optimized.) > > Note that jump optimization also consumes more memory, but not so much. > It just uses ~200 bytes, so, even if you use ~10,000 probes, it just > consumes a few MB. > > > o Usage > Set CONFIG_OPTPROBES=y when building a kernel, then all *probes will be > optimized if possible. > > Kprobes decodes probed function and checks whether the target instructions > can be optimized(replaced with a jump) safely. If it can't be, Kprobes just > doesn't optimize it. > > > o Optimization > Before preparing optimization, Kprobes inserts original(user-defined) > kprobe on the specified address. So, even if the kprobe is not > possible to be optimized, it just uses a normal kprobe. > > - Safety check > First, Kprobes gets the address of probed function and checks whether the > optimized region, which will be replaced by a jump instruction, does NOT > straddle the function boundary, because if the optimized region reaches the > next function, its caller causes unexpected results. > Next, Kprobes decodes whole body of probed function and checks there is > NO indirect jump, NO instruction which will cause exception by checking > exception_tables (this will jump to fixup code and fixup code jumps into > same function body) and NO near jump which jumps into the optimized region > (except the 1st byte of jump), because if some jump instruction jumps > into the middle of another instruction, it causes unexpected results too. > Kprobes also measures the length of instructions which will be replaced > by a jump instruction, because a jump instruction is longer than 1 byte, > it may replaces multiple instructions, and it checks whether those > instructions can be executed out-of-line. > > - Preparing detour code > Then, Kprobes prepares "detour" buffer, which contains exception emulating > code (push/pop registers, call handler), copied instructions(Kprobes copies > instructions which will be replaced by a jump, to the detour buffer), and > a jump which jumps back to the original execution path. > > - Pre-optimization > After preparing detour code, Kprobes enqueues the kprobe to optimizing list > and kicks kprobe-optimizer workqueue to optimize it. To wait other optimized > probes, kprobe-optimizer will delay to work. > When the optimized-kprobe is hit before optimization, its handler > changes IP(instruction pointer) to copied code and exits. So, those > copied instructions are executed on the detour buffer. > > - Optimization > Kprobe-optimizer doesn't start instruction-replacing soon, it waits > synchronize_sched for safety, because some processors are possible to be > interrupted on the middle of instruction series (2nd or Nth instruction) > which will be replaced by a jump instruction(*). > As you know, synchronize_sched() can ensure that all interruptions which > were executed when synchronize_sched() was called are done, only if > CONFIG_PREEMPT=n. So, this version supports only the kernel with > CONFIG_PREEMPT=n.(**) > After that, kprobe-optimizer calls stop_machine() to replace probed- > instructions with a jump instruction by using text_poke_smp(). > > - Unoptimization > When unregistering, disabling kprobe or being blocked by other kprobe, > an optimized-kprobe will be unoptimized. Before kprobe-optimizer runs, > the kprobe just be dequeued from the optimized list. When the optimization > has been done, it replaces a jump with int3 breakpoint and original code > by using text_poke_smp(). > > (*)Please imagine that 2nd instruction is interrupted and > optimizer replaces the 2nd instruction with jump *address* > while the interrupt handler is running. When the interrupt > returns to original address, there is no valid instructions > and it causes unexpected result. > > (**)This optimization-safety checking may be replaced with stop-machine > method which ksplice is done for supporting CONFIG_PREEMPT=y kernel. > > > Thank you, > > --- > > Masami Hiramatsu (9): > kprobes: Add documents of jump optimization > kprobes/x86: Support kprobes jump optimization on x86 > x86: Add text_poke_smp for SMP cross modifying code > kprobes/x86: Cleanup save/restore registers > kprobes/x86: Boost probes when reentering > kprobes: Jump optimization sysctl interface > kprobes: Introduce kprobes jump optimization > kprobes: Introduce generic insn_slot framework > kprobes/x86: Cleanup RELATIVEJUMP_INSTRUCTION to RELATIVEJUMP_OPCODE > > > Documentation/kprobes.txt | 191 ++++++++++- > arch/Kconfig | 13 + > arch/x86/Kconfig | 1 > arch/x86/include/asm/alternative.h | 4 > arch/x86/include/asm/kprobes.h | 31 ++ > arch/x86/kernel/alternative.c | 60 +++ > arch/x86/kernel/kprobes.c | 596 ++++++++++++++++++++++++++++------ > include/linux/kprobes.h | 44 +++ > kernel/kprobes.c | 626 +++++++++++++++++++++++++++++++----- > kernel/sysctl.c | 12 + > 10 files changed, 1373 insertions(+), 205 deletions(-) > > -- > Masami Hiramatsu > > Software Engineer > Hitachi Computer Products (America), Inc. > Software Solutions Division > > e-mail: mhiramat@redhat.com > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68