From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12879 invoked by alias); 14 Mar 2014 13:11:31 -0000 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org Received: (qmail 12867 invoked by uid 89); 14 Mar 2014 13:11:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS,UNSUBSCRIBE_BODY autolearn=no version=3.3.2 X-HELO: mail7.hitachi.co.jp Received: from mail7.hitachi.co.jp (HELO mail7.hitachi.co.jp) (133.145.228.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 14 Mar 2014 13:11:27 +0000 Received: from mlsv5.hitachi.co.jp (unknown [133.144.234.166]) by mail7.hitachi.co.jp (Postfix) with ESMTP id 3319E37AC5; Fri, 14 Mar 2014 22:11:25 +0900 (JST) Received: from mfilter03.hitachi.co.jp by mlsv5.hitachi.co.jp (8.13.1/8.13.1) id s2EDBPaY001487; Fri, 14 Mar 2014 22:11:25 +0900 Received: from vshuts01.hitachi.co.jp (vshuts01.hitachi.co.jp [10.201.6.83]) by mfilter03.hitachi.co.jp (Switch-3.3.4/Switch-3.3.4) with ESMTP id s2EDBNAD013636; Fri, 14 Mar 2014 22:11:24 +0900 Received: from gxml20a.ad.clb.hitachi.co.jp (unknown [158.213.157.160]) by vshuts01.hitachi.co.jp (Postfix) with ESMTP id 4494B2F003A; Fri, 14 Mar 2014 22:11:23 +0900 (JST) Received: from [10.198.220.44] by gxml20a.ad.clb.hitachi.co.jp (Switch-3.1.10/Switch-3.1.9) id 62ED0BMWF0000A8B8; Fri, 14 Mar 2014 22:11:22 +0900 Message-ID: <5322FFF7.2080606@hitachi.com> Date: Fri, 14 Mar 2014 13:11:00 -0000 From: Masami Hiramatsu User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120614 Thunderbird/13.0.1 MIME-Version: 1.0 To: linux-kernel@vger.kernel.org, Ingo Molnar Cc: Andi Kleen , Ananth N Mavinakayanahalli , Sandeepa Prabhu , Frederic Weisbecker , x86@kernel.org, Steven Rostedt , fche@redhat.com, mingo@redhat.com, systemtap@sourceware.org, "H. Peter Anvin" , Thomas Gleixner Subject: Re: [PATCH -tip v8 00/26] kprobes: introduce NOKPROBE_SYMBOL, bugfixes and scalbility efforts References: <20140305115843.22766.8355.stgit@ltc230.yrl.intra.hitachi.co.jp> In-Reply-To: <20140305115843.22766.8355.stgit@ltc230.yrl.intra.hitachi.co.jp> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2014-q1/txt/msg00283.txt.bz2 Ping? :) (2014/03/05 20:58), Masami Hiramatsu wrote: > Hi, > Here is the version 8 of NOKPROBE_SYMBOL series. > > This just updates the kprobe_table hash entry size enlargement > to 512 from 4096. That also includes decoupling the hash size > of kprobes one from kretprobes one, since both has different > hash-bases. > > > Changes > ======= >>From this series, I update 1 patch; > - Enlarge kprobes hash table size to 512 instead of > 4096. > And evaluate the improvement again with new hash. > > Blacklist improvements > ====================== > Currently, kprobes uses __kprobes annotation and internal symbol- > name based blacklist to prohibit probing on some functions, because > to probe those functions may cause an infinite recursive loop by > int3/debug exceptions. > However, current mechanisms have some problems especially from the > view point of maintaining code; > - __kprobes is easy to confuse the function is > used by kprobes, despite it just means "no kprobe > on it". > - __kprobes moves functions to different section > this will be not good for cache optimization. > - symbol-name based solution is not good at all, > since the symbol name easily be changed, and > we cannot notice it. > - it doesn't support functions in modules at all. > > Thus, I decided to introduce new NOKPROBE_SYMBOL macro for building > an integrated kprobe blacklist. > > The new macro stores the address of the given symbols into > _kprobe_blacklist section, and initialize the blacklist based on the > address list at boottime. > This is also applied for modules. When loading a module, kprobes > finds the blacklist symbols in _kprobe_blacklist section in the > module automatically. > This series replaces all __kprobes on x86 and generic code with the > NOKPROBE_SYMBOL() too. > > Although, the new blacklist still support old-style __kprobes by > decoding .kprobes.text if exist, because it still be used on arch- > dependent code except for x86. > > Scalability effort > ================== > This series fixes not only the kernel crashable "qualitative" bugs > but also "quantitative" issue with massive multiple kprobes. Thus > we can now do a stress test, putting kprobes on all (non-blacklisted) > kernel functions and enabling all of them. > To set kprobes on all kernel functions, run the below script. > ---- > #!/bin/sh > TRACE_DIR=/sys/kernel/debug/tracing/ > echo > $TRACE_DIR/kprobe_events > grep -iw t /proc/kallsyms | tr -d . | \ > awk 'BEGIN{i=0};{print("p:"$3"_"i, "0x"$1); i++}' | \ > while read l; do echo $l >> $TRACE_DIR/kprobe_events ; done > ---- > Since it doesn't check the blacklist at all, you'll see many write > errors, but no problem :). > > Note that a kind of performance issue is still in the kprobe-tracer > if you trace all functions. Since a few ftrace functions are called > inside the kprobe tracer even if we shut off the tracing (tracing_on > = 0), enabling kprobe-events on the functions will cause a bad > performance impact (it is safe, but you'll see the system slowdown > and no event recorded because it is just ignored). > To find those functions, you can use the third column of > (debugfs)/tracing/kprobe_profile as below, which tells you the number > of miss-hit(ignored) for each events. If you find that some events > which have small number in 2nd column and large number in 3rd column, > those may course the slowdown. > ---- > # sort -rnk 3 (debugfs)/tracing/kprobe_profile | head > ftrace_cmp_recs_4907 264950231 33648874543 > ring_buffer_lock_reserve_5087 0 4802719935 > trace_buffer_lock_reserve_5199 0 4385319303 > trace_event_buffer_lock_reserve_5200 0 4379968153 > ftrace_location_range_4918 18944015 2407616669 > bsearch_17098 18979815 2407579741 > ftrace_location_4972 18927061 2406723128 > ftrace_int3_handler_1211 18926980 2406303531 > poke_int3_handler_199 18448012 1403516611 > inat_get_opcode_attribute_16941 0 12715314 > ---- > > I'd recommend you to enable events on such functions after all other > events enabled. Then its performance impact becomes minimum. > > To enable kprobes on all kernel functions, run the below script. > ---- > #!/bin/sh > TRACE_DIR=/sys/kernel/debug/tracing > echo "Disable tracing to remove tracing overhead" > echo 0 > $TRACE_DIR/tracing_on > > BADS="ftrace_cmp_recs ring_buffer_lock_reserve trace_buffer_lock_reserve trace_event_buffer_lock_reserve ftrace_location_range bsearch ftrace_location ftrace_int3_handler poke_int3_handler inat_get_opcode_attribute" > HIDES= > for i in $BADS; do HIDES=$HIDES" --hide=$i*"; done > > SDATE=`date +%s` > echo "Enabling trace events: start at $SDATE" > > cd $TRACE_DIR/events/kprobes/ > for i in `ls $HIDES` ; do echo 1 > $i/enable; done > for j in $BADS; do for i in `ls -d $j*`;do echo 1 > $i/enable; done; done > > EDATE=`date +%s` > TIME=`expr $EDATE - $SDATE` > echo "Elapsed time: $TIME" > ---- > Note: Perhaps, using systemtap doesn't need to consider above bad > symbols since it has own logic not to probe itself. > > Result > ====== > These were also enabled after all other events are enabled. > And it took 2254 sec(without any intervals) for enabling 37222 probes. > And at that point, the perf top showed below result: > ---- > Samples: 10K of event 'cycles', Event count (approx.): 270565996 > + 16.39% [kernel] [k] native_load_idt > + 11.17% [kernel] [k] int3 > - 7.91% [kernel] [k] 0x00007fffa018e8e0 > - 0xffffffffa018d8e0 > 59.09% trace_event_buffer_lock_reserve > kprobe_trace_func > kprobe_dispatcher > + 40.45% trace_event_buffer_lock_reserve > ---- > 0x00007fffa018e8e0 may be the trampoline buffer of an optimized > probe on trace_event_buffer_lock_reserve. native_load_idt and int3 > are also called from normal kprobes. > This means, at least my environment, kprobes now passed the > stress test, and even if we put probes on all available functions > it just slows down about 50%. > > Changes from v7: > - [24/26] Enlarge hash table to 512 instead of 4096. > - Re-evaluate the performance improvements. > > Changes from v6: > - Updated patches on the latest -tip. > - [1/26] Add patch: Fix page-fault handling logic on x86 kprobes > - [2/26] Add patch: Allow to handle reentered kprobe on singlestepping > - [9/26] Add new patch: Call exception_enter after kprobes handled > - [12/26] Allow probing fetch functions in trace_uprobe.c. > - [24/26] Add new patch: Enlarge kprobes hash table size > - [25/26] Add new patch: Kprobe cache for frequently accessd kprobes > - [26/26] Add new patch: Skip Ftrace hlist check with ftrace-based kprobe > > Changes from v5: > - [2/22] Introduce nokprobe_inline macro > - [6/22] Prohibit probing on memset/memcpy > - [11/22] Allow probing on text_poke/hw_breakpoint > - [12/22] Use nokprobe_inline macro instead of __always_inline > - [14/22] Ditto. > - [21/22] Remove preempt disable/enable from kprobes/x86 > - [22/22] Add emergency int3 recovery code > > Thank you, > > --- > > Masami Hiramatsu (26): > [BUGFIX]kprobes/x86: Fix page-fault handling logic > kprobes/x86: Allow to handle reentered kprobe on singlestepping > kprobes: Prohibit probing on .entry.text code > kprobes: Introduce NOKPROBE_SYMBOL() macro for blacklist > [BUGFIX] kprobes/x86: Prohibit probing on debug_stack_* > [BUGFIX] x86: Prohibit probing on native_set_debugreg/load_idt > [BUGFIX] x86: Prohibit probing on thunk functions and restore > kprobes/x86: Call exception handlers directly from do_int3/do_debug > x86: Call exception_enter after kprobes handled > kprobes/x86: Allow probe on some kprobe preparation functions > kprobes: Allow probe on some kprobe functions > ftrace/*probes: Allow probing on some functions > x86: Allow kprobes on text_poke/hw_breakpoint > x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation > kprobes: Use NOKPROBE_SYMBOL macro instead of __kprobes > ftrace/kprobes: Use NOKPROBE_SYMBOL macro in ftrace > notifier: Use NOKPROBE_SYMBOL macro in notifier > sched: Use NOKPROBE_SYMBOL macro in sched > kprobes: Show blacklist entries via debugfs > kprobes: Support blacklist functions in module > kprobes: Use NOKPROBE_SYMBOL() in sample modules > kprobes/x86: Use kprobe_blacklist for .kprobes.text and .entry.text > kprobes/x86: Remove unneeded preempt_disable/enable in interrupt handlers > kprobes: Enlarge hash table to 512 entries > kprobes: Introduce kprobe cache to reduce cache misshits > ftrace: Introduce FTRACE_OPS_FL_SELF_FILTER for ftrace-kprobe > > > Documentation/kprobes.txt | 24 + > arch/Kconfig | 10 > arch/x86/include/asm/asm.h | 7 > arch/x86/include/asm/kprobes.h | 2 > arch/x86/include/asm/traps.h | 2 > arch/x86/kernel/alternative.c | 3 > arch/x86/kernel/apic/hw_nmi.c | 3 > arch/x86/kernel/cpu/common.c | 4 > arch/x86/kernel/cpu/perf_event.c | 3 > arch/x86/kernel/cpu/perf_event_amd_ibs.c | 3 > arch/x86/kernel/dumpstack.c | 9 > arch/x86/kernel/entry_32.S | 33 -- > arch/x86/kernel/entry_64.S | 20 - > arch/x86/kernel/hw_breakpoint.c | 5 > arch/x86/kernel/kprobes/core.c | 162 ++++---- > arch/x86/kernel/kprobes/ftrace.c | 19 + > arch/x86/kernel/kprobes/opt.c | 32 +- > arch/x86/kernel/kvm.c | 4 > arch/x86/kernel/nmi.c | 18 + > arch/x86/kernel/paravirt.c | 6 > arch/x86/kernel/traps.c | 35 +- > arch/x86/lib/thunk_32.S | 3 > arch/x86/lib/thunk_64.S | 3 > arch/x86/mm/fault.c | 28 + > include/asm-generic/vmlinux.lds.h | 9 > include/linux/compiler.h | 2 > include/linux/ftrace.h | 3 > include/linux/kprobes.h | 23 + > include/linux/module.h | 5 > kernel/kprobes.c | 607 +++++++++++++++++++++--------- > kernel/module.c | 6 > kernel/notifier.c | 22 + > kernel/sched/core.c | 7 > kernel/trace/ftrace.c | 3 > kernel/trace/trace_event_perf.c | 5 > kernel/trace/trace_kprobe.c | 66 ++- > kernel/trace/trace_probe.c | 65 ++- > kernel/trace/trace_probe.h | 15 - > kernel/trace/trace_uprobe.c | 20 - > samples/kprobes/jprobe_example.c | 1 > samples/kprobes/kprobe_example.c | 3 > samples/kprobes/kretprobe_example.c | 2 > 42 files changed, 824 insertions(+), 478 deletions(-) > > -- > Masami HIRAMATSU > IT Management Research Dept. Linux Technology Center > Hitachi, Ltd., Yokohama Research Laboratory > E-mail: masami.hiramatsu.pt@hitachi.com > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu.pt@hitachi.com