* How to get the detailed kernel stack trace if specified fuction takes too long to finish? @ 2014-03-28 4:22 liang xie 2014-03-28 16:56 ` Frank Ch. Eigler 2014-03-28 17:06 ` Josh Stone 0 siblings, 2 replies; 8+ messages in thread From: liang xie @ 2014-03-28 4:22 UTC (permalink / raw) To: systemtap Hi all, Probably it's a newbie question, to be honest, i am new to SystemTap, currently i want to capture the kernel stack trace if "sys_write" runs too long, if that i could know where it sucks. My OS:2.6.32-220.el6.x86_64 # stap -V Systemtap translator/driver (version 1.6/0.152 non-git sources) My tried script: global i=0 global tb=0 global te=0 probe kernel.function("sys_write") { tb=gettimeofday_us(); } probe kernel.function("sys_write").return { te=gettimeofday_us(); if(te-tb > 100000) { i++; printf("time_us: %d\n", te-tb); print_backtrace(); if(i > 15) { exit(); } } } # stap -x <pid> xxx.stp time_us: 226795 Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel] Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel] time_us: 101400 Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel] Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel] time_us: 527853 Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel] Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel] time_us: 133948 Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel] Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel] It doesn't show my "expected" kernel stack trace,e.g. file system or block layer related stack trace. How to archive that? Any pointers will be highly appreciated! Best, Liang ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How to get the detailed kernel stack trace if specified fuction takes too long to finish? 2014-03-28 4:22 How to get the detailed kernel stack trace if specified fuction takes too long to finish? liang xie @ 2014-03-28 16:56 ` Frank Ch. Eigler 2014-03-28 17:06 ` Josh Stone 1 sibling, 0 replies; 8+ messages in thread From: Frank Ch. Eigler @ 2014-03-28 16:56 UTC (permalink / raw) To: liang xie; +Cc: systemtap Hi - xieliang007 wrote: > Probably it's a newbie question, to be honest, i am new to SystemTap, > currently i want to capture the kernel stack trace if "sys_write" runs > too long, if that i could know where it sucks. OK. > [...] probe kernel.function("sys_write").return [...] > print_backtrace(); > Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel] > Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel] > > It doesn't show my "expected" kernel stack trace,e.g. file system or > block layer related stack trace. The expectation is mistaken. By the time that sys_write returns, all that kernel activity is in the past, and will have left no trace on the stack. > How to archive that? Any pointers will be highly appreciated! Not easy. One way could be to sample the kernel backtrace (from probe timer.profile) while a sys_write is in effect. Attributing the backtraces to the particular thread suffering from a slow sys_write is going to be tricky, but maybe this could work: global backtraces% global active probe syscall.write { active[tid()]=1 } probe timer.profile { if (! (tid() in active)) next; backtraces[tid(), backtrace()] <<< 1 } probe syscall.write.return { delete active[tid()] elapsed = gettimeofday_ms() - @entry(gettimeofday_ms()) if (elapsed <= 4) next; // oops, leaves behind backtraces[] records printf("%s[%d] sleepy for %d ms:\n", execname(), tid(), elapsed) foreach ([t,bt] in backtraces- limit /* top */ 10) if (t == tid()) { printf("\n%d hits:\n", @count(backtraces[t,bt])) print_stack(bt) } } # stap --all-modules FOO.stp - FChE ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How to get the detailed kernel stack trace if specified fuction takes too long to finish? 2014-03-28 4:22 How to get the detailed kernel stack trace if specified fuction takes too long to finish? liang xie 2014-03-28 16:56 ` Frank Ch. Eigler @ 2014-03-28 17:06 ` Josh Stone 2014-04-02 10:25 ` liang xie 1 sibling, 1 reply; 8+ messages in thread From: Josh Stone @ 2014-03-28 17:06 UTC (permalink / raw) To: systemtap On 03/27/2014 09:22 PM, liang xie wrote: > Hi all, > > Probably it's a newbie question, to be honest, i am new to SystemTap, No problem, welcome! > currently i want to capture the kernel stack trace if "sys_write" runs > too long, if that i could know where it sucks. > > My OS:2.6.32-220.el6.x86_64 > # stap -V > Systemtap translator/driver (version 1.6/0.152 non-git sources) > > My tried script: > global i=0 > global tb=0 > global te=0 > probe kernel.function("sys_write") { > tb=gettimeofday_us(); > } > probe kernel.function("sys_write").return { > te=gettimeofday_us(); > if(te-tb > 100000) { > i++; > printf("time_us: %d\n", te-tb); > print_backtrace(); > if(i > 15) { > exit(); > } > } > } > > # stap -x <pid> xxx.stp > time_us: 226795 > Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel] > Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel] > time_us: 101400 > Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel] > Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel] > time_us: 527853 > Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel] > Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel] > time_us: 133948 > Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel] > Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel] > > It doesn't show my "expected" kernel stack trace,e.g. file system or > block layer related stack trace. How to archive that? Any pointers > will be highly appreciated! Right, since you waited until return to print_backtrace, it will only be from the end of sys_write. (Though it appears to be the beginning, sys_write+0x0, because of the way a kretprobe captures returns.) Since you want info from deeper kernel internals, you'll need additional probes. One option is to use the para-callgraph example script, where sys_write is your first argument to start a trace, and the second argument gives a broad subsystem to dig through. Like: $ stap para-callgraph.stp 'kernel.function("sys_write")' \ 'kernel.function("*@fs/*")' This script will show you timestamps, but you'll get all output whether it's slow or not. Maybe you can modify it to filter some of the information. Another option is to use timer.profile, and then you can watch for your long waits. Something like: global start_time probe syscall.write { start_time[tid()] = gettimeofday_us() } probe syscall.write.return { delete start_time[tid()] } probe timer.profile { t = tid() if (t in start_time && gettimeofday_us() - start_time[t] > 100000) { printf("long sys_write in tid %d\n", t) print_backtrace() delete start_time[t] } } Even this assumes that the long wait is busy on the cpu, but I think most likely it's an IO wait, which means you'll be scheduled off while it completes. So then instead of timer.profile, or in addition, you could use a scheduler tracepoint. Like: global start_time probe syscall.write { start_time[tid()] = gettimeofday_us() } probe syscall.write.return { delete start_time[tid()] } probe kernel.trace("sched_switch") { t = task_tid($next) if (t in start_time && gettimeofday_us() - start_time[t] > 100000) { printf("long sys_write in tid %d\n", t) print_backtrace() delete start_time[t] } } So as your thread is scheduled back onto the cpu, if it has been a long time, the backtrace will tell you what path it took to get scheduled off in the first place. Hope that's enough to give you some direction... Josh ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How to get the detailed kernel stack trace if specified fuction takes too long to finish? 2014-03-28 17:06 ` Josh Stone @ 2014-04-02 10:25 ` liang xie 2014-04-02 16:32 ` Josh Stone 0 siblings, 1 reply; 8+ messages in thread From: liang xie @ 2014-04-02 10:25 UTC (permalink / raw) To: Josh Stone; +Cc: systemtap I tried the last scheduler script, the output just like this: # stap --all-modules y.stp WARNING: missing unwind/symbol data for module 'uprobes' long sys_write in tid 10162 Systemtap probe: kernel.trace("sched_switch") long sys_write in tid 11510 Systemtap probe: kernel.trace("sched_switch") long sys_write in tid 11288 Systemtap probe: kernel.trace("sched_switch") I could not figure out why "print_backtrace()" didn't output any log:) On Sat, Mar 29, 2014 at 12:30 AM, Josh Stone <jistone@redhat.com> wrote: > On 03/27/2014 09:22 PM, liang xie wrote: >> Hi all, >> >> Probably it's a newbie question, to be honest, i am new to SystemTap, > > No problem, welcome! > >> currently i want to capture the kernel stack trace if "sys_write" runs >> too long, if that i could know where it sucks. >> >> My OS:2.6.32-220.el6.x86_64 >> # stap -V >> Systemtap translator/driver (version 1.6/0.152 non-git sources) >> >> My tried script: >> global i=0 >> global tb=0 >> global te=0 >> probe kernel.function("sys_write") { >> tb=gettimeofday_us(); >> } >> probe kernel.function("sys_write").return { >> te=gettimeofday_us(); >> if(te-tb > 100000) { >> i++; >> printf("time_us: %d\n", te-tb); >> print_backtrace(); >> if(i > 15) { >> exit(); >> } >> } >> } >> >> # stap -x <pid> xxx.stp >> time_us: 226795 >> Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel] >> Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel] >> time_us: 101400 >> Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel] >> Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel] >> time_us: 527853 >> Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel] >> Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel] >> time_us: 133948 >> Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel] >> Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel] >> >> It doesn't show my "expected" kernel stack trace,e.g. file system or >> block layer related stack trace. How to archive that? Any pointers >> will be highly appreciated! > > Right, since you waited until return to print_backtrace, it will only be > from the end of sys_write. (Though it appears to be the beginning, > sys_write+0x0, because of the way a kretprobe captures returns.) > > Since you want info from deeper kernel internals, you'll need additional > probes. One option is to use the para-callgraph example script, where > sys_write is your first argument to start a trace, and the second > argument gives a broad subsystem to dig through. Like: > > $ stap para-callgraph.stp 'kernel.function("sys_write")' \ > 'kernel.function("*@fs/*")' > > This script will show you timestamps, but you'll get all output whether > it's slow or not. Maybe you can modify it to filter some of the > information. > > Another option is to use timer.profile, and then you can watch for your > long waits. Something like: > > global start_time > probe syscall.write { start_time[tid()] = gettimeofday_us() } > probe syscall.write.return { delete start_time[tid()] } > probe timer.profile { > t = tid() > if (t in start_time && gettimeofday_us() - start_time[t] > 100000) > { > printf("long sys_write in tid %d\n", t) > print_backtrace() > delete start_time[t] > } > } > > Even this assumes that the long wait is busy on the cpu, but I think > most likely it's an IO wait, which means you'll be scheduled off while > it completes. So then instead of timer.profile, or in addition, you > could use a scheduler tracepoint. Like: > > global start_time > probe syscall.write { start_time[tid()] = gettimeofday_us() } > probe syscall.write.return { delete start_time[tid()] } > probe kernel.trace("sched_switch") { > t = task_tid($next) > if (t in start_time && gettimeofday_us() - start_time[t] > 100000) > { > printf("long sys_write in tid %d\n", t) > print_backtrace() > delete start_time[t] > } > } > > So as your thread is scheduled back onto the cpu, if it has been a long > time, the backtrace will tell you what path it took to get scheduled off > in the first place. > > > Hope that's enough to give you some direction... > > Josh ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How to get the detailed kernel stack trace if specified fuction takes too long to finish? 2014-04-02 10:25 ` liang xie @ 2014-04-02 16:32 ` Josh Stone 2014-04-03 2:47 ` liang xie 0 siblings, 1 reply; 8+ messages in thread From: Josh Stone @ 2014-04-02 16:32 UTC (permalink / raw) To: liang xie; +Cc: systemtap On 04/02/2014 03:25 AM, liang xie wrote: > I tried the last scheduler script, the output just like this: > # stap --all-modules y.stp > WARNING: missing unwind/symbol data for module 'uprobes' > long sys_write in tid 10162 > Systemtap probe: kernel.trace("sched_switch") > long sys_write in tid 11510 > Systemtap probe: kernel.trace("sched_switch") > long sys_write in tid 11288 > Systemtap probe: kernel.trace("sched_switch") > > I could not figure out why "print_backtrace()" didn't output any log:) Ah, this "Systemtap probe:" message is fairly old. >>> My OS:2.6.32-220.el6.x86_64 >>> # stap -V >>> Systemtap translator/driver (version 1.6/0.152 non-git sources) Is it possible for you to update your systemtap rpm? 1.6 was released with RHEL 6.2 (presumably CentOS 6.2 as well), and we've since released systemtap-1.7 in 6.3, 1.8 in 6.4, and 2.3 in 6.5. I believe at least systemtap-1.8 will work better for you here. Josh ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How to get the detailed kernel stack trace if specified fuction takes too long to finish? 2014-04-02 16:32 ` Josh Stone @ 2014-04-03 2:47 ` liang xie 2014-04-03 16:24 ` Josh Stone 0 siblings, 1 reply; 8+ messages in thread From: liang xie @ 2014-04-03 2:47 UTC (permalink / raw) To: Josh Stone; +Cc: systemtap Yeh, systemtap-1.8 works for me, thanks! now i can get strace trace like this with the above scheduler script: long sys_write in tid 11892 0xffffffff814ed0a8 : thread_return+0x6d6/0x77e [kernel] (inexact) 0xffffffff812c4a41 : intel_idle+0xc1/0x170 [kernel] (inexact) 0xffffffff81097c6d : sched_clock_cpu+0xcd/0x110 [kernel] (inexact) 0xffffffff81009e3e : cpu_idle+0xee/0x110 [kernel] (inexact) 0xffffffff814e5f23 : start_secondary+0x202/0x245 [kernel] (inexact) long sys_write in tid 16716 0xffffffff814ed0a8 : thread_return+0x6d6/0x77e [kernel] (inexact) 0xffffffff812c4a41 : intel_idle+0xc1/0x170 [kernel] (inexact) 0xffffffff81097c6d : sched_clock_cpu+0xcd/0x110 [kernel] (inexact) 0xffffffff81009e3e : cpu_idle+0xee/0x110 [kernel] (inexact) 0xffffffff814e5f23 : start_secondary+0x202/0x245 [kernel] (inexact) long sys_write in tid 16716 0xffffffff814ed0a8 : thread_return+0x6d6/0x77e [kernel] (inexact) 0xffffffff812c4a41 : intel_idle+0xc1/0x170 [kernel] (inexact) 0xffffffff81097c6d : sched_clock_cpu+0xcd/0x110 [kernel] (inexact) 0xffffffff81009e3e : cpu_idle+0xee/0x110 [kernel] (inexact) 0xffffffff814e5f23 : start_secondary+0x202/0x245 [kernel] (inexact) long sys_write in tid 17191 0xffffffff814ed0a8 : thread_return+0x6d6/0x77e [kernel] (inexact) 0xffffffff812c4a41 : intel_idle+0xc1/0x170 [kernel] (inexact) 0xffffffff81097c6d : sched_clock_cpu+0xcd/0x110 [kernel] (inexact) 0xffffffff81009e3e : cpu_idle+0xee/0x110 [kernel] (inexact) 0xffffffff814e5f23 : start_secondary+0x202/0x245 [kernel] (inexact) But still no interesting filesystem related stack trace be found On Thu, Apr 3, 2014 at 12:32 AM, Josh Stone <jistone@redhat.com> wrote: > On 04/02/2014 03:25 AM, liang xie wrote: >> I tried the last scheduler script, the output just like this: >> # stap --all-modules y.stp >> WARNING: missing unwind/symbol data for module 'uprobes' >> long sys_write in tid 10162 >> Systemtap probe: kernel.trace("sched_switch") >> long sys_write in tid 11510 >> Systemtap probe: kernel.trace("sched_switch") >> long sys_write in tid 11288 >> Systemtap probe: kernel.trace("sched_switch") >> >> I could not figure out why "print_backtrace()" didn't output any log:) > > Ah, this "Systemtap probe:" message is fairly old. > >>>> My OS:2.6.32-220.el6.x86_64 >>>> # stap -V >>>> Systemtap translator/driver (version 1.6/0.152 non-git sources) > > Is it possible for you to update your systemtap rpm? 1.6 was released > with RHEL 6.2 (presumably CentOS 6.2 as well), and we've since released > systemtap-1.7 in 6.3, 1.8 in 6.4, and 2.3 in 6.5. I believe at least > systemtap-1.8 will work better for you here. > > Josh ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How to get the detailed kernel stack trace if specified fuction takes too long to finish? 2014-04-03 2:47 ` liang xie @ 2014-04-03 16:24 ` Josh Stone 2014-04-08 2:15 ` liang xie 0 siblings, 1 reply; 8+ messages in thread From: Josh Stone @ 2014-04-03 16:24 UTC (permalink / raw) To: liang xie; +Cc: systemtap On 04/02/2014 07:47 PM, liang xie wrote: > Yeh, systemtap-1.8 works for me, thanks! now i can get strace trace > like this with the above scheduler script: > long sys_write in tid 11892 > 0xffffffff814ed0a8 : thread_return+0x6d6/0x77e [kernel] (inexact) > 0xffffffff812c4a41 : intel_idle+0xc1/0x170 [kernel] (inexact) > 0xffffffff81097c6d : sched_clock_cpu+0xcd/0x110 [kernel] (inexact) > 0xffffffff81009e3e : cpu_idle+0xee/0x110 [kernel] (inexact) > 0xffffffff814e5f23 : start_secondary+0x202/0x245 [kernel] (inexact) [...] > But still no interesting filesystem related stack trace be found Oh, I see, I made a mistake here: > probe kernel.trace("sched_switch") { > t = task_tid($next) The tracepoint will run when we are *about* to switch to $next, but haven't yet, so the backtrace is for the previous task. That appears to be the idle task, so at least it seems clear you're not cpu-bound. We have a 'scheduler.cpu_on' tapset which may work better. That puts a kprobe on "finish_task_switch", so it should be the right context for a backtrace. Like: global start_time probe syscall.write { start_time[tid()] = gettimeofday_us() } probe syscall.write.return { delete start_time[tid()] } probe scheduler.cpu_on { t = tid() if (t in start_time && gettimeofday_us() - start_time[t] > 100000) { printf("long sys_write in tid %d\n", t) print_backtrace() delete start_time[t] } } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How to get the detailed kernel stack trace if specified fuction takes too long to finish? 2014-04-03 16:24 ` Josh Stone @ 2014-04-08 2:15 ` liang xie 0 siblings, 0 replies; 8+ messages in thread From: liang xie @ 2014-04-08 2:15 UTC (permalink / raw) To: Josh Stone; +Cc: systemtap Hi Josh, Just report back, the last one is pretty helpful and works for my situation, coooool:) Thanks a lot! On Fri, Apr 4, 2014 at 12:24 AM, Josh Stone <jistone@redhat.com> wrote: > On 04/02/2014 07:47 PM, liang xie wrote: >> Yeh, systemtap-1.8 works for me, thanks! now i can get strace trace >> like this with the above scheduler script: >> long sys_write in tid 11892 >> 0xffffffff814ed0a8 : thread_return+0x6d6/0x77e [kernel] (inexact) >> 0xffffffff812c4a41 : intel_idle+0xc1/0x170 [kernel] (inexact) >> 0xffffffff81097c6d : sched_clock_cpu+0xcd/0x110 [kernel] (inexact) >> 0xffffffff81009e3e : cpu_idle+0xee/0x110 [kernel] (inexact) >> 0xffffffff814e5f23 : start_secondary+0x202/0x245 [kernel] (inexact) > [...] >> But still no interesting filesystem related stack trace be found > > Oh, I see, I made a mistake here: > >> probe kernel.trace("sched_switch") { >> t = task_tid($next) > > The tracepoint will run when we are *about* to switch to $next, but > haven't yet, so the backtrace is for the previous task. That appears to > be the idle task, so at least it seems clear you're not cpu-bound. > > We have a 'scheduler.cpu_on' tapset which may work better. That puts a > kprobe on "finish_task_switch", so it should be the right context for a > backtrace. Like: > > global start_time > probe syscall.write { start_time[tid()] = gettimeofday_us() } > probe syscall.write.return { delete start_time[tid()] } > probe scheduler.cpu_on { > t = tid() > if (t in start_time && gettimeofday_us() - start_time[t] > 100000) > { > printf("long sys_write in tid %d\n", t) > print_backtrace() > delete start_time[t] > } > } ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-04-08 2:15 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-03-28 4:22 How to get the detailed kernel stack trace if specified fuction takes too long to finish? liang xie 2014-03-28 16:56 ` Frank Ch. Eigler 2014-03-28 17:06 ` Josh Stone 2014-04-02 10:25 ` liang xie 2014-04-02 16:32 ` Josh Stone 2014-04-03 2:47 ` liang xie 2014-04-03 16:24 ` Josh Stone 2014-04-08 2:15 ` liang xie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).