Re: How to get the detailed kernel stack trace if specified fuction takes too long to finish?

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

From: liang xie <xieliang007@gmail.com>
To: Josh Stone <jistone@redhat.com>
Cc: systemtap@sourceware.org
Subject: Re: How to get the detailed kernel stack trace if specified fuction takes too long to finish?
Date: Wed, 02 Apr 2014 10:25:00 -0000	[thread overview]
Message-ID: <CADu=CFrnaoiWcD8Jt8uFd1Q087Kyp28gKukKH_JD7craJeF3xw@mail.gmail.com> (raw)
In-Reply-To: <5335A3A6.8010300@redhat.com>

I tried the last scheduler script, the output just like this:
# stap --all-modules y.stp
WARNING: missing unwind/symbol data for module 'uprobes'
long sys_write in tid 10162
Systemtap probe: kernel.trace("sched_switch")
long sys_write in tid 11510
Systemtap probe: kernel.trace("sched_switch")
long sys_write in tid 11288
Systemtap probe: kernel.trace("sched_switch")

I could not figure out why "print_backtrace()" didn't output any log:)

On Sat, Mar 29, 2014 at 12:30 AM, Josh Stone <jistone@redhat.com> wrote:
> On 03/27/2014 09:22 PM, liang xie wrote:
>> Hi all,
>>
>> Probably it's a newbie question, to be honest, i am new to SystemTap,
>
> No problem, welcome!
>
>> currently i want to capture the kernel stack trace if "sys_write" runs
>> too long, if that i could know where it sucks.
>>
>> My OS:2.6.32-220.el6.x86_64
>> # stap -V
>> Systemtap translator/driver (version 1.6/0.152 non-git sources)
>>
>> My tried script:
>> global i=0
>> global tb=0
>> global te=0
>> probe kernel.function("sys_write") {
>>   tb=gettimeofday_us();
>> }
>> probe kernel.function("sys_write").return {
>>   te=gettimeofday_us();
>>   if(te-tb > 100000) {
>>    i++;
>>    printf("time_us: %d\n", te-tb);
>>    print_backtrace();
>>    if(i > 15) {
>>     exit();
>>    }
>>   }
>> }
>>
>> # stap -x <pid> xxx.stp
>> time_us: 226795
>> Returning from:  0xffffffff81176f90 : sys_write+0x0/0x90 [kernel]
>> Returning to  :  0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel]
>> time_us: 101400
>> Returning from:  0xffffffff81176f90 : sys_write+0x0/0x90 [kernel]
>> Returning to  :  0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel]
>> time_us: 527853
>> Returning from:  0xffffffff81176f90 : sys_write+0x0/0x90 [kernel]
>> Returning to  :  0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel]
>> time_us: 133948
>> Returning from:  0xffffffff81176f90 : sys_write+0x0/0x90 [kernel]
>> Returning to  :  0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel]
>>
>> It doesn't show my "expected" kernel stack trace,e.g. file system or
>> block layer related stack trace. How to archive that? Any pointers
>> will be highly appreciated!
>
> Right, since you waited until return to print_backtrace, it will only be
> from the end of sys_write.  (Though it appears to be the beginning,
> sys_write+0x0, because of the way a kretprobe captures returns.)
>
> Since you want info from deeper kernel internals, you'll need additional
> probes.  One option is to use the para-callgraph example script, where
> sys_write is your first argument to start a trace, and the second
> argument gives a broad subsystem to dig through.  Like:
>
> $ stap para-callgraph.stp 'kernel.function("sys_write")' \
>         'kernel.function("*@fs/*")'
>
> This script will show you timestamps, but you'll get all output whether
> it's slow or not.  Maybe you can modify it to filter some of the
> information.
>
> Another option is to use timer.profile, and then you can watch for your
> long waits.  Something like:
>
> global start_time
> probe syscall.write { start_time[tid()] = gettimeofday_us() }
> probe syscall.write.return { delete start_time[tid()] }
> probe timer.profile {
>   t = tid()
>   if (t in start_time && gettimeofday_us() - start_time[t] > 100000)
>   {
>     printf("long sys_write in tid %d\n", t)
>     print_backtrace()
>     delete start_time[t]
>   }
> }
>
> Even this assumes that the long wait is busy on the cpu, but I think
> most likely it's an IO wait, which means you'll be scheduled off while
> it completes.  So then instead of timer.profile, or in addition, you
> could use a scheduler tracepoint.  Like:
>
> global start_time
> probe syscall.write { start_time[tid()] = gettimeofday_us() }
> probe syscall.write.return { delete start_time[tid()] }
> probe kernel.trace("sched_switch") {
>   t = task_tid($next)
>   if (t in start_time && gettimeofday_us() - start_time[t] > 100000)
>   {
>     printf("long sys_write in tid %d\n", t)
>     print_backtrace()
>     delete start_time[t]
>   }
> }
>
> So as your thread is scheduled back onto the cpu, if it has been a long
> time, the backtrace will tell you what path it took to get scheduled off
> in the first place.
>
>
> Hope that's enough to give you some direction...
>
> Josh

next prev parent reply	other threads:[~2014-04-02 10:25 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-28  4:22 liang xie
2014-03-28 16:56 ` Frank Ch. Eigler
2014-03-28 17:06 ` Josh Stone
2014-04-02 10:25   ` liang xie [this message]
2014-04-02 16:32     ` Josh Stone
2014-04-03  2:47       ` liang xie
2014-04-03 16:24         ` Josh Stone
2014-04-08  2:15           ` liang xie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADu=CFrnaoiWcD8Jt8uFd1Q087Kyp28gKukKH_JD7craJeF3xw@mail.gmail.com' \
    --to=xieliang007@gmail.com \
    --cc=jistone@redhat.com \
    --cc=systemtap@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).