* Some newbie questions @ 2016-08-07 8:08 Avi Kivity 2016-08-07 9:16 ` Avi Kivity 2016-08-08 14:58 ` David Smith 0 siblings, 2 replies; 32+ messages in thread From: Avi Kivity @ 2016-08-07 8:08 UTC (permalink / raw) To: systemtap Hello, A few newbie questions. 1. Static library support I am writing a static library, and would like to ship some scripts with the library for its consumers. The library is named "seastar" and one of the consumers is named "scylla". What should I write as the process name in the script? As the writer of the library, I don't know the name of the process that will be consuming the library. The consumer shouldn't have to write those scripts. I could play sed games to adjust the scripts, but it's very sad to have to do this. 2. Inlined functions From the manual pages, it seems that inlined functions can be probed (minus the .return probe), but I just get an error: semantic error: while resolving probe point: identifier 'process' at script/home/avi/seastar/debug/task-latency.stap:3:7 source: probe process("scylla").function("reactor::run_tasks()") { ^ semantic error: no match (similar functions: _ZN7reactor14run_exit_tasksEv, statfs, dup, mkdir, ntohs) I will note that "mkdir" does not sound very similar to "reactor::run_tasks()" (but I am not a native speaker). 3. Process CPU timers (more of a feature request) I'm trying to find causes of latency in my program. To do that, I'm running a periodic timer and checking whether a function takes more time than some threshold. Ideally, I'd be able to arm the timer on the function entry point and disarm it on exit, rather than have it run continuously; this would need to be a per-thread cpu-time timer (e.g. CLOCK_THREAD_CPUTIME_ID)/ Here's my current script for reference ("running" and "start_time" need to become maps for it to be thread-safe): #!/usr/bin/stap global start_time global running probe begin { running = 0 } probe process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_start") { start_time = gettimeofday_us() running = 1 } probe process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_end") { running = 0 } probe timer.ms(10) { now = gettimeofday_us() if (running && (now - start_time) > 30000) { printf("detected tasks running for >30ms\n") print_usyms(ubacktrace()) } } I'd appreciate any tips as to whether there's a better way to do this. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-07 8:08 Some newbie questions Avi Kivity @ 2016-08-07 9:16 ` Avi Kivity 2016-08-08 14:58 ` David Smith 1 sibling, 0 replies; 32+ messages in thread From: Avi Kivity @ 2016-08-07 9:16 UTC (permalink / raw) To: systemtap On 08/07/2016 11:07 AM, Avi Kivity wrote: > Hello, > Hi, n00b, > A few newbie questions. > > 1. Static library support > > I am writing a static library, and would like to ship some scripts > with the library for its consumers. The library is named "seastar" > and one of the consumers is named "scylla". > > What should I write as the process name in the script? As the writer > of the library, I don't know the name of the process that will be > consuming the library. The consumer shouldn't have to write those > scripts. I could play sed games to adjust the scripts, but it's very > sad to have to do this. > You can use "@1" to paste a script parameter into the script. > 2. Inlined functions > > From the manual pages, it seems that inlined functions can be probed > (minus the .return probe), but I just get an error: > > semantic error: while resolving probe point: identifier 'process' at > script/home/avi/seastar/debug/task-latency.stap:3:7 > source: probe > process("scylla").function("reactor::run_tasks()") { > ^ > > semantic error: no match (similar functions: > _ZN7reactor14run_exit_tasksEv, statfs, dup, mkdir, ntohs) > > I will note that "mkdir" does not sound very similar to > "reactor::run_tasks()" (but I am not a native speaker). > > 3. Process CPU timers > > (more of a feature request) > > I'm trying to find causes of latency in my program. To do that, I'm > running a periodic timer and checking whether a function takes more > time than some threshold. > > Ideally, I'd be able to arm the timer on the function entry point and > disarm it on exit, rather than have it run continuously; this would > need to be a per-thread cpu-time timer (e.g. CLOCK_THREAD_CPUTIME_ID)/ > > Here's my current script for reference ("running" and "start_time" > need to become maps for it to be thread-safe): > > #!/usr/bin/stap > > global start_time > global running > > probe begin { > running = 0 > } > > probe > process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_start") > { > start_time = gettimeofday_us() > running = 1 > } > > probe > process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_end") > { > running = 0 > } > > probe timer.ms(10) { You should use timer.profile instead > now = gettimeofday_us() > if (running && (now - start_time) > 30000) { And a second parameter here. > printf("detected tasks running for >30ms\n") > print_usyms(ubacktrace()) > } > } > > I'd appreciate any tips as to whether there's a better way to do this. > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-07 8:08 Some newbie questions Avi Kivity 2016-08-07 9:16 ` Avi Kivity @ 2016-08-08 14:58 ` David Smith 2016-08-09 6:51 ` Avi Kivity 1 sibling, 1 reply; 32+ messages in thread From: David Smith @ 2016-08-08 14:58 UTC (permalink / raw) To: Avi Kivity, systemtap On 08/07/2016 03:07 AM, Avi Kivity wrote: > Hello, > > A few newbie questions. > > 1. Static library support > > I am writing a static library, and would like to ship some scripts with > the library for its consumers. The library is named "seastar" and one > of the consumers is named "scylla". > > What should I write as the process name in the script? As the writer of > the library, I don't know the name of the process that will be consuming > the library. The consumer shouldn't have to write those scripts. I > could play sed games to adjust the scripts, but it's very sad to have to > do this. The tricky part here is that your library is static. With a shared library, you could do something like: probe process.library("libseastar.so").function("FOO") { } which would get probe hits for any process that uses that shared library. For a statically linked library, you can't do that. Off the top of my head, I don't see any solution other than your '@1' parameter idea. > 2. Inlined functions > > From the manual pages, it seems that inlined functions can be probed > (minus the .return probe), but I just get an error: > > semantic error: while resolving probe point: identifier 'process' at > script/home/avi/seastar/debug/task-latency.stap:3:7 > source: probe process("scylla").function("reactor::run_tasks()") { > ^ > > semantic error: no match (similar functions: > _ZN7reactor14run_exit_tasksEv, statfs, dup, mkdir, ntohs) We're at the mercy of the compiler and the quality of the debuginfo it produces here when looking for inlined functions. If you want to investigate further here, you'll need to do the following: # eu-readelf -N --debug-dump=info scylla > scylla.log In scylla.log, look and see if you can find a subprogram with the name of the function you are interested in. > I will note that "mkdir" does not sound very similar to > "reactor::run_tasks()" (but I am not a native speaker). Actually, to my native speaker's eyes, none of the suggested functions look that close. The way the levenshtein suggestive feature works is that we report the top 5 scoring functions. I'd bet in this case the scores were all bad, but these were the top 5 of the scores. > 3. Process CPU timers > > (more of a feature request) > > I'm trying to find causes of latency in my program. To do that, I'm > running a periodic timer and checking whether a function takes more time > than some threshold. > > Ideally, I'd be able to arm the timer on the function entry point and > disarm it on exit, rather than have it run continuously; this would need > to be a per-thread cpu-time timer (e.g. CLOCK_THREAD_CPUTIME_ID)/ > > Here's my current script for reference ("running" and "start_time" need > to become maps for it to be thread-safe): > > #!/usr/bin/stap > > global start_time > global running > > probe begin { > running = 0 > } > > probe > process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_start") > { > start_time = gettimeofday_us() > running = 1 > } > > probe > process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_end") > { > running = 0 > } > > probe timer.ms(10) { > now = gettimeofday_us() > if (running && (now - start_time) > 30000) { > printf("detected tasks running for >30ms\n") > print_usyms(ubacktrace()) > } > } > > I'd appreciate any tips as to whether there's a better way to do this. The above isn't really going to work well, for several reasons: 1) You've only got one global 'start_time' and 'running' variables. If scylla is multithreaded or more than one instance is running, that isn't going to work. To fix this, you'd want to do convert them to arrays and index them by thread ids, like this: start_time[tid()] = gettimeofday_us() 2) Your ubacktrace() call in the timer function isn't going to report what you want. At that point, you aren't running in the context of the process, you are running in the context of whatever kernel thread is running the timer. I'm really sure what a user backtrace will report at that point, but it certainly won't be useful. Here's an (untested) version of what would work a bit better: ==== global start_time probe process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_start") { start_time[tid()] = gettimeofday_us() } probe process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_end") { now = gettimeofday_us() start = start_time[tid()] delete start_time[tid()] if ((now - start) > 30000) { printf("detected tasks running for >30ms (%d)\n", (now - start)) print_usyms(ubacktrace()) } } ==== If you like, you could add a timer probe to the above to report an ongoing problem in the current process (but without printing a backtrace). -- David Smith dsmith@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-08 14:58 ` David Smith @ 2016-08-09 6:51 ` Avi Kivity 2016-08-09 13:54 ` Mark Wielaard 2016-08-09 15:17 ` David Smith 0 siblings, 2 replies; 32+ messages in thread From: Avi Kivity @ 2016-08-09 6:51 UTC (permalink / raw) To: David Smith, systemtap On 08/08/2016 05:58 PM, David Smith wrote: > >> 2. Inlined functions >> >> From the manual pages, it seems that inlined functions can be probed >> (minus the .return probe), but I just get an error: >> >> semantic error: while resolving probe point: identifier 'process' at >> script/home/avi/seastar/debug/task-latency.stap:3:7 >> source: probe process("scylla").function("reactor::run_tasks()") { >> ^ >> >> semantic error: no match (similar functions: >> _ZN7reactor14run_exit_tasksEv, statfs, dup, mkdir, ntohs) > We're at the mercy of the compiler and the quality of the debuginfo it > produces here when looking for inlined functions. > > If you want to investigate further here, you'll need to do the following: > > # eu-readelf -N --debug-dump=info scylla > scylla.log > > In scylla.log, look and see if you can find a subprogram with the name > of the function you are interested in. [1417f4] subprogram external (flag_present) Yes name (strp) "run_tasks" decl_file (data1) 5 decl_line (data2) 812 linkage_name (strp) "_ZN7reactor9run_tasksER15circular_bufferISt10unique_ptrI4taskSt14default_deleteIS2_EESaIS5_EE" declaration (flag_present) Yes object_pointer (ref4) [141808] sibling (ref4) [141813] [141808] formal_parameter type (ref4) [1423bd] artificial (flag_present) Yes [14180d] formal_parameter type (ref4) [14a8a8] The debugger has no trouble setting breakpoints on the function or showing it in backtraces. In any case I sprinkled some static probes there. >> 3. Process CPU timers >> >> (more of a feature request) >> >> I'm trying to find causes of latency in my program. To do that, I'm >> running a periodic timer and checking whether a function takes more time >> than some threshold. >> >> Ideally, I'd be able to arm the timer on the function entry point and >> disarm it on exit, rather than have it run continuously; this would need >> to be a per-thread cpu-time timer (e.g. CLOCK_THREAD_CPUTIME_ID)/ >> >> Here's my current script for reference ("running" and "start_time" need >> to become maps for it to be thread-safe): >> >> #!/usr/bin/stap >> >> global start_time >> global running >> >> probe begin { >> running = 0 >> } >> >> probe >> process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_start") >> { >> start_time = gettimeofday_us() >> running = 1 >> } >> >> probe >> process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_end") >> { >> running = 0 >> } >> >> probe timer.ms(10) { >> now = gettimeofday_us() >> if (running && (now - start_time) > 30000) { >> printf("detected tasks running for >30ms\n") >> print_usyms(ubacktrace()) >> } >> } >> >> I'd appreciate any tips as to whether there's a better way to do this. > The above isn't really going to work well, for several reasons: > > 1) You've only got one global 'start_time' and 'running' variables. If > scylla is multithreaded or more than one instance is running, that isn't > going to work. > > To fix this, you'd want to do convert them to arrays and index them by > thread ids, like this: > > start_time[tid()] = gettimeofday_us() Yes, I found that out and my latest version works like this, but I ran into severe scaling issues. I also switched to timer.profile. What's its resolution on normal machines? > 2) Your ubacktrace() call in the timer function isn't going to report > what you want. At that point, you aren't running in the context of the > process, you are running in the context of whatever kernel thread is > running the timer. I'm really sure what a user backtrace will report at > that point, but it certainly won't be useful. Actually I got good backtraces when I instrumented a task to spin for 100ms. I also got some bad traces, but did not investigate more. > Here's an (untested) version of what would work a bit better: > > ==== > global start_time > > probe > process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_start") > { > start_time[tid()] = gettimeofday_us() > } > > probe > process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_end") > { > now = gettimeofday_us() > start = start_time[tid()] > delete start_time[tid()] > if ((now - start) > 30000) { > printf("detected tasks running for >30ms (%d)\n", > (now - start)) > print_usyms(ubacktrace()) Here the backtrace is useless, I know exactly what it looks like at this point; it's two frames away from main(). To give some context, this is a run-to-completion scheduler. I have tasks pushed into a queue from various sources (including other tasks), and this bit is the part of the main loop that dequeues a task, runs it, and repeats. Some tasks take too long, and I want to identify which ones. I could print the last task's vtable, but that is much less useful than a full backtrace at the time the task is detected as taking too long. > } > } > ==== > > If you like, you could add a timer probe to the above to report an > ongoing problem in the current process (but without printing a backtrace). > Unfortunately I already know there is a problem and need to pinpoint it. Which is where the backtrace is so useful. I can fall back to instrumenting the code with a timer (from timer_create(3)), but I'd much rather have the flexibility of non-intrusive instrumentation. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-09 6:51 ` Avi Kivity @ 2016-08-09 13:54 ` Mark Wielaard 2016-08-09 14:11 ` Avi Kivity 2016-08-09 15:17 ` David Smith 1 sibling, 1 reply; 32+ messages in thread From: Mark Wielaard @ 2016-08-09 13:54 UTC (permalink / raw) To: Avi Kivity; +Cc: David Smith, systemtap On Tue, Aug 09, 2016 at 09:51:22AM +0300, Avi Kivity wrote: > On 08/08/2016 05:58 PM, David Smith wrote: > > 2) Your ubacktrace() call in the timer function isn't going to report > > what you want. At that point, you aren't running in the context of the > > process, you are running in the context of whatever kernel thread is > > running the timer. I'm really sure what a user backtrace will report at > > that point, but it certainly won't be useful. > > Actually I got good backtraces when I instrumented a task to spin for 100ms. > I also got some bad traces, but did not investigate more. stap does indeed try to make this work even from a kernel context. See runtime/stack.c: /** Gets user space registers when available, also sets context * full_uregs_p if appropriate. Should be used instead of accessing * context uregs field directly when (full) uregs are needed from * kernel context. */ static struct pt_regs *_stp_get_uregs(struct context *c) But it might indeed depend on architecture and timer probe used. Cheers, Mark ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-09 13:54 ` Mark Wielaard @ 2016-08-09 14:11 ` Avi Kivity 2016-08-10 15:33 ` Frank Ch. Eigler 0 siblings, 1 reply; 32+ messages in thread From: Avi Kivity @ 2016-08-09 14:11 UTC (permalink / raw) To: Mark Wielaard; +Cc: David Smith, systemtap On 08/09/2016 02:25 PM, Mark Wielaard wrote: > On Tue, Aug 09, 2016 at 09:51:22AM +0300, Avi Kivity wrote: >> On 08/08/2016 05:58 PM, David Smith wrote: >>> 2) Your ubacktrace() call in the timer function isn't going to report >>> what you want. At that point, you aren't running in the context of the >>> process, you are running in the context of whatever kernel thread is >>> running the timer. I'm really sure what a user backtrace will report at >>> that point, but it certainly won't be useful. >> Actually I got good backtraces when I instrumented a task to spin for 100ms. >> I also got some bad traces, but did not investigate more. > stap does indeed try to make this work even from a kernel context. > See runtime/stack.c: > > /** Gets user space registers when available, also sets context > * full_uregs_p if appropriate. Should be used instead of accessing > * context uregs field directly when (full) uregs are needed from > * kernel context. > */ > static struct pt_regs *_stp_get_uregs(struct context *c) > > But it might indeed depend on architecture and timer probe used. > > I note that pid()/tid() also kept working. A happy coincidence maybe (the thread was not switched out). Maybe a per-thread timer can be added. I'm surprised this didn't come up before. Trying to find out why a function which is usually fast and very rarely slow seems to be a good fit for systemtap. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-09 14:11 ` Avi Kivity @ 2016-08-10 15:33 ` Frank Ch. Eigler 2016-08-10 15:39 ` Avi Kivity 2016-08-10 15:40 ` Avi Kivity 0 siblings, 2 replies; 32+ messages in thread From: Frank Ch. Eigler @ 2016-08-10 15:33 UTC (permalink / raw) To: Avi Kivity; +Cc: Mark Wielaard, David Smith, systemtap Avi Kivity <avi@scylladb.com> writes: > [...] > I'm surprised this didn't come up before. Trying to find out why a > function which is usually fast and very rarely slow seems to be a good > fit for systemtap. Like this? https://sourceware.org/systemtap/examples/#profiling/fntimes.stp Note though that such analysis probably cannot be performed based only upon PC samples - or even backtrace samples. We seem to require trapping individual function entry/exit events. - FChE ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-10 15:33 ` Frank Ch. Eigler @ 2016-08-10 15:39 ` Avi Kivity 2016-08-10 15:40 ` Avi Kivity 1 sibling, 0 replies; 32+ messages in thread From: Avi Kivity @ 2016-08-10 15:39 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Mark Wielaard, David Smith, systemtap On 08/10/2016 06:32 PM, Frank Ch. Eigler wrote: > Avi Kivity <avi@scylladb.com> writes: > >> [...] >> I'm surprised this didn't come up before. Trying to find out why a >> function which is usually fast and very rarely slow seems to be a good >> fit for systemtap. > Like this? > https://sourceware.org/systemtap/examples/#profiling/fntimes.stp Yes. The problem is that if the function is called often (with a usual short running time), then systemtap will eat all of the cpu time spinning on an internal lock. > Note though that such analysis probably cannot be performed based only > upon PC samples - or even backtrace samples. We seem to require > trapping individual function entry/exit events. That's why I tried systemtap. It worked will on my desktop, but very badly in production. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-10 15:33 ` Frank Ch. Eigler 2016-08-10 15:39 ` Avi Kivity @ 2016-08-10 15:40 ` Avi Kivity 2016-08-10 16:47 ` Frank Ch. Eigler 1 sibling, 1 reply; 32+ messages in thread From: Avi Kivity @ 2016-08-10 15:40 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Mark Wielaard, David Smith, systemtap On 08/10/2016 06:32 PM, Frank Ch. Eigler wrote: > Avi Kivity <avi@scylladb.com> writes: > >> [...] >> I'm surprised this didn't come up before. Trying to find out why a >> function which is usually fast and very rarely slow seems to be a good >> fit for systemtap. > Like this? > https://sourceware.org/systemtap/examples/#profiling/fntimes.stp Yes. The problem is that if the function is called often (with a usual short running time), then systemtap will eat all of the cpu time spinning on an internal lock. > Note though that such analysis probably cannot be performed based only > upon PC samples - or even backtrace samples. We seem to require > trapping individual function entry/exit events. That's why I tried systemtap. It worked well on my desktop, but very badly in production. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-10 15:40 ` Avi Kivity @ 2016-08-10 16:47 ` Frank Ch. Eigler 2016-08-11 7:51 ` Avi Kivity 0 siblings, 1 reply; 32+ messages in thread From: Frank Ch. Eigler @ 2016-08-10 16:47 UTC (permalink / raw) To: Avi Kivity; +Cc: Mark Wielaard, David Smith, systemtap Hi - On Wed, Aug 10, 2016 at 06:40:02PM +0300, Avi Kivity wrote: > [...] > Yes. The problem is that if the function is called often (with a usual > short running time), then systemtap will eat all of the cpu time > spinning on an internal lock. Well, not just that ... trapping each function entry/exit has unavoidable kernel uprobes context-switchy-type overheads. Your perf report may well be misattributing the cost. (Have you tried a stap script that merely traps all the same the function calls, and has empty probe handlers?) > >Note though that such analysis probably cannot be performed based only > >upon PC samples - or even backtrace samples. We seem to require > >trapping individual function entry/exit events. > > That's why I tried systemtap. It worked well on my desktop, but very > badly in production. It may be worth experimenting with "stap --runtime=dyninst" if your function analysis were restricted to basic Cish userspace that dyninst can handle. - FChE ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-10 16:47 ` Frank Ch. Eigler @ 2016-08-11 7:51 ` Avi Kivity 2016-08-11 8:13 ` Avi Kivity 0 siblings, 1 reply; 32+ messages in thread From: Avi Kivity @ 2016-08-11 7:51 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Mark Wielaard, David Smith, systemtap On 08/10/2016 07:47 PM, Frank Ch. Eigler wrote: > Hi - > > On Wed, Aug 10, 2016 at 06:40:02PM +0300, Avi Kivity wrote: >> [...] >> Yes. The problem is that if the function is called often (with a usual >> short running time), then systemtap will eat all of the cpu time >> spinning on an internal lock. > Well, not just that ... trapping each function entry/exit has > unavoidable kernel uprobes context-switchy-type overheads. Like you say, those are unavoidable. But at least those costs are handled by scalable resources. > Your perf > report may well be misattributing the cost. I think it's unlikely. When perf says __raw_spin_lock is guilty, it usually is. > (Have you tried a stap > script that merely traps all the same the function calls, and has > empty probe handlers?) I can try it. > > >>> Note though that such analysis probably cannot be performed based only >>> upon PC samples - or even backtrace samples. We seem to require >>> trapping individual function entry/exit events. >> That's why I tried systemtap. It worked well on my desktop, but very >> badly in production. > It may be worth experimenting with "stap --runtime=dyninst" if your > function analysis were restricted to basic Cish userspace that dyninst > can handle. > Will timer.profile work with dyninst? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-11 7:51 ` Avi Kivity @ 2016-08-11 8:13 ` Avi Kivity 2016-08-11 16:18 ` Frank Ch. Eigler 0 siblings, 1 reply; 32+ messages in thread From: Avi Kivity @ 2016-08-11 8:13 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Mark Wielaard, David Smith, systemtap On 08/11/2016 10:51 AM, Avi Kivity wrote: > > > On 08/10/2016 07:47 PM, Frank Ch. Eigler wrote: >> Hi - >> >> On Wed, Aug 10, 2016 at 06:40:02PM +0300, Avi Kivity wrote: >>> [...] >>> Yes. The problem is that if the function is called often (with a usual >>> short running time), then systemtap will eat all of the cpu time >>> spinning on an internal lock. >> Well, not just that ... trapping each function entry/exit has >> unavoidable kernel uprobes context-switchy-type overheads. > > Like you say, those are unavoidable. But at least those costs are > handled by scalable resources. > > >> Your perf >> report may well be misattributing the cost. > > I think it's unlikely. When perf says __raw_spin_lock is guilty, it > usually is. > >> (Have you tried a stap >> script that merely traps all the same the function calls, and has >> empty probe handlers?) > > I can try it. It was actually pretty bad with empty handlers: # 66.25% scylla [kernel.kallsyms] [k] _raw_spin_lock | ---_raw_spin_lock | |--49.95%-- 0x62ab | syscall_trace_leave | int_check_syscall_exit_work | | | |--99.08%-- epoll_pwait | | _ZN21reactor_backend_epoll16wait_and_processEiPK10__sigset_t | | _ZNSt17_Function_handlerIFbvEZN7reactor3runEvEUlvE5_E9_M_invokeERKSt9_Any_data | | _ZN7reactor3runEv | | | | | |--91.80%-- _ZZN3smp9configureEN5boost15program_options13variables_mapEENKUlvE1_clEv.constprop.2783 | | | _Z19dpdk_thread_adaptorPv | | | eal_thread_loop | | | start_thread | | | __clone | | | | | --8.20%-- _ZN12app_template14run_deprecatedEiPPcOSt8functionIFvvEE | | | | | --100.00%-- main | | __libc_start_main | | _start | --0.92%-- [...] | |--49.46%-- 0x619b | syscall_trace_enter_phase2 | tracesys_phase2 | | | |--98.82%-- epoll_pwait | | _ZN21reactor_backend_epoll16wait_and_processEiPK10__sigset_t | | _ZNSt17_Function_handlerIFbvEZN7reactor3runEvEUlvE5_E9_M_invokeERKSt9_Any_data | | _ZN7reactor3runEv | | | | | |--91.94%-- _ZZN3smp9configureEN5boost15program_options13variables_mapEENKUlvE1_clEv.constprop.2783 | | | _Z19dpdk_thread_adaptorPv | | | eal_thread_loop | | | start_thread | | | __clone | | | | | --8.06%-- _ZN12app_template14run_deprecatedEiPPcOSt8functionIFvvEE | | | | | --100.00%-- main | | __libc_start_main | | _start | --1.18%-- [...] --0.58%-- [...] I don't have any system call probes. Just two empty static probes, and a timer.profile handler. > >> >> >>>> Note though that such analysis probably cannot be performed based only >>>> upon PC samples - or even backtrace samples. We seem to require >>>> trapping individual function entry/exit events. >>> That's why I tried systemtap. It worked well on my desktop, but very >>> badly in production. >> It may be worth experimenting with "stap --runtime=dyninst" if your >> function analysis were restricted to basic Cish userspace that dyninst >> can handle. >> > > Will timer.profile work with dyninst? > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-11 8:13 ` Avi Kivity @ 2016-08-11 16:18 ` Frank Ch. Eigler 2016-08-17 15:15 ` Avi Kivity 0 siblings, 1 reply; 32+ messages in thread From: Frank Ch. Eigler @ 2016-08-11 16:18 UTC (permalink / raw) To: Avi Kivity; +Cc: Mark Wielaard, David Smith, systemtap Hi - avi wrote: > [...] > >>(Have you tried a stap script that merely traps all the same the > >>function calls, and has empty probe handlers?) > >I can try it. > > It was actually pretty bad with empty handlers: > > # > 66.25% scylla [kernel.kallsyms] [k] > _raw_spin_lock > | > ---_raw_spin_lock > | > |--49.95%-- 0x62ab > | syscall_trace_leave > [...] > |--49.46%-- 0x619b > | syscall_trace_enter_phase2 > [...] > > I don't have any system call probes. Just two empty static probes, and > a timer.profile handler. It would sure be nice if perf gave you an explanation of those 0x62ab bits. Maybe use dwarf unwind based callgraph collection? The stap runtime does sometimes create probes for internal purposes, such as trapping exec/mmap operations (so probes on new programs can be activated). It is conceivable that this is the source of the spinlock activity, but such a high rate doesn't make sense. > Will timer.profile work with dyninst? Profile timers, not yet, but normal periodic ones like timer.hz(997) yes. Unfortunately backtracing is also not yet implemented there. - FChE ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-11 16:18 ` Frank Ch. Eigler @ 2016-08-17 15:15 ` Avi Kivity 2016-08-17 18:06 ` David Smith 2016-08-17 18:10 ` Frank Ch. Eigler 0 siblings, 2 replies; 32+ messages in thread From: Avi Kivity @ 2016-08-17 15:15 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Mark Wielaard, David Smith, systemtap On 08/11/2016 07:18 PM, Frank Ch. Eigler wrote: > Hi - > > avi wrote: > >> [...] >>>> (Have you tried a stap script that merely traps all the same the >>>> function calls, and has empty probe handlers?) >>> I can try it. >> It was actually pretty bad with empty handlers: >> >> # >> 66.25% scylla [kernel.kallsyms] [k] > _raw_spin_lock >> | >> ---_raw_spin_lock >> | >> |--49.95%-- 0x62ab >> | syscall_trace_leave >> [...] >> |--49.46%-- 0x619b >> | syscall_trace_enter_phase2 >> [...] >> >> I don't have any system call probes. Just two empty static probes, and >> a timer.profile handler. > It would sure be nice if perf gave you an explanation of those 0x62ab > bits. Maybe use dwarf unwind based callgraph collection? The stap > runtime does sometimes create probes for internal purposes, such as > trapping exec/mmap operations (so probes on new programs can be > activated). It is conceivable that this is the source of the spinlock > activity, but such a high rate doesn't make sense. I replicated on a more modern machine (also wider: 2s24c48t). I had to --suppress-time-limits or stap would exit almost immediately. My load indeed calls epoll_wait() at a high rate, with zero timeout. But now it's completely dominated by tracing overhead. #!/usr/bin/stap # usage: task_latency.stap process_name latency_threshold_ms global start_time probe process(@1).mark("reactor_run_tasks_single_start") { start_time[pid(), tid()] = gettimeofday_us() } probe process(@1).mark("reactor_run_tasks_single_end") { delete start_time[pid(), tid()] } probe timer.profile { if ([pid(), tid()] in start_time) { now = gettimeofday_us() start = start_time[pid(), tid()] if ((now - start) > $2 * 1000) { printf("detected tasks running for >%sms\n", @2) print_usyms(ubacktrace()) } } } # # Total Lost Samples: 357 # # Samples: 308K of event 'cycles:ppp' # Event count (approx.): 222398334161 # # Overhead Command Shared Object Symbol # ........ ............... ................... ..................................................................................................................................................................................................................................................... # 50.66% scylla [kernel.kallsyms] [k] queued_spin_lock_slowpath | ---queued_spin_lock_slowpath | |--49.99%--_raw_spin_lock | | | |--48.48%--task_utrace_struct | | | | | |--25.21%--utrace_report_syscall_entry | | | syscall_trace_enter_phase2 | | | syscall_trace_enter | | | do_syscall_64 | | | return_from_SYSCALL_64 | | | | | | | --23.47%--epoll_pwait | | | reactor_backend_epoll::wait_and_process | | | | | | | |--15.91%--std::_Function_handler<bool (), reactor::run()::{lambda()#7}>::_M_invoke | | | | reactor::run | | | | smp::configure(boost::program_options::variables_map)::{lambda()#3}::operator() | | | | posix_thread::start_routine | | | | start_thread | | | | __clone | | | | | | | --7.56%--std::_Function_handler<bool (), reactor::run()::{lambda()#8}>::_M_invoke | | | logalloc::tracker::impl::compact_on_idle | | | reactor::run | | | smp::configure(boost::program_options::variables_map)::{lambda()#3}::operator() | | | posix_thread::start_routine | | | start_thread | | | __clone | | | | | --23.27%--utrace_report_syscall_exit | | syscall_slow_exit_work | | do_syscall_64 | | return_from_SYSCALL_64 | | | | | --21.92%--epoll_pwait | | reactor_backend_epoll::wait_and_process | | | | | |--14.83%--std::_Function_handler<bool (), reactor::run()::{lambda()#7}>::_M_invoke | | | reactor::run | | | smp::configure(boost::program_options::variables_map)::{lambda()#3}::operator() | | | posix_thread::start_routine | | | start_thread | | | __clone | | | | | --7.09%--std::_Function_handler<bool (), reactor::run()::{lambda()#8}>::_M_invoke | | logalloc::tracker::impl::compact_on_idle | | reactor::run | | smp::configure(boost::program_options::variables_map)::{lambda()#3}::operator() | | posix_thread::start_routine | | start_thread | | __clone | | | --1.27%--__lock_task_sighand | | | --0.67%--send_sigqueue | posix_timer_event | cpu_timer_fire | run_posix_cpu_timers | update_process_times | tick_sched_handle.isra.16 | tick_sched_timer | __hrtimer_run_queues | hrtimer_interrupt : > >> Will timer.profile work with dyninst? > Profile timers, not yet, but normal periodic ones like timer.hz(997) yes. > Unfortunately backtracing is also not yet implemented there. > > Well, it's not very useful to me then. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-17 15:15 ` Avi Kivity @ 2016-08-17 18:06 ` David Smith 2016-08-17 18:22 ` Avi Kivity 2016-08-17 18:10 ` Frank Ch. Eigler 1 sibling, 1 reply; 32+ messages in thread From: David Smith @ 2016-08-17 18:06 UTC (permalink / raw) To: Avi Kivity, Frank Ch. Eigler; +Cc: Mark Wielaard, systemtap On 08/17/2016 10:15 AM, Avi Kivity wrote: ... stuff deleted ... > My load indeed calls epoll_wait() at a high rate, with zero timeout. > But now it's completely dominated by tracing overhead. I'm not sure how much it will help, but one easy fix would be to replace all your '[pid(), tid()]' array indexing with just '[tid()]' array indexing. Try changing that and see if the tracing overhead goes down a bit. -- David Smith dsmith@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-17 18:06 ` David Smith @ 2016-08-17 18:22 ` Avi Kivity 0 siblings, 0 replies; 32+ messages in thread From: Avi Kivity @ 2016-08-17 18:22 UTC (permalink / raw) To: David Smith, Frank Ch. Eigler; +Cc: Mark Wielaard, systemtap On 08/17/2016 09:06 PM, David Smith wrote: > On 08/17/2016 10:15 AM, Avi Kivity wrote: > > ... stuff deleted ... > >> My load indeed calls epoll_wait() at a high rate, with zero timeout. >> But now it's completely dominated by tracing overhead. > I'm not sure how much it will help, but one easy fix would be to replace > all your '[pid(), tid()]' array indexing with just '[tid()]' array indexing. > > Try changing that and see if the tracing overhead goes down a bit. > Yeah, I forgot to do that. But it's really unlikely to help, all the overhead is coming from syscall tracing, which I didn't ask for. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-17 15:15 ` Avi Kivity 2016-08-17 18:06 ` David Smith @ 2016-08-17 18:10 ` Frank Ch. Eigler 2016-08-17 18:30 ` Avi Kivity 1 sibling, 1 reply; 32+ messages in thread From: Frank Ch. Eigler @ 2016-08-17 18:10 UTC (permalink / raw) To: Avi Kivity; +Cc: Mark Wielaard, David Smith, systemtap avi wrote: > [...] > My load indeed calls epoll_wait() at a high rate, with zero timeout. OK. That's an unusual and cpu-saturating method. Without systemtap active, the code must be doing millions of syscalls per second. Are you sure there's no better way? > But now it's completely dominated by tracing overhead. > [...] > | |--48.48%--task_utrace_struct > | | |--25.21%--utrace_report_syscall_entry > | | | syscall_trace_enter_phase2 > | | | syscall_trace_enter > | | | do_syscall_64 > | | | return_from_SYSCALL_64 > | | | | > | | | --23.47%--epoll_pwait > [...] OK, this implicates the systemtap runtime utrace-emulation code that looks for mmap/exec syscalls. Perhaps there could be a mode (a -D setting?) for selecting kprobes rather than syscall-tracepoints to catch those events. - FChE ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-17 18:10 ` Frank Ch. Eigler @ 2016-08-17 18:30 ` Avi Kivity 2016-08-17 18:35 ` Frank Ch. Eigler 0 siblings, 1 reply; 32+ messages in thread From: Avi Kivity @ 2016-08-17 18:30 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Mark Wielaard, David Smith, systemtap On 08/17/2016 09:10 PM, Frank Ch. Eigler wrote: > avi wrote: > >> [...] >> My load indeed calls epoll_wait() at a high rate, with zero timeout. > OK. That's an unusual and cpu-saturating method. Without systemtap > active, the code must be doing millions of syscalls per second. Are > you sure there's no better way? Yes. It's not very common, but some high throughput or latency sensitive applications will poll. In my case it's both high throughput and latency sensitive. In any case, asking users to do fewer syscalls because it hurts systemtap will not work. > >> But now it's completely dominated by tracing overhead. >> [...] >> | |--48.48%--task_utrace_struct >> | | |--25.21%--utrace_report_syscall_entry >> | | | syscall_trace_enter_phase2 >> | | | syscall_trace_enter >> | | | do_syscall_64 >> | | | return_from_SYSCALL_64 >> | | | | >> | | | --23.47%--epoll_pwait >> [...] > OK, this implicates the systemtap runtime utrace-emulation code that Why is it emulating? Is there something missing in Kconfig? This is a 4.6 Fedora kernel, so I expect the kitchen sink has been compiled in. > looks for mmap/exec syscalls. Perhaps there could be a mode (a -D > setting?) for selecting kprobes rather than syscall-tracepoints to > catch those events. > Why not trace just mmap and exec syscalls, rather than everything? Surely there's a static tracepoint for those. I'd say defaulting to taking a spinlock on every syscall is going to cause anyone who's using more than a laptop a lot of pain. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-17 18:30 ` Avi Kivity @ 2016-08-17 18:35 ` Frank Ch. Eigler 2016-08-17 18:42 ` Avi Kivity 2016-08-25 16:21 ` Avi Kivity 0 siblings, 2 replies; 32+ messages in thread From: Frank Ch. Eigler @ 2016-08-17 18:35 UTC (permalink / raw) To: Avi Kivity; +Cc: Mark Wielaard, David Smith, systemtap Hi - On Wed, Aug 17, 2016 at 09:30:38PM +0300, Avi Kivity wrote: > [...] > >>My load indeed calls epoll_wait() at a high rate, with zero timeout. > >[...] > Yes. It's not very common, but some high throughput or latency > sensitive applications will poll. In my case it's both high throughput > and latency sensitive. Interesting. I assume you've measured that. > [...] > >looks for mmap/exec syscalls. Perhaps there could be a mode (a -D > >setting?) for selecting kprobes rather than syscall-tracepoints to > >catch those events. > > Why not trace just mmap and exec syscalls, rather than everything? > Surely there's a static tracepoint for those. Heh, in this case it's "surely not". :-) There exists only a multiplexed system-call-entry and a system-call-exit pair of tracepoints. > I'd say defaulting to taking a spinlock on every syscall is going to > cause anyone who's using more than a laptop a lot of pain. There may well exist optimization opportunities that would benefit syscall-saturated applications such as yours. - FChE ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-17 18:35 ` Frank Ch. Eigler @ 2016-08-17 18:42 ` Avi Kivity 2016-08-25 16:21 ` Avi Kivity 1 sibling, 0 replies; 32+ messages in thread From: Avi Kivity @ 2016-08-17 18:42 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Mark Wielaard, David Smith, systemtap On 08/17/2016 09:35 PM, Frank Ch. Eigler wrote: > Hi - > > On Wed, Aug 17, 2016 at 09:30:38PM +0300, Avi Kivity wrote: >> [...] >>>> My load indeed calls epoll_wait() at a high rate, with zero timeout. >>> [...] >> Yes. It's not very common, but some high throughput or latency >> sensitive applications will poll. In my case it's both high throughput >> and latency sensitive. > Interesting. I assume you've measured that. Yes. It's also well known in some circles. If you're interested, some links: http://www.seastar-project.org/ http://www.scylladb.com/technology/architecture/ http://dpdk.org/ > >> [...] >>> looks for mmap/exec syscalls. Perhaps there could be a mode (a -D >>> setting?) for selecting kprobes rather than syscall-tracepoints to >>> catch those events. >> Why not trace just mmap and exec syscalls, rather than everything? >> Surely there's a static tracepoint for those. > Heh, in this case it's "surely not". :-) There exists only a > multiplexed system-call-entry and a system-call-exit pair of > tracepoints. Well, static tracepoints are a patch away, no? Another alternative is to eliminate that spinlock, perhaps until after you've verified it's mmap or exec. Can you now attach an ebpf program to that tracepoint? Maybe that's unneeded since you're loading native code into the kernel. > > >> I'd say defaulting to taking a spinlock on every syscall is going to >> cause anyone who's using more than a laptop a lot of pain. > There may well exist optimization opportunities that would benefit > syscall-saturated applications such as yours. It's quite possible. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-17 18:35 ` Frank Ch. Eigler 2016-08-17 18:42 ` Avi Kivity @ 2016-08-25 16:21 ` Avi Kivity 2016-08-25 17:37 ` David Smith 1 sibling, 1 reply; 32+ messages in thread From: Avi Kivity @ 2016-08-25 16:21 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Mark Wielaard, David Smith, systemtap On 08/17/2016 09:35 PM, Frank Ch. Eigler wrote: > Hi - > > On Wed, Aug 17, 2016 at 09:30:38PM +0300, Avi Kivity wrote: >> [...] >>>> My load indeed calls epoll_wait() at a high rate, with zero timeout. >>> [...] >> Yes. It's not very common, but some high throughput or latency >> sensitive applications will poll. In my case it's both high throughput >> and latency sensitive. > Interesting. I assume you've measured that. > > >> [...] >>> looks for mmap/exec syscalls. Perhaps there could be a mode (a -D >>> setting?) for selecting kprobes rather than syscall-tracepoints to >>> catch those events. >> Why not trace just mmap and exec syscalls, rather than everything? >> Surely there's a static tracepoint for those. > Heh, in this case it's "surely not". :-) There exists only a > multiplexed system-call-entry and a system-call-exit pair of > tracepoints. > Hi, Should I wait for ongoing improvement here, or shall I look elsewhere for my tracing needs? It would be a pity (for me) if I have to find another solution, because systemtap has all the expressiveness and integration I need. But it has a dramatic impact on my application runtime. I was able to extract some useful data with perf probe/perf record, but as soon as I need to qualify a probe point with runtime information, perf falls short. Avi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-25 16:21 ` Avi Kivity @ 2016-08-25 17:37 ` David Smith 2016-08-26 18:39 ` Avi Kivity 0 siblings, 1 reply; 32+ messages in thread From: David Smith @ 2016-08-25 17:37 UTC (permalink / raw) To: Avi Kivity, Frank Ch. Eigler; +Cc: Mark Wielaard, systemtap On 08/25/2016 11:21 AM, Avi Kivity wrote: > Hi, > > Should I wait for ongoing improvement here, or shall I look elsewhere > for my tracing needs? > > It would be a pity (for me) if I have to find another solution, because > systemtap has all the expressiveness and integration I need. But it has > a dramatic impact on my application runtime. > > I was able to extract some useful data with perf probe/perf record, but > as soon as I need to qualify a probe point with runtime information, > perf falls short. As Frank mentioned in a previous email, it might be possible for us to switch to using straight kprobes instead of syscall tracing to handle mmap tracing. In your use case of calling epoll_wait() lots of times per second, that might be a *big* win. I'll see what can be done to add that feature. -- David Smith dsmith@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-25 17:37 ` David Smith @ 2016-08-26 18:39 ` Avi Kivity 2016-08-26 20:08 ` David Smith 0 siblings, 1 reply; 32+ messages in thread From: Avi Kivity @ 2016-08-26 18:39 UTC (permalink / raw) To: David Smith, Frank Ch. Eigler; +Cc: Mark Wielaard, systemtap On 08/25/2016 08:37 PM, David Smith wrote: > On 08/25/2016 11:21 AM, Avi Kivity wrote: >> Hi, >> >> Should I wait for ongoing improvement here, or shall I look elsewhere >> for my tracing needs? >> >> It would be a pity (for me) if I have to find another solution, because >> systemtap has all the expressiveness and integration I need. But it has >> a dramatic impact on my application runtime. >> >> I was able to extract some useful data with perf probe/perf record, but >> as soon as I need to qualify a probe point with runtime information, >> perf falls short. > As Frank mentioned in a previous email, it might be possible for us to > switch to using straight kprobes instead of syscall tracing to handle > mmap tracing. In your use case of calling epoll_wait() lots of times per > second, that might be a *big* win. > > I'll see what can be done to add that feature. > Thanks a lot. I'll be happy to test patches. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-26 18:39 ` Avi Kivity @ 2016-08-26 20:08 ` David Smith 2016-08-26 21:21 ` Josh Stone 2016-08-28 13:30 ` Avi Kivity 0 siblings, 2 replies; 32+ messages in thread From: David Smith @ 2016-08-26 20:08 UTC (permalink / raw) To: Avi Kivity, Frank Ch. Eigler; +Cc: Mark Wielaard, systemtap [-- Attachment #1: Type: text/plain, Size: 1344 bytes --] On 08/26/2016 01:39 PM, Avi Kivity wrote: > On 08/25/2016 08:37 PM, David Smith wrote: >> On 08/25/2016 11:21 AM, Avi Kivity wrote: >>> Hi, >>> >>> Should I wait for ongoing improvement here, or shall I look elsewhere >>> for my tracing needs? >>> >>> It would be a pity (for me) if I have to find another solution, because >>> systemtap has all the expressiveness and integration I need. But it has >>> a dramatic impact on my application runtime. >>> >>> I was able to extract some useful data with perf probe/perf record, but >>> as soon as I need to qualify a probe point with runtime information, >>> perf falls short. >> As Frank mentioned in a previous email, it might be possible for us to >> switch to using straight kprobes instead of syscall tracing to handle >> mmap tracing. In your use case of calling epoll_wait() lots of times per >> second, that might be a *big* win. >> >> I'll see what can be done to add that feature. >> > > Thanks a lot. I'll be happy to test patches. OK, since you asked... Here's a patch I'm testing that tries to do prefiltering when a syscall occurs, so we don't have to take that lock. Please rebuild with it, and let me know if it (a) works, and (b) has lower overhead in your situation. -- David Smith dsmith@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax) [-- Attachment #2: prefilter.patch --] [-- Type: text/x-patch, Size: 4003 bytes --] diff --git a/runtime/stp_utrace.c b/runtime/stp_utrace.c index bb2d663..263b302 100644 --- a/runtime/stp_utrace.c +++ b/runtime/stp_utrace.c @@ -26,6 +26,7 @@ #include <trace/events/syscalls.h> #include "stp_task_work.c" #include "linux/stp_tracepoint.h" +#include "syscall.h" #include "stp_helper_lock.h" @@ -116,6 +117,8 @@ static void utrace_report_exec(void *cb_data __attribute__ ((unused)), #define __UTRACE_REGISTERED 1 static atomic_t utrace_state = ATOMIC_INIT(__UTRACE_UNREGISTERED); +static int __stp_utrace_only_vma_syscall_tracing = 0; + // If wake_up_state() is exported, use it. #if defined(STAPCONF_WAKE_UP_STATE_EXPORTED) #define stp_wake_up_state wake_up_state @@ -2064,6 +2067,23 @@ static void utrace_report_syscall_entry(void *cb_data __attribute__ ((unused)), if (atomic_read(&utrace_state) != __UTRACE_REGISTERED) return; + + /* If we're only doing syscall tracing for VMA tracking, then + * let's prefilter the syscall numbers that we need before + * calling the handlers. + * + * This allows us to avoid finding the utrace struct, which + * avoids some locking. */ + if (__stp_utrace_only_vma_syscall_tracing) { + long syscall_no = _stp_syscall_get_nr(task, regs); + if (syscall_no != MMAP_SYSCALL_NO(task) + && syscall_no != MMAP2_SYSCALL_NO(task) + && syscall_no != MPROTECT_SYSCALL_NO(task) + && syscall_no != MUNMAP_SYSCALL_NO(task)) + return; + } + + /* Grab the utrace struct for this task. */ utrace = task_utrace_struct(task); /* FIXME: Is this 100% correct? */ @@ -2103,6 +2123,23 @@ static void utrace_report_syscall_exit(void *cb_data __attribute__ ((unused)), if (atomic_read(&utrace_state) != __UTRACE_REGISTERED) return; + + /* If we're only doing syscall tracing for VMA tracking, then + * let's prefilter the syscall numbers that we need before + * calling the handlers. + * + * This allows us to avoid finding the utrace struct, which + * avoids some locking. */ + if (__stp_utrace_only_vma_syscall_tracing) { + long syscall_no = _stp_syscall_get_nr(task, regs); + if (syscall_no != MMAP_SYSCALL_NO(task) + && syscall_no != MMAP2_SYSCALL_NO(task) + && syscall_no != MPROTECT_SYSCALL_NO(task) + && syscall_no != MUNMAP_SYSCALL_NO(task)) + return; + } + + /* Grab the utrace struct for this task. */ utrace = task_utrace_struct(task); /* FIXME: Is this 100% correct? */ @@ -2469,4 +2506,10 @@ static void utrace_report_work(struct task_work *work) stp_task_work_func_done(); } +/* If this is called, we're *only* doing vma syscall tracing, no other + * syscall tracing is being done. */ +static void stap_utrace_only_vma_syscall_tracing(void) +{ + __stp_utrace_only_vma_syscall_tracing = 1; +} #endif /* _STP_UTRACE_C */ diff --git a/runtime/stp_utrace.h b/runtime/stp_utrace.h index 9f162bb..367109c 100644 --- a/runtime/stp_utrace.h +++ b/runtime/stp_utrace.h @@ -329,4 +329,8 @@ static inline enum utrace_resume_action utrace_resume_action(u32 action) return action & UTRACE_RESUME_MASK; } +/* If this is called, we're *only* doing vma syscall tracing, no other + * syscall tracing is being done. */ +static void stap_utrace_only_vma_syscall_tracing(void); + #endif /* _STP_UTRACE_H */ diff --git a/tapset-utrace.cxx b/tapset-utrace.cxx index 8580800..4321053 100644 --- a/tapset-utrace.cxx +++ b/tapset-utrace.cxx @@ -1208,6 +1208,12 @@ utrace_derived_probe_group::emit_module_linux_init (systemtap_session& s) return; s.op->newline() << "/* ---- utrace probes ---- */"; + if (!flags_seen[UDPF_SYSCALL] && !flags_seen[UDPF_SYSCALL_RETURN]) + { + s.op->newline() << "#if !defined(CONFIG_UTRACE)"; + s.op->newline() << "stap_utrace_only_vma_syscall_tracing();"; + s.op->newline() << "#endif"; + } s.op->newline() << "for (i=0; i<ARRAY_SIZE(stap_utrace_probes); i++) {"; s.op->newline(1) << "struct stap_utrace_probe *p = &stap_utrace_probes[i];"; s.op->newline() << "probe_point = p->probe->pp;"; // for error messages ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-26 20:08 ` David Smith @ 2016-08-26 21:21 ` Josh Stone 2016-08-28 13:30 ` Avi Kivity 1 sibling, 0 replies; 32+ messages in thread From: Josh Stone @ 2016-08-26 21:21 UTC (permalink / raw) To: systemtap, David Smith On 08/26/2016 01:07 PM, David Smith wrote: > --- a/tapset-utrace.cxx > +++ b/tapset-utrace.cxx > @@ -1208,6 +1208,12 @@ utrace_derived_probe_group::emit_module_linux_init (systemtap_session& s) > return; > > s.op->newline() << "/* ---- utrace probes ---- */"; > + if (!flags_seen[UDPF_SYSCALL] && !flags_seen[UDPF_SYSCALL_RETURN]) > + { > + s.op->newline() << "#if !defined(CONFIG_UTRACE)"; > + s.op->newline() << "stap_utrace_only_vma_syscall_tracing();"; > + s.op->newline() << "#endif"; > + } Will this be emitted if there are no utrace probes at all? I think not, so perhaps the default should be flipped. Assume filtered syscalls by default, and let this code turn on the full mode if process.syscall is in use. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-26 20:08 ` David Smith 2016-08-26 21:21 ` Josh Stone @ 2016-08-28 13:30 ` Avi Kivity 2016-08-28 13:35 ` Avi Kivity 2016-08-29 18:26 ` David Smith 1 sibling, 2 replies; 32+ messages in thread From: Avi Kivity @ 2016-08-28 13:30 UTC (permalink / raw) To: David Smith, Frank Ch. Eigler; +Cc: Mark Wielaard, systemtap On 08/26/2016 11:07 PM, David Smith wrote: > On 08/26/2016 01:39 PM, Avi Kivity wrote: >> On 08/25/2016 08:37 PM, David Smith wrote: >>> On 08/25/2016 11:21 AM, Avi Kivity wrote: >>>> Hi, >>>> >>>> Should I wait for ongoing improvement here, or shall I look elsewhere >>>> for my tracing needs? >>>> >>>> It would be a pity (for me) if I have to find another solution, because >>>> systemtap has all the expressiveness and integration I need. But it has >>>> a dramatic impact on my application runtime. >>>> >>>> I was able to extract some useful data with perf probe/perf record, but >>>> as soon as I need to qualify a probe point with runtime information, >>>> perf falls short. >>> As Frank mentioned in a previous email, it might be possible for us to >>> switch to using straight kprobes instead of syscall tracing to handle >>> mmap tracing. In your use case of calling epoll_wait() lots of times per >>> second, that might be a *big* win. >>> >>> I'll see what can be done to add that feature. >>> >> Thanks a lot. I'll be happy to test patches. > OK, since you asked... > > Here's a patch I'm testing that tries to do prefiltering when a syscall > occurs, so we don't have to take that lock. > > Please rebuild with it, and let me know if it (a) works, and (b) has > lower overhead in your situation. > With an unloaded system, systemtap almost vanishes from the profile. This is on a 2s24c48t system, running epoll_pwait() and polling on user memory locations in a tight loop. When I load my system, I start to see contention on systemtap locks: 9.26% scylla [kernel.kallsyms] [k] delay_tsc | ---delay_tsc | --9.25%--__const_udelay stp_lock_probe.constprop.104 | |--4.94%--probe_2819 | stapiu_probe_prehandler | uprobe_notify_resume | exit_to_usermode_loop | prepare_exit_to_usermode | retint_user | reactor::run_tasks | reactor::run | | | --4.94%--smp::configure(boost::program_options::variables_map)::{lambda()#3}::operator() | posix_thread::start_routine | start_thread | __clone | --4.30%--probe_2822 stapiu_probe_prehandler uprobe_notify_resume exit_to_usermode_loop prepare_exit_to_usermode retint_user reactor::run_tasks reactor::run | --4.30%--smp::configure(boost::program_options::variables_map)::{lambda()#3}::operator() posix_thread::start_routine start_thread __clone 3.75% scylla [kernel.kallsyms] [k] queued_spin_lock_slowpath | ---queued_spin_lock_slowpath | --3.74%--_raw_spin_lock | --3.63%--task_utrace_struct | |--2.49%--utrace_report_syscall_entry | syscall_trace_enter_phase2 | syscall_trace_enter | do_syscall_64 | return_from_SYSCALL_64 | | | --2.46%--epoll_pwait | reactor_backend_epoll::wait_and_process | | | |--1.65%--std::_Function_handler<bool (), reactor::run()::{lambda()#7}>::_M_invoke | | reactor::run | | | | | --1.55%--smp::configure(boost::program_options::variables_map)::{lambda()#3}::operator() | | posix_thread::start_routine | | start_thread | | __clone | | | --0.81%--std::_Function_handler<bool (), reactor::run()::{lambda()#8}>::_M_invoke | logalloc::tracker::impl::compact_on_idle | reactor::run | | | --0.76%--smp::configure(boost::program_options::variables_map)::{lambda()#3}::operator() | posix_thread::start_routine | start_thread | __clone | --1.14%--utrace_report_syscall_exit syscall_slow_exit_work do_syscall_64 return_from_SYSCALL_64 | --1.12%--epoll_pwait reactor_backend_epoll::wait_and_process | --0.75%--std::_Function_handler<bool (), reactor::run()::{lambda()#7}>::_M_invoke reactor::run | --0.69%--smp::configure(boost::program_options::variables_map)::{lambda()#3}::operator() posix_thread::start_routine start_thread __clone I may be able to move the probe points to a less frequently accessed point (at some loss in accuracy), so this patch ought to get me started. Thanks a lot! ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-28 13:30 ` Avi Kivity @ 2016-08-28 13:35 ` Avi Kivity 2016-08-28 13:45 ` Avi Kivity 2016-08-29 18:26 ` David Smith 1 sibling, 1 reply; 32+ messages in thread From: Avi Kivity @ 2016-08-28 13:35 UTC (permalink / raw) To: David Smith, Frank Ch. Eigler; +Cc: Mark Wielaard, systemtap On 08/28/2016 04:29 PM, Avi Kivity wrote: > > > I may be able to move the probe points to a less frequently accessed > point (at some loss in accuracy), so this patch ought to get me > started. Thanks a lot! > Alas, even with a light load I see delay_tsc jump to the #1 spot in the profile, so without a thread local variable, I can't use systemtap. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-28 13:35 ` Avi Kivity @ 2016-08-28 13:45 ` Avi Kivity 0 siblings, 0 replies; 32+ messages in thread From: Avi Kivity @ 2016-08-28 13:45 UTC (permalink / raw) To: David Smith, Frank Ch. Eigler; +Cc: Mark Wielaard, systemtap On 08/28/2016 04:34 PM, Avi Kivity wrote: > > > On 08/28/2016 04:29 PM, Avi Kivity wrote: >> >> >> I may be able to move the probe points to a less frequently accessed >> point (at some loss in accuracy), so this patch ought to get me >> started. Thanks a lot! >> > > Alas, even with a light load I see delay_tsc jump to the #1 spot in > the profile, so without a thread local variable, I can't use systemtap. In addition, my server crashed. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-28 13:30 ` Avi Kivity 2016-08-28 13:35 ` Avi Kivity @ 2016-08-29 18:26 ` David Smith 2016-08-31 16:44 ` Avi Kivity 1 sibling, 1 reply; 32+ messages in thread From: David Smith @ 2016-08-29 18:26 UTC (permalink / raw) To: Avi Kivity, Frank Ch. Eigler; +Cc: Mark Wielaard, systemtap [-- Attachment #1: Type: text/plain, Size: 1906 bytes --] On 08/28/2016 08:29 AM, Avi Kivity wrote: > > > On 08/26/2016 11:07 PM, David Smith wrote: >> On 08/26/2016 01:39 PM, Avi Kivity wrote: >>> On 08/25/2016 08:37 PM, David Smith wrote: >>>> On 08/25/2016 11:21 AM, Avi Kivity wrote: >>>>> Hi, >>>>> >>>>> Should I wait for ongoing improvement here, or shall I look elsewhere >>>>> for my tracing needs? >>>>> >>>>> It would be a pity (for me) if I have to find another solution, >>>>> because >>>>> systemtap has all the expressiveness and integration I need. But >>>>> it has >>>>> a dramatic impact on my application runtime. >>>>> >>>>> I was able to extract some useful data with perf probe/perf record, >>>>> but >>>>> as soon as I need to qualify a probe point with runtime information, >>>>> perf falls short. >>>> As Frank mentioned in a previous email, it might be possible for us to >>>> switch to using straight kprobes instead of syscall tracing to handle >>>> mmap tracing. In your use case of calling epoll_wait() lots of times >>>> per >>>> second, that might be a *big* win. >>>> >>>> I'll see what can be done to add that feature. >>>> >>> Thanks a lot. I'll be happy to test patches. >> OK, since you asked... >> >> Here's a patch I'm testing that tries to do prefiltering when a syscall >> occurs, so we don't have to take that lock. >> >> Please rebuild with it, and let me know if it (a) works, and (b) has >> lower overhead in your situation. >> > > With an unloaded system, systemtap almost vanishes from the profile. > This is on a 2s24c48t system, running epoll_pwait() and polling on user > memory locations in a tight loop. ... stuff deleted ... After looking into it, Josh was right in thinking I had the test backwards. Can you try the following patch (after reverting the last patch) and let us know what you find? -- David Smith dsmith@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax) [-- Attachment #2: prefilter2.patch --] [-- Type: text/x-patch, Size: 3855 bytes --] diff --git a/runtime/stp_utrace.c b/runtime/stp_utrace.c index bb2d663..d2d5b70 100644 --- a/runtime/stp_utrace.c +++ b/runtime/stp_utrace.c @@ -26,6 +26,7 @@ #include <trace/events/syscalls.h> #include "stp_task_work.c" #include "linux/stp_tracepoint.h" +#include "syscall.h" #include "stp_helper_lock.h" @@ -116,6 +117,8 @@ static void utrace_report_exec(void *cb_data __attribute__ ((unused)), #define __UTRACE_REGISTERED 1 static atomic_t utrace_state = ATOMIC_INIT(__UTRACE_UNREGISTERED); +static int __stp_utrace_syscall_tracing = 0; + // If wake_up_state() is exported, use it. #if defined(STAPCONF_WAKE_UP_STATE_EXPORTED) #define stp_wake_up_state wake_up_state @@ -2064,6 +2067,23 @@ static void utrace_report_syscall_entry(void *cb_data __attribute__ ((unused)), if (atomic_read(&utrace_state) != __UTRACE_REGISTERED) return; + + /* If we're only doing syscall tracing for VMA tracking, then + * let's prefilter the syscall numbers that we need before + * calling the handlers. + * + * This allows us to avoid finding the utrace struct, which + * avoids some locking. */ + if (!__stp_utrace_syscall_tracing) { + long syscall_no = _stp_syscall_get_nr(task, regs); + if (syscall_no != MMAP_SYSCALL_NO(task) + && syscall_no != MMAP2_SYSCALL_NO(task) + && syscall_no != MPROTECT_SYSCALL_NO(task) + && syscall_no != MUNMAP_SYSCALL_NO(task)) + return; + } + + /* Grab the utrace struct for this task. */ utrace = task_utrace_struct(task); /* FIXME: Is this 100% correct? */ @@ -2103,6 +2123,23 @@ static void utrace_report_syscall_exit(void *cb_data __attribute__ ((unused)), if (atomic_read(&utrace_state) != __UTRACE_REGISTERED) return; + + /* If we're only doing syscall tracing for VMA tracking, then + * let's prefilter the syscall numbers that we need before + * calling the handlers. + * + * This allows us to avoid finding the utrace struct, which + * avoids some locking. */ + if (!__stp_utrace_syscall_tracing) { + long syscall_no = _stp_syscall_get_nr(task, regs); + if (syscall_no != MMAP_SYSCALL_NO(task) + && syscall_no != MMAP2_SYSCALL_NO(task) + && syscall_no != MPROTECT_SYSCALL_NO(task) + && syscall_no != MUNMAP_SYSCALL_NO(task)) + return; + } + + /* Grab the utrace struct for this task. */ utrace = task_utrace_struct(task); /* FIXME: Is this 100% correct? */ @@ -2469,4 +2506,9 @@ static void utrace_report_work(struct task_work *work) stp_task_work_func_done(); } +/* If this is called, we're doing utrace-based syscall tracing. */ +static void stap_utrace_syscall_tracing(void) +{ + __stp_utrace_syscall_tracing = 1; +} #endif /* _STP_UTRACE_C */ diff --git a/runtime/stp_utrace.h b/runtime/stp_utrace.h index 9f162bb..34968b1 100644 --- a/runtime/stp_utrace.h +++ b/runtime/stp_utrace.h @@ -329,4 +329,7 @@ static inline enum utrace_resume_action utrace_resume_action(u32 action) return action & UTRACE_RESUME_MASK; } +/* If this is called, we're doing utrace-based syscall tracing. */ +static void stap_utrace_syscall_tracing(void); + #endif /* _STP_UTRACE_H */ diff --git a/tapset-utrace.cxx b/tapset-utrace.cxx index 8580800..154c335 100644 --- a/tapset-utrace.cxx +++ b/tapset-utrace.cxx @@ -1208,6 +1208,12 @@ utrace_derived_probe_group::emit_module_linux_init (systemtap_session& s) return; s.op->newline() << "/* ---- utrace probes ---- */"; + if (flags_seen[UDPF_SYSCALL] || flags_seen[UDPF_SYSCALL_RETURN]) + { + s.op->newline() << "#if !defined(CONFIG_UTRACE)"; + s.op->newline() << "stap_utrace_syscall_tracing();"; + s.op->newline() << "#endif"; + } s.op->newline() << "for (i=0; i<ARRAY_SIZE(stap_utrace_probes); i++) {"; s.op->newline(1) << "struct stap_utrace_probe *p = &stap_utrace_probes[i];"; s.op->newline() << "probe_point = p->probe->pp;"; // for error messages ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-29 18:26 ` David Smith @ 2016-08-31 16:44 ` Avi Kivity 0 siblings, 0 replies; 32+ messages in thread From: Avi Kivity @ 2016-08-31 16:44 UTC (permalink / raw) To: David Smith, Frank Ch. Eigler; +Cc: Mark Wielaard, systemtap On 08/29/2016 09:26 PM, David Smith wrote: > On 08/28/2016 08:29 AM, Avi Kivity wrote: >> >> On 08/26/2016 11:07 PM, David Smith wrote: >>> On 08/26/2016 01:39 PM, Avi Kivity wrote: >>>> On 08/25/2016 08:37 PM, David Smith wrote: >>>>> On 08/25/2016 11:21 AM, Avi Kivity wrote: >>>>>> Hi, >>>>>> >>>>>> Should I wait for ongoing improvement here, or shall I look elsewhere >>>>>> for my tracing needs? >>>>>> >>>>>> It would be a pity (for me) if I have to find another solution, >>>>>> because >>>>>> systemtap has all the expressiveness and integration I need. But >>>>>> it has >>>>>> a dramatic impact on my application runtime. >>>>>> >>>>>> I was able to extract some useful data with perf probe/perf record, >>>>>> but >>>>>> as soon as I need to qualify a probe point with runtime information, >>>>>> perf falls short. >>>>> As Frank mentioned in a previous email, it might be possible for us to >>>>> switch to using straight kprobes instead of syscall tracing to handle >>>>> mmap tracing. In your use case of calling epoll_wait() lots of times >>>>> per >>>>> second, that might be a *big* win. >>>>> >>>>> I'll see what can be done to add that feature. >>>>> >>>> Thanks a lot. I'll be happy to test patches. >>> OK, since you asked... >>> >>> Here's a patch I'm testing that tries to do prefiltering when a syscall >>> occurs, so we don't have to take that lock. >>> >>> Please rebuild with it, and let me know if it (a) works, and (b) has >>> lower overhead in your situation. >>> >> With an unloaded system, systemtap almost vanishes from the profile. >> This is on a 2s24c48t system, running epoll_pwait() and polling on user >> memory locations in a tight loop. > ... stuff deleted ... > > After looking into it, Josh was right in thinking I had the test > backwards. Can you try the following patch (after reverting the last > patch) and let us know what you find? > With the new patch, I get ERROR: probe overhead exceeded threshold ERROR: probe overhead exceeded threshold WARNING: Number of errors: 2, skipped probes: 0 WARNING: /home/avi/systemtap.f23/bin/staprun exited with status: 1 Pass 5: run failed. [man error::pass5] with -g --suppress-time-limits, I get the usual spinlock badness. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-09 6:51 ` Avi Kivity 2016-08-09 13:54 ` Mark Wielaard @ 2016-08-09 15:17 ` David Smith 2016-08-09 15:28 ` Avi Kivity 1 sibling, 1 reply; 32+ messages in thread From: David Smith @ 2016-08-09 15:17 UTC (permalink / raw) To: Avi Kivity, systemtap On 08/09/2016 01:51 AM, Avi Kivity wrote: > > > On 08/08/2016 05:58 PM, David Smith wrote: >> >>> 2. Inlined functions >>> >>> From the manual pages, it seems that inlined functions can be probed >>> (minus the .return probe), but I just get an error: >>> >>> semantic error: while resolving probe point: identifier 'process' at >>> script/home/avi/seastar/debug/task-latency.stap:3:7 >>> source: probe >>> process("scylla").function("reactor::run_tasks()") { >>> ^ >>> >>> semantic error: no match (similar functions: >>> _ZN7reactor14run_exit_tasksEv, statfs, dup, mkdir, ntohs) >> We're at the mercy of the compiler and the quality of the debuginfo it >> produces here when looking for inlined functions. >> >> If you want to investigate further here, you'll need to do the following: >> >> # eu-readelf -N --debug-dump=info scylla > scylla.log >> >> In scylla.log, look and see if you can find a subprogram with the name >> of the function you are interested in. > > [1417f4] subprogram > external (flag_present) Yes > name (strp) "run_tasks" > decl_file (data1) 5 > decl_line (data2) 812 > linkage_name (strp) > "_ZN7reactor9run_tasksER15circular_bufferISt10unique_ptrI4taskSt14default_deleteIS2_EESaIS5_EE" > > declaration (flag_present) Yes > object_pointer (ref4) [141808] > sibling (ref4) [141813] > [141808] formal_parameter > type (ref4) [1423bd] > artificial (flag_present) Yes > [14180d] formal_parameter > type (ref4) [14a8a8] According to the above, that function isn't inlined. You might just try the following to see what functions stap can see: # stap -l 'process("scylla").function("*")' | grep run_tasks > The debugger has no trouble setting breakpoints on the function or > showing it in backtraces. In any case I sprinkled some static probes > there. > >>> 3. Process CPU timers >>> >>> (more of a feature request) >>> >>> I'm trying to find causes of latency in my program. To do that, I'm >>> running a periodic timer and checking whether a function takes more time >>> than some threshold. >>> >>> Ideally, I'd be able to arm the timer on the function entry point and >>> disarm it on exit, rather than have it run continuously; this would need >>> to be a per-thread cpu-time timer (e.g. CLOCK_THREAD_CPUTIME_ID)/ >>> >>> Here's my current script for reference ("running" and "start_time" need >>> to become maps for it to be thread-safe): >>> >>> #!/usr/bin/stap >>> >>> global start_time >>> global running >>> >>> probe begin { >>> running = 0 >>> } >>> >>> probe >>> process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_start") >>> >>> { >>> start_time = gettimeofday_us() >>> running = 1 >>> } >>> >>> probe >>> process("/home/avi/urchin/build/release/scylla").mark("reactor_run_tasks_end") >>> >>> { >>> running = 0 >>> } >>> >>> probe timer.ms(10) { >>> now = gettimeofday_us() >>> if (running && (now - start_time) > 30000) { >>> printf("detected tasks running for >30ms\n") >>> print_usyms(ubacktrace()) >>> } >>> } >>> >>> I'd appreciate any tips as to whether there's a better way to do this. >> The above isn't really going to work well, for several reasons: >> >> 1) You've only got one global 'start_time' and 'running' variables. If >> scylla is multithreaded or more than one instance is running, that isn't >> going to work. >> >> To fix this, you'd want to do convert them to arrays and index them by >> thread ids, like this: >> >> start_time[tid()] = gettimeofday_us() > > Yes, I found that out and my latest version works like this, but I ran > into severe scaling issues. > > I also switched to timer.profile. What's its resolution on normal > machines? timer.profile probe resolution is CONFIG_HZ. Check your kernel config and see "man stapprobes" for more details. -- David Smith dsmith@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Some newbie questions 2016-08-09 15:17 ` David Smith @ 2016-08-09 15:28 ` Avi Kivity 0 siblings, 0 replies; 32+ messages in thread From: Avi Kivity @ 2016-08-09 15:28 UTC (permalink / raw) To: David Smith, systemtap On 08/09/2016 06:17 PM, David Smith wrote: > On 08/09/2016 01:51 AM, Avi Kivity wrote: >> >> On 08/08/2016 05:58 PM, David Smith wrote: >>>> 2. Inlined functions >>>> >>>> From the manual pages, it seems that inlined functions can be probed >>>> (minus the .return probe), but I just get an error: >>>> >>>> semantic error: while resolving probe point: identifier 'process' at >>>> script/home/avi/seastar/debug/task-latency.stap:3:7 >>>> source: probe >>>> process("scylla").function("reactor::run_tasks()") { >>>> ^ >>>> >>>> semantic error: no match (similar functions: >>>> _ZN7reactor14run_exit_tasksEv, statfs, dup, mkdir, ntohs) >>> We're at the mercy of the compiler and the quality of the debuginfo it >>> produces here when looking for inlined functions. >>> >>> If you want to investigate further here, you'll need to do the following: >>> >>> # eu-readelf -N --debug-dump=info scylla > scylla.log >>> >>> In scylla.log, look and see if you can find a subprogram with the name >>> of the function you are interested in. >> [1417f4] subprogram >> external (flag_present) Yes >> name (strp) "run_tasks" >> decl_file (data1) 5 >> decl_line (data2) 812 >> linkage_name (strp) >> "_ZN7reactor9run_tasksER15circular_bufferISt10unique_ptrI4taskSt14default_deleteIS2_EESaIS5_EE" >> >> declaration (flag_present) Yes >> object_pointer (ref4) [141808] >> sibling (ref4) [141813] >> [141808] formal_parameter >> type (ref4) [1423bd] >> artificial (flag_present) Yes >> [14180d] formal_parameter >> type (ref4) [14a8a8] > According to the above, that function isn't inlined. You might just try > the following to see what functions stap can see: > > # stap -l 'process("scylla").function("*")' | grep run_tasks process("/usr/bin/scylla").function("run_tasks@core/reactor.cc:1362") Strangely not qualified by "reactor::". >> I also switched to timer.profile. What's its resolution on normal >> machines? > timer.profile probe resolution is CONFIG_HZ. Check your kernel config > and see "man stapprobes" for more details. > Thanks. ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2016-08-31 16:44 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-08-07 8:08 Some newbie questions Avi Kivity 2016-08-07 9:16 ` Avi Kivity 2016-08-08 14:58 ` David Smith 2016-08-09 6:51 ` Avi Kivity 2016-08-09 13:54 ` Mark Wielaard 2016-08-09 14:11 ` Avi Kivity 2016-08-10 15:33 ` Frank Ch. Eigler 2016-08-10 15:39 ` Avi Kivity 2016-08-10 15:40 ` Avi Kivity 2016-08-10 16:47 ` Frank Ch. Eigler 2016-08-11 7:51 ` Avi Kivity 2016-08-11 8:13 ` Avi Kivity 2016-08-11 16:18 ` Frank Ch. Eigler 2016-08-17 15:15 ` Avi Kivity 2016-08-17 18:06 ` David Smith 2016-08-17 18:22 ` Avi Kivity 2016-08-17 18:10 ` Frank Ch. Eigler 2016-08-17 18:30 ` Avi Kivity 2016-08-17 18:35 ` Frank Ch. Eigler 2016-08-17 18:42 ` Avi Kivity 2016-08-25 16:21 ` Avi Kivity 2016-08-25 17:37 ` David Smith 2016-08-26 18:39 ` Avi Kivity 2016-08-26 20:08 ` David Smith 2016-08-26 21:21 ` Josh Stone 2016-08-28 13:30 ` Avi Kivity 2016-08-28 13:35 ` Avi Kivity 2016-08-28 13:45 ` Avi Kivity 2016-08-29 18:26 ` David Smith 2016-08-31 16:44 ` Avi Kivity 2016-08-09 15:17 ` David Smith 2016-08-09 15:28 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).