kernel function probe overhead

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* kernel function probe overhead
@ 2016-11-25 16:07 Aubrey Li
  2016-11-25 17:22 ` Frank Ch. Eigler
  0 siblings, 1 reply; 9+ messages in thread
From: Aubrey Li @ 2016-11-25 16:07 UTC (permalink / raw)
  To: systemtap

[-- Attachment #1: Type: text/plain, Size: 1458 bytes --]

Hi list,

I'm trying to profiling some Linux kernel subsystems by systemtap. So
I want to know the overhead of various kinds of systemtap probes.

Basically I sampled two kernel function probes,

probe kernel.function(arch_cpu_idle_enter).call
probe kernel.function(local_touch_nmi).call

========
a snip objdump of the code:

ffffffff81035740 <arch_cpu_idle_enter>:
ffffffff81035740:       e8 4b 76 55 00          callq
ffffffff8158cd90 <__fentry__>
ffffffff81035745:       e8 f6 96 ff ff          callq
ffffffff8102ee40 <local_touch_nmi>
========

because local irq is disabled, I suppose the duration of these two
probes to be ZERO.

However, I run the attached stp script and here is what I got:

$ stap test.stp
Press ^C to stop
Total time: 10106 miliseconds
t_end @count=2908 @min=676 @max=88152 @sum=15005802 @avg=5160

the duration varies from 676 to 88152, I understand all the tracing
mechanism should have overhead, while 676ns is acceptable, but 88152
is not.

And I have no idea why its range is so big?

Please let me know if I can provide more information on my side.

Thanks,
-Aubrey

$ stap -V
Systemtap translator/driver (version 3.1/0.167, commit
release-3.0-273-g7d27ff5084d6)
Copyright (C) 2005-2016 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
tested kernel versions: 2.6.18 ... 4.8
enabled features: NLS

$ uname -a
Linux aubrey-hp 4.8.10+ #5 SMP Fri Nov 25 02:11:32 CST 2016 x86_64 GNU/Linux

[-- Attachment #2: test.stp --]
[-- Type: application/octet-stream, Size: 527 bytes --]

#!/usr/local/bin/stap

global init_time

global t_start
global t_end

probe begin{
        printf("Press ^C to stop\n")
        init_time = gettimeofday_ms()
}

probe kernel.function("arch_cpu_idle_enter").call
{
	t_start[cpu()] = gettimeofday_ns()
}

probe kernel.function("local_touch_nmi").call
{
	if (t_start[cpu()]) {
		t_end <<< gettimeofday_ns() - t_start[cpu()]
		t_start[cpu()] = 0
	}
}

probe timer.s(10)
{
        exit()
}

probe end {
        printf("Total time: %d miliseconds\n", gettimeofday_ms() - init_time)
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel function probe overhead
  2016-11-25 16:07 kernel function probe overhead Aubrey Li
@ 2016-11-25 17:22 ` Frank Ch. Eigler
  2016-11-26  6:18   ` Aubrey Li
  0 siblings, 1 reply; 9+ messages in thread
From: Frank Ch. Eigler @ 2016-11-25 17:22 UTC (permalink / raw)
  To: Aubrey Li; +Cc: systemtap

aubreylee wrote:

> [...]
> Basically I sampled two kernel function probes,
>
> probe kernel.function(arch_cpu_idle_enter).call
> probe kernel.function(local_touch_nmi).call

... where one directly calls the other.  Good test!

> because local irq is disabled, I suppose the duration of these two
> probes to be ZERO.

Yes, ideally.

> However, I run the attached stp script and here is what I got:
> [...]
> Total time: 10106 miliseconds
> t_end @count=2908 @min=676 @max=88152 @sum=15005802 @avg=5160

> the duration varies from 676 to 88152, I understand all the tracing
> mechanism should have overhead, while 676ns is acceptable, but 88152
> is not.

> probe kernel.function("arch_cpu_idle_enter").call
> 	t_start[cpu()] = gettimeofday_ns()
> probe kernel.function("local_touch_nmi").call
>	t_end <<< gettimeofday_ns() - t_start[cpu()]

A couple of things come to mind.  Use of gettimeofday_ns() function uses
a software-estimated real-time clock, which may not be steady enough for
high-grained measurements like this.  Consider using get_cycles() or the
functions in the [man tapset::timestamp_monotonic] tapset.

Another possibility is jitter like caching effects within the kernel due
to activity elsewhere.  The average is pretty low.  Consider printing a
histogram of t_end, or (if using git systemtap) its @variance.

Another possibility is contention over the t_start[] variable, if
multiple processors are running this region of code, so trigger the
systemtap probes concurrently.  systemtap automatically locks such
global variables, esp. if they contain non-aggregate values like the
scalar timestamps.

Consider using "stap -t FOO.stp" to have the tool print internal
estimates of its own probes' runtimes.  That would include lock
contention, but exclude kernel-side overheads like the code involved in
the its kprobes machinery.

- FChE

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel function probe overhead
  2016-11-25 17:22 ` Frank Ch. Eigler
@ 2016-11-26  6:18   ` Aubrey Li
  2016-11-28 19:19     ` Frank Ch. Eigler
  0 siblings, 1 reply; 9+ messages in thread
From: Aubrey Li @ 2016-11-26  6:18 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Sat, Nov 26, 2016 at 1:22 AM, Frank Ch. Eigler <fche@redhat.com> wrote:
>

Thanks for your reply, Frank.

> aubreylee wrote:
>
>> [...]
>> Basically I sampled two kernel function probes,
>>
>> probe kernel.function(arch_cpu_idle_enter).call
>> probe kernel.function(local_touch_nmi).call
>
> ... where one directly calls the other.  Good test!
>
>
>> because local irq is disabled, I suppose the duration of these two
>> probes to be ZERO.
>
> Yes, ideally.
>
>
>> However, I run the attached stp script and here is what I got:
>> [...]
>> Total time: 10106 miliseconds
>> t_end @count=2908 @min=676 @max=88152 @sum=15005802 @avg=5160
>
>> the duration varies from 676 to 88152, I understand all the tracing
>> mechanism should have overhead, while 676ns is acceptable, but 88152
>> is not.
>
>> probe kernel.function("arch_cpu_idle_enter").call
>>       t_start[cpu()] = gettimeofday_ns()
>> probe kernel.function("local_touch_nmi").call
>>       t_end <<< gettimeofday_ns() - t_start[cpu()]
>
> A couple of things come to mind.  Use of gettimeofday_ns() function uses
> a software-estimated real-time clock, which may not be steady enough for
> high-grained measurements like this

Timestamp function: gettimeofday_ns()
 value |-------------------------------------------------- count
    32 |                                                      0
    64 |                                                      0
   128 |                                                      6
   256 |@                                                    62
   512 |@@@@@@@@@@                                          434
  1024 |@@@@@@@@@@@                                         487
  2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  2014
  4096 |                                                     35
  8192 |@                                                    44
 16384 |                                                     28
 32768 |                                                     20
 65536 |                                                      0
131072 |                                                      0

Total time: 10096 miliseconds

> Consider using get_cycles()

Timestamp function: get_cycles()
 value |-------------------------------------------------- count
    64 |                                                      0
   128 |                                                      0
   256 |                                                      9
   512 |@@@                                                  91
  1024 |@@@@@@@@@@                                          313
  2048 |@@@@@@@@@@@@@@                                      420
  4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  1495
  8192 |@@@@@@@@@@@                                         350
 16384 |@@                                                   83
 32768 |                                                     29
 65536 |@                                                    31
131072 |                                                      1
262144 |                                                      0
524288 |                                                      0


> or the
> functions in the [man tapset::timestamp_monotonic] tapset.
>

Timestamp function: local_clock_ns()
 value |-------------------------------------------------- count
    32 |                                                      0
    64 |                                                      0
   128 |                                                      6
   256 |@                                                    57
   512 |@@@@@@@@@@                                          377
  1024 |@@@@@@@@@@@                                         426
  2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  1792
  4096 |                                                     29
  8192 |@                                                    50
 16384 |                                                     23
 32768 |                                                     23
 65536 |                                                      0
131072 |                                                      0

> Another possibility is jitter like caching effects within the kernel due
> to activity elsewhere.

I doubt caching effects cause so big variance.

>  The average is pretty low.  Consider printing a
> histogram of t_end, or (if using git systemtap) its @variance.
>
> Another possibility is contention over the t_start[] variable, if
> multiple processors are running this region of code, so trigger the
> systemtap probes concurrently.  systemtap automatically locks such
> global variables, esp. if they contain non-aggregate values like the
> scalar timestamps.

Do we have an approach to probe the function on CPU0 only?

>
> Consider using "stap -t FOO.stp" to have the tool print internal
> estimates of its own probes' runtimes.  That would include lock
> contention, but exclude kernel-side overheads like the code involved in
> the its kprobes machinery.

"-t" option output.
----- probe hit report:
begin, (test2.stp:8:1), hits: 1, cycles: 3280min/3280avg/3280max,
from: begin, index: 0
kernel.function("arch_cpu_idle_enter@arch/x86/kernel/process.c:277").call,
(test2.stp:13:1), hits: 2801, cycles: 560min/34290avg/230611max, from:
kernel.function("arch_cpu_idle_enter@arch/x86/kernel/process.c:277").call
from: kernel.function("arch_cpu_idle_enter").call, index: 2
kernel.function("local_touch_nmi@arch/x86/kernel/nmi.c:561").call,
(test2.stp:18:1), hits: 2801, cycles: 396min/5118avg/160096max, from:
kernel.function("local_touch_nmi@arch/x86/kernel/nmi.c:561").call
from: kernel.function("local_touch_nmi").call, index: 3
timer.s(10), (test2.stp:26:1), hits: 1, cycles:
126258min/126258avg/126258max, from: timer.s(10), index: 4
end, (test2.stp:32:1), hits: 1, cycles: 6240min/6240avg/6240max, from:
end, index: 5
----- refresh report:

Thanks,
-Aubrey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel function probe overhead
  2016-11-26  6:18   ` Aubrey Li
@ 2016-11-28 19:19     ` Frank Ch. Eigler
  2016-11-29  3:31       ` Aubrey Li
  0 siblings, 1 reply; 9+ messages in thread
From: Frank Ch. Eigler @ 2016-11-28 19:19 UTC (permalink / raw)
  To: Aubrey Li; +Cc: systemtap

aubreylee wrote:

>> A couple of things come to mind.  Use of gettimeofday_ns() function uses
>> a software-estimated real-time clock, which may not be steady enough for
>> high-grained measurements like this
>
> Timestamp function: gettimeofday_ns()
> [...]
> Timestamp function: get_cycles()
> [...]
> Timestamp function: local_clock_ns()
> [...]

Thanks for testing!  The distribution of those three seem remarkably
similar - a nice central peak, and a second small peak at 4-8x the
first.  It suggests we can probably rule out clock problems.

>>  The average is pretty low.  Consider printing a
>> histogram of t_end, or (if using git systemtap) its @variance.
>>
>> Another possibility is contention over the t_start[] variable, if
>> multiple processors are running this region of code, so trigger the
>> systemtap probes concurrently.  systemtap automatically locks such
>> global variables, esp. if they contain non-aggregate values like the
>> scalar timestamps.
>
> Do we have an approach to probe the function on CPU0 only?

Not directly; the kernel does not give us such control.  The probe could
include a

   if (cpu() != 0) next;

at the top of the probe handler body.  That way locks still are taken,
but released quickly on the other CPUs.

Another solution would be to have per-thread global variables.  Since
the kernel doesn't help us out with this, we'd have to emulate it
somehow, and for this to be wortwhile, it'd have to work faster than
ordinary locked/shared array[tid()] accesses.

>> Consider using "stap -t FOO.stp" to have the tool print internal
>> estimates of its own probes' runtimes.  That would include lock
>> contention, but exclude kernel-side overheads like the code involved in
>> the its kprobes machinery.
>
> "-t" option output.

Ah, I note that we don't track occurrences of lock contention,
even though we could.  Opened https://sourceware.org/bugzilla/show_bug.cgi?id=20879

> [...]
> kernel.function("arch_cpu_idle_enter@arch/x86/kernel/process.c:277").call,
> (test2.stp:13:1), hits: 2801, cycles: 560min/34290avg/230611max, from:
> [...]
> kernel.function("local_touch_nmi@arch/x86/kernel/nmi.c:561").call,
> (test2.stp:18:1), hits: 2801, cycles: 396min/5118avg/160096max, from:
> [...]

That inconsistency dramatic contention or caching or some other source
of jitter.  Having the variance figure in there would be useful though.
Would anyone like to give that a try?  (translate.cxx line 2283ish and
runtime/linux/common_session_state.h _stp_stat_init area).

- FChE

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel function probe overhead
  2016-11-28 19:19     ` Frank Ch. Eigler
@ 2016-11-29  3:31       ` Aubrey Li
  2016-11-29 19:21         ` Josh Stone
  2016-12-01 18:47         ` Cody Santing
  0 siblings, 2 replies; 9+ messages in thread
From: Aubrey Li @ 2016-11-29  3:31 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Tue, Nov 29, 2016 at 3:19 AM, Frank Ch. Eigler <fche@redhat.com> wrote:
>
>
> aubreylee wrote:
>
> >> A couple of things come to mind.  Use of gettimeofday_ns() function uses
> >> a software-estimated real-time clock, which may not be steady enough for
> >> high-grained measurements like this
> >
> > Timestamp function: gettimeofday_ns()
> > [...]
> > Timestamp function: get_cycles()
> > [...]
> > Timestamp function: local_clock_ns()
> > [...]
>
> Thanks for testing!  The distribution of those three seem remarkably
> similar - a nice central peak, and a second small peak at 4-8x the
> first.  It suggests we can probably rule out clock problems.
>
>
> >>  The average is pretty low.  Consider printing a
> >> histogram of t_end, or (if using git systemtap) its @variance.
> >>
> >> Another possibility is contention over the t_start[] variable, if
> >> multiple processors are running this region of code, so trigger the
> >> systemtap probes concurrently.  systemtap automatically locks such
> >> global variables, esp. if they contain non-aggregate values like the
> >> scalar timestamps.
> >
> > Do we have an approach to probe the function on CPU0 only?
>
> Not directly; the kernel does not give us such control.  The probe could
> include a
>
>    if (cpu() != 0) next;
>
> at the top of the probe handler body.  That way locks still are taken,
> but released quickly on the other CPUs.
>

Thanks for the suggestion.

Original result:
t_end @count=4800 @min=740 @max=300281 @sum=90589877 @avg=18872

Add if(cpu()!=0) next in the probe handler:
t_end @count=161 @min=653 @max=114856 @sum=1780522 @avg=11059

And if I add "maxcpus=1" to the kernel cmdline:
t_end @count=1190 @min=573 @max=14529 @sum=1406437 @avg=1181

The last one is much better, so it looks like we have a scalability
opportunity here.

> Another solution would be to have per-thread global variables.  Since
> the kernel doesn't help us out with this, we'd have to emulate it
> somehow, and for this to be wortwhile, it'd have to work faster than
> ordinary locked/shared array[tid()] accesses.
>

Is this a suggestion for me? How to do this?

>
> >> Consider using "stap -t FOO.stp" to have the tool print internal
> >> estimates of its own probes' runtimes.  That would include lock
> >> contention, but exclude kernel-side overheads like the code involved in
> >> the its kprobes machinery.
> >
> > "-t" option output.
>
> Ah, I note that we don't track occurrences of lock contention,
> even though we could.  Opened https://sourceware.org/bugzilla/show_bug.cgi?id=20879

okay, thanks, I'll keep an eye on this ticket.

>
> > [...]
> > kernel.function("arch_cpu_idle_enter@arch/x86/kernel/process.c:277").call,
> > (test2.stp:13:1), hits: 2801, cycles: 560min/34290avg/230611max, from:
> > [...]
> > kernel.function("local_touch_nmi@arch/x86/kernel/nmi.c:561").call,
> > (test2.stp:18:1), hits: 2801, cycles: 396min/5118avg/160096max, from:
> > [...]
>
> That inconsistency dramatic contention or caching or some other source
> of jitter.  Having the variance figure in there would be useful though.
> Would anyone like to give that a try?  (translate.cxx line 2283ish and
> runtime/linux/common_session_state.h _stp_stat_init area).
>

If there is a patch delivered against git head, I can apply it and
paste the output here.

Thanks,
-Aubrey

> - FChE

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel function probe overhead
  2016-11-29  3:31       ` Aubrey Li
@ 2016-11-29 19:21         ` Josh Stone
  2016-12-01 18:47         ` Cody Santing
  1 sibling, 0 replies; 9+ messages in thread
From: Josh Stone @ 2016-11-29 19:21 UTC (permalink / raw)
  To: Aubrey Li, Frank Ch. Eigler; +Cc: systemtap

On 11/28/2016 07:30 PM, Aubrey Li wrote:
>> Another solution would be to have per-thread global variables.  Since
>> > the kernel doesn't help us out with this, we'd have to emulate it
>> > somehow, and for this to be wortwhile, it'd have to work faster than
>> > ordinary locked/shared array[tid()] accesses.
>> >
> Is this a suggestion for me? How to do this?

There are relevant RFEs for stap already:

per-cpu: https://sourceware.org/bugzilla/show_bug.cgi?id=10795
per-thread: https://sourceware.org/bugzilla/show_bug.cgi?id=10796

The latter was closed on the hunch that it wouldn't help much, but that
could still be explored if someone is interested.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel function probe overhead
  2016-11-29  3:31       ` Aubrey Li
  2016-11-29 19:21         ` Josh Stone
@ 2016-12-01 18:47         ` Cody Santing
       [not found]           ` <CAGPKeUKm9EZZxL=MHZX9_0N5SUYX291mJMTCExYSspRceJgrxg@mail.gmail.com>
  1 sibling, 1 reply; 9+ messages in thread
From: Cody Santing @ 2016-12-01 18:47 UTC (permalink / raw)
  To: Aubrey Li, Frank Ch. Eigler, Josh Stone; +Cc: systemtap

Hello  Aubrey,

>> Ah, I note that we don't track occurrences of lock contention,
>> even though we could.  Opened https://sourceware.org/bugzilla/show_bug.cgi?id=20879
>
> okay, thanks, I'll keep an eye on this ticket.
>

I have yet to update the ticket but I have added tracking of lock
contention.  The number of lock contention occurrences on global
variables will be printed following the probe hit report from the '-t'
option.

>> That inconsistency dramatic contention or caching or some other source
>> of jitter.  Having the variance figure in there would be useful though.
>> Would anyone like to give that a try?  (translate.cxx line 2283ish and
>> runtime/linux/common_session_state.h _stp_stat_init area).
>>

I have also added variance to the '-t' option.  It is immediately
following min/avg/max of each probe.

>
> If there is a patch delivered against git head, I can apply it and
> paste the output here.
>

You should find the patch there.  Give it a shot, if you have any
feedback I would like to hear it.

Thank you,

Cody Santing

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <CAGPKeUKm9EZZxL=MHZX9_0N5SUYX291mJMTCExYSspRceJgrxg@mail.gmail.com>]

* Fwd: kernel function probe overhead
       [not found]           ` <CAGPKeUKm9EZZxL=MHZX9_0N5SUYX291mJMTCExYSspRceJgrxg@mail.gmail.com>
@ 2016-12-12 17:07             ` Cody Santing
  2016-12-12 17:43               ` Frank Ch. Eigler
  0 siblings, 1 reply; 9+ messages in thread
From: Cody Santing @ 2016-12-12 17:07 UTC (permalink / raw)
  To: systemtap

---------- Forwarded message ----------
From: Aubrey Li <aubreylee@gmail.com>
Date: Thu, Dec 8, 2016 at 8:44 AM
Subject: Re: kernel function probe overhead
To: Cody Santing <csanting@redhat.com>


On Fri, Dec 2, 2016 at 2:47 AM, Cody Santing <csanting@redhat.com> wrote:
>
> Hello  Aubrey,
>
> >> Ah, I note that we don't track occurrences of lock contention,
> >> even though we could.  Opened https://sourceware.org/bugzilla/show_bug.cgi?id=20879
> >
> > okay, thanks, I'll keep an eye on this ticket.
> >
>
> I have yet to update the ticket but I have added tracking of lock
> contention.  The number of lock contention occurrences on global
> variables will be printed following the probe hit report from the '-t'
> option.
>
> >> That inconsistency dramatic contention or caching or some other source
> >> of jitter.  Having the variance figure in there would be useful though.
> >> Would anyone like to give that a try?  (translate.cxx line 2283ish and
> >> runtime/linux/common_session_state.h _stp_stat_init area).
> >>
>
> I have also added variance to the '-t' option.  It is immediately
> following min/avg/max of each probe.
>
> >
> > If there is a patch delivered against git head, I can apply it and
> > paste the output here.
> >
>
> You should find the patch there.  Give it a shot, if you have any
> feedback I would like to hear it.
>
Thanks Cody.
Here is what I got:

============================================================
$ sudo stap -t test.stp
-----------------------------------------
Timestamp function: gettimeofday_ns()
Total time: 10092 miliseconds
t_end @count=2962 @min=268 @max=81878 @sum=10843684 @avg=3660

----- probe hit report:
begin, (./test.stp:8:1), hits: 1, cycles: 3752min/3752avg/3752max,
variance: 0, from: begin, index: 0
kernel.function("arch_cpu_idle_enter@arch/x86/kernel/process.c:277").call,
(./test.stp:14:1), hits: 2962, cycles:

>400min/30951avg/233043max, variance: 824817112, from: kernel.function("arch_cpu_idle_enter@arch/x86/kernel/process.c:277").call

May I know what does variance value mean here?

from: kernel.function("arch_cpu_idle_enter").call, index: 2
kernel.function("local_touch_nmi@arch/x86/kernel/nmi.c:561").call,
(./test.stp:19:1), hits: 2962, cycles: 288min/4483avg/177541max,
variance: 36900201, from:
kernel.function("local_touch_nmi@arch/x86/kernel/nmi.c:561").call
from: kernel.function("local_touch_nmi").call, index: 3
timer.s(10), (./test.stp:27:1), hits: 1, cycles:
6974min/6974avg/6974max, variance: 0, from: timer.s(10), index: 4
end, (./test.stp:32:1), hits: 1, cycles: 6928min/6928avg/6928max,
variance: 0, from: end, index: 5
end, (./test.stp:1:1), hits: 1, cycles: 9956min/9956avg/9956max,
variance: 0, from: end, index: 6

----- refresh report:
>'__global_t_start' lock contention occurred 2853 times

Would it be nice to report time cost of lock contention?

Anyway, instead of reporting lock contention, it would be nice if we
can reduce it, :)

Thanks,
-Aubrey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: kernel function probe overhead
  2016-12-12 17:07             ` Fwd: " Cody Santing
@ 2016-12-12 17:43               ` Frank Ch. Eigler
  0 siblings, 0 replies; 9+ messages in thread
From: Frank Ch. Eigler @ 2016-12-12 17:43 UTC (permalink / raw)
  To: Aubrey Li, Cody Santing; +Cc: systemtap


aubreylee wrote;

> Thanks Cody.
> Here is what I got:
>
> ============================================================
> $ sudo stap -t test.stp
> -----------------------------------------
> [...]
> kernel.function("arch_cpu_idle_enter@arch/x86/kernel/process.c:277").call,
> (./test.stp:14:1), hits: 2962, cycles:
>
>>400min/30951avg/233043max, variance: 824817112, from: kernel.function("arch_cpu_idle_enter@arch/x86/kernel/process.c:277").call
>
> May I know what does variance value mean here?

This is normal statistical variance, i.e., the square of the standard
deviation.  That means that the bulk of the probe execution times were
within sqrt(824817712)=28719 cycles from the mean, if the distribution
were roughly normal.


> ----- refresh report:
>>'__global_t_start' lock contention occurred 2853 times
>
> Would it be nice to report time cost of lock contention?

That would be roughly #-contentions * TRYLOCKDELAY (default 10us),
so about 29 ms.


> Anyway, instead of reporting lock contention, it would be nice if we
> can reduce it, :)

Sure, but a good first step is to know the magnitude of the problem.


- FChE

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-12-12 17:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-25 16:07 kernel function probe overhead Aubrey Li
2016-11-25 17:22 ` Frank Ch. Eigler
2016-11-26  6:18   ` Aubrey Li
2016-11-28 19:19     ` Frank Ch. Eigler
2016-11-29  3:31       ` Aubrey Li
2016-11-29 19:21         ` Josh Stone
2016-12-01 18:47         ` Cody Santing
     [not found]           ` <CAGPKeUKm9EZZxL=MHZX9_0N5SUYX291mJMTCExYSspRceJgrxg@mail.gmail.com>
2016-12-12 17:07             ` Fwd: " Cody Santing
2016-12-12 17:43               ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).