static user probe performance

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* static user probe performance
@ 2010-03-25 18:34 Stan Cox
  2010-04-02 20:53 ` Stan Cox
  0 siblings, 1 reply; 7+ messages in thread
From: Stan Cox @ 2010-03-25 18:34 UTC (permalink / raw)
  To: systemtap

We wanted to get an idea of the performance of user markers with
respect to various improvements that have been made.  After starting
with the tcl testsuite I switched to a benchmark program because it is
easier to generate reproducible results.  The benchmark is a program
that runs: towers of Hanoi, eight queens, integer multiplication,
quicksort, bubblesort, fft.  Probes are placed in high traffic areas
of each benchmark and are hit in the 100,000+ range.  For each possibility the 
benchmark is run twice not using stap, then a stap module is built, then the 
module is run. The statistics are from /usr/bin/time

##### no volatile probe arguments (requires gcc-4.4.3-10)
7.05user 0.11system 0:07.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+19700minor)pagefaults 0swaps
7.06user 0.11system 0:07.18elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+19699minor)pagefaults 0swaps
28.01user 496.38system 31:23.32elapsed 27%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+21912minor)pagefaults 0swaps

##### volatile probe arguments
0.23user 0.00system 0:07.20elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+19700minor)pagefaults 0swaps
0.23user 0.00system 0:07.19elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+19700minor)pagefaults 0swaps
79.37user 1349.53system 31:15.61elapsed 76%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+21910minor)pagefaults 0swaps

##### no volatile probe arguments and no probe semaphores
0.23user 0.00system 0:07.33elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+19699minor)pagefaults 0swaps
0.24user 0.00system 0:07.35elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+19701minor)pagefaults 0swaps
113.04user 1760.64system 31:15.29elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+21909minor)pagefaults 0swaps

##### volatile probe arguments and no probe semaphores
7.38user 0.11system 0:07.50elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+19699minor)pagefaults 0swaps
7.38user 0.11system 0:07.50elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+19699minor)pagefaults 0swaps
104.12user 1777.06system 31:22.16elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+21911minor)pagefaults 0swaps

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: static user probe performance
  2010-03-25 18:34 static user probe performance Stan Cox
@ 2010-04-02 20:53 ` Stan Cox
  2010-04-05 15:21   ` Frank Ch. Eigler
  0 siblings, 1 reply; 7+ messages in thread
From: Stan Cox @ 2010-04-02 20:53 UTC (permalink / raw)
  To: systemtap

Output massaged with an awk script:

not run with stap
user time
       no volatile no semaphore 1.07
       no volatile semaphore 1.11
       volatile semaphore 1.16
       volatile no semaphore 1.18
system time
       volatile no semaphore 0.03
       no volatile semaphore 0.03
       volatile semaphore 0.04
       no volatile no semaphore 0.05
elapsed time
       volatile no semaphore 38.18
       no volatile no semaphore 48.02
       volatile semaphore 51.66
       no volatile semaphore 60.23

* volatile refers to: do the user probe arguments use volatile
   semaphore refers to: are the user probes guarded by a semaphore

run with stap
user time
       no volatile no semaphore 3.49
       no volatile semaphore 3.56
       volatile semaphore 3.62
       volatile no semaphore 3.7
system time
       volatile semaphore 58.34
       no volatile semaphore 58.51
       volatile no semaphore 58.68
       no volatile no semaphore 58.72
elapsed time
       volatile semaphore 1858.71
       no volatile no semaphore 1865.68
       volatile no semaphore 1873.07
       no volatile semaphore 1878.09


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: static user probe performance
  2010-04-02 20:53 ` Stan Cox
@ 2010-04-05 15:21   ` Frank Ch. Eigler
  2010-04-07 14:26     ` Stan Cox
  2010-04-07 14:43     ` Frank Ch. Eigler
  0 siblings, 2 replies; 7+ messages in thread
From: Frank Ch. Eigler @ 2010-04-05 15:21 UTC (permalink / raw)
  To: Stan Cox; +Cc: systemtap


Stan Cox <scox@redhat.com> writes:

> [...]
> not run with stap
> [...]
> elapsed time
>       volatile no semaphore 38.18
>       no volatile no semaphore 48.02
>       volatile semaphore 51.66
>       no volatile semaphore 60.23

I don't understand these numbers.  The "no volatile" version is
consistently slower than "volatile".  How can that be?


> run with stap
> [...]
> elapsed time
>       volatile semaphore 1858.71
>       no volatile no semaphore 1865.68
>       volatile no semaphore 1873.07
>       no volatile semaphore 1878.09

So for this particular microbenchmark, systemtap probing imposes a 30x
slowdown, yikes.  How many probe hits does this represent?

- FChE

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: static user probe performance
  2010-04-05 15:21   ` Frank Ch. Eigler
@ 2010-04-07 14:26     ` Stan Cox
  2010-04-07 14:43     ` Frank Ch. Eigler
  1 sibling, 0 replies; 7+ messages in thread
From: Stan Cox @ 2010-04-07 14:26 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On 04/05/2010 11:21 AM, Frank Ch. Eigler wrote:
> How many probe hits does this represent?

Oh, quite a few:

process_hits @count=0xc038fc @min=0x1 @max=0x1 @sum=0xc038fc @avg=0x1
tower_hits @count=0x7cfe0c @min=0x1 @max=0x1 @sum=0x7cfe0c @avg=0x1
try_hits @count=0x14e2ae0 @min=0x1 @max=0x1 @sum=0x14e2ae0 @avg=0x1
innerproduct_hits @count=0x1e84800 @min=0x1 @max=0x1 @sum=0x1e84800 @avg=0x1
trial_hits @count=0xf4a10 @min=0x1 @max=0x1 @sum=0xf4a10 @avg=0x1
quicksort_hits @count=0x82553c @min=0x1 @max=0x1 @sum=0x82553c @avg=0x1
bubble_hits @count=0x3b7c458 @min=0x1 @max=0x1 @sum=0x3b7c458 @avg=0x1
insert_hits @count=0x21b1bb8 @min=0x1 @max=0x1 @sum=0x21b1bb8 @avg=0x1
fft_hits @count=0x9c4000 @min=0x1 @max=0x1 @sum=0x9c4000 @avg=0x1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: static user probe performance
  2010-04-05 15:21   ` Frank Ch. Eigler
  2010-04-07 14:26     ` Stan Cox
@ 2010-04-07 14:43     ` Frank Ch. Eigler
  2010-04-07 18:10       ` Josh Stone
  2010-04-07 19:10       ` Jim Keniston
  1 sibling, 2 replies; 7+ messages in thread
From: Frank Ch. Eigler @ 2010-04-07 14:43 UTC (permalink / raw)
  To: systemtap

>> run with stap
>> [...]
>> elapsed time
>>       volatile semaphore 1858.71
>>       no volatile no semaphore 1865.68
>>       volatile no semaphore 1873.07
>>       no volatile semaphore 1878.09
>
> So for this particular microbenchmark, systemtap probing imposes a 30x
> slowdown, yikes.  How many probe hits does this represent?

From scox's later numbers, that's 192 million hits, for about 9
us/hit, which seems rather high.  (In this microbenchmark, it seems
that the instrumentation is taking an impressive 98% of the cpu time,
which ought to have triggered the overload protection.  I assume there
was an -DSTP_NO_OVERLOAD in effect.)

- FChE

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: static user probe performance
  2010-04-07 14:43     ` Frank Ch. Eigler
@ 2010-04-07 18:10       ` Josh Stone
  2010-04-07 19:10       ` Jim Keniston
  1 sibling, 0 replies; 7+ messages in thread
From: Josh Stone @ 2010-04-07 18:10 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On 04/07/2010 07:43 AM, Frank Ch. Eigler wrote:
> (In this microbenchmark, it seems that the instrumentation is taking
> an impressive 98% of the cpu time, which ought to have triggered the
> overload protection.  I assume there was an -DSTP_NO_OVERLOAD in
> effect.)

Well, the overload protection can only account for cpu time in our own
handler, and it's unfortunately blind to the overhead of the probe
mechanism itself.  The int3 trap and all the uprobes code around it
(including the single-step) are outside the overload measurement.

Perhaps we should pad the measurements with approximate cycle counts of
a probe hit?  Of course, people already complain when overloading kicks
in, and this would make it even more of a nanny...

Josh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: static user probe performance
  2010-04-07 14:43     ` Frank Ch. Eigler
  2010-04-07 18:10       ` Josh Stone
@ 2010-04-07 19:10       ` Jim Keniston
  1 sibling, 0 replies; 7+ messages in thread
From: Jim Keniston @ 2010-04-07 19:10 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Wed, 2010-04-07 at 10:43 -0400, Frank Ch. Eigler wrote:
> >> run with stap
> >> [...]
> >> elapsed time
> >>       volatile semaphore 1858.71
> >>       no volatile no semaphore 1865.68
> >>       volatile no semaphore 1873.07
> >>       no volatile semaphore 1878.09
> >
> > So for this particular microbenchmark, systemtap probing imposes a 30x
> > slowdown, yikes.  How many probe hits does this represent?
> 
> From scox's later numbers, that's 192 million hits, for about 9
> us/hit, which seems rather high.

Yes, it is high.  Are you probing a multithreaded app?  We have a known
performance problem there:
http://sourceware.org/bugzilla/show_bug.cgi?id=5660

Jim

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-04-07 19:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-25 18:34 static user probe performance Stan Cox
2010-04-02 20:53 ` Stan Cox
2010-04-05 15:21   ` Frank Ch. Eigler
2010-04-07 14:26     ` Stan Cox
2010-04-07 14:43     ` Frank Ch. Eigler
2010-04-07 18:10       ` Josh Stone
2010-04-07 19:10       ` Jim Keniston

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).