public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Re: Measure the Accept Queueing Time
       [not found] <43E28A07.1040604@tamu.edu>
@ 2006-02-04  0:13 ` Frank Ch. Eigler
       [not found]   ` <43E90B4F.7040407@us.ibm.com>
  2006-02-14  4:55   ` Peter Bach
  0 siblings, 2 replies; 17+ messages in thread
From: Frank Ch. Eigler @ 2006-02-04  0:13 UTC (permalink / raw)
  To: Benjamin Chu; +Cc: linux-kernel, systemtap


Benjamin Chu <benchu@tamu.edu> writes:

> [...]  All I want is to measure the amount of time when a open
> request is in the accept queue. [...]
> p.s. My Linux Kernel Version is 2.4.25

Dude, you made me spend several hours playing with systemtap to solve
this simple-sounding problem.  :-) 

At the end, I have a script (attached below) that works on one
particular kernel build (a beta RHEL4 kernel, which has the systemtap
prerequisite kprobes, but uses the same networking structure).
Unfortunately, it also demonstrated some of the work we have to do in
systemtap land to make it work better.

The good news: here is the output of the script running on a vmware
instance that has only single socket listener program, being briefly
tickled by hand, then hammered by "nc" connections in a loop.  The
lines came out every ten seconds (as requested by the script, see
below), and report on the number of successful accept()s, plus the
number of microseconds that each open_request* was in the
accept_queue.

% socket -l -s NNNN &
[1] 25384
% stap -g acceptdelay.stp
[1139011591] socket(25384) count=1 avg=1487us
[1139011601] socket(25384) count=9 avg=560us
[1139011611] socket(25384) count=6 avg=915us
[1139011621] socket(25384) count=747 avg=604us
[1139011631] socket(25384) count=1280 avg=558us
[1139011641] socket(25384) count=1147 avg=547us
[1139011645] socket(25384) count=25 avg=471us

Is that the sort of data you were hoping to see?

The systemtap translator unfortunately has problems identifying the
most ideal probe insertion points, based on source code coordinates or
function names.  The interception of inline functions is weak.  At the
present, we also don't have/use the benefit of hooks inserted into the
kernel just for us, which would make probes simple and precise.  But
all these problems are being worked on.

So, as a stop-gap for this warts-and-all demonstration, the script
just uses hard-coded PC addresses.  Please look beyond that and use
your imagination!

- FChE


#! stap -g

# This embedded-C function is needed to extract a tcp_opt field
# from a pointer cast to generic struct sock*.  Later, systemtap
# will offer first class syntax and protected operation for this.
%{
#include <net/tcp.h>
%}
function tcp_aq_head:long (sock:long) 
%{
  struct tcp_opt *tp = tcp_sk ((struct sock*) (uintptr_t) THIS->sock);
  THIS->__retvalue = (int64_t) (uintptr_t) tp->accept_queue;
%}


global open_request_arrivals # indexed by open_request pointer
global open_request_delays # stats, indexed by pid and execname

probe # a spot in the open_request creation path
# XXX: the following commented-out probe points should also work
# kernel.inline("tcp_acceptq_queue")
# kernel.function("tcp_v4_conn_request")
# kernel.inline("tcp_openreq_init")
  kernel.statement("*@net/ipv4/tcp_ipv4.c:1472") # after tcp_openreq_init()
{
  open_request_arrivals[$req] = gettimeofday_us()
}


# One could also probe at entry of tcp_accept itself, to track
# whether an accept() syscall was blocked by absence of pending
# open_requests.


probe # a spot in tcp_accept, after pending open_request found
# kernel.statement("*@net/ipv4/tcp_ipv4.c:1922")
  kernel.statement (0xc029eec8) # near req = tp->accept_queue
{
  req = tcp_aq_head ($sk) # tcp_sk(sk)->accept_queue; $req should work too
  then = open_request_arrivals[req]
  if (then) { 
    delete open_request_arrivals[req] # save memory
    now = gettimeofday_us()
    open_request_delays[pid(),execname()] <<< now-then 
  }
}


probe timer.ms(10000), end # periodically and at session shutdown
{
  now=gettimeofday_s()
  foreach ([pid+,execname] in open_request_delays) # sort by pid
    printf("[%d] %s(%d) count=%d avg=%dus\n", now, execname, pid,
           @count(open_request_delays[pid,execname]),
           @avg(open_request_delays[pid,execname]))
  delete open_request_delays
}


# Some preprocessor magic to prevent someone from running this
# demonstration script (with its hard-coded probe addresses)
# on another kernel build
%( kernel_vr == "2.6.9-30.ELsmp" %? %( arch == "i686" %? /* OK */
                                                      %: BADARCH %)
                                 %: BADVERSION %)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Measure the Accept Queueing Time
       [not found]   ` <43E90B4F.7040407@us.ibm.com>
@ 2006-02-07 21:26     ` Frank Ch. Eigler
  0 siblings, 0 replies; 17+ messages in thread
From: Frank Ch. Eigler @ 2006-02-07 21:26 UTC (permalink / raw)
  To: Vara Prasad; +Cc: systemtap

Hi -

varap wrote:

> [...]  Looks like you had certain difficulties in coming up with the
> script due to the current infrastructure of SystemTAP.  [...]

I thought those parts were self-explanatory.  All those
kernel.statement(0xaddr) probes should not have been necessary, as the
dwarf system should be able to resolve them symbolically as those
other (commented-out) probe points in the script.

I also marked a point where systemtap should become able to extract a
particular target-side value, which was defined by two partially
aliased structures.  A new feature (pointer type downcasting in
$-expressions) is needed.

- FChE

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Measure the Accept Queueing Time
  2006-02-04  0:13 ` Measure the Accept Queueing Time Frank Ch. Eigler
       [not found]   ` <43E90B4F.7040407@us.ibm.com>
@ 2006-02-14  4:55   ` Peter Bach
  2006-02-14 22:28     ` Frank Ch. Eigler
  2006-02-17 10:46     ` Andi Kleen
  1 sibling, 2 replies; 17+ messages in thread
From: Peter Bach @ 2006-02-14  4:55 UTC (permalink / raw)
  Cc: systemtap


As a performance person, this is exactly what I would want to do with 
systemtap, with a few minor tweaks.

Most everything inside the kernel is a queue or system of queues, but as 
Frank noted, the trick is to find the proper probe points to measure 
them. As for queues, the only three useful measurements are average 
service time, average wait time, and average queue length. Mostly 
developers give you measurement of throughput (the number of completion 
of foo per second), but little to no insight to the other three values.

The most useful way to measure these values in a lightweight way was 
detailed by Adrian Cockcroft about a decade ago, when he detailed the 
Solaris disk driver, which had "extended disk statistics" that separated 
this waiting and service time. Up until this time, all *nix version had 
smeared these two values together, giving limited value when working 
with high-end storage arrays.

See this article for details, the this technique can be applied to most 
of the queues in a computer system. I do not know if there is anything 
systemtap can do to make the collection or calculation of these values 
easier. Other queues that you could look at are threading and 
concurrency within multi-threaded apps, where internal application 
queues are exposed to the outside world.

Regards,
Peter

How do disks really work?
Unix Insider 6/1/96
Figure out your disks' behavior
Adrian Cockcroft
http://storage.itworld.com/4650/UIR960601perf/pfindex.html

Frank Ch. Eigler wrote:

>Benjamin Chu <benchu@tamu.edu> writes:
>
>  
>
>>[...]  All I want is to measure the amount of time when a open
>>request is in the accept queue. [...]
>>p.s. My Linux Kernel Version is 2.4.25
>>    
>>
>
>Dude, you made me spend several hours playing with systemtap to solve
>this simple-sounding problem.  :-) 
>
>At the end, I have a script (attached below) that works on one
>particular kernel build (a beta RHEL4 kernel, which has the systemtap
>prerequisite kprobes, but uses the same networking structure).
>Unfortunately, it also demonstrated some of the work we have to do in
>systemtap land to make it work better.
>
>The good news: here is the output of the script running on a vmware
>instance that has only single socket listener program, being briefly
>tickled by hand, then hammered by "nc" connections in a loop.  The
>lines came out every ten seconds (as requested by the script, see
>below), and report on the number of successful accept()s, plus the
>number of microseconds that each open_request* was in the
>accept_queue.
>
>% socket -l -s NNNN &
>[1] 25384
>% stap -g acceptdelay.stp
>[1139011591] socket(25384) count=1 avg=1487us
>[1139011601] socket(25384) count=9 avg=560us
>[1139011611] socket(25384) count=6 avg=915us
>[1139011621] socket(25384) count=747 avg=604us
>[1139011631] socket(25384) count=1280 avg=558us
>[1139011641] socket(25384) count=1147 avg=547us
>[1139011645] socket(25384) count=25 avg=471us
>
>Is that the sort of data you were hoping to see?
>
>The systemtap translator unfortunately has problems identifying the
>most ideal probe insertion points, based on source code coordinates or
>function names.  The interception of inline functions is weak.  At the
>present, we also don't have/use the benefit of hooks inserted into the
>kernel just for us, which would make probes simple and precise.  But
>all these problems are being worked on.
>
>So, as a stop-gap for this warts-and-all demonstration, the script
>just uses hard-coded PC addresses.  Please look beyond that and use
>your imagination!
>
>- FChE
>  
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Measure the Accept Queueing Time
  2006-02-14  4:55   ` Peter Bach
@ 2006-02-14 22:28     ` Frank Ch. Eigler
  2006-02-17  8:22       ` Ken Robson
  2006-02-17 10:46     ` Andi Kleen
  1 sibling, 1 reply; 17+ messages in thread
From: Frank Ch. Eigler @ 2006-02-14 22:28 UTC (permalink / raw)
  To: Peter Bach; +Cc: systemtap


peterzbach wrote:

> [...]
> Most everything inside the kernel is a queue or system of queues, but
> as Frank noted, the trick is to find the proper probe points to
> measure them.

> As for queues, the only three useful measurements are average
> service time, average wait time, and average queue length. [...]
> The most useful way to measure these values in a lightweight way was
> detailed by Adrian Cockcroft about a decade ago [...]

Thanks for the reference.  I put a simplified first sketch of this
into a new "queue_stats.stp" tapset.  It comes with an accompanying
pass-5 test case / demo that simulates six concurrent threads modeling
traffic to two separate queues.

% stap ../tests/testsuite/systemtap.samples/queue_demo.stp
block-read: 9 ops/s, 1.191 qlen, 207754 await, 81453 svctm, 74% wait, 51% util
block-write: 8 ops/s, 0.785 qlen, 224397 await, 132798 svctm, 56% wait, 72% util
block-read: 9 ops/s, 0.882 qlen, 215635 await, 122062 svctm, 63% wait, 75% util
block-write: 8 ops/s, 1.010 qlen, 237468 await, 119600 svctm, 68% wait, 67% util
block-read: 10 ops/s, 0.741 qlen, 158623 await, 88998 svctm, 54% wait, 66% util
block-write: 9 ops/s, 0.997 qlen, 188266 await, 88399 svctm, 76% wait, 68% util


The new code provides these generic calculations only.  The client
code must still plop probes into the appropriate place (say, in the
block I/O stack, or the scheduler's run queue, or ...?) and call the
provided queue state-change tracking functions.


- FChE

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Measure the Accept Queueing Time
  2006-02-14 22:28     ` Frank Ch. Eigler
@ 2006-02-17  8:22       ` Ken Robson
  2006-02-17 14:32         ` Frank Ch. Eigler
  0 siblings, 1 reply; 17+ messages in thread
From: Ken Robson @ 2006-02-17  8:22 UTC (permalink / raw)
  To: 'Frank Ch. Eigler'; +Cc: systemtap, 'Peter Bach'

Whilst I agree that the execution path dissection is useful, I do not agree
that averages are entirely useful, to put them into context you need some
indication of standard deviation during the sample period and a median
calculation to give some indication of outliers.

There have been some hints (unless I missed something more concrete) about
aggregations. Statistical functions that seem useful that do not aggregate
well seem to be mode, median and standard deviation - each of these would
seem to require that you retain the entire data set until the function is
applied which seems to reduce some of the value of the aggregations,
particularly when doing long runs.

What built-in data 'visualisation' functions do people envisage and how
efficient (in terms of shipping data to user space or retaining data in the
kernel) do people think they will be, or have I misunderstood the issue
altogether?

-----Original Message-----
From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org]
On Behalf Of Frank Ch. Eigler
Sent: 14 February 2006 22:28
To: Peter Bach
Cc: systemtap@sources.redhat.com
Subject: Re: Measure the Accept Queueing Time


peterzbach wrote:

> [...]
> Most everything inside the kernel is a queue or system of queues, but
> as Frank noted, the trick is to find the proper probe points to
> measure them.

> As for queues, the only three useful measurements are average
> service time, average wait time, and average queue length. [...]
> The most useful way to measure these values in a lightweight way was
> detailed by Adrian Cockcroft about a decade ago [...]

Thanks for the reference.  I put a simplified first sketch of this
into a new "queue_stats.stp" tapset.  It comes with an accompanying
pass-5 test case / demo that simulates six concurrent threads modeling
traffic to two separate queues.

% stap ../tests/testsuite/systemtap.samples/queue_demo.stp
block-read: 9 ops/s, 1.191 qlen, 207754 await, 81453 svctm, 74% wait, 51%
util
block-write: 8 ops/s, 0.785 qlen, 224397 await, 132798 svctm, 56% wait, 72%
util
block-read: 9 ops/s, 0.882 qlen, 215635 await, 122062 svctm, 63% wait, 75%
util
block-write: 8 ops/s, 1.010 qlen, 237468 await, 119600 svctm, 68% wait, 67%
util
block-read: 10 ops/s, 0.741 qlen, 158623 await, 88998 svctm, 54% wait, 66%
util
block-write: 9 ops/s, 0.997 qlen, 188266 await, 88399 svctm, 76% wait, 68%
util


The new code provides these generic calculations only.  The client
code must still plop probes into the appropriate place (say, in the
block I/O stack, or the scheduler's run queue, or ...?) and call the
provided queue state-change tracking functions.


- FChE


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Measure the Accept Queueing Time
  2006-02-14  4:55   ` Peter Bach
  2006-02-14 22:28     ` Frank Ch. Eigler
@ 2006-02-17 10:46     ` Andi Kleen
  2006-02-17 14:36       ` Frank Ch. Eigler
  1 sibling, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2006-02-17 10:46 UTC (permalink / raw)
  To: Peter Bach; +Cc: systemtap

Peter Bach <peterzbach@comcast.net> writes:
 
> Most everything inside the kernel is a queue or system of queues, but
> as Frank noted, the trick is to find the proper probe points to
> measure them.


I tried to use systemtap for measuring times in the past,
but the biggest problem was that there's no good way to share
a variable between two probes (except for using globals which is racy)

So you can't nicely say

probe A
             measure time A

probe B
             measure time B
             data <<< (B-A)

Sometimes it's possible to stuff A and B into an array indexed
with the unique object the to be measured code is working on,
but in some cases that's also not possible if there isn't such 
an object.

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Measure the Accept Queueing Time
  2006-02-17  8:22       ` Ken Robson
@ 2006-02-17 14:32         ` Frank Ch. Eigler
  0 siblings, 0 replies; 17+ messages in thread
From: Frank Ch. Eigler @ 2006-02-17 14:32 UTC (permalink / raw)
  To: Ken Robson; +Cc: systemtap, 'Peter Bach'

Hi, Ken -

On Fri, Feb 17, 2006 at 08:22:35AM -0000, Ken Robson wrote:

> [...]  There have been some hints (unless I missed something more
> concrete) about aggregations. Statistical functions that seem useful
> that do not aggregate well seem to be mode, median and standard
> deviation - each of these would seem to require that you retain the
> entire data set [...]

I am not a numerical methods guy, but I gather that at least some of
these functions *can* be aggregated incrementally, approximately if
unavoidable.  There exist incremental (or "streaming" or "on-line")
algorithms for calculation of variation, standard deviation, and
median functions, that do not require storage of the whole data set.
This paper seems to be a nice introduction:
http://citeseer.ist.psu.edu/muthukrishnan03data.html

> What built-in data 'visualisation' functions do people envisage and
> how efficient (in terms of shipping data to user space or retaining
> data in the kernel) do people think they will be [...]

As much data reduction (filtering, aggregation) as possible (small
cost in time and space) should likely be done in kernel space.  As
fast as bulk kernel-to-user data transport may be, it may still cost
orders of magnitude more to encode/buffer/context-switch/decode/filter
than to do the computation right at the point of origin.

As far as graphical visualization goes, that would need to go into
user space, interfacing with plotting/waveform/trace browsing tools,
but I don't think that's what you meant.

- FChE

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Measure the Accept Queueing Time
  2006-02-17 10:46     ` Andi Kleen
@ 2006-02-17 14:36       ` Frank Ch. Eigler
  2006-02-17 15:01         ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: Frank Ch. Eigler @ 2006-02-17 14:36 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Peter Bach, systemtap


Andi Kleen <ak@suse.de> writes:

> [...]  Sometimes it's possible to stuff A and B into an array
> indexed with the unique object the to be measured code is working
> on, but in some cases that's also not possible if there isn't such
> an object.

But the two probes must have some connection in order for that time
difference to make sense.  tid() ?

- FChE

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Measure the Accept Queueing Time
  2006-02-17 14:36       ` Frank Ch. Eigler
@ 2006-02-17 15:01         ` Andi Kleen
       [not found]           ` <20060217161538.GF3170@redhat.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2006-02-17 15:01 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Peter Bach, systemtap

On Friday 17 February 2006 15:36, Frank Ch. Eigler wrote:
> 
> Andi Kleen <ak@suse.de> writes:
> 
> > [...]  Sometimes it's possible to stuff A and B into an array
> > indexed with the unique object the to be measured code is working
> > on, but in some cases that's also not possible if there isn't such
> > an object.
> 
> But the two probes must have some connection in order for that time
> difference to make sense.  tid() ?

Doesn't work for nested interrupts. 

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Measure the Accept Queueing Time
       [not found]           ` <20060217161538.GF3170@redhat.com>
@ 2006-02-17 16:27             ` Andi Kleen
  2006-02-17 16:59               ` Frank Ch. Eigler
  0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2006-02-17 16:27 UTC (permalink / raw)
  To: systemtap

On Friday 17 February 2006 17:15, Frank Ch. Eigler wrote:
> Hi -
> 
> ak wrote:
> > [...]
> > > But the two probes must have some connection in order for that time
> > > difference to make sense.  tid() ?
> > 
> > Doesn't work for nested interrupts. 
> 
> Could you outline a problematic scenario more fully?


interrupt 1
calls function to be measured
startprobe 
... function runs....
  interrupt 2 from a different device 
      calls function to be measured again
      startprobe nested
      ... function runs nested ... 
      endprobe nested
  interrupt 2 ends
... function still runs ...
endprobe

x86-64 has a per cpu interrupt nesting count, but even with that
and using (tid, count) as index there are some problems.

I hit this when trying to measure dma_map_sg()

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Measure the Accept Queueing Time
  2006-02-17 16:27             ` Andi Kleen
@ 2006-02-17 16:59               ` Frank Ch. Eigler
  2006-02-17 17:24                 ` Andi Kleen
  0 siblings, 1 reply; 17+ messages in thread
From: Frank Ch. Eigler @ 2006-02-17 16:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: systemtap


ak wrote:

> interrupt 1
> calls function to be measured
> startprobe 
> ... function runs....
>   interrupt 2 from a different device 
>       calls function to be measured again
>       startprobe nested
>       ... function runs nested ... 
>       endprobe nested
>   interrupt 2 ends
> ... function still runs ...
> endprobe
> [...]

If this is happening on the same CPU, then it really shouldn't be
happening at all.  On one hand, at probe start time, systemtap checks
for reentrancy and aborts any nested probes during context allocation,
so the nested probe handler should never get to even start.  On the
other hand, the probe entry boilerplate should explicitly disable
interrupts/preemption while a probe handler is running, to prevent
"interrupt 2" from actually dispatching.  We may be missing some code
for that (bug #2293).

- FChE

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Measure the Accept Queueing Time
  2006-02-17 16:59               ` Frank Ch. Eigler
@ 2006-02-17 17:24                 ` Andi Kleen
  2006-02-17 17:47                   ` Frank Ch. Eigler
  0 siblings, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2006-02-17 17:24 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Friday 17 February 2006 17:59, Frank Ch. Eigler wrote:
> 
> ak wrote:
> 
> > interrupt 1
> > calls function to be measured
> > startprobe 
> > ... function runs....
> >   interrupt 2 from a different device 
> >       calls function to be measured again
> >       startprobe nested
> >       ... function runs nested ... 
> >       endprobe nested
> >   interrupt 2 ends
> > ... function still runs ...
> > endprobe
> > [...]
> 
> If this is happening on the same CPU, then it really shouldn't be
> happening at all.  On one hand, at probe start time, systemtap checks
> for reentrancy and aborts any nested probes during context allocation,

The probes itself are not nested. Just the to be measured function is.

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Measure the Accept Queueing Time
  2006-02-17 17:24                 ` Andi Kleen
@ 2006-02-17 17:47                   ` Frank Ch. Eigler
  0 siblings, 0 replies; 17+ messages in thread
From: Frank Ch. Eigler @ 2006-02-17 17:47 UTC (permalink / raw)
  To: Andi Kleen; +Cc: systemtap

Hi -

ak wrote:

> The probes itself are not nested. Just the to be measured function is.

Aha.  So, rewording (just for my own clarity), this is the sequence:

entry probe A
entry probe A'
return probe A'
return probe A

And you want to measure time between A-A and A'-A'.

It seems you could do this if you invent a unique-ID for each A or A'
entry, like using a per-CPU counter, then using that as an index:

global nesting # [cpu]
global entrytime # [cpu,nesting#]

probe A {
  c = cpu(); n = nesting[c]++; entrytime[c,n] = gettimeofday_us()
}
probe A.return {
  c = cpu(); n = --nesting[c]; then = entrytime[c,n]; now = gettimeofday_us()
}


- FChE

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Measure the Accept Queueing Time
  2006-02-17 21:59 peterzbach
@ 2006-02-18  1:30 ` Frank Ch. Eigler
  0 siblings, 0 replies; 17+ messages in thread
From: Frank Ch. Eigler @ 2006-02-18  1:30 UTC (permalink / raw)
  To: peterzbach; +Cc: systemtap

Hi -

On Fri, Feb 17, 2006 at 09:59:04PM +0000, peterzbach@comcast.net wrote:

> The mothod I had pointed to in Adrian's paper does not require that
> you match up the entry and exit probes to get accurate response time
> measurements. [...]

Yes, but that method only yields accurate *averages*, not an exact
time for a specific occurrence.

- FChE

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Measure the Accept Queueing Time
@ 2006-02-17 22:00 peterzbach
  0 siblings, 0 replies; 17+ messages in thread
From: peterzbach @ 2006-02-17 22:00 UTC (permalink / raw)
  To: ken, 'Frank Ch. Eigler'; +Cc: Ken Robson, systemtap

And of course of the workload is bursty in any way, you need to have the discrete measurements to do a reasonable analysis of the systems performance.

 -------------- Original message ----------------------
From: peterzbach@comcast.net
> 
> For 80% of the performance problem cases in a production system, you just need 
> to identify the device or queue with the longest service times to give a first 
> order approximation of where you need to improve the system. The service and 
> queue times you get from a steady state system are accurate enough to use in any 
> modeling effort to predict future performance.
> 
> Yes, for engineering level benchmarks you may need to collect the individual 
> response time and look at boxplots of the response times, but that is only 
> needed in a smaller subset of the cases. As someone who has done both type of 
> performance measurements and tuning, getting the first part correctly has a much 
> bigger impact than delaying or destabilizing the product to do the second.
> 
> Regards,
> Peter
> 
>  -------------- Original message ----------------------
> From: "Ken Robson" <ken@robson.net>
> > Whilst I agree that the execution path dissection is useful, I do not agree
> > that averages are entirely useful, to put them into context you need some
> > indication of standard deviation during the sample period and a median
> > calculation to give some indication of outliers.
> > 
> > There have been some hints (unless I missed something more concrete) about
> > aggregations. Statistical functions that seem useful that do not aggregate
> > well seem to be mode, median and standard deviation - each of these would
> > seem to require that you retain the entire data set until the function is
> > applied which seems to reduce some of the value of the aggregations,
> > particularly when doing long runs.
> > 
> > What built-in data 'visualisation' functions do people envisage and how
> > efficient (in terms of shipping data to user space or retaining data in the
> > kernel) do people think they will be, or have I misunderstood the issue
> > altogether?
> > 
> > -----Original Message-----
> > From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org]
> > On Behalf Of Frank Ch. Eigler
> > Sent: 14 February 2006 22:28
> > To: Peter Bach
> > Cc: systemtap@sources.redhat.com
> > Subject: Re: Measure the Accept Queueing Time
> > 
> > 
> > peterzbach wrote:
> > 
> > > [...]
> > > Most everything inside the kernel is a queue or system of queues, but
> > > as Frank noted, the trick is to find the proper probe points to
> > > measure them.
> > 
> > > As for queues, the only three useful measurements are average
> > > service time, average wait time, and average queue length. [...]
> > > The most useful way to measure these values in a lightweight way was
> > > detailed by Adrian Cockcroft about a decade ago [...]
> > 
> > Thanks for the reference.  I put a simplified first sketch of this
> > into a new "queue_stats.stp" tapset.  It comes with an accompanying
> > pass-5 test case / demo that simulates six concurrent threads modeling
> > traffic to two separate queues.
> > 
> > % stap ../tests/testsuite/systemtap.samples/queue_demo.stp
> > block-read: 9 ops/s, 1.191 qlen, 207754 await, 81453 svctm, 74% wait, 51%
> > util
> > block-write: 8 ops/s, 0.785 qlen, 224397 await, 132798 svctm, 56% wait, 72%
> > util
> > block-read: 9 ops/s, 0.882 qlen, 215635 await, 122062 svctm, 63% wait, 75%
> > util
> > block-write: 8 ops/s, 1.010 qlen, 237468 await, 119600 svctm, 68% wait, 67%
> > util
> > block-read: 10 ops/s, 0.741 qlen, 158623 await, 88998 svctm, 54% wait, 66%
> > util
> > block-write: 9 ops/s, 0.997 qlen, 188266 await, 88399 svctm, 76% wait, 68%
> > util
> > 
> > 
> > The new code provides these generic calculations only.  The client
> > code must still plop probes into the appropriate place (say, in the
> > block I/O stack, or the scheduler's run queue, or ...?) and call the
> > provided queue state-change tracking functions.
> > 
> > 
> > - FChE
> > 
> > 
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Measure the Accept Queueing Time
@ 2006-02-17 21:59 peterzbach
  2006-02-18  1:30 ` Frank Ch. Eigler
  0 siblings, 1 reply; 17+ messages in thread
From: peterzbach @ 2006-02-17 21:59 UTC (permalink / raw)
  To: Frank Ch. Eigler, Andi Kleen; +Cc: systemtap

The mothod I had pointed to in Adrian's paper does not require that you match up the entry and exit probes to get accurate response time measurements.

You only need to make the queue entry and queue exit parts atomic, so there will be some issues with synchronization in a multi-CPU system. But you only compare the entry and exit times with the time the queue state was last changed, and add or subtract the time product of the curve. You are essentially precalculating the area under the curve in a running fashion to allow for reasonably accurate results when you looks at the overall queue stats every few seconds or minutes.

regards,
Peter


 -------------- Original message ----------------------
From: "Frank Ch. Eigler" <fche@redhat.com>
> Hi -
> 
> ak wrote:
> 
> > The probes itself are not nested. Just the to be measured function is.
> 
> Aha.  So, rewording (just for my own clarity), this is the sequence:
> 
> entry probe A
> entry probe A'
> return probe A'
> return probe A
> 
> And you want to measure time between A-A and A'-A'.
> 
> It seems you could do this if you invent a unique-ID for each A or A'
> entry, like using a per-CPU counter, then using that as an index:
> 
> global nesting # [cpu]
> global entrytime # [cpu,nesting#]
> 
> probe A {
>   c = cpu(); n = nesting[c]++; entrytime[c,n] = gettimeofday_us()
> }
> probe A.return {
>   c = cpu(); n = --nesting[c]; then = entrytime[c,n]; now = gettimeofday_us()
> }
> 
> 
> - FChE


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Measure the Accept Queueing Time
@ 2006-02-17 21:53 peterzbach
  0 siblings, 0 replies; 17+ messages in thread
From: peterzbach @ 2006-02-17 21:53 UTC (permalink / raw)
  To: ken, 'Frank Ch. Eigler'; +Cc: Ken Robson, systemtap


For 80% of the performance problem cases in a production system, you just need to identify the device or queue with the longest service times to give a first order approximation of where you need to improve the system. The service and queue times you get from a steady state system are accurate enough to use in any modeling effort to predict future performance.

Yes, for engineering level benchmarks you may need to collect the individual response time and look at boxplots of the response times, but that is only needed in a smaller subset of the cases. As someone who has done both type of performance measurements and tuning, getting the first part correctly has a much bigger impact than delaying or destabilizing the product to do the second.

Regards,
Peter

 -------------- Original message ----------------------
From: "Ken Robson" <ken@robson.net>
> Whilst I agree that the execution path dissection is useful, I do not agree
> that averages are entirely useful, to put them into context you need some
> indication of standard deviation during the sample period and a median
> calculation to give some indication of outliers.
> 
> There have been some hints (unless I missed something more concrete) about
> aggregations. Statistical functions that seem useful that do not aggregate
> well seem to be mode, median and standard deviation - each of these would
> seem to require that you retain the entire data set until the function is
> applied which seems to reduce some of the value of the aggregations,
> particularly when doing long runs.
> 
> What built-in data 'visualisation' functions do people envisage and how
> efficient (in terms of shipping data to user space or retaining data in the
> kernel) do people think they will be, or have I misunderstood the issue
> altogether?
> 
> -----Original Message-----
> From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org]
> On Behalf Of Frank Ch. Eigler
> Sent: 14 February 2006 22:28
> To: Peter Bach
> Cc: systemtap@sources.redhat.com
> Subject: Re: Measure the Accept Queueing Time
> 
> 
> peterzbach wrote:
> 
> > [...]
> > Most everything inside the kernel is a queue or system of queues, but
> > as Frank noted, the trick is to find the proper probe points to
> > measure them.
> 
> > As for queues, the only three useful measurements are average
> > service time, average wait time, and average queue length. [...]
> > The most useful way to measure these values in a lightweight way was
> > detailed by Adrian Cockcroft about a decade ago [...]
> 
> Thanks for the reference.  I put a simplified first sketch of this
> into a new "queue_stats.stp" tapset.  It comes with an accompanying
> pass-5 test case / demo that simulates six concurrent threads modeling
> traffic to two separate queues.
> 
> % stap ../tests/testsuite/systemtap.samples/queue_demo.stp
> block-read: 9 ops/s, 1.191 qlen, 207754 await, 81453 svctm, 74% wait, 51%
> util
> block-write: 8 ops/s, 0.785 qlen, 224397 await, 132798 svctm, 56% wait, 72%
> util
> block-read: 9 ops/s, 0.882 qlen, 215635 await, 122062 svctm, 63% wait, 75%
> util
> block-write: 8 ops/s, 1.010 qlen, 237468 await, 119600 svctm, 68% wait, 67%
> util
> block-read: 10 ops/s, 0.741 qlen, 158623 await, 88998 svctm, 54% wait, 66%
> util
> block-write: 9 ops/s, 0.997 qlen, 188266 await, 88399 svctm, 76% wait, 68%
> util
> 
> 
> The new code provides these generic calculations only.  The client
> code must still plop probes into the appropriate place (say, in the
> block I/O stack, or the scheduler's run queue, or ...?) and call the
> provided queue state-change tracking functions.
> 
> 
> - FChE
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2006-02-18  1:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <43E28A07.1040604@tamu.edu>
2006-02-04  0:13 ` Measure the Accept Queueing Time Frank Ch. Eigler
     [not found]   ` <43E90B4F.7040407@us.ibm.com>
2006-02-07 21:26     ` Frank Ch. Eigler
2006-02-14  4:55   ` Peter Bach
2006-02-14 22:28     ` Frank Ch. Eigler
2006-02-17  8:22       ` Ken Robson
2006-02-17 14:32         ` Frank Ch. Eigler
2006-02-17 10:46     ` Andi Kleen
2006-02-17 14:36       ` Frank Ch. Eigler
2006-02-17 15:01         ` Andi Kleen
     [not found]           ` <20060217161538.GF3170@redhat.com>
2006-02-17 16:27             ` Andi Kleen
2006-02-17 16:59               ` Frank Ch. Eigler
2006-02-17 17:24                 ` Andi Kleen
2006-02-17 17:47                   ` Frank Ch. Eigler
2006-02-17 21:53 peterzbach
2006-02-17 21:59 peterzbach
2006-02-18  1:30 ` Frank Ch. Eigler
2006-02-17 22:00 peterzbach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).