public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* statistics with intermediate results
@ 2006-01-12  0:07 Martin Peschke
  2006-01-12  1:54 ` James Dickens
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Martin Peschke @ 2006-01-12  0:07 UTC (permalink / raw)
  To: systemtap

Hi,

another question of mine:

If I want to provide latencies then I need to measure two times,
send time and receive time. I can calculate a latency
when I know both times, which requires the first time to be
kept somewhere until I have measured the second time.

The problem is where to put the first timestamp. It would
be per request. But when I use dynamic instrumentation, e.g.
systemtap, then I can't put some spare bytes in a
per request data structure to store intermediate results.

I guess, one could report all events, like send time, receive
time and so on, through systemtap and defer all processing to
a user land script. That's the Linux Kernel Event Trace Tool
approach:
http://sourceware.org/ml/systemtap/2005-q4/msg00458.html

 From a performance point of view, I am not sure it is the
fastet way of getting latencies, because it involves huge
amounts of data being generated by probes and being
reported through relayfs, while we can't use the benefits
of immediate data reduction as provided systemtap's statistics.

I am wondering whether dynamic instrumentation is the answer
to this kind of measurement requirements.

Thanks in advance for your thoughts.

Martin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: statistics with intermediate results
  2006-01-12  0:07 statistics with intermediate results Martin Peschke
@ 2006-01-12  1:54 ` James Dickens
  2006-01-12 12:35   ` Martin Peschke
  2006-01-12  4:07 ` Frank Ch. Eigler
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: James Dickens @ 2006-01-12  1:54 UTC (permalink / raw)
  To: Martin Peschke; +Cc: systemtap

On 1/11/06, Martin Peschke <mp3@de.ibm.com> wrote:
> Hi,
>
> another question of mine:
>
> If I want to provide latencies then I need to measure two times,
> send time and receive time. I can calculate a latency
> when I know both times, which requires the first time to be
> kept somewhere until I have measured the second time.
>

and really you don't need to keep all the results, basicly you could
just store min, max, and mean or medium and get the information you
would need for most tasks.


> The problem is where to put the first timestamp. It would
> be per request. But when I use dynamic instrumentation, e.g.
> systemtap, then I can't put some spare bytes in a
> per request data structure to store intermediate results.
>
> I guess, one could report all events, like send time, receive
> time and so on, through systemtap and defer all processing to
> a user land script. That's the Linux Kernel Event Trace Tool
> approach:

You can look at dtrace as an example it has agreations that store
events like this and give the ability to print them. you can also
quanitize the results as well.



> http://sourceware.org/ml/systemtap/2005-q4/msg00458.html
>
>  From a performance point of view, I am not sure it is the
> fastet way of getting latencies, because it involves huge
> amounts of data being generated by probes and being
> reported through relayfs, while we can't use the benefits
> of immediate data reduction as provided systemtap's statistics.
>
Agregations are what is needed, because you really don't need to store
all the data, just the best, worst and average cases.

> I am wondering whether dynamic instrumentation is the answer
> to this kind of measurement requirements.
>
> Thanks in advance for your thoughts.
>
> Martin
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: statistics with intermediate results
  2006-01-12  0:07 statistics with intermediate results Martin Peschke
  2006-01-12  1:54 ` James Dickens
@ 2006-01-12  4:07 ` Frank Ch. Eigler
  2006-01-12 12:24   ` Martin Peschke
  2006-01-12 14:50 ` Jose R. Santos
  2006-01-12 16:43 ` William Cohen
  3 siblings, 1 reply; 9+ messages in thread
From: Frank Ch. Eigler @ 2006-01-12  4:07 UTC (permalink / raw)
  To: Martin Peschke; +Cc: systemtap

Martin Peschke <mp3@de.ibm.com> writes:

> [...]
> The problem is where to put the first timestamp. It would
> be per request. But when I use dynamic instrumentation, e.g.
> systemtap, then I can't put some spare bytes in a
> per request data structure to store intermediate results.

I don't understand what is blocking you.  There is no "per request
data structure" in systemtap - spare or otherwise.  You copy values
out of kernel side with the $target variables, and correlate them on
the script side.

You can declare and use as many script-side arrays as you see fit, and
index them as you see fit.  As long as you can recompute the same
index tuple (a pid, request pointer address, and/or whatever) at the
probe points that correspond to the beginning and the end of a
computation, just use the array to store the temporaries ("start
time").

Once you have a real result ("elapsed time") you want to store, put
that in a new array, which can be one that carries statistical values.
Use the "<<<" accumulation operator to add values, and the @avg etc.
operators to read results.


> [...] I guess, one could report all events, like send time, receive
> time and so on, through systemtap and defer all processing to a user
> land script. That's the Linux Kernel Event Trace Tool approach:
> [...]

It is a possible way, but not generally necessary for systemtap.


- FChE

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: statistics with intermediate results
  2006-01-12  4:07 ` Frank Ch. Eigler
@ 2006-01-12 12:24   ` Martin Peschke
  0 siblings, 0 replies; 9+ messages in thread
From: Martin Peschke @ 2006-01-12 12:24 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

Frank Ch. Eigler wrote:
> Martin Peschke <mp3@de.ibm.com> writes:
>>But when I use dynamic instrumentation, e.g.
>>systemtap, then I can't put some spare bytes in a
>>per request data structure to store intermediate results.
> 
> I don't understand what is blocking you.  There is no "per request
> data structure" in systemtap - spare or otherwise.

Sorry, I wasn't clear. I mean that I can't enhance kernel
data structures later on.

I could do so prior to kernel build in preparation of a tapset
that would make use of these spare bytes for temporaries, though.
I guess, this kind of access to temporaries would be fastest,
while it preserves most advantages of dynamic instrumentation.

> You can declare and use as many script-side arrays as you see fit, and
> index them as you see fit.  As long as you can recompute the same
> index tuple (a pid, request pointer address, and/or whatever) at the
> probe points that correspond to the beginning and the end of a
> computation, just use the array to store the temporaries ("start
> time").

Sounds feasible. I will give it a try. Thanks.

Martin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: statistics with intermediate results
  2006-01-12  1:54 ` James Dickens
@ 2006-01-12 12:35   ` Martin Peschke
  0 siblings, 0 replies; 9+ messages in thread
From: Martin Peschke @ 2006-01-12 12:35 UTC (permalink / raw)
  To: James Dickens; +Cc: systemtap

James Dickens wrote:
> On 1/11/06, Martin Peschke <mp3@de.ibm.com> wrote:
>>If I want to provide latencies then I need to measure two times,
>>send time and receive time. I can calculate a latency
>>when I know both times, which requires the first time to be
>>kept somewhere until I have measured the second time.
> 
> and really you don't need to keep all the results, basicly you could
> just store min, max, and mean or medium and get the information you
> would need for most tasks.

You might be right regarding min/max/avg being sufficient for
some cases. However, I think histograms can be useful for other
cases. Latency histograms might show several peaks, with
one or more of them being unexpected and worth a closer look.

Martin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: statistics with intermediate results
  2006-01-12  0:07 statistics with intermediate results Martin Peschke
  2006-01-12  1:54 ` James Dickens
  2006-01-12  4:07 ` Frank Ch. Eigler
@ 2006-01-12 14:50 ` Jose R. Santos
  2006-01-12 16:43 ` William Cohen
  3 siblings, 0 replies; 9+ messages in thread
From: Jose R. Santos @ 2006-01-12 14:50 UTC (permalink / raw)
  To: Martin Peschke; +Cc: systemtap

Martin Peschke wrote:

>I guess, one could report all events, like send time, receive
>time and so on, through systemtap and defer all processing to
>a user land script. That's the Linux Kernel Event Trace Tool
>approach:
>http://sourceware.org/ml/systemtap/2005-q4/msg00458.html
>
> From a performance point of view, I am not sure it is the
>fastet way of getting latencies, because it involves huge
>amounts of data being generated by probes and being
>reported through relayfs, while we can't use the benefits
>of immediate data reduction as provided systemtap's statistics.
>

One of the things that we are doing with the Kernel event trace tool is 
add the capabilities for users to add their own trace hooks.  One can 
chose to probe a single point in the kernel instead of doing a full 
trace.  In the end though it really depends on which has the greater 
overhead; doing aggregation in the systemtap script or printing every 
single event to userspace.  It's obvious who win here.

One key advantage of having a trace is that it allows you to run once 
and analyze in many different ways.  Like you said, histograms can be 
very useful.

Good Luck

-JRS

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: statistics with intermediate results
  2006-01-12  0:07 statistics with intermediate results Martin Peschke
                   ` (2 preceding siblings ...)
  2006-01-12 14:50 ` Jose R. Santos
@ 2006-01-12 16:43 ` William Cohen
  2006-01-12 17:13   ` Martin Peschke
  3 siblings, 1 reply; 9+ messages in thread
From: William Cohen @ 2006-01-12 16:43 UTC (permalink / raw)
  To: Martin Peschke; +Cc: systemtap

Martin Peschke wrote:
> Hi,
> 
> another question of mine:
> 
> If I want to provide latencies then I need to measure two times,
> send time and receive time. I can calculate a latency
> when I know both times, which requires the first time to be
> kept somewhere until I have measured the second time.
> 
> The problem is where to put the first timestamp. It would
> be per request. But when I use dynamic instrumentation, e.g.
> systemtap, then I can't put some spare bytes in a
> per request data structure to store intermediate results.
> 
> I guess, one could report all events, like send time, receive
> time and so on, through systemtap and defer all processing to
> a user land script. That's the Linux Kernel Event Trace Tool
> approach:
> http://sourceware.org/ml/systemtap/2005-q4/msg00458.html
> 
>  From a performance point of view, I am not sure it is the
> fastet way of getting latencies, because it involves huge
> amounts of data being generated by probes and being
> reported through relayfs, while we can't use the benefits
> of immediate data reduction as provided systemtap's statistics.
> 
> I am wondering whether dynamic instrumentation is the answer
> to this kind of measurement requirements.
> 
> Thanks in advance for your thoughts.
> 
> Martin

Associative arrays can be used for this purpose. Use the pointer to the 
data structure as a key for the associative array. Store start time in 
the associative array. Then when the data structure is encountered for 
the completion operation fetch the time from the associative array and 
compute the elapsed time.

How many outstanding operations are there going to be at any given time?

-Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: statistics with intermediate results
  2006-01-12 16:43 ` William Cohen
@ 2006-01-12 17:13   ` Martin Peschke
  0 siblings, 0 replies; 9+ messages in thread
From: Martin Peschke @ 2006-01-12 17:13 UTC (permalink / raw)
  To: William Cohen; +Cc: systemtap

William Cohen wrote:
> Associative arrays can be used for this purpose. Use the pointer to the 
> data structure as a key for the associative array. Store start time in 
> the associative array. Then when the data structure is encountered for 
> the completion operation fetch the time from the associative array and 
> compute the elapsed time.
> 
> How many outstanding operations are there going to be at any given time?

It depends...

For SCSI, there are certain limits for tagged command queueing.
Devices may impose limits, adapters may impose limits, other layers
and subsystems in Linux might impose limits (blocklayer?
SCSI mid layer?).

For the IBM zSeries FCP adapter driver there used to be
(rather arbitrary) limits of 32 concurrent commands per LUN
and 4096 concurrent commands per adapter. Experience shows that
when running some I/O stress workloadt or benchmark, we manage
to hit these limits easily.

In short, I would expect to see up to hundreds or maybe even thousands
of outstanding operations going to be at any given time for systems
like database servers, that is, for systems that are likely candidates
for a performance analysis.

I am a little concerned that searching huge systemtap arrays for
each request could be too expensive. But I don't know much about
the bowels of the systemtap runtime.

Martin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: statistics with intermediate results
@ 2006-01-13  6:34 Mao, Bibo
  0 siblings, 0 replies; 9+ messages in thread
From: Mao, Bibo @ 2006-01-13  6:34 UTC (permalink / raw)
  To: Martin Peschke, William Cohen; +Cc: systemtap

Currently systemtap is suitable for short time performance statistics. Sometimes user mainly want to get statistical raw data by sysmtemtap, and does not need analyze this data real-time. For long time running, I think it need Ping-Pong buffer, switch the buffer when relay the data into user layer by RelayFS and keep in trace on.

bibo,mao
>-----Original Message-----
>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org]
>On Behalf Of Martin Peschke
>Sent: 2006年1月13日 1:12
>To: William Cohen
>Cc: systemtap@sources.redhat.com
>Subject: Re: statistics with intermediate results
>
>William Cohen wrote:
>> Associative arrays can be used for this purpose. Use the pointer to the
>> data structure as a key for the associative array. Store start time in
>> the associative array. Then when the data structure is encountered for
>> the completion operation fetch the time from the associative array and
>> compute the elapsed time.
>>
>> How many outstanding operations are there going to be at any given time?
>
>It depends...
>
>For SCSI, there are certain limits for tagged command queueing.
>Devices may impose limits, adapters may impose limits, other layers
>and subsystems in Linux might impose limits (blocklayer?
>SCSI mid layer?).
>
>For the IBM zSeries FCP adapter driver there used to be
>(rather arbitrary) limits of 32 concurrent commands per LUN
>and 4096 concurrent commands per adapter. Experience shows that
>when running some I/O stress workloadt or benchmark, we manage
>to hit these limits easily.
>
>In short, I would expect to see up to hundreds or maybe even thousands
>of outstanding operations going to be at any given time for systems
>like database servers, that is, for systems that are likely candidates
>for a performance analysis.
>
>I am a little concerned that searching huge systemtap arrays for
>each request could be too expensive. But I don't know much about
>the bowels of the systemtap runtime.
>
>Martin

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-01-13  6:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-12  0:07 statistics with intermediate results Martin Peschke
2006-01-12  1:54 ` James Dickens
2006-01-12 12:35   ` Martin Peschke
2006-01-12  4:07 ` Frank Ch. Eigler
2006-01-12 12:24   ` Martin Peschke
2006-01-12 14:50 ` Jose R. Santos
2006-01-12 16:43 ` William Cohen
2006-01-12 17:13   ` Martin Peschke
2006-01-13  6:34 Mao, Bibo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).