* Evaluating SystemTap for Network Response Times
@ 2006-01-31 16:52 Nathan DeBardeleben
2006-01-31 17:21 ` Frank Ch. Eigler
2006-01-31 17:30 ` Hien Nguyen
0 siblings, 2 replies; 13+ messages in thread
From: Nathan DeBardeleben @ 2006-01-31 16:52 UTC (permalink / raw)
To: systemtap
I'm looking at using SystemTap and/or [dj/k]probes to time network
operations inside the kernel. Specifically, we want to time the point
when a socket send operation leaves user space, entering kernel space,
down to the point where the kernel says "it's done, sent". Obviously
this may involve fragmentation of packets so we'd need some way to keep
track that these N fragments came from this initial operation, and as
they close up, we know we're not completely done until all fragments are
done.
Initially this looks just like the kind of thing I could do with
SystemTap but I worry that the scripting language will be too
restrictive to allow me to allocate these types of data structures to do
record keeping. When it comes down to it - I want to observe a system
and recognize outliers ("hey, this operation took 20 times longer than
the rest") through statistical means.
I was hoping I could get some feedback from the SystemTap
users/developers as to whether (1) this seems feasible, (2) SystemTap
seems like the appropriate tool, and (3) perhaps if anyone is aware of
similar projects.
I will be experimenting with this in a parallel computing environment
and with single system image tools such as bproc and the brand new
XCPU. I hope I can add some value to the SystemTap community by testing
it out in these environments. If this first step goes well, I will be
looking at using SystemTap for monitoring parallel file systems and
studying potential performance bottlenecks.
Thanks for your time.
--
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@lanl.gov
---------------------------------------------------------------------
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Evaluating SystemTap for Network Response Times
2006-01-31 16:52 Evaluating SystemTap for Network Response Times Nathan DeBardeleben
@ 2006-01-31 17:21 ` Frank Ch. Eigler
2006-01-31 18:29 ` Nathan DeBardeleben
2006-01-31 17:30 ` Hien Nguyen
1 sibling, 1 reply; 13+ messages in thread
From: Frank Ch. Eigler @ 2006-01-31 17:21 UTC (permalink / raw)
To: Nathan DeBardeleben; +Cc: systemtap
Nathan DeBardeleben <ndebard@lanl.gov> writes:
> [...] Specifically, we want to time the point
> when a socket send operation leaves user space, entering kernel space,
> down to the point where the kernel says "it's done, sent". [...]
>
> Initially this looks just like the kind of thing I could do with
> SystemTap but I worry that the scripting language will be too
> restrictive to allow me to allocate these types of data structures
> to do record keeping.
I hope it is exactly this kind of complex instrumentation with which
systemtap could show its prowess. I would like to help you make it
work.
> When it comes down to it - I want to observe a system and recognize
> outliers ("hey, this operation took 20 times longer than the rest")
> through statistical means.
Expressing that condition should be no problem at all. If for example
you elect to use a statistics value to store elapsed times
times <<< time /* or an array indexed however necessary */
then a probe can compare the current average to a new value like this:
if (@avg(times) > EXPR) { /* process further */ }
Over time, I foresee the variety of statistical calculations growing
to include goodies like standard deviations, random sampling, and
whatever else can be efficiently computed per-CPU and then aggregated
across CPUs.
> [...] I hope I can add some value to the SystemTap community by
> testing it out in these environments. If this first step goes well,
> I will be looking at using SystemTap for monitoring parallel file
> systems and studying potential performance bottlenecks.
That all sounds great.
- FChE
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Evaluating SystemTap for Network Response Times
2006-01-31 16:52 Evaluating SystemTap for Network Response Times Nathan DeBardeleben
2006-01-31 17:21 ` Frank Ch. Eigler
@ 2006-01-31 17:30 ` Hien Nguyen
2006-02-06 22:55 ` Nathan DeBardeleben
1 sibling, 1 reply; 13+ messages in thread
From: Hien Nguyen @ 2006-01-31 17:30 UTC (permalink / raw)
To: Nathan DeBardeleben; +Cc: systemtap
Hi Nathan,
I think what you are trying to achieve could be done with systemtap. I
wrote a small script to monitor the tcp traffic a while back (see URL below)
http://sourceware.org/ml/systemtap/2005-q4/msg00302.html
What type of record keeping do have in mind? I am sure that someone on
the systemtap team want to hear it.
Thanks, Hien.
Nathan DeBardeleben wrote:
> I'm looking at using SystemTap and/or [dj/k]probes to time network
> operations inside the kernel. Specifically, we want to time the point
> when a socket send operation leaves user space, entering kernel space,
> down to the point where the kernel says "it's done, sent". Obviously
> this may involve fragmentation of packets so we'd need some way to
> keep track that these N fragments came from this initial operation,
> and as they close up, we know we're not completely done until all
> fragments are done.
>
> Initially this looks just like the kind of thing I could do with
> SystemTap but I worry that the scripting language will be too
> restrictive to allow me to allocate these types of data structures to
> do record keeping. When it comes down to it - I want to observe a
> system and recognize outliers ("hey, this operation took 20 times
> longer than the rest") through statistical means.
>
> I was hoping I could get some feedback from the SystemTap
> users/developers as to whether (1) this seems feasible, (2) SystemTap
> seems like the appropriate tool, and (3) perhaps if anyone is aware of
> similar projects.
>
> I will be experimenting with this in a parallel computing environment
> and with single system image tools such as bproc and the brand new
> XCPU. I hope I can add some value to the SystemTap community by
> testing it out in these environments. If this first step goes well, I
> will be looking at using SystemTap for monitoring parallel file
> systems and studying potential performance bottlenecks.
>
> Thanks for your time.
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Evaluating SystemTap for Network Response Times
2006-01-31 17:21 ` Frank Ch. Eigler
@ 2006-01-31 18:29 ` Nathan DeBardeleben
0 siblings, 0 replies; 13+ messages in thread
From: Nathan DeBardeleben @ 2006-01-31 18:29 UTC (permalink / raw)
To: Frank Ch. Eigler; +Cc: systemtap
Frank Ch. Eigler wrote:
> Nathan DeBardeleben <ndebard@lanl.gov> writes:
>
>
>> [...] Specifically, we want to time the point
>> when a socket send operation leaves user space, entering kernel space,
>> down to the point where the kernel says "it's done, sent". [...]
>>
>> Initially this looks just like the kind of thing I could do with
>> SystemTap but I worry that the scripting language will be too
>> restrictive to allow me to allocate these types of data structures
>> to do record keeping.
>>
>
> I hope it is exactly this kind of complex instrumentation with which
> systemtap could show its prowess. I would like to help you make it
> work.
>
So should I take this as a volunteering of your assistance in answering
questions? :) We are heavily involved in dynamic application
instrumentation (dyninst, Open|SpeedShop, apps/tools like those) but
we're starting to see a major need to go a step deeper into looking at
the kernel. We're in the position to really test out SystemTap/kprobes
on a variety of different architectures and stress loads and even (if
necessary/accepted) to help extend and fix SystemTap.
I've read all the online documentation and written some simple test
taps, like the examples on the web.
Before I start tracing the 2.6.x kernel for the places we want to
instrument this network study I wondered - have you guys had already
started a "tapset" for this issue?
The other question I had was if SystemTap intends to look into djprobes
in the future. I've read it mentioned in bits and pieces on your
mailing list but haven't yet pieced together a 'big picture' of the
SystemTap opinion and/or plans for djprobes. From our studies, it's a
lot lighter weight than kprobes and we're hoping to go with something
like SystemTap so that in the future you guys might implement a way for
users to choose which probe they want at insert time.
Take care.
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@lanl.gov
---------------------------------------------------------------------
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Evaluating SystemTap for Network Response Times
2006-01-31 17:30 ` Hien Nguyen
@ 2006-02-06 22:55 ` Nathan DeBardeleben
2006-02-07 0:17 ` Hien Nguyen
0 siblings, 1 reply; 13+ messages in thread
From: Nathan DeBardeleben @ 2006-02-06 22:55 UTC (permalink / raw)
To: Hien Nguyen; +Cc: systemtap
Hien Nguyen wrote:
> Hi Nathan,
>
> I think what you are trying to achieve could be done with systemtap. I
> wrote a small script to monitor the tcp traffic a while back (see URL
> below)
> http://sourceware.org/ml/systemtap/2005-q4/msg00302.html
>
I'm a complete newbie to systemtap, so please explain why when I try and
run the example on the link above that you sent me I get this:
> [root@kraken1 systemtap]# stap -v tcp_mon.stp
> Created temporary directory "/tmp/stapM6VSNS"
> parse error: embedded code in unprivileged script
> saw: embedded-code at tcp_mon.stp:70:1
> 1 parse error(s).
> Searched
> '/usr/share/systemtap/tapset/2.6.14-1.1656_FC4smp/x86_64/*.stp', match
> count 0
> Searched '/usr/share/systemtap/tapset/2.6.14-1.1656_FC4smp/*.stp',
> match count 0
> Searched '/usr/share/systemtap/tapset/2.6.14/x86_64/*.stp', match count 0
> Searched '/usr/share/systemtap/tapset/2.6.14/*.stp', match count 1
> Searched '/usr/share/systemtap/tapset/2.6/x86_64/*.stp', match count 0
> Searched '/usr/share/systemtap/tapset/2.6/*.stp', match count 0
> Searched '/usr/share/systemtap/tapset/x86_64/*.stp', match count 0
> Searched '/usr/share/systemtap/tapset/*.stp', match count 8
> Pass 1: parsed user script and 9 library script(s).
> Pass 1: parse failed.
> Running rm -rf /tmp/stapM6VSNS
> [root@kraken1 systemtap]#
Also you say to copy tapset.stp to a directory you create in that post -
where do I get tapset.stp?
Sorry for the beginner question :)
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@lanl.gov
---------------------------------------------------------------------
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Evaluating SystemTap for Network Response Times
2006-02-06 22:55 ` Nathan DeBardeleben
@ 2006-02-07 0:17 ` Hien Nguyen
2006-02-08 16:23 ` Nathan DeBardeleben
0 siblings, 1 reply; 13+ messages in thread
From: Hien Nguyen @ 2006-02-07 0:17 UTC (permalink / raw)
To: Nathan DeBardeleben; +Cc: systemtap
[-- Attachment #1: Type: text/plain, Size: 2184 bytes --]
Hi Nathan,
There are actually two seperate files
1. tcp_mon.stp
2. tcp_tapset/tapset.stp
I include in this mail a tar file for those file for your convenience.
Create a tmp directory,
cd tmp, untar the file and run
stap -I./tcp_tapset tcp_mon.stp (as root)
Thanks, Hien.
Nathan DeBardeleben wrote:
> Hien Nguyen wrote:
>
>> Hi Nathan,
>>
>> I think what you are trying to achieve could be done with systemtap.
>> I wrote a small script to monitor the tcp traffic a while back (see
>> URL below)
>> http://sourceware.org/ml/systemtap/2005-q4/msg00302.html
>>
> I'm a complete newbie to systemtap, so please explain why when I try
> and run the example on the link above that you sent me I get this:
>
>> [root@kraken1 systemtap]# stap -v tcp_mon.stp
>> Created temporary directory "/tmp/stapM6VSNS"
>> parse error: embedded code in unprivileged script
>> saw: embedded-code at tcp_mon.stp:70:1
>> 1 parse error(s).
>> Searched
>> '/usr/share/systemtap/tapset/2.6.14-1.1656_FC4smp/x86_64/*.stp',
>> match count 0
>> Searched '/usr/share/systemtap/tapset/2.6.14-1.1656_FC4smp/*.stp',
>> match count 0
>> Searched '/usr/share/systemtap/tapset/2.6.14/x86_64/*.stp', match
>> count 0
>> Searched '/usr/share/systemtap/tapset/2.6.14/*.stp', match count 1
>> Searched '/usr/share/systemtap/tapset/2.6/x86_64/*.stp', match count 0
>> Searched '/usr/share/systemtap/tapset/2.6/*.stp', match count 0
>> Searched '/usr/share/systemtap/tapset/x86_64/*.stp', match count 0
>> Searched '/usr/share/systemtap/tapset/*.stp', match count 8
>> Pass 1: parsed user script and 9 library script(s).
>> Pass 1: parse failed. Running rm -rf /tmp/stapM6VSNS
>> [root@kraken1 systemtap]#
>
> Also you say to copy tapset.stp to a directory you create in that post
> - where do I get tapset.stp?
>
> Sorry for the beginner question :)
>
> -- Nathan
> Correspondence
> ---------------------------------------------------------------------
> Nathan DeBardeleben, Ph.D.
> Los Alamos National Laboratory
> Parallel Tools Team
> High Performance Computing Environments
> phone: 505-667-3428
> email: ndebard@lanl.gov
> ---------------------------------------------------------------------
>
>
[-- Attachment #2: tcp_mon.tar.gz --]
[-- Type: application/x-gzip, Size: 937 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Evaluating SystemTap for Network Response Times
2006-02-07 0:17 ` Hien Nguyen
@ 2006-02-08 16:23 ` Nathan DeBardeleben
2006-02-08 17:25 ` Hien Nguyen
2006-02-08 17:58 ` Frank Ch. Eigler
0 siblings, 2 replies; 13+ messages in thread
From: Nathan DeBardeleben @ 2006-02-08 16:23 UTC (permalink / raw)
To: Hien Nguyen; +Cc: systemtap
I appreciate all the help getting my feet off the ground. What you sent
me works perfectly, however, I want to understand it a bit more (of course).
I guess I have some basic questions about ST. For instance, I tried
running the 'top2.stp' David Sperry included in the list and that gives
me errors about resolving kernel.syscall.*. I'm guessing it has
something to do with a syscall tapset, but where do I go about getting
those? And further more, how do I know I need them?
What really is a 'tapset'? Is it just a collection of useful functions
that users might want to call that you put into a nice location so as to
keep from copying the code raw into each script that needs it? Or is it
something more?
And when I look at David's output it doesn't look like he pointed at an
include directory. Just really confused.
I know, lots of questions and I'm sure questions that would be in a nice
document if this weren't a project under such heavy development.
Thanks all. I apologize for the newbie questions, I'll get there soon. :)
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@lanl.gov
---------------------------------------------------------------------
Hien Nguyen wrote:
> Hi Nathan,
>
> There are actually two seperate files
> 1. tcp_mon.stp
> 2. tcp_tapset/tapset.stp
>
> I include in this mail a tar file for those file for your convenience.
> Create a tmp directory,
> cd tmp, untar the file and run
> stap -I./tcp_tapset tcp_mon.stp (as root)
>
> Thanks, Hien.
>
> Nathan DeBardeleben wrote:
>
>> Hien Nguyen wrote:
>>
>>> Hi Nathan,
>>>
>>> I think what you are trying to achieve could be done with systemtap.
>>> I wrote a small script to monitor the tcp traffic a while back (see
>>> URL below)
>>> http://sourceware.org/ml/systemtap/2005-q4/msg00302.html
>>>
>> I'm a complete newbie to systemtap, so please explain why when I try
>> and run the example on the link above that you sent me I get this:
>>
>>> [root@kraken1 systemtap]# stap -v tcp_mon.stp
>>> Created temporary directory "/tmp/stapM6VSNS"
>>> parse error: embedded code in unprivileged script
>>> saw: embedded-code at tcp_mon.stp:70:1
>>> 1 parse error(s).
>>> Searched
>>> '/usr/share/systemtap/tapset/2.6.14-1.1656_FC4smp/x86_64/*.stp',
>>> match count 0
>>> Searched '/usr/share/systemtap/tapset/2.6.14-1.1656_FC4smp/*.stp',
>>> match count 0
>>> Searched '/usr/share/systemtap/tapset/2.6.14/x86_64/*.stp', match
>>> count 0
>>> Searched '/usr/share/systemtap/tapset/2.6.14/*.stp', match count 1
>>> Searched '/usr/share/systemtap/tapset/2.6/x86_64/*.stp', match count 0
>>> Searched '/usr/share/systemtap/tapset/2.6/*.stp', match count 0
>>> Searched '/usr/share/systemtap/tapset/x86_64/*.stp', match count 0
>>> Searched '/usr/share/systemtap/tapset/*.stp', match count 8
>>> Pass 1: parsed user script and 9 library script(s).
>>> Pass 1: parse failed. Running rm -rf /tmp/stapM6VSNS
>>> [root@kraken1 systemtap]#
>>
>> Also you say to copy tapset.stp to a directory you create in that
>> post - where do I get tapset.stp?
>>
>> Sorry for the beginner question :)
>>
>> -- Nathan
>> Correspondence
>> ---------------------------------------------------------------------
>> Nathan DeBardeleben, Ph.D.
>> Los Alamos National Laboratory
>> Parallel Tools Team
>> High Performance Computing Environments
>> phone: 505-667-3428
>> email: ndebard@lanl.gov
>> ---------------------------------------------------------------------
>>
>>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Evaluating SystemTap for Network Response Times
2006-02-08 16:23 ` Nathan DeBardeleben
@ 2006-02-08 17:25 ` Hien Nguyen
2006-02-08 17:58 ` Frank Ch. Eigler
1 sibling, 0 replies; 13+ messages in thread
From: Hien Nguyen @ 2006-02-08 17:25 UTC (permalink / raw)
To: Nathan DeBardeleben; +Cc: systemtap
Nathan DeBardeleben wrote:
> I appreciate all the help getting my feet off the ground. What you
> sent me works perfectly, however, I want to understand it a bit more
> (of course).
I am glad that I could help.
> I guess I have some basic questions about ST. For instance, I tried
> running the 'top2.stp' David Sperry included in the list and that
> gives me errors about resolving kernel.syscall.*. I'm guessing it has
> something to do with a syscall tapset, but where do I go about getting
> those? And further more, how do I know I need them?
Yes, David's script is using the system calls tapset. You should find it
under
/usr/share/systemtap/tapset or /usr/local/share/systemtap/tapset (depend
on when you configure the systemtap build)
If you don't find the syscalls.stp under those directory above, try to
copy syscalls.stp from
/usr/local/share/systemtap/tapset/2.6.9-24-ELsmp to
/usr/local/share/systemtap/tapset
Notice several directories with kernel version name, those mean the
syscall tapsets are tested for those kernels, and if you are running one
of those kernel, syscalls.stp will be found by systemtap command. In
your case, your system probably runs a different kernel. So copy
syscalls.stp to the root level of the tapset directory then systemtap
will find it. Sorry for the confusion, we are current working on the new
system calls tapset which will be much more cleaner.
>
> What really is a 'tapset'? Is it just a collection of useful
> functions that users might want to call that you put into a nice
> location so as to keep from copying the code raw into each script that
> needs it? Or is it something more?
The tapset could be a collection of probe points, a collection of useful
functions. For example, the system calls tapset is a collection of probe
points for all system calls, it also exports the data may be interested
to user such as name, function arguments, trace string etc...
>
> And when I look at David's output it doesn't look like he pointed at
> an include directory. Just really confused.
When you include a directory with the -I option, you tell systemtap to
go look for the tapset in that directory first then go to
/usr/local/share/systemtap/tapset
>
> I know, lots of questions and I'm sure questions that would be in a
> nice document if this weren't a project under such heavy development.
>
> Thanks all. I apologize for the newbie questions, I'll get there
> soon. :)
I hope this helps.
>
> -- Nathan
> Correspondence
> ---------------------------------------------------------------------
> Nathan DeBardeleben, Ph.D.
> Los Alamos National Laboratory
> Parallel Tools Team
> High Performance Computing Environments
> phone: 505-667-3428
> email: ndebard@lanl.gov
> ---------------------------------------------------------------------
>
>
>
> Hien Nguyen wrote:
>
>> Hi Nathan,
>>
>> There are actually two seperate files
>> 1. tcp_mon.stp
>> 2. tcp_tapset/tapset.stp
>>
>> I include in this mail a tar file for those file for your convenience.
>> Create a tmp directory,
>> cd tmp, untar the file and run
>> stap -I./tcp_tapset tcp_mon.stp (as root)
>>
>> Thanks, Hien.
>>
>> Nathan DeBardeleben wrote:
>>
>>> Hien Nguyen wrote:
>>>
>>>> Hi Nathan,
>>>>
>>>> I think what you are trying to achieve could be done with
>>>> systemtap. I wrote a small script to monitor the tcp traffic a
>>>> while back (see URL below)
>>>> http://sourceware.org/ml/systemtap/2005-q4/msg00302.html
>>>>
>>> I'm a complete newbie to systemtap, so please explain why when I try
>>> and run the example on the link above that you sent me I get this:
>>>
>>>> [root@kraken1 systemtap]# stap -v tcp_mon.stp
>>>> Created temporary directory "/tmp/stapM6VSNS"
>>>> parse error: embedded code in unprivileged script
>>>> saw: embedded-code at tcp_mon.stp:70:1
>>>> 1 parse error(s).
>>>> Searched
>>>> '/usr/share/systemtap/tapset/2.6.14-1.1656_FC4smp/x86_64/*.stp',
>>>> match count 0
>>>> Searched '/usr/share/systemtap/tapset/2.6.14-1.1656_FC4smp/*.stp',
>>>> match count 0
>>>> Searched '/usr/share/systemtap/tapset/2.6.14/x86_64/*.stp', match
>>>> count 0
>>>> Searched '/usr/share/systemtap/tapset/2.6.14/*.stp', match count 1
>>>> Searched '/usr/share/systemtap/tapset/2.6/x86_64/*.stp', match count 0
>>>> Searched '/usr/share/systemtap/tapset/2.6/*.stp', match count 0
>>>> Searched '/usr/share/systemtap/tapset/x86_64/*.stp', match count 0
>>>> Searched '/usr/share/systemtap/tapset/*.stp', match count 8
>>>> Pass 1: parsed user script and 9 library script(s).
>>>> Pass 1: parse failed. Running rm -rf /tmp/stapM6VSNS
>>>> [root@kraken1 systemtap]#
>>>
>>>
>>> Also you say to copy tapset.stp to a directory you create in that
>>> post - where do I get tapset.stp?
>>>
>>> Sorry for the beginner question :)
>>>
>>> -- Nathan
>>> Correspondence
>>> ---------------------------------------------------------------------
>>> Nathan DeBardeleben, Ph.D.
>>> Los Alamos National Laboratory
>>> Parallel Tools Team
>>> High Performance Computing Environments
>>> phone: 505-667-3428
>>> email: ndebard@lanl.gov
>>> ---------------------------------------------------------------------
>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Evaluating SystemTap for Network Response Times
2006-02-08 16:23 ` Nathan DeBardeleben
2006-02-08 17:25 ` Hien Nguyen
@ 2006-02-08 17:58 ` Frank Ch. Eigler
1 sibling, 0 replies; 13+ messages in thread
From: Frank Ch. Eigler @ 2006-02-08 17:58 UTC (permalink / raw)
To: Nathan DeBardeleben; +Cc: systemtap
ndebard wrote:
> I guess I have some basic questions about ST. For instance, I tried
> running the 'top2.stp' David Sperry included in the list and that
> gives me errors about resolving kernel.syscall.*. [...]
The "kernel.syscall.*" probes are being slowly deprecated, replaced by
the "syscall.*" names. They are defined in tapset/**/syscall*.stp
files, and should be installed nearby the stap binary.
> What really is a 'tapset'? Is it just a collection of useful
> functions [...]
Yes, basically that: just a script in a search path. It can define
auxiliary functions, globals, probe aliases: stuff to express
commonality or abstraction. Also, a tapset is given "guru-mode"
privileges automatically. (See the stap man page.)
> And when I look at David's output it doesn't look like he pointed at
> an include directory. Just really confused.
The "-I" flag is used to extend the default tapset search path. It is
analogous to the C -I flag, but does not actually accept C headers.
- FChE
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Evaluating SystemTap for Network Response Times
@ 2006-02-07 9:54 David A Sperry
0 siblings, 0 replies; 13+ messages in thread
From: David A Sperry @ 2006-02-07 9:54 UTC (permalink / raw)
To: systemtap
[-- Attachment #1: Type: text/plain, Size: 8675 bytes --]
Hi Nathan,
I'm also a newbie and finally got this to work. I see you are using
an SMP kernel. So am I. I ran into a problem compiling and loading the
kernel module. I followed the install instructions from the Sept 2005
"Instrumenting the Linux Kernel with SystemTap" and used yum to install the
kernel-devel package. This would have worked fine on the uniprocessor
kernel. To get things to work on my 2way SMP I had to install the smp
kernel stuff. You may have to do the same. I brute forced the install "yum
install kernel-smp*". If you can't get this example to work you may want to
check the version of the kernel you are running against the develop lib you
have.
I was able to untar Hien's example and run it without any problem. Here is
the verbose output
[root@ibm systap]# stap -I tcp_tapset tcp_mon.stp -v
Created temporary directory "/tmp/stapMLX13C"
Searched '/usr/share/systemtap/tapset/2.6.15-1.1909_FC5smp/i686/*.stp',
match count 0
Searched '/usr/share/systemtap/tapset/2.6.15-1.1909_FC5smp/*.stp', match
count 0
Searched '/usr/share/systemtap/tapset/2.6.15/i686/*.stp', match count 0
Searched '/usr/share/systemtap/tapset/2.6.15/*.stp', match count 0
Searched '/usr/share/systemtap/tapset/2.6/i686/*.stp', match count 0
Searched '/usr/share/systemtap/tapset/2.6/*.stp', match count 0
Searched '/usr/share/systemtap/tapset/i686/*.stp', match count 0
Searched '/usr/share/systemtap/tapset/*.stp', match count 8
Searched 'tcp_tapset/2.6.15-1.1909_FC5smp/i686/*.stp', match count 0
Searched 'tcp_tapset/2.6.15-1.1909_FC5smp/*.stp', match count 0
Searched 'tcp_tapset/2.6.15/i686/*.stp', match count 0
Searched 'tcp_tapset/2.6.15/*.stp', match count 0
Searched 'tcp_tapset/2.6/i686/*.stp', match count 0
Searched 'tcp_tapset/2.6/*.stp', match count 0
Searched 'tcp_tapset/i686/*.stp', match count 0
Searched 'tcp_tapset/*.stp', match count 1
Pass 1: parsed user script and 10 library script(s).
parsed 'tcp_sendmsg' -> func 'tcp_sendmsg'
pattern 'kernel' matches module 'kernel'
focused on module 'kernel' = [c0100000-c04a2a9c, bias 0]
pattern 'tcp_sendmsg' matches function 'tcp_sendmsg'
selected function tcp_sendmsg
finding prologue for 'tcp_sendmsg' entrypc=0xc02c2d2e highpc=0xc02c36b1
finding location for local 'sk' near address c02c2d39, module bias 0
finding location for local 'sk' near address c02c2d39, module bias 0
finding location for local 'sk' near address c02c2d39, module bias 0
finding location for local 'sk' near address c02c2d39, module bias 0
pattern 'kernel' matches module 'kernel'
parsed 'tcp_sendmsg' -> func 'tcp_sendmsg'
pattern 'kernel' matches module 'kernel'
focused on module 'kernel' = [c0100000-c04a2a9c, bias 0]
pattern 'tcp_sendmsg' matches function 'tcp_sendmsg'
selected function tcp_sendmsg
finding prologue for 'tcp_sendmsg' entrypc=0xc02c2d2e highpc=0xc02c36b1
pattern 'kernel' matches module 'kernel'
parsed 'tcp_recvmsg' -> func 'tcp_recvmsg'
pattern 'kernel' matches module 'kernel'
focused on module 'kernel' = [c0100000-c04a2a9c, bias 0]
pattern 'tcp_recvmsg' matches function 'tcp_recvmsg'
selected function tcp_recvmsg
finding prologue for 'tcp_recvmsg' entrypc=0xc02c3819 highpc=0xc02c3f44
pattern 'kernel' matches module 'kernel'
parsed 'do_exit' -> func 'do_exit'
pattern 'kernel' matches module 'kernel'
focused on module 'kernel' = [c0100000-c04a2a9c, bias 0]
pattern 'do_exit' matches function 'do_exit'
selected function do_exit
finding prologue for 'do_exit' entrypc=0xc0125f4a highpc=0xc0126646
pattern 'kernel' matches module 'kernel'
parsed 'tcp_close_state' -> func 'tcp_close_state'
pattern 'kernel' matches module 'kernel'
focused on module 'kernel' = [c0100000-c04a2a9c, bias 0]
pattern 'tcp_close_state' matches function 'tcp_close_state'
selected function tcp_close_state
finding prologue for 'tcp_close_state' entrypc=0xc02c1fdc highpc=0xc02c20d8
pattern 'kernel' matches module 'kernel'
parsed 'tcp_disconnect' -> func 'tcp_disconnect'
pattern 'kernel' matches module 'kernel'
focused on module 'kernel' = [c0100000-c04a2a9c, bias 0]
pattern 'tcp_disconnect' matches function 'tcp_disconnect'
selected function tcp_disconnect
finding prologue for 'tcp_disconnect' entrypc=0xc02c3f44 highpc=0xc02c4316
pattern 'kernel' matches module 'kernel'
Pass 2: analyzed user script. 7 probe(s), 1f function(s), 6 global(s).
Running grep " [tT] " /proc/kallsyms | sort -k 1,8 -s -o
/tmp/stapMLX13C/symbols.sorted
Pass 3: translated to C into "/tmp/stapMLX13C/stap_3315.c"
Running make -C "/lib/modules/2.6.15-1.1909_FC5smp/build"
M="/tmp/stapMLX13C" modules
make: Entering directory `/usr/src/kernels/2.6.15-1.1909_FC5-smp-i686'
CC [M] /tmp/stapMLX13C/stap_3315.o
Building modules, stage 2.
MODPOST
CC /tmp/stapMLX13C/stap_3315.mod.o
LD [M] /tmp/stapMLX13C/stap_3315.ko
make: Leaving directory `/usr/src/kernels/2.6.15-1.1909_FC5-smp-i686'
Pass 4: compiled into "stap_3315.ko"
Running sudo /usr/libexec/systemtap/stpd -r -d 3315
/tmp/stapMLX13C/stap_3315.ko
UID PID SIZE NAME PORT SOURCE IP
0 2749 106 smbd 445 192.168.1.121
Running rm -rf /tmp/stapMLX13C
-Dave
Hien Nguyen
<hien@us.ibm.com>
Sent by: To
systemtap-owner@s Nathan DeBardeleben
ourceware.org <ndebard@lanl.gov>
cc
"systemtap@sources.redhat.com"
02/06/2006 07:17 <systemtap@sources.redhat.com>
PM Subject
Re: Evaluating SystemTap for
Network Response Times
Hi Nathan,
There are actually two seperate files
1. tcp_mon.stp
2. tcp_tapset/tapset.stp
I include in this mail a tar file for those file for your convenience.
Create a tmp directory,
cd tmp, untar the file and run
stap -I./tcp_tapset tcp_mon.stp (as root)
Thanks, Hien.
Nathan DeBardeleben wrote:
> Hien Nguyen wrote:
>
>> Hi Nathan,
>>
>> I think what you are trying to achieve could be done with systemtap.
>> I wrote a small script to monitor the tcp traffic a while back (see
>> URL below)
>> http://sourceware.org/ml/systemtap/2005-q4/msg00302.html
>>
> I'm a complete newbie to systemtap, so please explain why when I try
> and run the example on the link above that you sent me I get this:
>
>> [root@kraken1 systemtap]# stap -v tcp_mon.stp
>> Created temporary directory "/tmp/stapM6VSNS"
>> parse error: embedded code in unprivileged script
>> saw: embedded-code at tcp_mon.stp:70:1
>> 1 parse error(s).
>> Searched
>> '/usr/share/systemtap/tapset/2.6.14-1.1656_FC4smp/x86_64/*.stp',
>> match count 0
>> Searched '/usr/share/systemtap/tapset/2.6.14-1.1656_FC4smp/*.stp',
>> match count 0
>> Searched '/usr/share/systemtap/tapset/2.6.14/x86_64/*.stp', match
>> count 0
>> Searched '/usr/share/systemtap/tapset/2.6.14/*.stp', match count 1
>> Searched '/usr/share/systemtap/tapset/2.6/x86_64/*.stp', match count 0
>> Searched '/usr/share/systemtap/tapset/2.6/*.stp', match count 0
>> Searched '/usr/share/systemtap/tapset/x86_64/*.stp', match count 0
>> Searched '/usr/share/systemtap/tapset/*.stp', match count 8
>> Pass 1: parsed user script and 9 library script(s).
>> Pass 1: parse failed. Running rm -rf /tmp/stapM6VSNS
>> [root@kraken1 systemtap]#
>
> Also you say to copy tapset.stp to a directory you create in that post
> - where do I get tapset.stp?
>
> Sorry for the beginner question :)
>
> -- Nathan
> Correspondence
> ---------------------------------------------------------------------
> Nathan DeBardeleben, Ph.D.
> Los Alamos National Laboratory
> Parallel Tools Team
> High Performance Computing Environments
> phone: 505-667-3428
> email: ndebard@lanl.gov
> ---------------------------------------------------------------------
>
>
(See attached file: tcp_mon.tar.gz)
[-- Attachment #2: tcp_mon.tar.gz --]
[-- Type: application/octet-stream, Size: 937 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Evaluating SystemTap for Network Response Times
2006-01-31 18:44 ` Nathan DeBardeleben
@ 2006-02-02 15:24 ` Frank Ch. Eigler
0 siblings, 0 replies; 13+ messages in thread
From: Frank Ch. Eigler @ 2006-02-02 15:24 UTC (permalink / raw)
To: Nathan DeBardeleben; +Cc: systemtap
Nathan DeBardeleben <ndebard@lanl.gov> writes:
> [...] We probably wouldn't be willing to do anymore static kernel
> instrumentation. [...]
Understood.
> [...] Believe it or not - we find kernel bugs semi-regularly, and
> they can waste tons of our time trying to track them down. [...]
Actually, it's not hard to believe. Back when I worked on a system
that was a heavy user of the kernel (a big database server), one of
its architects mentioned that of all the bugs they found, roughly 25%
were in the OS kernel. I asked what fraction of time was spent in the
kernel, and it also happened to be around 25%. We both then realized
that this might not be a coincidence.
- FChE
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Evaluating SystemTap for Network Response Times
2006-01-31 18:29 Frank Ch. Eigler
@ 2006-01-31 18:44 ` Nathan DeBardeleben
2006-02-02 15:24 ` Frank Ch. Eigler
0 siblings, 1 reply; 13+ messages in thread
From: Nathan DeBardeleben @ 2006-01-31 18:44 UTC (permalink / raw)
To: Frank Ch. Eigler; +Cc: systemtap
Frank Ch. Eigler wrote:
> Would you folks use static instrumentation too, if it were available
> for systemtap? That is, would you be willing to insert macro calls
> into your kernel sources, which would get roughly djprobes-level
> performance for enabled probes, and a slight slowdown for disabled
> ones?
> (http://sourceware.org/ml/systemtap/2005-q4/msg00415.html)
>
>
We probably wouldn't be willing to do anymore static kernel
instrumentation. We have a number of projects here which have been
kernel patches that are moving away from it if only because tracking the
kernel has been way too time consuming. That probably wouldn't be the
case with this style of static instrumentation, but there are other
issues. In particular, one of our focuses is to attach to a running
machine after we start to observe problems, probe into the kernel,
figure out what's going on, and then detach ourselves.
This whole idea of attach / detach is really at the heart of it.
For instance, we've had problems before with certain chipsets on network
cards that we tested at beta worked great under heavy load and then when
we got the official versions the chipset was slightly changed. We would
see that if we really hammered these cards in a parallel machine,
slamming the network as we often do, that one of the cards would
randomly timeout and the driver would reset it and it would continue.
They FUNCTIONED but their performance would periodically drop to about
(seriously) 20,000 times slower for 1 network operation - then it would
fix itself and move on. If you then say we've got a couple hundred to
thousands of these cards in a large machine, the probability of this
happening obviously goes up and we're in for a lot more trouble. Code
appears to start slowing down, and people try and figure out where the
problems are.
Point being - if we had this type of instrumentation I'm wanting to make
with SystemTap we'd see outlying socket operations and could collocate
those problems with our application's problems. We could then know a
lot more about what's going on with our system - from the start to finish.
Also - of similar problem with network cards we often see what appears
to be a hung system. We dynamically attach to a running app and find
it's sitting in an call waiting on a network operation to complete. We
then wonder - is the kernel waiting? Is the kernel stuck? Is our
network dead? So many questions and we hope that with some careful
probing we can hook into this stuck application and really zero down on
where the problem is. Believe it or not - we find kernel bugs
semi-regularly, and they can waste tons of our time trying to track them
down.
Sorry for the rambling - trying to paint a picture for you of where
we're coming from and where we're hoping SystemTap can help us.
I haven't had time yet to start digesting the scripts that have been
linked to me this morning so I'll need to do that before I can really
determine what further assistance / direction we need.
I really appreciate everyone's time. Take care.
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@lanl.gov
---------------------------------------------------------------------
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Evaluating SystemTap for Network Response Times
@ 2006-01-31 18:29 Frank Ch. Eigler
2006-01-31 18:44 ` Nathan DeBardeleben
0 siblings, 1 reply; 13+ messages in thread
From: Frank Ch. Eigler @ 2006-01-31 18:29 UTC (permalink / raw)
To: systemtap
Hi -
> >I hope it is exactly this kind of complex instrumentation with which
> >systemtap could show its prowess. I would like to help you make it
> >work.
> >
> So should I take this as a volunteering of your assistance in answering
> questions? :)
Absolutely.
> [...] Before I start tracing the 2.6.x kernel for the places we want
> to instrument this network study I wondered - hve you guys had
> already started a "tapset" for this issue?
In connection with their binary trace prototype, the Hitachi folks
have posted some tapset scripts that define network layer probes.
(http://sourceware.org/ml/systemtap/2005-q4/msg00446.html). But we
don't have enough to even call it a start yet.
> The other question I had was if SystemTap intends to look into
> djprobes in the future. [...]
We would like Hitachi to finish a djprobes implementation that
performs applicability tests in the kernel, as a hidden part of the
kprobes api. In this case, it'd be a kprobes-layer optimization that
systemtap need not be aware of. Alternately, if they write user-level
analysis algorithms, we can put those into the translator and bypass
kprobes outright for applicable probe points.
> [...] From our studies, it's a lot lighter weight than kprobes and
> we're hoping to go with something like SystemTap so that in the
> future you guys might implement a way for users to choose which
> probe they want at insert time.
Would you folks use static instrumentation too, if it were available
for systemtap? That is, would you be willing to insert macro calls
into your kernel sources, which would get roughly djprobes-level
performance for enabled probes, and a slight slowdown for disabled
ones?
(http://sourceware.org/ml/systemtap/2005-q4/msg00415.html)
- FChE
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-02-08 17:58 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-31 16:52 Evaluating SystemTap for Network Response Times Nathan DeBardeleben
2006-01-31 17:21 ` Frank Ch. Eigler
2006-01-31 18:29 ` Nathan DeBardeleben
2006-01-31 17:30 ` Hien Nguyen
2006-02-06 22:55 ` Nathan DeBardeleben
2006-02-07 0:17 ` Hien Nguyen
2006-02-08 16:23 ` Nathan DeBardeleben
2006-02-08 17:25 ` Hien Nguyen
2006-02-08 17:58 ` Frank Ch. Eigler
2006-01-31 18:29 Frank Ch. Eigler
2006-01-31 18:44 ` Nathan DeBardeleben
2006-02-02 15:24 ` Frank Ch. Eigler
2006-02-07 9:54 David A Sperry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).