* get the mapping of <key, syscallname>
@ 2006-05-11 2:05 Li Guanglei
2006-05-11 18:14 ` Martin Hunt
2006-05-12 15:08 ` Frank Ch. Eigler
0 siblings, 2 replies; 10+ messages in thread
From: Li Guanglei @ 2006-05-11 2:05 UTC (permalink / raw)
To: systemtap
Hi,
One of my tapsets will log the syscall name got from
CONTEXT->probe_point. But string printing is costly so I want to log
the key of syscall name, and later the post-processing app in user
space will figure out the syscall name represented by corresponding keys.
But this requires to generate a mapping of pairs of <key,
syscallname> at probe begin. The CONTEXT->probe_point is stored in
dwarf_kprobe_probe_??_location_names, so I can use the address of each
syscall name as the key and in the probe handler I can simply log the
pointer instead of a string. But since there are many probe points
defined beside syscall in my tapsets, I can't figure out which
dwarf_kprobe_probe_.._location_names variable the syscall name is stored.
Do you have any ideas about this? Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: get the mapping of <key, syscallname>
2006-05-11 2:05 get the mapping of <key, syscallname> Li Guanglei
@ 2006-05-11 18:14 ` Martin Hunt
2006-05-12 15:08 ` Frank Ch. Eigler
1 sibling, 0 replies; 10+ messages in thread
From: Martin Hunt @ 2006-05-11 18:14 UTC (permalink / raw)
To: Li Guanglei; +Cc: systemtap
On Thu, 2006-05-11 at 10:04 +0800, Li Guanglei wrote:
> Hi,
> One of my tapsets will log the syscall name got from
> CONTEXT->probe_point. But string printing is costly so I want to log
> the key of syscall name, and later the post-processing app in user
> space will figure out the syscall name represented by corresponding keys.
string printing is more expensive than it should be because the
translator emits unoptimized code that copies them too often. That will
probably be fixed in the future. Still, I wouldn't call it "costly".
Writing a binary key instead will save you little for what may be a
temporary advantage. I advise waiting on this.
Martin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: get the mapping of <key, syscallname>
2006-05-11 2:05 get the mapping of <key, syscallname> Li Guanglei
2006-05-11 18:14 ` Martin Hunt
@ 2006-05-12 15:08 ` Frank Ch. Eigler
2006-05-12 16:06 ` Jose R. Santos
1 sibling, 1 reply; 10+ messages in thread
From: Frank Ch. Eigler @ 2006-05-12 15:08 UTC (permalink / raw)
To: Li Guanglei; +Cc: systemtap
Li Guanglei <guanglei@cn.ibm.com> writes:
> One of my tapsets will log the syscall name got from
> CONTEXT->probe_point. [...]
Why not use the "name" variable set by the syscalls tapset?
> [...] I want to log the key of syscall name, and later the
> post-processing app in user space will figure out the syscall name
> represented by corresponding keys. [...]
As does Martin, I wonder if this is a good place to spend your
optimization energies. Anyway, one possibility is to have the
syscalls tapset also define a system call number variable.
- FChE
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: get the mapping of <key, syscallname>
2006-05-12 15:08 ` Frank Ch. Eigler
@ 2006-05-12 16:06 ` Jose R. Santos
2006-05-12 16:12 ` Frank Ch. Eigler
0 siblings, 1 reply; 10+ messages in thread
From: Jose R. Santos @ 2006-05-12 16:06 UTC (permalink / raw)
To: Frank Ch. Eigler; +Cc: Li Guanglei, systemtap
Frank Ch. Eigler wrote:
>Li Guanglei <guanglei@cn.ibm.com> writes:
>
>> One of my tapsets will log the syscall name got from
>> CONTEXT->probe_point. [...]
>
>Why not use the "name" variable set by the syscalls tapset?
>
>
LKET does not currently use the syscall tapset because it doesnt use the
extra information that this tapset collects from each systemcall. In
order to improve performance, we keep the system call trace as simple as
possible.
>> [...] I want to log the key of syscall name, and later the
>> post-processing app in user space will figure out the syscall name
>> represented by corresponding keys. [...]
>
>As does Martin, I wonder if this is a good place to spend your
>optimization energies. Anyway, one possibility is to have the
>syscalls tapset also define a system call number variable.
>
>
Its not just the string printing that's costly, its also searching the
system call name with in CONTEXT->probe_point. Avoiding searching and
printing of the syscall string can provide between 3-6% performance
improvements on SPECjbb results.
>- FChE
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: get the mapping of <key, syscallname>
2006-05-12 16:06 ` Jose R. Santos
@ 2006-05-12 16:12 ` Frank Ch. Eigler
2006-05-12 16:40 ` Jose R. Santos
2006-05-13 7:18 ` Li Guanglei
0 siblings, 2 replies; 10+ messages in thread
From: Frank Ch. Eigler @ 2006-05-12 16:12 UTC (permalink / raw)
To: Jose R. Santos; +Cc: systemtap
Hi -
> > Why not use the "name" variable set by the syscalls tapset?
>
> LKET does not currently use the syscall tapset because it doesnt use the
> extra information that this tapset collects from each systemcall. [...]
But, due to the optimization pass in the translator, only that portion
of the information actually *used* by your script is actually
collected. Please try it out (check out stap -p2 or -p3 output to
confirm).
> [...] Its not just the string printing that's costly, its also
> searching the system call name with in CONTEXT->probe_point. [...]
Rolling your own code to manually scan through "probe_point" is not
really necessary or appropriate. There is a "probefunc()" function
you can use for now.
- FChE
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: get the mapping of <key, syscallname>
2006-05-12 16:12 ` Frank Ch. Eigler
@ 2006-05-12 16:40 ` Jose R. Santos
2006-05-13 7:18 ` Li Guanglei
1 sibling, 0 replies; 10+ messages in thread
From: Jose R. Santos @ 2006-05-12 16:40 UTC (permalink / raw)
To: Frank Ch. Eigler; +Cc: systemtap
Frank Ch. Eigler wrote:
>Hi -
>
>> > Why not use the "name" variable set by the syscalls tapset?
>>
>> LKET does not currently use the syscall tapset because it doesnt use the
>> extra information that this tapset collects from each systemcall. [...]
>
>But, due to the optimization pass in the translator, only that portion
>of the information actually *used* by your script is actually
>collected. Please try it out (check out stap -p2 or -p3 output to
>confirm).
>
>
I seem to remember that there was another reason why Li Guanglei did not
use the systemcall tapset but I cant remember why. If the performance
is good, then adding the systemcall id number variable will solve this
problem. We'll, try it out.
>> [...] Its not just the string printing that's costly, its also
>> searching the system call name with in CONTEXT->probe_point. [...]
>
>Rolling your own code to manually scan through "probe_point" is not
>really necessary or appropriate. There is a "probefunc()" function
>you can use for now.
>
>
I agree. Thats why I would like to see LKET as part of the SystemTap
tapset in order to have more people reviewing some of the stupid design
decisions we make. :)
>- FChE
>
>
-JRS
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: get the mapping of <key, syscallname>
2006-05-12 16:12 ` Frank Ch. Eigler
2006-05-12 16:40 ` Jose R. Santos
@ 2006-05-13 7:18 ` Li Guanglei
2006-05-14 20:26 ` Frank Ch. Eigler
2006-05-15 16:39 ` Hien Nguyen
1 sibling, 2 replies; 10+ messages in thread
From: Li Guanglei @ 2006-05-13 7:18 UTC (permalink / raw)
To: Frank Ch. Eigler; +Cc: Jose R. Santos, systemtap
Frank Ch. Eigler ??:
> Hi -
>
>>> Why not use the "name" variable set by the syscalls tapset?
>> LKET does not currently use the syscall tapset because it doesnt use the
>> extra information that this tapset collects from each systemcall. [...]
>
> But, due to the optimization pass in the translator, only that portion
> of the information actually *used* by your script is actually
> collected. Please try it out (check out stap -p2 or -p3 output to
> confirm).
>
I tried and found it works really well. It compiles away all the
unnecessary codes.
But it still has one issue. e.g, I tried to use:
probe syscall.* { ... } to trace all system calls on ppc64. But it
failed since sys64_time(defined in SystemTap's ppc64/syscall.stp) is
removed from 2.6.16.5/ppc64. I think this issues has been addressed
in bug 2645.
If this bug #2645 is resolved , then I think I can use the syscall.stp
tapsets shipped with SystemTap. Another advantage of using it is that
I can simply use the $name instead of getting it using lines of codes.
But one disadvantage is that if using "syscall.*", then every syscall
will use a different function to be its probe handler. It causes the
generated much larger and compiling time much longer compared with
current LKET's syscall tapsets:
/usr/local/share/systemtap/tapset> ll -h /tmp/stap6HKAzF/*
127 May 13 07:36 /tmp/stap6HKAzF/Makefile
2.0M May 13 07:36 /tmp/stap6HKAzF/stap_5910.c
4.1M May 13 07:37 /tmp/stap6HKAzF/stap_5910.ko
2.8K May 13 07:37 /tmp/stap6HKAzF/stap_5910.mod.c
60K May 13 07:37 /tmp/stap6HKAzF/stap_5910.mod.o
4.0M May 13 07:37 /tmp/stap6HKAzF/stap_5910.o
712K May 13 07:36 /tmp/stap6HKAzF/stap-symbols.h
518K May 13 07:36 /tmp/stap6HKAzF/symbols.sorted
>> [...] Its not just the string printing that's costly, its also
>> searching the system call name with in CONTEXT->probe_point. [...]
>
> Rolling your own code to manually scan through "probe_point" is not
> really necessary or appropriate. There is a "probefunc()" function
> you can use for now.
>
I once used the code of probefunc() inside log_syscall_extra()
function. But during the performance analysis of LKET, I found it's a
little expensive considered that log_syscall_extra() will be called
very frequently. So I change to use these ugly manual written codes
for a better performance.
I think use SystemTap as a trace tool cares much about the performance
considered that it will be used in a heavy loaded system and a slight
performance differece will add up to a big one. But probefunc() is not
a problem if SystemTap is used as a diagnosis purpose.
Below is the email I sent out internally on this. I used specjbb as
the probed application.
>>> FYI:
As discovered in the former analysis of LKET, strpbrk() is one of
the most costly operations. It is a kernel function and is called by
strsep() to get the current syscall name.
I use another way to get the syscall name:
ptr = CONTEXT->probe_point;
while(*ptr++ != '"');
do {
buffer[i++] = *ptr++;
} while(*ptr != '@');
buffer[i] = '\0';
And I did some testings to compare the new method and the old one. I
also use get_cycles() to get the cycles consumed by these two method.
================= old method using strsep() ================
OProfile:
samples % app name symbol name
4239923 67.2099 java (no symbols)
207747 3.2931 lglzy ._stp_vsnprintf
109242 1.7317 vmlinux-2.6.16.5.debug .strpbrk
58739 0.9311 libc-2.3.4.so memset
58344 0.9249 lglzy .dwarf_kprobe_1_enter
56300 0.8924 libjvm.so fastLocalMark
55565 0.8808 libjvm.so cacheAllocWithInitialization
54983 0.8716 vmlinux-2.6.16.5.debug restore
54123 0.8579 vmlinux-2.6.16.5.debug .kernel_unknown_bucket
50853 0.8061 libjitc.so sd9
get_cycles(): ~165
================= new method ======================
OProfile:
samples % app name symbol name
4373512 69.2051 java (no symbols)
211307 3.3437 lglzy ._stp_vsnprintf
59605 0.9432 lglzy .function_log_syscall_tracedata
58020 0.9181 libc-2.3.4.so memset
57804 0.9147 vmlinux-2.6.16.5.debug restore
56514 0.8943 libjvm.so fastLocalMark
55085 0.8716 libjvm.so cacheAllocWithInitialization
53948 0.8537 lglzy .dwarf_kprobe_1_enter
50175 0.7940 libjitc.so sd9
~
get_cycles(): ~40
================= SUMMARY ====================
oprofile shows that the total samples in lglzy & vmlinux decrease
about 2% using the new method.
And get_cycles() shows that the total cycles using the new method is
~40, much less than ~165 with the old method.
I've committed this change into TLA
> - FChE
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: get the mapping of <key, syscallname>
2006-05-13 7:18 ` Li Guanglei
@ 2006-05-14 20:26 ` Frank Ch. Eigler
2006-05-15 16:39 ` Hien Nguyen
1 sibling, 0 replies; 10+ messages in thread
From: Frank Ch. Eigler @ 2006-05-14 20:26 UTC (permalink / raw)
To: Li Guanglei; +Cc: systemtap
Hi -
guanglei wrote:
> [...]
> But one disadvantage is that if using "syscall.*", then every syscall
> will use a different function to be its probe handler.
There are two separate opportunities for improvement here, and both of
them are going to get some attention.
First, a lot of essentially duplicated registration, unregistration,
and entry routines are being generated. These can be collapsed into
one triplet of functions per type (one for all kprobes, one for all
kretprobes, etc.). This will remove hundreds of functions of the
"syscall.*"-originated C code, and should make a drastic difference as
to performance.
Second, probe handlers constructed from distinct probe/alias bodies
may be collapsed, if they turn out to be equivalent after
optimization. One might think of it as a generalization of graydon's
"flavour" calculations in tapsets.cxx.
Together, these two tapset-level optimizations should bring the
"abstraction penalty" way down.
> [Regarding parsing CONTEXT->probe_point: ...] As discovered in the
> former analysis of LKET, strpbrk() is one of the most costly
> operations. It is a kernel function and is called by strsep() to get
> the current syscall name. I use another way to get the syscall
> name: [...]
It may be worth addressing this a totally different way too. The
translator could provide another variable, a sibling of
CONTEXT->probe_point. It would contains the equivalent of
probefunc(), at least for those probes that have a "synchronous"
binding to a PC address, which kprobes-based ones do. It could be
statically initialized the same way as probe_point, and the
probefunc() tapset function could be extended to look at that field
first. No strpbrk or iteration at all then.
- FChE
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: get the mapping of <key, syscallname>
2006-05-13 7:18 ` Li Guanglei
2006-05-14 20:26 ` Frank Ch. Eigler
@ 2006-05-15 16:39 ` Hien Nguyen
1 sibling, 0 replies; 10+ messages in thread
From: Hien Nguyen @ 2006-05-15 16:39 UTC (permalink / raw)
To: Li Guanglei; +Cc: Frank Ch. Eigler, Jose R. Santos, systemtap
On Sat, 2006-05-13 at 15:18 +0800, Li Guanglei wrote:
>
> But it still has one issue. e.g, I tried to use:
> probe syscall.* { ... } to trace all system calls on ppc64. But it
> failed since sys64_time(defined in SystemTap's ppc64/syscall.stp) is
> removed from 2.6.16.5/ppc64. I think this issues has been addressed
> in bug 2645.
For the time being, I would use the kernel_v preprocessor. I fact there
are a few of them in the ppc64/syscall.stp.
>
> If this bug #2645 is resolved , then I think I can use the syscall.stp
> tapsets shipped with SystemTap. Another advantage of using it is that
> I can simply use the $name instead of getting it using lines of codes.
Another advantage is when more people start using syscalls tapset we
will get more feedback (bug reports).
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: get the mapping of <key, syscallname>
@ 2006-05-15 17:41 Stone, Joshua I
0 siblings, 0 replies; 10+ messages in thread
From: Stone, Joshua I @ 2006-05-15 17:41 UTC (permalink / raw)
To: Frank Ch. Eigler, Li Guanglei; +Cc: systemtap
On Sunday, May 14, 2006 1:25 PM, Frank Ch. Eigler wrote:
> Second, probe handlers constructed from distinct probe/alias bodies
> may be collapsed, if they turn out to be equivalent after
> optimization. One might think of it as a generalization of graydon's
> "flavour" calculations in tapsets.cxx.
Of course, in the syscall.* case, simply using 'name' will make every
handler appear unique...
Josh
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-05-15 17:41 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-11 2:05 get the mapping of <key, syscallname> Li Guanglei
2006-05-11 18:14 ` Martin Hunt
2006-05-12 15:08 ` Frank Ch. Eigler
2006-05-12 16:06 ` Jose R. Santos
2006-05-12 16:12 ` Frank Ch. Eigler
2006-05-12 16:40 ` Jose R. Santos
2006-05-13 7:18 ` Li Guanglei
2006-05-14 20:26 ` Frank Ch. Eigler
2006-05-15 16:39 ` Hien Nguyen
2006-05-15 17:41 Stone, Joshua I
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).