* return probe not executed on SMP system
@ 2005-11-03 14:01 Guang Lei Li
2005-11-03 16:58 ` Hien Nguyen
2005-11-04 1:11 ` Hien Nguyen
0 siblings, 2 replies; 8+ messages in thread
From: Guang Lei Li @ 2005-11-03 14:01 UTC (permalink / raw)
To: systemtap
Hi,
I met some difficulties when dealing with the return probe on a
multi-processor system(Power5 System, 4 CPU).
This is the stap script I used:
global counter
function info()
%{
struct task_struct *cur = current;
_stp_printf("\n|%ld|%ld|%ld|%u|", cur->pid, cur->tgid,
cur->thread_info->cpu);
%}
probe kernel.function("sys_read")
{
if(pid() == target())
{
counter--
info()
log("pid:".string(pid())." target:".string(target())."entry")
}
}
probe kernel.function("sys_read").return
{
if(pid() == target())
{
counter++
info()
log("pid:".string(pid())." target:".string(target())."return")
}
}
probe begin
{
counter=100
}
probe end
{
log("counter: ".string(counter))
}
then I run:
stap -g a.stp -c "ls > a"
The output:
root:/root/temp>stap -g b.stp -c "ls > a"
|3713|3713|3|0|pid:3713 target:3713entry
|3713|3713|3|0|pid:3713 target:3713entry
|3713|3713|3|0|pid:3713 target:3713entry
|3713|3713|3|0|pid:3713 target:3713entry
|3713|3713|3|0|pid:3713 target:3713entry
counter: 95
It seemed that the return probe didn't work for me.
I tried the same script on a uni-processor x86 system, it worked fine.
And I also tried to write a simple c application which will open a file,
and read some data from this file. I run it:
stap -g b.stp -c "./a.out"
It gave the output like:
...
|3881|3881|0|0|pid:3881 target:3881entry
|3881|3881|0|0|pid:3881 target:3881entry
|3881|3881|0|0|pid:3881 target:3881entry
|3881|3881|0|0|pid:3881 target:3881return
|3881|3881|0|0|pid:3881 target:3881entry
|3881|3881|0|0|pid:3881 target:3881return
|3881|3881|0|0|pid:3881 target:3881entry
....
|3881|3881|3|0|pid:3881 target:3881entry
|3881|3881|3|0|pid:3881 target:3881return
counter: 33
You can see that there are still some return probes not be executed at
all(if all are executed, the counter should be 100).
Could anybody give me a hint about this problem?
Best Regards,
Li Guanglei
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: return probe not executed on SMP system
2005-11-03 14:01 return probe not executed on SMP system Guang Lei Li
@ 2005-11-03 16:58 ` Hien Nguyen
2005-11-04 1:11 ` Hien Nguyen
1 sibling, 0 replies; 8+ messages in thread
From: Hien Nguyen @ 2005-11-03 16:58 UTC (permalink / raw)
To: Guang Lei Li; +Cc: systemtap
You probably run out out kretprobe instances. When the return probe runs
out kretprobe instances (maxactive), it just skips and increase an
internal counter "nmissed".
Systemtap does not have the interface to change kretprobe->maxactive (in
this case increase the maxactive would help) or examine kretprobe->nmissed.
Hien.
Guang Lei Li wrote:
>Hi,
>
> I met some difficulties when dealing with the return probe on a
>multi-processor system(Power5 System, 4 CPU).
>
> This is the stap script I used:
>
>global counter
>
>function info()
>%{
> struct task_struct *cur = current;
> _stp_printf("\n|%ld|%ld|%ld|%u|", cur->pid, cur->tgid,
>cur->thread_info->cpu);
>%}
>
>probe kernel.function("sys_read")
>{
> if(pid() == target())
> {
> counter--
> info()
> log("pid:".string(pid())." target:".string(target())."entry")
> }
>}
>
>probe kernel.function("sys_read").return
>{
> if(pid() == target())
> {
> counter++
> info()
> log("pid:".string(pid())." target:".string(target())."return")
> }
>}
>
>probe begin
>{
> counter=100
>}
>
>probe end
>{
> log("counter: ".string(counter))
>}
>
>then I run:
> stap -g a.stp -c "ls > a"
>
>The output:
>
>root:/root/temp>stap -g b.stp -c "ls > a"
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>counter: 95
>
> It seemed that the return probe didn't work for me.
> I tried the same script on a uni-processor x86 system, it worked fine.
>
> And I also tried to write a simple c application which will open a file,
>and read some data from this file. I run it:
> stap -g b.stp -c "./a.out"
> It gave the output like:
>
>...
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881return
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881return
>
>|3881|3881|0|0|pid:3881 target:3881entry
>....
>
>|3881|3881|3|0|pid:3881 target:3881entry
>
>|3881|3881|3|0|pid:3881 target:3881return
>counter: 33
>
> You can see that there are still some return probes not be executed at
>all(if all are executed, the counter should be 100).
>
> Could anybody give me a hint about this problem?
>
>Best Regards,
>
>Li Guanglei
>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: return probe not executed on SMP system
2005-11-03 14:01 return probe not executed on SMP system Guang Lei Li
2005-11-03 16:58 ` Hien Nguyen
@ 2005-11-04 1:11 ` Hien Nguyen
2005-11-04 2:11 ` Guang Lei Li
1 sibling, 1 reply; 8+ messages in thread
From: Hien Nguyen @ 2005-11-04 1:11 UTC (permalink / raw)
To: Guang Lei Li; +Cc: systemtap
[-- Attachment #1: Type: text/plain, Size: 2394 bytes --]
Hi,
After looking deep into this issue I found out the problem is in the
stpd. The sptd unloads the systemtap module a little too early before
the return probes have a chance to fire their handlers.
If you have the systemtap src tree try this tempory_fix.patch. I will
file a bugzilla tomorrow.
Hien.
Guang Lei Li wrote:
>Hi,
>
> I met some difficulties when dealing with the return probe on a
>multi-processor system(Power5 System, 4 CPU).
>
> This is the stap script I used:
>
>global counter
>
>function info()
>%{
> struct task_struct *cur = current;
> _stp_printf("\n|%ld|%ld|%ld|%u|", cur->pid, cur->tgid,
>cur->thread_info->cpu);
>%}
>
>probe kernel.function("sys_read")
>{
> if(pid() == target())
> {
> counter--
> info()
> log("pid:".string(pid())." target:".string(target())."entry")
> }
>}
>
>probe kernel.function("sys_read").return
>{
> if(pid() == target())
> {
> counter++
> info()
> log("pid:".string(pid())." target:".string(target())."return")
> }
>}
>
>probe begin
>{
> counter=100
>}
>
>probe end
>{
> log("counter: ".string(counter))
>}
>
>then I run:
> stap -g a.stp -c "ls > a"
>
>The output:
>
>root:/root/temp>stap -g b.stp -c "ls > a"
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>counter: 95
>
> It seemed that the return probe didn't work for me.
> I tried the same script on a uni-processor x86 system, it worked fine.
>
> And I also tried to write a simple c application which will open a file,
>and read some data from this file. I run it:
> stap -g b.stp -c "./a.out"
> It gave the output like:
>
>...
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881return
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881return
>
>|3881|3881|0|0|pid:3881 target:3881entry
>....
>
>|3881|3881|3|0|pid:3881 target:3881entry
>
>|3881|3881|3|0|pid:3881 target:3881return
>counter: 33
>
> You can see that there are still some return probes not be executed at
>all(if all are executed, the counter should be 100).
>
> Could anybody give me a hint about this problem?
>
>Best Regards,
>
>Li Guanglei
>
>
>
[-- Attachment #2: temporary_fix.patch --]
[-- Type: text/x-patch, Size: 527 bytes --]
--- src.old/runtime/stpd/librelay.c 2005-10-19 12:35:35.000000000 -0700
+++ src-20051029/runtime/stpd/librelay.c 2005-11-03 17:06:51.000000000 -0800
@@ -729,11 +729,16 @@
case STP_START:
{
struct transport_start *t = (struct transport_start *)data;
+ unsigned int mywait= 0xffffffff;
dbug("probe_start() returned %d\n", t->pid);
+
if (t->pid < 0)
cleanup_and_exit(0);
else if (target_cmd)
kill (target_pid, SIGUSR1);
+ while(mywait> 0) {
+ mywait--;
+ }
break;
}
default:
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: return probe not executed on SMP system
2005-11-04 1:11 ` Hien Nguyen
@ 2005-11-04 2:11 ` Guang Lei Li
2005-11-04 5:03 ` Hien Nguyen
0 siblings, 1 reply; 8+ messages in thread
From: Guang Lei Li @ 2005-11-04 2:11 UTC (permalink / raw)
To: Hien Nguyen; +Cc: systemtap
Hien Nguyen <hien@us.ibm.com> wrote on 2005-11-04 09:10:53:
> Hi,
> After looking deep into this issue I found out the problem is in the
> stpd. The sptd unloads the systemtap module a little too early before
> the return probes have a chance to fire their handlers.
>
> If you have the systemtap src tree try this tempory_fix.patch. I will
> file a bugzilla tomorrow.
>
> Hien.
>
I applied your patch, and I also raise the maxaction from 1 to 10, then
it worked really well, all return probed are caught.
But the default setting of maxaction to 1 is too low for a multi-processor
system. It will cause a lot of misses of kretprobe(I counter it on Power5
system for a while, about 2000 kprobes are caught, but only about 400
kretprobe was caught, so about 1600 kretprobes are missed)
I read the "kernel/kprobes.c", and found:
/* Pre-allocate memory for max kretprobe instances */
if (rp->maxactive <= 0) {
#ifdef CONFIG_PREEMPT
rp->maxactive = max(10, 2 * NR_CPUS);
#else
rp->maxactive = NR_CPUS;
#endif
}
Is it appropriate to just delete "if (rp->maxactive <= 0)", to change the
above block as:
#ifdef CONFIG_PREEMPT
rp->maxactive = max(10, 2 * NR_CPUS);
#else
rp->maxactive = NR_CPUS;
#endif
so that kretprobe will never miss on a multi-processor system?
Thanks for your help.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: return probe not executed on SMP system
2005-11-04 2:11 ` Guang Lei Li
@ 2005-11-04 5:03 ` Hien Nguyen
2005-11-04 5:34 ` Guang Lei Li
0 siblings, 1 reply; 8+ messages in thread
From: Hien Nguyen @ 2005-11-04 5:03 UTC (permalink / raw)
To: Guang Lei Li; +Cc: systemtap
Guang Lei Li wrote:
>I applied your patch, and I also raise the maxaction from 1 to 10, then
>it worked really well, all return probed are caught.
>
>
Do you mean maxactive? maxactive is not default to 1.
>But the default setting of maxaction to 1 is too low for a multi-processor
>system. It will cause a lot of misses of kretprobe(I counter it on Power5
>system for a while, about 2000 kprobes are caught, but only about 400
>kretprobe was caught, so about 1600 kretprobes are missed)
>
>I read the "kernel/kprobes.c", and found:
>
>/* Pre-allocate memory for max kretprobe instances */
>if (rp->maxactive <= 0) {
>#ifdef CONFIG_PREEMPT
> rp->maxactive = max(10, 2 * NR_CPUS);
>#else
> rp->maxactive = NR_CPUS;
>#endif
> }
>
>
By default maxactive is set to a value as in the #ifdef #endif clause
above
>Is it appropriate to just delete "if (rp->maxactive <= 0)", to change the
>above block as:
>
>
No, you should not make that change. When register a return probe, user
has an option to change maxactive.
We need to implement the systemtap's interface to make that change in
the future.
Hien.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: return probe not executed on SMP system
2005-11-04 5:03 ` Hien Nguyen
@ 2005-11-04 5:34 ` Guang Lei Li
2005-11-04 6:06 ` Hien Nguyen
2005-11-04 6:15 ` Hien Nguyen
0 siblings, 2 replies; 8+ messages in thread
From: Guang Lei Li @ 2005-11-04 5:34 UTC (permalink / raw)
To: Hien Nguyen; +Cc: systemtap
Hien Nguyen <hien@us.ibm.com> wrote on 2005-11-04 13:03:28:
> Guang Lei Li wrote:
>
> >I applied your patch, and I also raise the maxaction from 1 to 10, then
> >it worked really well, all return probed are caught.
> >
> >
> Do you mean maxactive? maxactive is not default to 1.
Thanks for your answer.
Yes, I mean maxactive, not maxaction.
I grep the systemTap source code, and found in line 2590 of tapsets.cxx:
o->newline() << probe_name << ".maxactive = 1;";
So systemTap set the maxactive to 1. Am I right?
>
> >But the default setting of maxaction to 1 is too low for a
multi-processor
> >system. It will cause a lot of misses of kretprobe(I counter it on
Power5
> >system for a while, about 2000 kprobes are caught, but only about 400
> >kretprobe was caught, so about 1600 kretprobes are missed)
> >
> >I read the "kernel/kprobes.c", and found:
> >
> >/* Pre-allocate memory for max kretprobe instances */
> >if (rp->maxactive <= 0) {
> >#ifdef CONFIG_PREEMPT
> > rp->maxactive = max(10, 2 * NR_CPUS);
> >#else
> > rp->maxactive = NR_CPUS;
> >#endif
> > }
> >
> >
>
> By default maxactive is set to a value as in the #ifdef #endif clause
> above
>
> >Is it appropriate to just delete "if (rp->maxactive <= 0)", to change
the
> >above block as:
> >
> >
> No, you should not make that change. When register a return probe, user
> has an option to change maxactive.
> We need to implement the systemtap's interface to make that change in
> the future.
>
for systemTap, the kretprobe would never change the maxactive because
systemTap
has already set maxactive to 1.
We could implement a systemTap's interface to do this. But why not let
kretprobe
automatically detects and sets the real value of maxactive?
> Hien.
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2005-11-04 6:15 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-03 14:01 return probe not executed on SMP system Guang Lei Li
2005-11-03 16:58 ` Hien Nguyen
2005-11-04 1:11 ` Hien Nguyen
2005-11-04 2:11 ` Guang Lei Li
2005-11-04 5:03 ` Hien Nguyen
2005-11-04 5:34 ` Guang Lei Li
2005-11-04 6:06 ` Hien Nguyen
2005-11-04 6:15 ` Hien Nguyen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).