public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* return probe not executed on SMP system
@ 2005-11-03 14:01 Guang Lei Li
  2005-11-03 16:58 ` Hien Nguyen
  2005-11-04  1:11 ` Hien Nguyen
  0 siblings, 2 replies; 8+ messages in thread
From: Guang Lei Li @ 2005-11-03 14:01 UTC (permalink / raw)
  To: systemtap

Hi,

  I met some difficulties when dealing with the return probe on a 
multi-processor system(Power5 System, 4 CPU).

  This is the stap script I used:

global counter

function info()
%{
  struct task_struct *cur = current;
  _stp_printf("\n|%ld|%ld|%ld|%u|", cur->pid, cur->tgid, 
cur->thread_info->cpu);
%}

probe kernel.function("sys_read")
{
  if(pid() == target())
  {
    counter--
    info()
    log("pid:".string(pid())." target:".string(target())."entry")
  }
}

probe kernel.function("sys_read").return
{
  if(pid() == target())
  {
    counter++
    info()
    log("pid:".string(pid())." target:".string(target())."return")
  }
}

probe begin
{
  counter=100
}

probe end
{
  log("counter: ".string(counter))
}

then I run:
  stap -g a.stp -c "ls > a"

The output:

root:/root/temp>stap -g b.stp -c "ls > a"

|3713|3713|3|0|pid:3713 target:3713entry

|3713|3713|3|0|pid:3713 target:3713entry

|3713|3713|3|0|pid:3713 target:3713entry

|3713|3713|3|0|pid:3713 target:3713entry

|3713|3713|3|0|pid:3713 target:3713entry
counter: 95

 It seemed that the return probe didn't work for me.
 I tried the same script on a uni-processor x86 system, it worked fine.

 And I also tried to write a simple c application which will open a file, 
and read some data from this file. I run it:
   stap -g b.stp -c "./a.out"
 It gave the output like:

...
|3881|3881|0|0|pid:3881 target:3881entry

|3881|3881|0|0|pid:3881 target:3881entry

|3881|3881|0|0|pid:3881 target:3881entry

|3881|3881|0|0|pid:3881 target:3881return

|3881|3881|0|0|pid:3881 target:3881entry

|3881|3881|0|0|pid:3881 target:3881return

|3881|3881|0|0|pid:3881 target:3881entry
....

|3881|3881|3|0|pid:3881 target:3881entry

|3881|3881|3|0|pid:3881 target:3881return
counter: 33

 You can see that there are still some return probes not be executed at 
all(if all are executed, the counter should be 100).

 Could anybody give me a hint about this problem?

Best Regards,

Li Guanglei

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: return probe not executed on SMP system
  2005-11-03 14:01 return probe not executed on SMP system Guang Lei Li
@ 2005-11-03 16:58 ` Hien Nguyen
  2005-11-04  1:11 ` Hien Nguyen
  1 sibling, 0 replies; 8+ messages in thread
From: Hien Nguyen @ 2005-11-03 16:58 UTC (permalink / raw)
  To: Guang Lei Li; +Cc: systemtap

You probably run out out kretprobe instances. When the return probe runs 
out kretprobe instances (maxactive), it just skips and increase an 
internal counter "nmissed".

Systemtap does not have the interface to change kretprobe->maxactive (in 
this case increase the maxactive would help) or examine kretprobe->nmissed.

Hien.

Guang Lei Li wrote:

>Hi,
>
>  I met some difficulties when dealing with the return probe on a 
>multi-processor system(Power5 System, 4 CPU).
>
>  This is the stap script I used:
>
>global counter
>
>function info()
>%{
>  struct task_struct *cur = current;
>  _stp_printf("\n|%ld|%ld|%ld|%u|", cur->pid, cur->tgid, 
>cur->thread_info->cpu);
>%}
>
>probe kernel.function("sys_read")
>{
>  if(pid() == target())
>  {
>    counter--
>    info()
>    log("pid:".string(pid())." target:".string(target())."entry")
>  }
>}
>
>probe kernel.function("sys_read").return
>{
>  if(pid() == target())
>  {
>    counter++
>    info()
>    log("pid:".string(pid())." target:".string(target())."return")
>  }
>}
>
>probe begin
>{
>  counter=100
>}
>
>probe end
>{
>  log("counter: ".string(counter))
>}
>
>then I run:
>  stap -g a.stp -c "ls > a"
>
>The output:
>
>root:/root/temp>stap -g b.stp -c "ls > a"
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>counter: 95
>
> It seemed that the return probe didn't work for me.
> I tried the same script on a uni-processor x86 system, it worked fine.
>
> And I also tried to write a simple c application which will open a file, 
>and read some data from this file. I run it:
>   stap -g b.stp -c "./a.out"
> It gave the output like:
>
>...
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881return
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881return
>
>|3881|3881|0|0|pid:3881 target:3881entry
>....
>
>|3881|3881|3|0|pid:3881 target:3881entry
>
>|3881|3881|3|0|pid:3881 target:3881return
>counter: 33
>
> You can see that there are still some return probes not be executed at 
>all(if all are executed, the counter should be 100).
>
> Could anybody give me a hint about this problem?
>
>Best Regards,
>
>Li Guanglei
>
>  
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: return probe not executed on SMP system
  2005-11-03 14:01 return probe not executed on SMP system Guang Lei Li
  2005-11-03 16:58 ` Hien Nguyen
@ 2005-11-04  1:11 ` Hien Nguyen
  2005-11-04  2:11   ` Guang Lei Li
  1 sibling, 1 reply; 8+ messages in thread
From: Hien Nguyen @ 2005-11-04  1:11 UTC (permalink / raw)
  To: Guang Lei Li; +Cc: systemtap

[-- Attachment #1: Type: text/plain, Size: 2394 bytes --]

Hi,
After looking deep into this issue I found out the problem is in the 
stpd. The sptd unloads the systemtap module a little too early before 
the return probes have a chance to fire their handlers.

If you have the systemtap src tree try this tempory_fix.patch. I will 
file a bugzilla tomorrow.

Hien.



Guang Lei Li wrote:

>Hi,
>
>  I met some difficulties when dealing with the return probe on a 
>multi-processor system(Power5 System, 4 CPU).
>
>  This is the stap script I used:
>
>global counter
>
>function info()
>%{
>  struct task_struct *cur = current;
>  _stp_printf("\n|%ld|%ld|%ld|%u|", cur->pid, cur->tgid, 
>cur->thread_info->cpu);
>%}
>
>probe kernel.function("sys_read")
>{
>  if(pid() == target())
>  {
>    counter--
>    info()
>    log("pid:".string(pid())." target:".string(target())."entry")
>  }
>}
>
>probe kernel.function("sys_read").return
>{
>  if(pid() == target())
>  {
>    counter++
>    info()
>    log("pid:".string(pid())." target:".string(target())."return")
>  }
>}
>
>probe begin
>{
>  counter=100
>}
>
>probe end
>{
>  log("counter: ".string(counter))
>}
>
>then I run:
>  stap -g a.stp -c "ls > a"
>
>The output:
>
>root:/root/temp>stap -g b.stp -c "ls > a"
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>counter: 95
>
> It seemed that the return probe didn't work for me.
> I tried the same script on a uni-processor x86 system, it worked fine.
>
> And I also tried to write a simple c application which will open a file, 
>and read some data from this file. I run it:
>   stap -g b.stp -c "./a.out"
> It gave the output like:
>
>...
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881return
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881return
>
>|3881|3881|0|0|pid:3881 target:3881entry
>....
>
>|3881|3881|3|0|pid:3881 target:3881entry
>
>|3881|3881|3|0|pid:3881 target:3881return
>counter: 33
>
> You can see that there are still some return probes not be executed at 
>all(if all are executed, the counter should be 100).
>
> Could anybody give me a hint about this problem?
>
>Best Regards,
>
>Li Guanglei
>
>  
>


[-- Attachment #2: temporary_fix.patch --]
[-- Type: text/x-patch, Size: 527 bytes --]

--- src.old/runtime/stpd/librelay.c	2005-10-19 12:35:35.000000000 -0700
+++ src-20051029/runtime/stpd/librelay.c	2005-11-03 17:06:51.000000000 -0800
@@ -729,11 +729,16 @@
 		case STP_START: 
 		{
 			struct transport_start *t = (struct transport_start *)data;
+			unsigned int mywait= 0xffffffff;
 			dbug("probe_start() returned %d\n", t->pid);
+
 			if (t->pid < 0)
 				cleanup_and_exit(0);
 			else if (target_cmd)
 				kill (target_pid, SIGUSR1);
+				while(mywait> 0) {
+					mywait--;
+				}
 			break;
 		}
 		default:

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: return probe not executed on SMP system
  2005-11-04  1:11 ` Hien Nguyen
@ 2005-11-04  2:11   ` Guang Lei Li
  2005-11-04  5:03     ` Hien Nguyen
  0 siblings, 1 reply; 8+ messages in thread
From: Guang Lei Li @ 2005-11-04  2:11 UTC (permalink / raw)
  To: Hien Nguyen; +Cc: systemtap

Hien Nguyen <hien@us.ibm.com> wrote on 2005-11-04 09:10:53:

> Hi,
> After looking deep into this issue I found out the problem is in the 
> stpd. The sptd unloads the systemtap module a little too early before 
> the return probes have a chance to fire their handlers.
> 
> If you have the systemtap src tree try this tempory_fix.patch. I will 
> file a bugzilla tomorrow.
> 
> Hien.
> 
I applied your patch, and I also raise the maxaction from 1 to 10, then
it worked really well, all return probed are caught.

But the default setting of maxaction to 1 is too low for a multi-processor
system. It will cause a lot of misses of kretprobe(I counter it on Power5
system for a while, about 2000 kprobes are caught, but only about 400
kretprobe was caught, so about 1600 kretprobes are missed)

I read the "kernel/kprobes.c", and found:

/* Pre-allocate memory for max kretprobe instances */
if (rp->maxactive <= 0) {
#ifdef CONFIG_PREEMPT
                rp->maxactive = max(10, 2 * NR_CPUS);
#else
                rp->maxactive = NR_CPUS;
#endif
        }

Is it appropriate to just delete "if (rp->maxactive <= 0)", to change the 
above block as:

#ifdef CONFIG_PREEMPT
     rp->maxactive = max(10, 2 * NR_CPUS);
#else
     rp->maxactive = NR_CPUS;
#endif

so that kretprobe will never miss on a multi-processor system?

Thanks for your help.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: return probe not executed on SMP system
  2005-11-04  2:11   ` Guang Lei Li
@ 2005-11-04  5:03     ` Hien Nguyen
  2005-11-04  5:34       ` Guang Lei Li
  0 siblings, 1 reply; 8+ messages in thread
From: Hien Nguyen @ 2005-11-04  5:03 UTC (permalink / raw)
  To: Guang Lei Li; +Cc: systemtap

Guang Lei Li wrote:

>I applied your patch, and I also raise the maxaction from 1 to 10, then
>it worked really well, all return probed are caught.
>  
>
Do you mean maxactive? maxactive is not default to 1.

>But the default setting of maxaction to 1 is too low for a multi-processor
>system. It will cause a lot of misses of kretprobe(I counter it on Power5
>system for a while, about 2000 kprobes are caught, but only about 400
>kretprobe was caught, so about 1600 kretprobes are missed)
>
>I read the "kernel/kprobes.c", and found:
>
>/* Pre-allocate memory for max kretprobe instances */
>if (rp->maxactive <= 0) {
>#ifdef CONFIG_PREEMPT
>                rp->maxactive = max(10, 2 * NR_CPUS);
>#else
>                rp->maxactive = NR_CPUS;
>#endif
>        }
>  
>

By default maxactive is set to a value as in the #ifdef   #endif clause 
above

>Is it appropriate to just delete "if (rp->maxactive <= 0)", to change the 
>above block as:
>  
>
No, you should not make that change. When register a return probe, user 
has an option to change maxactive.
We need to implement the systemtap's interface to make that change in 
the future.

Hien.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: return probe not executed on SMP system
  2005-11-04  5:03     ` Hien Nguyen
@ 2005-11-04  5:34       ` Guang Lei Li
  2005-11-04  6:06         ` Hien Nguyen
  2005-11-04  6:15         ` Hien Nguyen
  0 siblings, 2 replies; 8+ messages in thread
From: Guang Lei Li @ 2005-11-04  5:34 UTC (permalink / raw)
  To: Hien Nguyen; +Cc: systemtap

Hien Nguyen <hien@us.ibm.com> wrote on 2005-11-04 13:03:28:

> Guang Lei Li wrote:
> 
> >I applied your patch, and I also raise the maxaction from 1 to 10, then
> >it worked really well, all return probed are caught.
> > 
> >
> Do you mean maxactive? maxactive is not default to 1.

Thanks for your answer.

Yes, I mean maxactive, not maxaction.

I grep the systemTap source code, and found in line 2590 of tapsets.cxx:

 o->newline() << probe_name << ".maxactive = 1;";

So systemTap set the maxactive to 1. Am I right?

> 
> >But the default setting of maxaction to 1 is too low for a 
multi-processor
> >system. It will cause a lot of misses of kretprobe(I counter it on 
Power5
> >system for a while, about 2000 kprobes are caught, but only about 400
> >kretprobe was caught, so about 1600 kretprobes are missed)
> >
> >I read the "kernel/kprobes.c", and found:
> >
> >/* Pre-allocate memory for max kretprobe instances */
> >if (rp->maxactive <= 0) {
> >#ifdef CONFIG_PREEMPT
> >                rp->maxactive = max(10, 2 * NR_CPUS);
> >#else
> >                rp->maxactive = NR_CPUS;
> >#endif
> >        }
> > 
> >
> 
> By default maxactive is set to a value as in the #ifdef   #endif clause 
> above
> 
> >Is it appropriate to just delete "if (rp->maxactive <= 0)", to change 
the 
> >above block as:
> > 
> >
> No, you should not make that change. When register a return probe, user 
> has an option to change maxactive.
> We need to implement the systemtap's interface to make that change in 
> the future.
> 
for systemTap, the kretprobe would never change the maxactive because 
systemTap
has already set maxactive to 1.

We could implement a systemTap's interface to do this. But why not let 
kretprobe
automatically detects and sets the real value of maxactive?

> Hien.
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: return probe not executed on SMP system
  2005-11-04  5:34       ` Guang Lei Li
@ 2005-11-04  6:06         ` Hien Nguyen
  2005-11-04  6:15         ` Hien Nguyen
  1 sibling, 0 replies; 8+ messages in thread
From: Hien Nguyen @ 2005-11-04  6:06 UTC (permalink / raw)
  To: Guang Lei Li; +Cc: systemtap

Guang Lei Li wrote:

>
>I grep the systemTap source code, and found in line 2590 of tapsets.cxx:
>
> o->newline() << probe_name << ".maxactive = 1;";
>
>So systemTap set the maxactive to 1. Am I right?
>
>  
>
I did not know that. Systemtap should not set maxactive=1, it should be 
0 instead.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: return probe not executed on SMP system
  2005-11-04  5:34       ` Guang Lei Li
  2005-11-04  6:06         ` Hien Nguyen
@ 2005-11-04  6:15         ` Hien Nguyen
  1 sibling, 0 replies; 8+ messages in thread
From: Hien Nguyen @ 2005-11-04  6:15 UTC (permalink / raw)
  To: Guang Lei Li; +Cc: systemtap


>I grep the systemTap source code, and found in line 2590 of tapsets.cxx:
>
> o->newline() << probe_name << ".maxactive = 1;";
>
>So systemTap set the maxactive to 1. Am I right?
>
>  
>
Confirm! You are right, systemtap set maxactive=1. It should set it to 0 
or do not set it at all so kprobes can set it to the default value.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-11-04  6:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-03 14:01 return probe not executed on SMP system Guang Lei Li
2005-11-03 16:58 ` Hien Nguyen
2005-11-04  1:11 ` Hien Nguyen
2005-11-04  2:11   ` Guang Lei Li
2005-11-04  5:03     ` Hien Nguyen
2005-11-04  5:34       ` Guang Lei Li
2005-11-04  6:06         ` Hien Nguyen
2005-11-04  6:15         ` Hien Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).