From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17901 invoked by alias); 4 Nov 2005 01:11:00 -0000 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org Received: (qmail 17836 invoked by uid 22791); 4 Nov 2005 01:10:56 -0000 Message-ID: <436AB51D.9040307@us.ibm.com> Date: Fri, 04 Nov 2005 01:11:00 -0000 From: Hien Nguyen User-Agent: Mozilla Thunderbird 1.0.6-1.1.fc4 (X11/20050720) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Guang Lei Li CC: systemtap@sources.redhat.com Subject: Re: return probe not executed on SMP system References: In-Reply-To: Content-Type: multipart/mixed; boundary="------------020707070300070805010509" X-SW-Source: 2005-q4/txt/msg00133.txt.bz2 This is a multi-part message in MIME format. --------------020707070300070805010509 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Content-length: 2394 Hi, After looking deep into this issue I found out the problem is in the stpd. The sptd unloads the systemtap module a little too early before the return probes have a chance to fire their handlers. If you have the systemtap src tree try this tempory_fix.patch. I will file a bugzilla tomorrow. Hien. Guang Lei Li wrote: >Hi, > > I met some difficulties when dealing with the return probe on a >multi-processor system(Power5 System, 4 CPU). > > This is the stap script I used: > >global counter > >function info() >%{ > struct task_struct *cur = current; > _stp_printf("\n|%ld|%ld|%ld|%u|", cur->pid, cur->tgid, >cur->thread_info->cpu); >%} > >probe kernel.function("sys_read") >{ > if(pid() == target()) > { > counter-- > info() > log("pid:".string(pid())." target:".string(target())."entry") > } >} > >probe kernel.function("sys_read").return >{ > if(pid() == target()) > { > counter++ > info() > log("pid:".string(pid())." target:".string(target())."return") > } >} > >probe begin >{ > counter=100 >} > >probe end >{ > log("counter: ".string(counter)) >} > >then I run: > stap -g a.stp -c "ls > a" > >The output: > >root:/root/temp>stap -g b.stp -c "ls > a" > >|3713|3713|3|0|pid:3713 target:3713entry > >|3713|3713|3|0|pid:3713 target:3713entry > >|3713|3713|3|0|pid:3713 target:3713entry > >|3713|3713|3|0|pid:3713 target:3713entry > >|3713|3713|3|0|pid:3713 target:3713entry >counter: 95 > > It seemed that the return probe didn't work for me. > I tried the same script on a uni-processor x86 system, it worked fine. > > And I also tried to write a simple c application which will open a file, >and read some data from this file. I run it: > stap -g b.stp -c "./a.out" > It gave the output like: > >... >|3881|3881|0|0|pid:3881 target:3881entry > >|3881|3881|0|0|pid:3881 target:3881entry > >|3881|3881|0|0|pid:3881 target:3881entry > >|3881|3881|0|0|pid:3881 target:3881return > >|3881|3881|0|0|pid:3881 target:3881entry > >|3881|3881|0|0|pid:3881 target:3881return > >|3881|3881|0|0|pid:3881 target:3881entry >.... > >|3881|3881|3|0|pid:3881 target:3881entry > >|3881|3881|3|0|pid:3881 target:3881return >counter: 33 > > You can see that there are still some return probes not be executed at >all(if all are executed, the counter should be 100). > > Could anybody give me a hint about this problem? > >Best Regards, > >Li Guanglei > > > --------------020707070300070805010509 Content-Type: text/x-patch; name="temporary_fix.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="temporary_fix.patch" Content-length: 527 --- src.old/runtime/stpd/librelay.c 2005-10-19 12:35:35.000000000 -0700 +++ src-20051029/runtime/stpd/librelay.c 2005-11-03 17:06:51.000000000 -0800 @@ -729,11 +729,16 @@ case STP_START: { struct transport_start *t = (struct transport_start *)data; + unsigned int mywait= 0xffffffff; dbug("probe_start() returned %d\n", t->pid); + if (t->pid < 0) cleanup_and_exit(0); else if (target_cmd) kill (target_pid, SIGUSR1); + while(mywait> 0) { + mywait--; + } break; } default: --------------020707070300070805010509--