public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* Instrumenting context switches
@ 2006-11-30  1:35 Perry Cheng
  2006-11-30  1:41 ` Frank Ch. Eigler
  2006-11-30 18:22 ` Martin Hunt
  0 siblings, 2 replies; 9+ messages in thread
From: Perry Cheng @ 2006-11-30  1:35 UTC (permalink / raw)
  To: systemtap

Using some example code, I tried to instrument context switching by adding 
a probe to the method __switch_to.  Some documentation had suggested that 
certain versions of the systemtap could not handle instrumenting 
context_switch.  In the past I have gotten this to work but now the use of 
_stp_printf seems to cause the machine to freeze hard.  If I leave out the 
printing of switchCount but leave in the increment, things work fine.  Is 
this a known problem or a known historical problem and if so what are the 
workarounds?


probe kernel.function("__switch_to")
{
        doSwitchTo(gettimeofday_us(), $prev_p, $next_p);
}

function doSwitchTo(timeus:long, prev:long, next:long)
%{
        _stp_printf("SWITCHCOUNT = %ld\n", switchCount); 
<------------------------------ BAD LINE
        switchCount++;
}%




Perry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Instrumenting context switches
  2006-11-30  1:35 Instrumenting context switches Perry Cheng
@ 2006-11-30  1:41 ` Frank Ch. Eigler
  2006-11-30 18:22 ` Martin Hunt
  1 sibling, 0 replies; 9+ messages in thread
From: Frank Ch. Eigler @ 2006-11-30  1:41 UTC (permalink / raw)
  To: Perry Cheng; +Cc: systemtap


> [...]
>         doSwitchTo(gettimeofday_us(), $prev_p, $next_p);
> [...]

It may be that this problem is due to the recent rewrite of
gettimeofday_us.  That code contains bits like preempt_disable() and
_enable(), even though the equivalent (interrupt disabling) should
already be done within probe context.  In particular, I wonder if
changing the latter to preempt_enable_no_resched() might improve the
situation.

- FChE

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Instrumenting context switches
  2006-11-30  1:35 Instrumenting context switches Perry Cheng
  2006-11-30  1:41 ` Frank Ch. Eigler
@ 2006-11-30 18:22 ` Martin Hunt
  2006-11-30 22:01   ` Perry Cheng
  1 sibling, 1 reply; 9+ messages in thread
From: Martin Hunt @ 2006-11-30 18:22 UTC (permalink / raw)
  To: Perry Cheng; +Cc: systemtap

On Wed, 2006-11-29 at 18:25 -0500, Perry Cheng wrote:

> probe kernel.function("__switch_to")
> {
>         doSwitchTo(gettimeofday_us(), $prev_p, $next_p);
> }
> 
> function doSwitchTo(timeus:long, prev:long, next:long)
> %{
>         _stp_printf("SWITCHCOUNT = %ld\n", switchCount); 
> <------------------------------ BAD LINE
>         switchCount++;
> }%

Obviously the code fragment above is not exactly what you are using to
reproduce the bug. (You can't use keywords as parameter names,
uninitialized switchCount, "}%", etc).  I tried the following and did
not see any problems.  Can you give more details (arch, OS, etc) on how
to reproduce?

%{
	long switchCount = 1000000;
%}

function doSwitchTo (t:long, p:long, n:long) %{
	_stp_printf("SWITCHCOUNT = %ld\n", switchCount); 
	switchCount++;
%}

probe kernel.function("__switch_to")
{
        doSwitchTo(gettimeofday_us(), $prev_p, $next_p);
}



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Instrumenting context switches
  2006-11-30 18:22 ` Martin Hunt
@ 2006-11-30 22:01   ` Perry Cheng
  0 siblings, 0 replies; 9+ messages in thread
From: Perry Cheng @ 2006-11-30 22:01 UTC (permalink / raw)
  To: Martin Hunt, systemtap

The following even simpler program also dies and rules out gettimeofday or 
accessing a global variable as possible causes. The crash happens on both 
an intel-686 and AMD-686 both running a modified version of 2.6.16.  I 
don't know the details of the modifications but they are generally used to 
support real-time features and include the hrt and rt-prio patches.  The 
src is at ftp://linuxpatch.ncsa.uiuc.edu/rt-linux/rhel4u2/R1-iFix1.

If I replace __switch_to with context_switch or finish_task_switch, the 
failure still occurs.  However, if I switch to set_task_comm, then things 
seem ok.

probe kernel.function("__switch_to")
{
        foobar()
}

function foobar()
%{
        _stp_printf("foobar\n");
%}

Perry


systemtap-owner@sourceware.org wrote on 11/30/2006 10:31:10 AM:

> On Wed, 2006-11-29 at 18:25 -0500, Perry Cheng wrote:
> 
> > probe kernel.function("__switch_to")
> > {
> >         doSwitchTo(gettimeofday_us(), $prev_p, $next_p);
> > }
> > 
> > function doSwitchTo(timeus:long, prev:long, next:long)
> > %{
> >         _stp_printf("SWITCHCOUNT = %ld\n", switchCount); 
> > <------------------------------ BAD LINE
> >         switchCount++;
> > }%
> 
> Obviously the code fragment above is not exactly what you are using to
> reproduce the bug. (You can't use keywords as parameter names,
> uninitialized switchCount, "}%", etc).  I tried the following and did
> not see any problems.  Can you give more details (arch, OS, etc) on how
> to reproduce?
> 
> %{
>    long switchCount = 1000000;
> %}
> 
> function doSwitchTo (t:long, p:long, n:long) %{
>    _stp_printf("SWITCHCOUNT = %ld\n", switchCount); 
>    switchCount++;
> %}
> 
> probe kernel.function("__switch_to")
> {
>         doSwitchTo(gettimeofday_us(), $prev_p, $next_p);
> }
> 
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Instrumenting context switches
       [not found] <OFBBDFEA36.CA571100-ON85257237.00006111-85257237.0000676B@mck.us.ray.com>
@ 2006-12-01  3:14 ` Dave Sperry
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Sperry @ 2006-12-01  3:14 UTC (permalink / raw)
  To: perryche; +Cc: systemtap

Hi Perry,
I did not have to do anything to tell stap where to find the symbols.
One thing you can check is that the debug symbols are located where the
src/README file suggests.

I also had to modify tapsets.cxx file to fix some rt Enums 

--- src_orig/tapsets.cxx	2006-11-17 15:35:47.000000000 -0500
+++ src/tapsets.cxx	2006-11-19 19:09:02.000000000 -0500
@@ -4332,13 +4332,13 @@ hrtimer_derived_probe_group::emit_module
 
   s.op->newline() << "for (i=0; i<" << probes.size() << "; i++) {";
   s.op->newline(1) << "struct stap_hrtimer_probe* stp = &
stap_hrtimer_probes [i];";
-  s.op->newline() << "hrtimer_init (& stp->hrtimer, CLOCK_MONOTONIC,
HRTIMER_REL);";
-  s.op->newline() << "stp->hrtimer.function = & enter_hrtimer_probe;";
+  s.op->newline() << "hrtimer_init (& stp->hrtimer, CLOCK_MONOTONIC,
HRTIMER_MODE_REL);";
+  s.op->newline() << "stp->hrtimer.function = (void *)(&
enter_hrtimer_probe);";
   // There is no hrtimer field to identify *this* (i-th) probe handler
   // callback.  So instead we'll deduce it at entry time.
   s.op->newline() << "(void) hrtimer_start (& stp->hrtimer, ";
   emit_interval (s.op);
-  s.op->line() << ", HRTIMER_REL);";
+  s.op->line() << ", HRTIMER_MODE_REL);";
   // Note: no partial failure rollback is needed: hrtimer_start only
   // "fails" if the timer was already active, which cannot be.
   s.op->newline(-1) << "}"; // for loop

The other thing I do when thing behave strangely is flush the systemtap
cache 
"rm -rf /root/.systemtap/cache/*"

Dave


> 
> 
> Hi Dave,
> 
> I was using version 0.4 and have upgraded to version 0.5.11 like you
> (mine's from today though and not 11/20).
> However, the new stap can't find the probe points.  I suspect that it is
> not locating the kernel symbols file.
> Do you know how to get the info out of the old /usr/bin/stap and give it
> to the new /usr/local/bin/stap?
> 
> 
> Perry
> 
> 
> systemtap-owner@sourceware.org wrote on 11/30/2006 05:01:25 PM:
> 
> > Perry,
> >    I had no problem running your systemtap scripts on my AMD-686 SMP box
> > running the same kernel as you list below.
> >
> > I did a stap perry.stp -vvvv -g &>perryFoo.txt and it worked just fine.
> > You can see the log at:
> >
> > http://toomanyprojects.org:2000/outbound/perry/perryFoo.txt
> >
> >
> > The version of systemtap I used is:
> >
> > SystemTap translator/driver (version 0.5.11 built 2006-11-20)
> > (Using Red Hat elfutils 0.124 libraries.)
> > Copyright (C) 2005-2006 Red Hat, Inc. and others
> > This is free software; see the source for copying conditions.
> >
> >
> > Dave
> >
> > >              Perry Cheng
> > >              <perryche@us.ibm.
> > >              com>  To
> > >              Sent by:                  Martin Hunt <hunt@redhat.com>,
> 
> > >              systemtap-owner@s         systemtap@sourceware.org
> > >              ourceware.org  cc
> > >
> > > Subject
> > >              11/30/2006 03:29          Re: Instrumenting context
> switches
> > >              PM
> > >
> > > The following even simpler program also dies and rules out
> gettimeofday or
> > > accessing a global variable as possible causes. The crash happens on
> both
> > > an intel-686 and AMD-686 both running a modified version of 2.6.16.  I
> > > don't know the details of the modifications but they are generally
> used to
> > > support real-time features and include the hrt and rt-prio patches.
> The
> > > src is at ftp://linuxpatch.ncsa.uiuc.edu/rt-linux/rhel4u2/R1-iFix1.
> > >
> > > If I replace __switch_to with context_switch or finish_task_switch,
> the
> > > failure still occurs.  However, if I switch to set_task_comm, then
> things
> > > seem ok.
> > >
> > > probe kernel.function("__switch_to")
> > > {
> > >         foobar()
> > > }
> > >
> > > function foobar()
> > > %{
> > >         _stp_printf("foobar\n");
> > > %}
> > >
> > > Perry
> > >
> > >
> > > systemtap-owner@sourceware.org wrote on 11/30/2006 10:31:10 AM:
> > >
> > > > On Wed, 2006-11-29 at 18:25 -0500, Perry Cheng wrote:
> > > >
> > > > > probe kernel.function("__switch_to")
> > > > > {
> > > > >         doSwitchTo(gettimeofday_us(), $prev_p, $next_p);
> > > > > }
> > > > >
> > > > > function doSwitchTo(timeus:long, prev:long, next:long)
> > > > > %{
> > > > >         _stp_printf("SWITCHCOUNT = %ld\n", switchCount);
> > > > > <------------------------------ BAD LINE
> > > > >         switchCount++;
> > > > > }%
> > > >
> > > > Obviously the code fragment above is not exactly what you are using
> to
> > > > reproduce the bug. (You can't use keywords as parameter names,
> > > > uninitialized switchCount, "}%", etc).  I tried the following and
> did
> > > > not see any problems.  Can you give more details (arch, OS, etc) on
> how
> > > > to reproduce?
> > > >
> > > > %{
> > > >    long switchCount = 1000000;
> > > > %}
> > > >
> > > > function doSwitchTo (t:long, p:long, n:long) %{
> > > >    _stp_printf("SWITCHCOUNT = %ld\n", switchCount);
> > > >    switchCount++;
> > > > %}
> > > >
> > > > probe kernel.function("__switch_to")
> > > > {
> > > >         doSwitchTo(gettimeofday_us(), $prev_p, $next_p);
> > > > }
> > > >
> > > >
> > > >
> > >
> > >
> >
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Instrumenting context switches
  2006-12-01  0:25 ` Dave Sperry
@ 2006-12-01  3:12   ` Perry Cheng
  0 siblings, 0 replies; 9+ messages in thread
From: Perry Cheng @ 2006-12-01  3:12 UTC (permalink / raw)
  To: dave_sperry, systemtap

Hi Dave,

I was using version 0.4 and have upgraded to version 0.5.11 like you 
(mine's from today though and not 11/20). 
However, the new stap can't find the probe points.  I suspect that it is 
not locating the kernel symbols file. 
Do you know how to get the info out of the old /usr/bin/stap and give it 
to the new /usr/local/bin/stap? 


Perry


systemtap-owner@sourceware.org wrote on 11/30/2006 05:01:25 PM:

> Perry,
>    I had no problem running your systemtap scripts on my AMD-686 SMP box
> running the same kernel as you list below. 
> 
> I did a stap perry.stp -vvvv -g &>perryFoo.txt and it worked just fine.
> You can see the log at:
> 
> http://toomanyprojects.org:2000/outbound/perry/perryFoo.txt
> 
> 
> The version of systemtap I used is:
> 
> SystemTap translator/driver (version 0.5.11 built 2006-11-20)
> (Using Red Hat elfutils 0.124 libraries.)
> Copyright (C) 2005-2006 Red Hat, Inc. and others
> This is free software; see the source for copying conditions.
> 
> 
> Dave
> 
> >              Perry Cheng  
> >              <perryche@us.ibm.  
> >              com>  To 
> >              Sent by:                  Martin Hunt <hunt@redhat.com>,  
 
> >              systemtap-owner@s         systemtap@sourceware.org  
> >              ourceware.org  cc 
> >  
> > Subject 
> >              11/30/2006 03:29          Re: Instrumenting context 
switches 
> >              PM  
> >  
> > The following even simpler program also dies and rules out 
gettimeofday or
> > accessing a global variable as possible causes. The crash happens on 
both
> > an intel-686 and AMD-686 both running a modified version of 2.6.16.  I
> > don't know the details of the modifications but they are generally 
used to
> > support real-time features and include the hrt and rt-prio patches. 
The
> > src is at ftp://linuxpatch.ncsa.uiuc.edu/rt-linux/rhel4u2/R1-iFix1.
> > 
> > If I replace __switch_to with context_switch or finish_task_switch, 
the
> > failure still occurs.  However, if I switch to set_task_comm, then 
things
> > seem ok.
> > 
> > probe kernel.function("__switch_to")
> > {
> >         foobar()
> > }
> > 
> > function foobar()
> > %{
> >         _stp_printf("foobar\n");
> > %}
> > 
> > Perry
> > 
> > 
> > systemtap-owner@sourceware.org wrote on 11/30/2006 10:31:10 AM:
> > 
> > > On Wed, 2006-11-29 at 18:25 -0500, Perry Cheng wrote:
> > >
> > > > probe kernel.function("__switch_to")
> > > > {
> > > >         doSwitchTo(gettimeofday_us(), $prev_p, $next_p);
> > > > }
> > > >
> > > > function doSwitchTo(timeus:long, prev:long, next:long)
> > > > %{
> > > >         _stp_printf("SWITCHCOUNT = %ld\n", switchCount);
> > > > <------------------------------ BAD LINE
> > > >         switchCount++;
> > > > }%
> > >
> > > Obviously the code fragment above is not exactly what you are using 
to
> > > reproduce the bug. (You can't use keywords as parameter names,
> > > uninitialized switchCount, "}%", etc).  I tried the following and 
did
> > > not see any problems.  Can you give more details (arch, OS, etc) on 
how
> > > to reproduce?
> > >
> > > %{
> > >    long switchCount = 1000000;
> > > %}
> > >
> > > function doSwitchTo (t:long, p:long, n:long) %{
> > >    _stp_printf("SWITCHCOUNT = %ld\n", switchCount);
> > >    switchCount++;
> > > %}
> > >
> > > probe kernel.function("__switch_to")
> > > {
> > >         doSwitchTo(gettimeofday_us(), $prev_p, $next_p);
> > > }
> > >
> > >
> > >
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Instrumenting context switches
       [not found] <OFB78608CD.65116F14-ON85257236.0073959D-85257236.00738AAF@mck.us.ray.com>
@ 2006-12-01  0:25 ` Dave Sperry
  2006-12-01  3:12   ` Perry Cheng
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Sperry @ 2006-12-01  0:25 UTC (permalink / raw)
  To: perryche; +Cc: systemtap

Perry,
   I had no problem running your systemtap scripts on my AMD-686 SMP box
running the same kernel as you list below. 

I did a stap perry.stp -vvvv -g &>perryFoo.txt and it worked just fine.
You can see the log at:

http://toomanyprojects.org:2000/outbound/perry/perryFoo.txt


The version of systemtap I used is:

SystemTap translator/driver (version 0.5.11 built 2006-11-20)
(Using Red Hat elfutils 0.124 libraries.)
Copyright (C) 2005-2006 Red Hat, Inc. and others
This is free software; see the source for copying conditions.


Dave

>              Perry Cheng                                                   
>              <perryche@us.ibm.                                             
>              com>                                                       To 
>              Sent by:                  Martin Hunt <hunt@redhat.com>,      
>              systemtap-owner@s         systemtap@sourceware.org            
>              ourceware.org                                              cc 
>                                                                            
>                                                                    Subject 
>              11/30/2006 03:29          Re: Instrumenting context switches  
>              PM                                                            
>                                                                            
> The following even simpler program also dies and rules out gettimeofday or
> accessing a global variable as possible causes. The crash happens on both
> an intel-686 and AMD-686 both running a modified version of 2.6.16.  I
> don't know the details of the modifications but they are generally used to
> support real-time features and include the hrt and rt-prio patches.  The
> src is at ftp://linuxpatch.ncsa.uiuc.edu/rt-linux/rhel4u2/R1-iFix1.
> 
> If I replace __switch_to with context_switch or finish_task_switch, the
> failure still occurs.  However, if I switch to set_task_comm, then things
> seem ok.
> 
> probe kernel.function("__switch_to")
> {
>         foobar()
> }
> 
> function foobar()
> %{
>         _stp_printf("foobar\n");
> %}
> 
> Perry
> 
> 
> systemtap-owner@sourceware.org wrote on 11/30/2006 10:31:10 AM:
> 
> > On Wed, 2006-11-29 at 18:25 -0500, Perry Cheng wrote:
> >
> > > probe kernel.function("__switch_to")
> > > {
> > >         doSwitchTo(gettimeofday_us(), $prev_p, $next_p);
> > > }
> > >
> > > function doSwitchTo(timeus:long, prev:long, next:long)
> > > %{
> > >         _stp_printf("SWITCHCOUNT = %ld\n", switchCount);
> > > <------------------------------ BAD LINE
> > >         switchCount++;
> > > }%
> >
> > Obviously the code fragment above is not exactly what you are using to
> > reproduce the bug. (You can't use keywords as parameter names,
> > uninitialized switchCount, "}%", etc).  I tried the following and did
> > not see any problems.  Can you give more details (arch, OS, etc) on how
> > to reproduce?
> >
> > %{
> >    long switchCount = 1000000;
> > %}
> >
> > function doSwitchTo (t:long, p:long, n:long) %{
> >    _stp_printf("SWITCHCOUNT = %ld\n", switchCount);
> >    switchCount++;
> > %}
> >
> > probe kernel.function("__switch_to")
> > {
> >         doSwitchTo(gettimeofday_us(), $prev_p, $next_p);
> > }
> >
> >
> >
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Instrumenting context switches
@ 2006-11-30 18:10 Stone, Joshua I
  0 siblings, 0 replies; 9+ messages in thread
From: Stone, Joshua I @ 2006-11-30 18:10 UTC (permalink / raw)
  To: Frank Ch. Eigler, Perry Cheng; +Cc: systemtap

On Wednesday, November 29, 2006 4:47 PM, Frank Ch. Eigler wrote:
> It may be that this problem is due to the recent rewrite of
> gettimeofday_us.  That code contains bits like preempt_disable() and
> _enable(), even though the equivalent (interrupt disabling) should
> already be done within probe context.  In particular, I wonder if
> changing the latter to preempt_enable_no_resched() might improve the
> situation.

I'll take a look at whether some of the locking in the time subsystem
can go away, under the assumption that probes are always
interrupt-disabled.  I took a very conservative approach, so I'm sure
some of that is overkill.  However, the preempt_enable vs. _no_resched
shouldn't really cause a problem, because preempt_schedule checks for
irqs_disabled() anyway.

I think this is a red herring for Perry though.  He mentions that simply
taking out his _stp_printf statement makes things work, so the
gettimeofday_us is still being called in the working case.

A wild guess -- the stack is transitioning in __switch_to, so might it
be that _stp_printf is running out-of-bounds somehow?


Josh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Instrumenting context switches
@ 2006-11-30 15:31 Stone, Joshua I
  0 siblings, 0 replies; 9+ messages in thread
From: Stone, Joshua I @ 2006-11-30 15:31 UTC (permalink / raw)
  To: Perry Cheng, systemtap

On Wednesday, November 29, 2006 3:26 PM, Perry Cheng wrote:
> Using some example code, I tried to instrument context switching by
> adding a probe to the method __switch_to.  Some documentation had
> suggested that certain versions of the systemtap could not handle
> instrumenting context_switch.

The problems with resolving the context_switch function have been fixed
for a while, but there's still sometimes issues accessing the parameters
of inline functions.  See bugzilla #1155.
http://sources.redhat.com/bugzilla/show_bug.cgi?id=1155

Using __switch_to is not an option on all platforms -- on IA64 it is a
macro, and on x86_64 it is blacklisted (bz2086).


Josh

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-12-01  0:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-11-30  1:35 Instrumenting context switches Perry Cheng
2006-11-30  1:41 ` Frank Ch. Eigler
2006-11-30 18:22 ` Martin Hunt
2006-11-30 22:01   ` Perry Cheng
2006-11-30 15:31 Stone, Joshua I
2006-11-30 18:10 Stone, Joshua I
     [not found] <OFB78608CD.65116F14-ON85257236.0073959D-85257236.00738AAF@mck.us.ray.com>
2006-12-01  0:25 ` Dave Sperry
2006-12-01  3:12   ` Perry Cheng
     [not found] <OFBBDFEA36.CA571100-ON85257237.00006111-85257237.0000676B@mck.us.ray.com>
2006-12-01  3:14 ` Dave Sperry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).