public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded
@ 2008-04-04 11:19 ananth at in dot ibm dot com
  2008-04-04 11:46 ` [Bug runtime/6030] " fche at redhat dot com
                   ` (14 more replies)
  0 siblings, 15 replies; 20+ messages in thread
From: ananth at in dot ibm dot com @ 2008-04-04 11:19 UTC (permalink / raw)
  To: systemtap

A normal ^c to terminate a stap script induces the following signals:
sender sh signal SIGCHLD receiver stap
sender sshd signal SIGINT receiver stapio
sender sshd signal SIGINT receiver staprun
sender sshd signal SIGINT receiver stap
sender stapio signal SIGUSR2 receiver stapio
sender stapio signal SIGCHLD receiver staprun
sender staprun signal 0x20 receiver staprun
sender staprun signal SIGCHLD receiver stap
sender rm signal SIGCHLD receiver stap
sender stap signal SIGCHLD receiver bash

However, kill -9 on stap and then staprun, leads to an orphaned module
kill -9 to <stap_pid>
sender bash signal SIGKILL receiver stap
sender stap signal SIGCHLD receiver bash

then kill -9 to <staprun_pid>
sender bash signal SIGKILL receiver staprun
sender staprun signal SIGCHLD receiver init

then kill -9 to <stapio_pid>
sender bash signal SIGKILL receiver stapio
sender stapio signal SIGCHLD receiver init

The module is left unloaded. Probes from the module are unregistered though.

However, if we first do a kill -9 <stapio_pid>, things are fine:
sender bash signal SIGKILL receiver stapio
sender stapio signal SIGCHLD receiver staprun
sender staprun signal 0x20 receiver staprun
sender staprun signal SIGCHLD receiver stap
sender rm signal SIGCHLD receiver stap
sender stap signal SIGCHLD receiver bash

Similar issues are seen with a kill -SIGINT to <stap_pid> or <staprun_pid>

There needs to be a clean way of killing all three process along with module
unload when one of the processes is killed.

-- 
           Summary: stap, staprun, stapio interaction quirks: stale module
                    left unloaded
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
        AssignedTo: systemtap at sources dot redhat dot com
        ReportedBy: ananth at in dot ibm dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
@ 2008-04-04 11:46 ` fche at redhat dot com
  2008-04-04 11:47   ` Ananth N Mavinakayanahalli
  2008-04-04 14:20 ` ananth at in dot ibm dot com
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 20+ messages in thread
From: fche at redhat dot com @ 2008-04-04 11:46 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2008-04-04 11:18 -------
We cannot reasonably recover from a kill -9 on staprun.  That is the only
privileged process that can unload the module.  It may have been hand-started,
leaving none of our code available to watch over it and restart it.  Teaching
stap to re-fork staprun another time is perhaps possible, but in your
"kill -9 stap staprun" scenario, that wouldn't help either.

Note when the module detects its user-space partner process going away, it
carries out shutdown as far as it can: deregisters probes, frees memory.
A module cannot unload itself AFAIK.  So while it is an orphan, it merely
sits in memory, and probably blocks a repeat invocation.  It's not that bad.

Perhaps it's time to reconsider including a script such as
http://sourceware.org/ml/systemtap/2008-q1/msg00051.html
in the distribution.  Or a cron-driven systemtap module cleaner/unloader.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale  module left unloaded
  2008-04-04 11:46 ` [Bug runtime/6030] " fche at redhat dot com
@ 2008-04-04 11:47   ` Ananth N Mavinakayanahalli
  2008-04-04 15:43     ` Martin Hunt
  0 siblings, 1 reply; 20+ messages in thread
From: Ananth N Mavinakayanahalli @ 2008-04-04 11:47 UTC (permalink / raw)
  To: sourceware-bugzilla; +Cc: systemtap

> We cannot reasonably recover from a kill -9 on staprun.  That is the only
> privileged process that can unload the module.  It may have been hand-started,
> leaving none of our code available to watch over it and restart it.  Teaching
> stap to re-fork staprun another time is perhaps possible, but in your
> "kill -9 stap staprun" scenario, that wouldn't help either.

Agreed. SIGKILL is a box case. SIGINT/SIGHUP can be handled though.

> Perhaps it's time to reconsider including a script such as
> http://sourceware.org/ml/systemtap/2008-q1/msg00051.html
> in the distribution.  Or a cron-driven systemtap module cleaner/unloader.

Yeah. That'd work too.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
  2008-04-04 11:46 ` [Bug runtime/6030] " fche at redhat dot com
@ 2008-04-04 14:20 ` ananth at in dot ibm dot com
  2008-04-04 21:10 ` hunt at redhat dot com
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: ananth at in dot ibm dot com @ 2008-04-04 14:20 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From ananth at in dot ibm dot com  2008-04-04 11:46 -------
Subject:  stap, staprun, stapio interaction quirks: stale
	module left unloaded

> We cannot reasonably recover from a kill -9 on staprun.  That is the only
> privileged process that can unload the module.  It may have been hand-started,
> leaving none of our code available to watch over it and restart it.  Teaching
> stap to re-fork staprun another time is perhaps possible, but in your
> "kill -9 stap staprun" scenario, that wouldn't help either.

Agreed. SIGKILL is a box case. SIGINT/SIGHUP can be handled though.

> Perhaps it's time to reconsider including a script such as
> http://sourceware.org/ml/systemtap/2008-q1/msg00051.html
> in the distribution.  Or a cron-driven systemtap module cleaner/unloader.

Yeah. That'd work too.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug runtime/6030] stap, staprun, stapio interaction quirks:  stale  module left unloaded
  2008-04-04 11:47   ` Ananth N Mavinakayanahalli
@ 2008-04-04 15:43     ` Martin Hunt
  0 siblings, 0 replies; 20+ messages in thread
From: Martin Hunt @ 2008-04-04 15:43 UTC (permalink / raw)
  To: ananth; +Cc: sourceware-bugzilla, systemtap


On Fri, 2008-04-04 at 17:16 +0530, Ananth N Mavinakayanahalli wrote:
> > We cannot reasonably recover from a kill -9 on staprun.  That is the only
> > privileged process that can unload the module.  It may have been hand-started,
> > leaving none of our code available to watch over it and restart it.  Teaching
> > stap to re-fork staprun another time is perhaps possible, but in your
> > "kill -9 stap staprun" scenario, that wouldn't help either.
> 
> Agreed. SIGKILL is a box case. SIGINT/SIGHUP can be handled though.
> 
> > Perhaps it's time to reconsider including a script such as
> > http://sourceware.org/ml/systemtap/2008-q1/msg00051.html
> > in the distribution.  Or a cron-driven systemtap module cleaner/unloader.
> 
> Yeah. That'd work too.

cron wouldn't work well. If you wanted to run the same script you would
not be able to until cron had triggered the cleanup.  As a user, I find
stuff like that extremely annoying.

I still prefer the solution I described here
http://sources.redhat.com/bugzilla/show_bug.cgi?id=5716

"I think a simpler, more secure approach would be to simply separate the
module removal and build it as a standalone suid program.  It would
check the user was in stapusr or stapdev, verify the module that was
requested to be unloaded was a systemtap module, then unload it.  That
allows staprun to do some quick setup, load the module, then drop all
capabilities (if we use them), fork stapio, and exit. Stapio would exec
the module unloader when it got ^C or an exit message from the module."

So staprun goes away and cannot be killed. SIGKILL on stapio would still
leave the module installed, of course. So then you would run the
unloader by hand. What would we call it? staprm?  Or it could just be
staprun with a flag to unload instead of load.

We could add an option to remove all systemtap modules.

Is there a simple way to tell which systemtap scripts are running and
who owns them?  We could easily add that too.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
  2008-04-04 11:46 ` [Bug runtime/6030] " fche at redhat dot com
  2008-04-04 14:20 ` ananth at in dot ibm dot com
@ 2008-04-04 21:10 ` hunt at redhat dot com
  2008-04-08 14:13 ` ananth at in dot ibm dot com
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: hunt at redhat dot com @ 2008-04-04 21:10 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From hunt at redhat dot com  2008-04-04 15:43 -------
Subject: Re:  stap, staprun, stapio interaction quirks:
	stale 	module left unloaded


On Fri, 2008-04-04 at 17:16 +0530, Ananth N Mavinakayanahalli wrote:
> > We cannot reasonably recover from a kill -9 on staprun.  That is the only
> > privileged process that can unload the module.  It may have been hand-started,
> > leaving none of our code available to watch over it and restart it.  Teaching
> > stap to re-fork staprun another time is perhaps possible, but in your
> > "kill -9 stap staprun" scenario, that wouldn't help either.
> 
> Agreed. SIGKILL is a box case. SIGINT/SIGHUP can be handled though.
> 
> > Perhaps it's time to reconsider including a script such as
> > http://sourceware.org/ml/systemtap/2008-q1/msg00051.html
> > in the distribution.  Or a cron-driven systemtap module cleaner/unloader.
> 
> Yeah. That'd work too.

cron wouldn't work well. If you wanted to run the same script you would
not be able to until cron had triggered the cleanup.  As a user, I find
stuff like that extremely annoying.

I still prefer the solution I described here
http://sources.redhat.com/bugzilla/show_bug.cgi?id=5716

"I think a simpler, more secure approach would be to simply separate the
module removal and build it as a standalone suid program.  It would
check the user was in stapusr or stapdev, verify the module that was
requested to be unloaded was a systemtap module, then unload it.  That
allows staprun to do some quick setup, load the module, then drop all
capabilities (if we use them), fork stapio, and exit. Stapio would exec
the module unloader when it got ^C or an exit message from the module."

So staprun goes away and cannot be killed. SIGKILL on stapio would still
leave the module installed, of course. So then you would run the
unloader by hand. What would we call it? staprm?  Or it could just be
staprun with a flag to unload instead of load.

We could add an option to remove all systemtap modules.

Is there a simple way to tell which systemtap scripts are running and
who owns them?  We could easily add that too.






-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
                   ` (2 preceding siblings ...)
  2008-04-04 21:10 ` hunt at redhat dot com
@ 2008-04-08 14:13 ` ananth at in dot ibm dot com
  2008-04-08 16:19 ` fche at redhat dot com
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: ananth at in dot ibm dot com @ 2008-04-08 14:13 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From ananth at in dot ibm dot com  2008-04-08 12:57 -------
Subject: Re:  stap, staprun, stapio interaction quirks:
	stale module left unloaded

On Fri, Apr 04, 2008 at 03:43:15PM -0000, hunt at redhat dot com wrote:
> 
> "I think a simpler, more secure approach would be to simply separate the
> module removal and build it as a standalone suid program.  It would
> check the user was in stapusr or stapdev, verify the module that was
> requested to be unloaded was a systemtap module, then unload it.  That
> allows staprun to do some quick setup, load the module, then drop all
> capabilities (if we use them), fork stapio, and exit. Stapio would exec
> the module unloader when it got ^C or an exit message from the module."
> 
> So staprun goes away and cannot be killed. SIGKILL on stapio would still
> leave the module installed, of course. So then you would run the
> unloader by hand. What would we call it? staprm?  Or it could just be
> staprun with a flag to unload instead of load.

Yes, this sounds like a simpler solution.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
                   ` (3 preceding siblings ...)
  2008-04-08 14:13 ` ananth at in dot ibm dot com
@ 2008-04-08 16:19 ` fche at redhat dot com
  2008-07-07 20:32 ` anithra at linux dot vnet dot ibm dot com
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: fche at redhat dot com @ 2008-04-08 16:19 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2008-04-08 14:12 -------
(In reply to comment #3)
> cron wouldn't work well. If you wanted to run the same script you would
> not be able to until cron had triggered the cleanup.  As a user, I find
> stuff like that extremely annoying.

Remember, the scenario here is that the user killed his own stap*
processes with kill -9.  Such extreme annoyance is self-induced.

I recall we came across some runes that perform pseudo-renaming of a
compiled module before insmod time.  If that can be made to work
(as a part of staprun presumably), then reexecuting the same probe
several times concurrently would be possible.

Another simple possibility is for staprun to attempt to *unload* a
module with the same name that it's about to load - to clean up an
orphaned zombie.  (It'd use sys_delete_module(2) with O_NONBLOCK.)


> I still prefer the solution I described here
> http://sources.redhat.com/bugzilla/show_bug.cgi?id=5716
> 
> "I think a simpler, more secure approach would be to simply separate the
> module removal and build it as a standalone suid program.   [...]

Perhaps, but is it useful & necessary enough to overcome the natural
skepticism toward adding anything setuid?


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
                   ` (4 preceding siblings ...)
  2008-04-08 16:19 ` fche at redhat dot com
@ 2008-07-07 20:32 ` anithra at linux dot vnet dot ibm dot com
  2008-07-08 16:37 ` fche at redhat dot com
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: anithra at linux dot vnet dot ibm dot com @ 2008-07-07 20:32 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From anithra at linux dot vnet dot ibm dot com  2008-07-07 20:31 -------
(In reply to comment #5)
> (In reply to comment #3)
> > cron wouldn't work well. If you wanted to run the same script you would
> > not be able to until cron had triggered the cleanup.  As a user, I find
> > stuff like that extremely annoying.
> 
> Remember, the scenario here is that the user killed his own stap*
> processes with kill -9.  Such extreme annoyance is self-induced.
> 
> I recall we came across some runes that perform pseudo-renaming of a
> compiled module before insmod time.  If that can be made to work
> (as a part of staprun presumably), then reexecuting the same probe
> several times concurrently would be possible.
> 
> Another simple possibility is for staprun to attempt to *unload* a
> module with the same name that it's about to load - to clean up an
> orphaned zombie.  (It'd use sys_delete_module(2) with O_NONBLOCK.)
> 
> 
> > I still prefer the solution I described here
> > http://sources.redhat.com/bugzilla/show_bug.cgi?id=5716
> > 
> > "I think a simpler, more secure approach would be to simply separate the
> > module removal and build it as a standalone suid program.   [...]
> 
> Perhaps, but is it useful & necessary enough to overcome the natural
> skepticism toward adding anything setuid?
> 

Due to the above sequence of signals when an user attempts to kill the stap
process by sending a signal (not counting -9), the stap process terminates when
pending_interrrupts > 1 , leaving the stapio as a zombie. I'm not sure why it
was designed this way, but i'm facing problems when terminating a stap process
from the stapgui due to this. 
The stapgui execs stap process' upon a user's request but it is not possible to
terminate these processes as we are not aware of the pid of the stapio process.
We only have the pid of the stap process but terminating that would leave the
stapio running as a zombie and the module unloaded. 
This can be fixed by passing the signal to the process group (staprun/stapio)
when pending_interrupts > 1. Thus when stap terminates, so will the staprun/stapio. 
Also, do we need to wait for pending_interrupts to be > 1 even after the script
has started execution (i.e, pass 5)?. 


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
                   ` (5 preceding siblings ...)
  2008-07-07 20:32 ` anithra at linux dot vnet dot ibm dot com
@ 2008-07-08 16:37 ` fche at redhat dot com
  2008-07-08 19:10 ` anithra at linux dot vnet dot ibm dot com
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: fche at redhat dot com @ 2008-07-08 16:37 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2008-07-08 16:36 -------
One way to prevent a zombied stapio/staprun is to have stap
issue a
    signal (SIGCHLD, SIG_IGN)



-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
                   ` (6 preceding siblings ...)
  2008-07-08 16:37 ` fche at redhat dot com
@ 2008-07-08 19:10 ` anithra at linux dot vnet dot ibm dot com
  2008-07-10 16:44 ` anithra at linux dot vnet dot ibm dot com
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: anithra at linux dot vnet dot ibm dot com @ 2008-07-08 19:10 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From anithra at linux dot vnet dot ibm dot com  2008-07-08 19:09 -------
(In reply to comment #7)
> One way to prevent a zombied stapio/staprun is to have stap
> issue a
>     signal (SIGCHLD, SIG_IGN)
> 
> 

Do you think we  need to wait for two interrupts (i.e,check for
pending_interrupts>1) even in pass 5?. 

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
                   ` (7 preceding siblings ...)
  2008-07-08 19:10 ` anithra at linux dot vnet dot ibm dot com
@ 2008-07-10 16:44 ` anithra at linux dot vnet dot ibm dot com
  2008-07-10 17:05 ` fche at redhat dot com
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: anithra at linux dot vnet dot ibm dot com @ 2008-07-10 16:44 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From anithra at linux dot vnet dot ibm dot com  2008-07-10 16:43 -------
The signal(SIG_CHLD, SIG_IGN) would result in the STAP process ignoring the
SIG_CHLD signal. Wont this result in the stap process running forever, waiting
for an explicit signal?. In case of the stapgui i want to terminate the stap
process, and along with it the stapio process. 

Eg:
root     28391 24712  3 20:51 pts/1    00:00:00 stap  /counter.stp
root     28595 28391 11 20:51 pts/1   
00:00:00/usr/local/libexec/systemtap/stapio
/tmp/stap5z1Ybp/stap_eab4dea519753fb568fa80466b98c4e3_240.ko

Now if i 'kill -SIGTERM 28391', that stops the stap process, but leaves 28595
running as the signal is not passed to the child. 
If any tool/application wishes to invoke systemtap through a program or as part
of a script , the script would only have the process id of the immediate child
forked, in this case the stap process. It would not be able to terminate the
process & unload the module correctly. This is the problem i have with the stapgui. 

I was planning to change stap's  signal handler  to pass the signal to the
process group before exiting. Is there any reason why we would not want to do
that?. 


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
                   ` (8 preceding siblings ...)
  2008-07-10 16:44 ` anithra at linux dot vnet dot ibm dot com
@ 2008-07-10 17:05 ` fche at redhat dot com
  2008-07-17 13:28   ` anithra
  2008-07-10 17:45 ` anithra at linux dot vnet dot ibm dot com
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 20+ messages in thread
From: fche at redhat dot com @ 2008-07-10 17:05 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2008-07-10 17:04 -------
(In reply to comment #9)
> The signal(SIG_CHLD, SIG_IGN) would result in the STAP process ignoring the
> SIG_CHLD signal. Wont this result in the stap process running forever, waiting
> for an explicit signal?. In case of the stapgui i want to terminate the stap
> process, and along with it the stapio process. 

Could you use   kill (SIGTERM, - stap_pid)  so you broadcast the signal
to the process group?

> I was planning to change stap's  signal handler  to pass the signal to the
> process group before exiting. Is there any reason why we would not want to do
> that?. 

I'm not sure that is legal from within a signal handler proper, but as a
part of stap's exit, one could do a   kill (SIGTERM, 0)


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
                   ` (9 preceding siblings ...)
  2008-07-10 17:05 ` fche at redhat dot com
@ 2008-07-10 17:45 ` anithra at linux dot vnet dot ibm dot com
  2008-07-10 18:00 ` fche at redhat dot com
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: anithra at linux dot vnet dot ibm dot com @ 2008-07-10 17:45 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From anithra at linux dot vnet dot ibm dot com  2008-07-10 17:44 -------
(In reply to comment #10)

> Could you use   kill (SIGTERM, - stap_pid)  so you broadcast the signal
> to the process group?

 I tried with kill(SIGTERM,0) and pkill SIGTERM -g getpgid(0), both seem to work
fine. I was not sure if there was any design reason behind not passing on the
signals. Your reply answers that question :). 
 There is a second part to this problem, in that i'm having to kill the stap
process twice from my program to exit. This is due to the pending_interrupts > 1
check. This is not really a major bother but it would be nice if we didnt have
to send two signals. Do you think it would be ok to remove this check
(pending_interrupts > 1) if the execution is currently in pass 5?. It should be
straightforward to modify the code to do so, but again, is there any reason why
we should retain this check after the translation phases are over?. 

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
                   ` (10 preceding siblings ...)
  2008-07-10 17:45 ` anithra at linux dot vnet dot ibm dot com
@ 2008-07-10 18:00 ` fche at redhat dot com
  2008-07-17 18:55 ` fche at redhat dot com
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 20+ messages in thread
From: fche at redhat dot com @ 2008-07-10 18:00 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2008-07-10 18:00 -------
> There is a second part to this problem, in that i'm having to kill the stap
> process twice from my program to exit.

This should only occur if the first signal didn't get through to the whole
process group.  If stapio received the first signal, it should shut down in
a timely manner, and let stap shut down cleanly too within a few seconds.

The second signal should only be necessary if something broke with handling the
first one - it's sort of a user-level patience timeout measure that should not be
necessary for mechanical use.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug runtime/6030] stap, staprun, stapio interaction quirks:  stale module left unloaded
  2008-07-10 17:05 ` fche at redhat dot com
@ 2008-07-17 13:28   ` anithra
  2008-07-17 17:48     ` Frank Ch. Eigler
  0 siblings, 1 reply; 20+ messages in thread
From: anithra @ 2008-07-17 13:28 UTC (permalink / raw)
  To: systemtap

fche at redhat dot com wrote:
> ------- Additional Comments From fche at redhat dot com  2008-07-10 17:04 -------
> (In reply to comment #9)
>   
>> The signal(SIG_CHLD, SIG_IGN) would result in the STAP process ignoring the
>> SIG_CHLD signal. Wont this result in the stap process running forever, waiting
>> for an explicit signal?. In case of the stapgui i want to terminate the stap
>> process, and along with it the stapio process. 
>>     
>
> Could you use   kill (SIGTERM, - stap_pid)  so you broadcast the signal
> to the process group?
>
>   
Hi Frank,

kill(SIGTERM, -stap_pid) solves the problem from the StapGUI point of 
view. I'm able to terminate both the stap& stapio process and thereby 
unload the module.
But the behaviour of the stap process still doesn't seem correct. If a 
signal is sent to the stap process it terminates leaving stapio running 
as a zombie and the module still loaded.
I'm attaching a patch that will solve the problem. Here i'm just 
passing  the signal received by the stap process to the process group, 
so that stapio can unload the module and terminate.

Regards,
Anithra

---
 main.cxx |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: systemtap/main.cxx
===================================================================
--- systemtap.orig/main.cxx
+++ systemtap/main.cxx
@@ -260,7 +260,7 @@ printscript(systemtap_session& s, ostrea
 int pending_interrupts;

 extern "C"
-void handle_interrupt (int /* sig */)
+void handle_interrupt (int  sig)
 {
   pending_interrupts ++;
   if (pending_interrupts > 1) // XXX: should be configurable? time-based?
@@ -268,6 +268,7 @@ void handle_interrupt (int /* sig */)
       char msg[] = "Too many interrupts received, exiting.\n";
       int rc = write (2, msg, sizeof(msg)-1);
       if (rc) {/* Do nothing; we don't care if our last gasp went out. 
*/ ;}
+      kill(0,sig);
       _exit (1);
    }
 }


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug runtime/6030] stap, staprun, stapio interaction quirks:  stale module left unloaded
  2008-07-17 13:28   ` anithra
@ 2008-07-17 17:48     ` Frank Ch. Eigler
  0 siblings, 0 replies; 20+ messages in thread
From: Frank Ch. Eigler @ 2008-07-17 17:48 UTC (permalink / raw)
  To: anithra; +Cc: systemtap

anithra <anithra@linux.vnet.ibm.com> writes:

> [...]  kill(SIGTERM, -stap_pid) solves the problem from the StapGUI
> point of qview. I'm able to terminate both the stap& stapio process
> and thereby unload the module.

Good.

> But the behaviour of the stap process still doesn't seem correct. If
> a signal is sent to the stap process it terminates leaving stapio
> running as a zombie and the module still loaded.

I still don't quite see how this scenario would happen.  If stapio is
a zombie, then this means that it has exited.  As a part of exiting,
stapio attempts to exec "staprun -d MODULE" to unload the module.  If
a module is left behind loaded, something other than stap process's
own signal processing must have been involved.

(By the way, there is no such thing as "running as a zombie".  Zombie
processes are unsightly but otherwise consume essentially no
resources.)

- FChE

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
                   ` (11 preceding siblings ...)
  2008-07-10 18:00 ` fche at redhat dot com
@ 2008-07-17 18:55 ` fche at redhat dot com
  2008-07-23 14:41 ` anithra at linux dot vnet dot ibm dot com
  2008-07-23 14:49 ` fche at redhat dot com
  14 siblings, 0 replies; 20+ messages in thread
From: fche at redhat dot com @ 2008-07-17 18:55 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2008-07-17 18:54 -------
Please check out the effect of commit 82737be.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
                   ` (12 preceding siblings ...)
  2008-07-17 18:55 ` fche at redhat dot com
@ 2008-07-23 14:41 ` anithra at linux dot vnet dot ibm dot com
  2008-07-23 14:49 ` fche at redhat dot com
  14 siblings, 0 replies; 20+ messages in thread
From: anithra at linux dot vnet dot ibm dot com @ 2008-07-23 14:41 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From anithra at linux dot vnet dot ibm dot com  2008-07-23 14:40 -------
(In reply to comment #13)
> Please check out the effect of commit 82737be.

Yes, That fixes it.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug runtime/6030] stap, staprun, stapio interaction quirks: stale module left unloaded
  2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
                   ` (13 preceding siblings ...)
  2008-07-23 14:41 ` anithra at linux dot vnet dot ibm dot com
@ 2008-07-23 14:49 ` fche at redhat dot com
  14 siblings, 0 replies; 20+ messages in thread
From: fche at redhat dot com @ 2008-07-23 14:49 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2008-07-23 14:48 -------
Thanks for testing.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=6030

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-07-23 14:49 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-04 11:19 [Bug runtime/6030] New: stap, staprun, stapio interaction quirks: stale module left unloaded ananth at in dot ibm dot com
2008-04-04 11:46 ` [Bug runtime/6030] " fche at redhat dot com
2008-04-04 11:47   ` Ananth N Mavinakayanahalli
2008-04-04 15:43     ` Martin Hunt
2008-04-04 14:20 ` ananth at in dot ibm dot com
2008-04-04 21:10 ` hunt at redhat dot com
2008-04-08 14:13 ` ananth at in dot ibm dot com
2008-04-08 16:19 ` fche at redhat dot com
2008-07-07 20:32 ` anithra at linux dot vnet dot ibm dot com
2008-07-08 16:37 ` fche at redhat dot com
2008-07-08 19:10 ` anithra at linux dot vnet dot ibm dot com
2008-07-10 16:44 ` anithra at linux dot vnet dot ibm dot com
2008-07-10 17:05 ` fche at redhat dot com
2008-07-17 13:28   ` anithra
2008-07-17 17:48     ` Frank Ch. Eigler
2008-07-10 17:45 ` anithra at linux dot vnet dot ibm dot com
2008-07-10 18:00 ` fche at redhat dot com
2008-07-17 18:55 ` fche at redhat dot com
2008-07-23 14:41 ` anithra at linux dot vnet dot ibm dot com
2008-07-23 14:49 ` fche at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).