* [Bug runtime/10575] occasional stapio hangs for -c CMD
2009-08-29 20:27 [Bug runtime/10575] New: occasional stapio hangs for -c CMD fche at redhat dot com
@ 2009-08-31 17:03 ` fche at redhat dot com
2009-08-31 17:40 ` jistone at redhat dot com
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: fche at redhat dot com @ 2009-08-31 17:03 UTC (permalink / raw)
To: systemtap
------- Additional Comments From fche at redhat dot com 2009-08-31 17:03 -------
In one scenario, the initial SIGUSR1 sent to the target_cmd-executing
stapio process appears to be lost (either not received, or sent before
the child program was listening for it, or perhaps not sent at all?!).
--
http://sourceware.org/bugzilla/show_bug.cgi?id=10575
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug runtime/10575] occasional stapio hangs for -c CMD
2009-08-29 20:27 [Bug runtime/10575] New: occasional stapio hangs for -c CMD fche at redhat dot com
2009-08-31 17:03 ` [Bug runtime/10575] " fche at redhat dot com
@ 2009-08-31 17:40 ` jistone at redhat dot com
2009-09-08 9:51 ` mjw at redhat dot com
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: jistone at redhat dot com @ 2009-08-31 17:40 UTC (permalink / raw)
To: systemtap
------- Additional Comments From jistone at redhat dot com 2009-08-31 17:40 -------
(In reply to comment #1)
> In one scenario, the initial SIGUSR1 sent to the target_cmd-executing
> stapio process appears to be lost (either not received, or sent before
> the child program was listening for it, or perhaps not sent at all?!).
Do your scripts have lots of output? It could be related to #10189, where
STP_START gets lost in the overflow...
We also fork the child process before setting up signals, so we wouldn't see the
SIGCHLD if the it died to soon (i.e. before the SIGUSR1/exec stuff, but that
would be an abnormal termination). The fix here is to prepare for SIGCHLD
before starting the child.
Another race I see is if the main process sent the SIGUSR1 before the child had
setup its handler -- this would cause the child to abort. We should get a
SIGCHLD in this case though, so while not desirable, it wouldn't cause your
hang. We should probably set SIGUSR1 blocked before forking anyway.
There's a tighter race between the child's calls to sigaction-ignore-SIGUSR1 and
then pause -- the signal could be lost in-between. I believe sigsuspend would
handle this more atomically.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=10575
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug runtime/10575] occasional stapio hangs for -c CMD
2009-08-29 20:27 [Bug runtime/10575] New: occasional stapio hangs for -c CMD fche at redhat dot com
2009-08-31 17:03 ` [Bug runtime/10575] " fche at redhat dot com
2009-08-31 17:40 ` jistone at redhat dot com
@ 2009-09-08 9:51 ` mjw at redhat dot com
2009-09-08 9:54 ` mjw at redhat dot com
2009-10-13 14:06 ` dsmith at redhat dot com
4 siblings, 0 replies; 6+ messages in thread
From: mjw at redhat dot com @ 2009-09-08 9:51 UTC (permalink / raw)
To: systemtap
------- Additional Comments From mjw at redhat dot com 2009-09-08 09:51 -------
I am seeing the opposite, -c doesn't hang, but seems to fail to see the process
run at all. This seems to be caused by the workaround in mainloop.c for PR6964.
Maybe related, maybe not?
--
http://sourceware.org/bugzilla/show_bug.cgi?id=10575
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug runtime/10575] occasional stapio hangs for -c CMD
2009-08-29 20:27 [Bug runtime/10575] New: occasional stapio hangs for -c CMD fche at redhat dot com
` (2 preceding siblings ...)
2009-09-08 9:51 ` mjw at redhat dot com
@ 2009-09-08 9:54 ` mjw at redhat dot com
2009-10-13 14:06 ` dsmith at redhat dot com
4 siblings, 0 replies; 6+ messages in thread
From: mjw at redhat dot com @ 2009-09-08 9:54 UTC (permalink / raw)
To: systemtap
------- Additional Comments From mjw at redhat dot com 2009-09-08 09:54 -------
(In reply to comment #3)
> I am seeing the opposite, -c doesn't hang, but seems to fail to see the process
> run at all. This seems to be caused by the workaround in mainloop.c for PR6964.
> Maybe related, maybe not?
Forgot to add. This seems to be the cause of spurious testsuite failures where
lots of user space programs are run and probed for a short time with -c. Like
sdt.exp or exelib.exp. Those fail occasionally with "0 matches". But it is hard
to replicate by hand.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=10575
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug runtime/10575] occasional stapio hangs for -c CMD
2009-08-29 20:27 [Bug runtime/10575] New: occasional stapio hangs for -c CMD fche at redhat dot com
` (3 preceding siblings ...)
2009-09-08 9:54 ` mjw at redhat dot com
@ 2009-10-13 14:06 ` dsmith at redhat dot com
4 siblings, 0 replies; 6+ messages in thread
From: dsmith at redhat dot com @ 2009-10-13 14:06 UTC (permalink / raw)
To: systemtap
------- Additional Comments From dsmith at redhat dot com 2009-10-13 14:05 -------
commit ba9abf3 should improve this situation by avoiding the pause()-based race
condition Josh mentioned in comment #2. I've added code to use sigsuspend()
instead of pause() to avoid the race condition.
On the 2.6.31-tip kernel, I was seeing consistent failures from the
cmd_parse.exp testcase without this fix (but the specific test failures within
that testcase were random). With commit ba9abf3, I get consistent passes with
cmd_parse.exp on that kernel.
However, because of the intermittent nature of this problem, it is possible
there are still other fixes to be made. So, we'll leave this open for now.
--
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|systemtap at sources dot |dsmith at redhat dot com
|redhat dot com |
Status|NEW |ASSIGNED
http://sourceware.org/bugzilla/show_bug.cgi?id=10575
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
^ permalink raw reply [flat|nested] 6+ messages in thread