public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug runtime/10575] New: occasional stapio hangs for -c CMD
@ 2009-08-29 20:27 fche at redhat dot com
  2009-08-31 17:03 ` [Bug runtime/10575] " fche at redhat dot com
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: fche at redhat dot com @ 2009-08-29 20:27 UTC (permalink / raw)
  To: systemtap

Intermittently, "stap .... -c FOOO" appears to hang, for normally
short-lived FOOO.  I don't know whether in these cases FOOO fails to
start, or whether its ending fails to be noticed by stapio, but
something is occasionally broken.

-- 
           Summary: occasional stapio hangs for -c CMD
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: runtime
        AssignedTo: systemtap at sources dot redhat dot com
        ReportedBy: fche at redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=10575

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/10575] occasional stapio hangs for -c CMD
  2009-08-29 20:27 [Bug runtime/10575] New: occasional stapio hangs for -c CMD fche at redhat dot com
@ 2009-08-31 17:03 ` fche at redhat dot com
  2009-08-31 17:40 ` jistone at redhat dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: fche at redhat dot com @ 2009-08-31 17:03 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From fche at redhat dot com  2009-08-31 17:03 -------
In one scenario, the initial SIGUSR1 sent to the target_cmd-executing
stapio process appears to be lost (either not received, or sent before
the child program was listening for it, or perhaps not sent at all?!).


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10575

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/10575] occasional stapio hangs for -c CMD
  2009-08-29 20:27 [Bug runtime/10575] New: occasional stapio hangs for -c CMD fche at redhat dot com
  2009-08-31 17:03 ` [Bug runtime/10575] " fche at redhat dot com
@ 2009-08-31 17:40 ` jistone at redhat dot com
  2009-09-08  9:51 ` mjw at redhat dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: jistone at redhat dot com @ 2009-08-31 17:40 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From jistone at redhat dot com  2009-08-31 17:40 -------
(In reply to comment #1)
> In one scenario, the initial SIGUSR1 sent to the target_cmd-executing
> stapio process appears to be lost (either not received, or sent before
> the child program was listening for it, or perhaps not sent at all?!).

Do your scripts have lots of output?  It could be related to #10189, where
STP_START gets lost in the overflow...

We also fork the child process before setting up signals, so we wouldn't see the
SIGCHLD if the it died to soon (i.e. before the SIGUSR1/exec stuff, but that
would be an abnormal termination).  The fix here is to prepare for SIGCHLD
before starting the child.

Another race I see is if the main process sent the SIGUSR1 before the child had
setup its handler -- this would cause the child to abort.  We should get a
SIGCHLD in this case though, so while not desirable, it wouldn't cause your
hang.  We should probably set SIGUSR1 blocked before forking anyway.

There's a tighter race between the child's calls to sigaction-ignore-SIGUSR1 and
then pause -- the signal could be lost in-between.  I believe sigsuspend would
handle this more atomically.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10575

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/10575] occasional stapio hangs for -c CMD
  2009-08-29 20:27 [Bug runtime/10575] New: occasional stapio hangs for -c CMD fche at redhat dot com
  2009-08-31 17:03 ` [Bug runtime/10575] " fche at redhat dot com
  2009-08-31 17:40 ` jistone at redhat dot com
@ 2009-09-08  9:51 ` mjw at redhat dot com
  2009-09-08  9:54 ` mjw at redhat dot com
  2009-10-13 14:06 ` dsmith at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: mjw at redhat dot com @ 2009-09-08  9:51 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mjw at redhat dot com  2009-09-08 09:51 -------
I am seeing the opposite, -c doesn't hang, but seems to fail to see the process
run at all. This seems to be caused by the workaround in mainloop.c for PR6964.
Maybe related, maybe not?

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10575

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/10575] occasional stapio hangs for -c CMD
  2009-08-29 20:27 [Bug runtime/10575] New: occasional stapio hangs for -c CMD fche at redhat dot com
                   ` (2 preceding siblings ...)
  2009-09-08  9:51 ` mjw at redhat dot com
@ 2009-09-08  9:54 ` mjw at redhat dot com
  2009-10-13 14:06 ` dsmith at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: mjw at redhat dot com @ 2009-09-08  9:54 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From mjw at redhat dot com  2009-09-08 09:54 -------
(In reply to comment #3)
> I am seeing the opposite, -c doesn't hang, but seems to fail to see the process
> run at all. This seems to be caused by the workaround in mainloop.c for PR6964.
> Maybe related, maybe not?

Forgot to add. This seems to be the cause of spurious testsuite failures where
lots of user space programs are run and probed for a short time with -c. Like
sdt.exp or exelib.exp. Those fail occasionally with "0 matches". But it is hard
to replicate by hand.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=10575

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug runtime/10575] occasional stapio hangs for -c CMD
  2009-08-29 20:27 [Bug runtime/10575] New: occasional stapio hangs for -c CMD fche at redhat dot com
                   ` (3 preceding siblings ...)
  2009-09-08  9:54 ` mjw at redhat dot com
@ 2009-10-13 14:06 ` dsmith at redhat dot com
  4 siblings, 0 replies; 6+ messages in thread
From: dsmith at redhat dot com @ 2009-10-13 14:06 UTC (permalink / raw)
  To: systemtap


------- Additional Comments From dsmith at redhat dot com  2009-10-13 14:05 -------
commit ba9abf3 should improve this situation by avoiding the pause()-based race
condition Josh mentioned in comment #2.  I've added code to use sigsuspend()
instead of pause() to avoid the race condition.

On the 2.6.31-tip kernel, I was seeing consistent failures from the
cmd_parse.exp testcase without this fix (but the specific test failures within
that testcase were random).  With commit ba9abf3, I get consistent passes with
cmd_parse.exp on that kernel.

However, because of the intermittent nature of this problem, it is possible
there are still other fixes to be made.  So, we'll leave this open for now.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|systemtap at sources dot    |dsmith at redhat dot com
                   |redhat dot com              |
             Status|NEW                         |ASSIGNED


http://sourceware.org/bugzilla/show_bug.cgi?id=10575

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-10-13 14:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-29 20:27 [Bug runtime/10575] New: occasional stapio hangs for -c CMD fche at redhat dot com
2009-08-31 17:03 ` [Bug runtime/10575] " fche at redhat dot com
2009-08-31 17:40 ` jistone at redhat dot com
2009-09-08  9:51 ` mjw at redhat dot com
2009-09-08  9:54 ` mjw at redhat dot com
2009-10-13 14:06 ` dsmith at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).