[Bug dyninst/15443] New: deal with mutatees that die during our handlers

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* [Bug dyninst/15443] New: deal with mutatees that die during our handlers
@ 2013-05-07 19:40 jistone at redhat dot com
  0 siblings, 0 replies; only message in thread
From: jistone at redhat dot com @ 2013-05-07 19:40 UTC (permalink / raw)
  To: systemtap

http://sourceware.org/bugzilla/show_bug.cgi?id=15443

             Bug #: 15443
           Summary: deal with mutatees that die during our handlers
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: dyninst
        AssignedTo: systemtap@sourceware.org
        ReportedBy: jistone@redhat.com
    Classification: Unclassified

We can assume for a moment that our runtime is perfect, and never causes the
mutatee to die.  But what happens if a threaded mutatee exits (by signal or by
choice) or execs, while one of its threads is currently in one of our probe
handlers?  I expect at a minimum, that context mutex will be left forever
locked.  It's possible for much more to be left in inconsistent state too.

(I've been trying to debug some weird issues during testsuite runs, and while
I'm not certain this is the root cause, it does seem to be a real possibility.)

Maybe we could try to capture all exit/exec paths and "quiesce" other threads
(at least as far as our state is concerned).  I suspect that this would require
heroic effort though, and still probably imperfect. (e.g. SIGKILL is absolute.)

For mutexes, there is pthread_mutexattr_setrobust() which we should probably
use.  This will at least tell us EOWNERDEAD, and from there we can decide
whether recovery is possible.  That decision is probably different for each
mutex-locked area we have, e.g. a dead lock on a context struct can probably be
repurposed, but a dead lock on the transport seems worse.  But even handling
EOWNERDEAD as a fatal error would be better than just hanging.

For rwlock, I see no equivalent of setrobust().  These are used for global
variables, so we should probably just add timeouts.  (Not a trylock-wait-retry
loop as in kernel - I think just a plain timed[rd|wr]lock is fine.)

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2013-05-07 19:40 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-07 19:40 [Bug dyninst/15443] New: deal with mutatees that die during our handlers jistone at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).