From: Mark Wielaard <mjw@redhat.com>
To: Josh Stone <jistone@redhat.com>
Cc: systemtap@sourceware.org
Subject: Re: Making the transport layer more robust
Date: Tue, 16 Aug 2011 13:23:00 -0000 [thread overview]
Message-ID: <1313500965.3393.5.camel@springer.wildebeest.org> (raw)
In-Reply-To: <4E4965B3.6080700@redhat.com>
On Mon, 2011-08-15 at 11:30 -0700, Josh Stone wrote:
> On 08/12/2011 10:43 AM, Mark Wielaard wrote:
> > commit 46ac9ed5bad86641e552bee4e42a2d973ffc12d0
> > Author: Mark Wielaard <mjw@redhat.com>
> > Date: Fri Aug 12 19:34:20 2011 +0200
> >
> > Remove _stp_ctl_work_timer from module transport layer.
> >
> > The _stp_ctl_work_timer would trigger every 20ms to check whether
> > there were cmd messages queued, but not announced yet and to
> > check the _stp_exit_flag was set.
> >
> > This commit makes all control messages announce themselves and
> > check the _stp_exit_flag in the _stp_ctl_read_cmd loop (delivery
> > is still possibly delayed since the messages are just pushed on
> > a wait queue).
>
> This has unfortunately left open an opportunity for deadlock. The
> kernel wake_up infrastructure takes a spinlock on the wait queue. If
> the probe context happens to fire while that lock is held, either via a
> direct probe on something called by wake_up or indirectly via NMI, then
> the handler must not call anything that would attempt the same lock.
> But this commit is triggering a wake_up on ctl prints, and commit
> a85c8aff triggers the same on exit().
>
> For example, __wake_up_common is called with a lock held, and then
> either of these will cause a deadlock:
>
> probe kernel.function("__wake_up_common") { warn(pp()) }
>
> probe kernel.function("__wake_up_common") { exit() }
>
> This issue in general is very similar to PR2525. We must take care not
> to call any blocking code from arbitrary probe context.
Thanks for catching that. I am surprised none of our tests triggered
this. I added a nasty testcase based on the above example and reverted
most of the above two commit, reintroducing the timer on the kernel side
(luckily we can still keep the poll/select implementation so we won't be
busy polling on the user side at least). I also tried to explicitly
document all the "safe" places in the patch.
commit fc67febc6733e5803e6883a3757abda6268a953a
Author: Mark Wielaard <mjw@redhat.com>
Date: Tue Aug 16 14:31:29 2011 +0200
Reintroduce timer for transport cmd channel, don't wake_up unconditionally.
Revert parts of commit a85c8a "runtime/io.c: Explicitly signal setting of
_stp_exit_flag" and commit 46ac9e "Remove _stp_ctl_work_timer from module
transport layer". Introduce a new test wake_up.exp that shows a deadlock
when sending cmd messages and waking up the reader immediately.
Renamed _stp_ctl_write to _stp_ctl_send, which can be called from
everywhere. Rename _stp_ctl_send to _stp_ctl_send_notify that can be
called from user context in the transport layer itself (this will
immediately notify any readers). Document all places that use
_stp_ctl_send_notify directly to clarify why that is safe.
See http://sourceware.org/ml/systemtap/2011-q3/msg00163.html
Cheers,
Mark
next prev parent reply other threads:[~2011-08-16 13:23 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-19 8:59 Mark Wielaard
2011-07-19 11:42 ` Mark Wielaard
2011-07-19 15:03 ` Mark Wielaard
2011-07-20 8:29 ` Mark Wielaard
2011-07-19 15:05 ` William Cohen
2011-07-20 14:13 ` Mark Wielaard
2011-07-21 17:18 ` David Smith
2011-08-12 17:43 ` Mark Wielaard
2011-08-15 8:24 ` Mark Wielaard
2011-08-15 18:30 ` Josh Stone
2011-08-16 13:23 ` Mark Wielaard [this message]
2011-08-25 12:12 ` Turgis, Frederic
2011-08-26 15:45 ` Turgis, Frederic
2011-08-26 18:45 ` Frank Ch. Eigler
2011-08-29 8:32 ` Turgis, Frederic
2011-08-29 11:21 ` Frank Ch. Eigler
2011-08-29 14:46 ` Frank Ch. Eigler
2011-08-30 13:20 ` Turgis, Frederic
2011-09-05 11:27 ` Mark Wielaard
2011-09-05 14:32 ` Turgis, Frederic
[not found] ` <13872098A06B02418CF379A158C0F1460163182604@dnce02.ent.ti.com>
2011-09-06 10:12 ` Mark Wielaard
2011-09-06 14:30 ` Turgis, Frederic
2011-09-06 14:37 ` David Smith
2011-09-06 15:37 ` David Smith
2011-09-06 16:25 ` Turgis, Frederic
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1313500965.3393.5.camel@springer.wildebeest.org \
--to=mjw@redhat.com \
--cc=jistone@redhat.com \
--cc=systemtap@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).