public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
From: Mark Wielaard <mjw@redhat.com>
To: Josh Stone <jistone@redhat.com>
Cc: systemtap@sourceware.org
Subject: Re: Making the transport layer more robust
Date: Tue, 16 Aug 2011 13:23:00 -0000	[thread overview]
Message-ID: <1313500965.3393.5.camel@springer.wildebeest.org> (raw)
In-Reply-To: <4E4965B3.6080700@redhat.com>

On Mon, 2011-08-15 at 11:30 -0700, Josh Stone wrote:
> On 08/12/2011 10:43 AM, Mark Wielaard wrote:
> > commit 46ac9ed5bad86641e552bee4e42a2d973ffc12d0
> > Author: Mark Wielaard <mjw@redhat.com>
> > Date:   Fri Aug 12 19:34:20 2011 +0200
> > 
> >     Remove _stp_ctl_work_timer from module transport layer.
> >     
> >     The _stp_ctl_work_timer would trigger every 20ms to check whether
> >     there were cmd messages queued, but not announced yet and to
> >     check the _stp_exit_flag was set.
> >     
> >     This commit makes all control messages announce themselves and
> >     check the _stp_exit_flag in the _stp_ctl_read_cmd loop (delivery
> >     is still possibly delayed since the messages are just pushed on
> >     a wait queue).
> 
> This has unfortunately left open an opportunity for deadlock.  The
> kernel wake_up infrastructure takes a spinlock on the wait queue.  If
> the probe context happens to fire while that lock is held, either via a
> direct probe on something called by wake_up or indirectly via NMI, then
> the handler must not call anything that would attempt the same lock.
> But this commit is triggering a wake_up on ctl prints, and commit
> a85c8aff triggers the same on exit().
> 
> For example, __wake_up_common is called with a lock held, and then
> either of these will cause a deadlock:
> 
>   probe kernel.function("__wake_up_common") { warn(pp()) }
> 
>   probe kernel.function("__wake_up_common") { exit() }
> 
> This issue in general is very similar to PR2525.  We must take care not
> to call any blocking code from arbitrary probe context.

Thanks for catching that. I am surprised none of our tests triggered
this. I added a nasty testcase based on the above example and reverted
most of the above two commit, reintroducing the timer on the kernel side
(luckily we can still keep the poll/select implementation so we won't be
busy polling on the user side at least). I also tried to explicitly
document all the "safe" places in the patch.

commit fc67febc6733e5803e6883a3757abda6268a953a
Author: Mark Wielaard <mjw@redhat.com>
Date:   Tue Aug 16 14:31:29 2011 +0200

  Reintroduce timer for transport cmd channel, don't wake_up unconditionally.
    
  Revert parts of commit a85c8a "runtime/io.c: Explicitly signal setting of
  _stp_exit_flag" and commit 46ac9e "Remove _stp_ctl_work_timer from module
  transport layer". Introduce a new test wake_up.exp that shows a deadlock
  when sending cmd messages and waking up the reader immediately.
    
  Renamed _stp_ctl_write to _stp_ctl_send, which can be called from
  everywhere. Rename _stp_ctl_send to _stp_ctl_send_notify that can be
  called from user context in the transport layer itself (this will
  immediately notify any readers). Document all places that use
  _stp_ctl_send_notify directly to clarify why that is safe.
    
  See http://sourceware.org/ml/systemtap/2011-q3/msg00163.html

Cheers,

Mark

  reply	other threads:[~2011-08-16 13:23 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-19  8:59 Mark Wielaard
2011-07-19 11:42 ` Mark Wielaard
2011-07-19 15:03 ` Mark Wielaard
2011-07-20  8:29   ` Mark Wielaard
2011-07-19 15:05 ` William Cohen
2011-07-20 14:13 ` Mark Wielaard
2011-07-21 17:18 ` David Smith
2011-08-12 17:43 ` Mark Wielaard
2011-08-15  8:24   ` Mark Wielaard
2011-08-15 18:30   ` Josh Stone
2011-08-16 13:23     ` Mark Wielaard [this message]
2011-08-25 12:12       ` Turgis, Frederic
2011-08-26 15:45         ` Turgis, Frederic
2011-08-26 18:45           ` Frank Ch. Eigler
2011-08-29  8:32             ` Turgis, Frederic
2011-08-29 11:21               ` Frank Ch. Eigler
2011-08-29 14:46               ` Frank Ch. Eigler
2011-08-30 13:20                 ` Turgis, Frederic
2011-09-05 11:27         ` Mark Wielaard
2011-09-05 14:32           ` Turgis, Frederic
     [not found]           ` <13872098A06B02418CF379A158C0F1460163182604@dnce02.ent.ti.com>
2011-09-06 10:12             ` Mark Wielaard
2011-09-06 14:30               ` Turgis, Frederic
2011-09-06 14:37               ` David Smith
2011-09-06 15:37                 ` David Smith
2011-09-06 16:25                   ` Turgis, Frederic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1313500965.3393.5.camel@springer.wildebeest.org \
    --to=mjw@redhat.com \
    --cc=jistone@redhat.com \
    --cc=systemtap@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).