From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16614 invoked by alias); 15 Aug 2011 18:30:34 -0000 Received: (qmail 16554 invoked by uid 22791); 15 Aug 2011 18:30:33 -0000 X-SWARE-Spam-Status: No, hits=-6.7 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,SPF_HELO_PASS X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 15 Aug 2011 18:30:12 +0000 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p7FIUB1o023829 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 15 Aug 2011 14:30:11 -0400 Received: from [10.3.113.88] (ovpn-113-88.phx2.redhat.com [10.3.113.88]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p7FIUBpZ006323; Mon, 15 Aug 2011 14:30:11 -0400 Message-ID: <4E4965B3.6080700@redhat.com> Date: Mon, 15 Aug 2011 18:30:00 -0000 From: Josh Stone User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110707 Thunderbird/5.0 MIME-Version: 1.0 To: Mark Wielaard CC: systemtap@sourceware.org Subject: Re: Making the transport layer more robust References: <1311065908.9144.27.camel@springer.wildebeest.org> <20110812174324.GA1394@hermans.wildebeest.org> In-Reply-To: <20110812174324.GA1394@hermans.wildebeest.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2011-q3/txt/msg00163.txt.bz2 On 08/12/2011 10:43 AM, Mark Wielaard wrote: > commit 46ac9ed5bad86641e552bee4e42a2d973ffc12d0 > Author: Mark Wielaard > Date: Fri Aug 12 19:34:20 2011 +0200 > > Remove _stp_ctl_work_timer from module transport layer. > > The _stp_ctl_work_timer would trigger every 20ms to check whether > there were cmd messages queued, but not announced yet and to > check the _stp_exit_flag was set. > > This commit makes all control messages announce themselves and > check the _stp_exit_flag in the _stp_ctl_read_cmd loop (delivery > is still possibly delayed since the messages are just pushed on > a wait queue). This has unfortunately left open an opportunity for deadlock. The kernel wake_up infrastructure takes a spinlock on the wait queue. If the probe context happens to fire while that lock is held, either via a direct probe on something called by wake_up or indirectly via NMI, then the handler must not call anything that would attempt the same lock. But this commit is triggering a wake_up on ctl prints, and commit a85c8aff triggers the same on exit(). For example, __wake_up_common is called with a lock held, and then either of these will cause a deadlock: probe kernel.function("__wake_up_common") { warn(pp()) } probe kernel.function("__wake_up_common") { exit() } This issue in general is very similar to PR2525. We must take care not to call any blocking code from arbitrary probe context. Thanks, Josh