From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3416 invoked by alias); 14 Jun 2010 09:18:16 -0000 Received: (qmail 3407 invoked by uid 22791); 14 Jun 2010 09:18:14 -0000 X-SWARE-Spam-Status: No, hits=-5.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 14 Jun 2010 09:18:08 +0000 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o5E9I7Yr001741 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 14 Jun 2010 05:18:07 -0400 Received: from [10.36.5.239] (vpn1-5-239.ams2.redhat.com [10.36.5.239]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o5E9I5Yb007970; Mon, 14 Jun 2010 05:18:06 -0400 Subject: Re: [Bug uprobes/11672] utrace_report_syscall_exit crash From: Mark Wielaard To: Roland McGrath Cc: systemtap@sources.redhat.com In-Reply-To: <20100614084019.7F188408C2@magilla.sf.frob.com> References: <20100607130056.11672.mjw@redhat.com> <20100609143920.1986.qmail@sourceware.org> <20100614084019.7F188408C2@magilla.sf.frob.com> Content-Type: text/plain; charset="UTF-8" Date: Mon, 14 Jun 2010 14:49:00 -0000 Message-ID: <1276507085.4140.27.camel@springer.wildebeest.org> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2010-q2/txt/msg00546.txt.bz2 On Mon, 2010-06-14 at 01:40 -0700, Roland McGrath wrote: > > (__stp_utrace_attach): Loop on -ERESTARTSYS after utrace_barrier. > > (__stp_utrace_task_finder_target_quiesce): Likewise. > > That is a busy-wait loop, because -ERESTARTSYS means "signal is pending" > and that stays true without any blocking until you let it get back to user > mode (or almost, i.e. signal handling). I don't really understand the > context of where these functions get called. I'm guessing it is in some > control thread belonging to stapio or something like that. In that case, > the signal pending is a SIGINT killing stapio or something like that I > suppose. Is that the case? This is called from the task finder cleanup code and the utrace probe exit code before the module tries to unload. There could be some signals involved since we might unload the module by forking, executing and waiting for a child, runstap -d, process to do it (this might happen when the stap process gets a ^C). It also happens when a script calls exit(), or someone explicitly calls rmmod on us. Precise description can be found in runtime/transport/transport.txt (SHUTDOWN AND UNLOADING). > This is a relatively "safe" busy-wait. utrace_barrier will return 0 when > the tracee is all clear, not short-circuit because of signal_pending() > unless it actually has to block. It's waiting for the tracee on the other > CPU to complete your callback, which should be pretty quick. But it's > still a busy-wait that suddenly chews CPU in a spurt, and all quite fragile. Busy-waiting is bad, so if there is an alternative that would be nice. All we need is that if after a utrace_control UTRACE_DETACH we get an -EINPROGRESS that we can wait till we are sure any pending handlers have finished and that the detach fully succeeded. Thanks, Mark