From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4355 invoked by alias); 20 Jul 2011 08:29:43 -0000 Received: (qmail 4346 invoked by uid 22791); 20 Jul 2011 08:29:42 -0000 X-SWARE-Spam-Status: No, hits=-7.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,SPF_HELO_PASS X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 20 Jul 2011 08:29:29 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p6K8TSm4015967 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 20 Jul 2011 04:29:28 -0400 Received: from springer.wildebeest.org (ovpn-116-31.ams2.redhat.com [10.36.116.31]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p6K8TR5I020578 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 20 Jul 2011 04:29:28 -0400 Received: by springer.wildebeest.org (Postfix, from userid 500) id 55E54409BB; Wed, 20 Jul 2011 10:29:24 +0200 (CEST) Subject: Re: Making the transport layer more robust From: Mark Wielaard To: systemtap@sourceware.org In-Reply-To: <1311087764.9144.42.camel@springer.wildebeest.org> References: <1311065908.9144.27.camel@springer.wildebeest.org> <1311087764.9144.42.camel@springer.wildebeest.org> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Date: Wed, 20 Jul 2011 08:29:00 -0000 Message-ID: <1311150558.9144.45.camel@springer.wildebeest.org> Mime-Version: 1.0 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2011-q3/txt/msg00061.txt.bz2 On Tue, 2011-07-19 at 17:02 +0200, Mark Wielaard wrote: > On Tue, 2011-07-19 at 10:58 +0200, Mark Wielaard wrote: > > pr10854.exp acts strangely on rhel5, it seems fine on f14. It just sits > > there waiting the reap staprun, which will never happen since it tries > > to pkill it at the same time, that could be because the startup/exit of > > staprun/stapio is much more robust now, but I don't fully understand the > > expect spawn, catch, wait logic. Maybe it is some strange bug in the > > rhel5 expect? Maybe I changed some expectation of staprun/stapio/module > > interaction? Any help understanding the expect logic would be > > appreciated. >=20 > I think I narrowed this down to the following commit: >=20 > commit 5c854d7ca64df766c581c9ed7ff81e04c9d1fa4d > Author: Chris Meek > Date: Wed Jul 13 10:31:47 2011 -0400 >=20 > PR12890: Renaming modules in Staprun >=20 > Although it is somewhat hard to say, because it doesn't always fail. But > I have never seen it fail before this commit. >=20 > Still trying to understand the real issue and the testcase though. So > all help appreciated. Frank seems to have fixed it by changing the testcase as follows: commit 49909b5572bc61c03cc80ef94f6d00dc5bbf665d Author: Frank Ch. Eigler Date: Tue Jul 19 13:52:58 2011 -0400 resolve PR12890 vs PR10854 bunfight =20=20=20=20 The PR10854 test case uses a tight loop of staprun and a nexted loop of pkills, written in a way that counts on staprun's pre-PR12890 "insert; unload; retry insert" module-handling heuristic. With this heuristic gone (and error messages properly generated), the PR10854 test case goes woozy and hangs in the while { ... pkill ... } tcl loop. Now we don't loop in there any more. The test now passes on all my setups. Cheers, Mark