From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1019 invoked by alias); 14 May 2007 20:43:05 -0000 Received: (qmail 1012 invoked by uid 22791); 14 May 2007 20:43:05 -0000 X-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL,BAYES_20,DK_POLICY_SIGNSOME,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31) by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 14 May 2007 20:43:02 +0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.1/8.13.1) with ESMTP id l4EKh09J019799; Mon, 14 May 2007 16:43:00 -0400 Received: from lacrosse.corp.redhat.com (lacrosse.corp.redhat.com [172.16.52.154]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id l4EKgxH5001492; Mon, 14 May 2007 16:42:59 -0400 Received: from [10.11.14.151] (vpn-14-151.rdu.redhat.com [10.11.14.151]) by lacrosse.corp.redhat.com (8.12.11.20060308/8.11.6) with ESMTP id l4EKgntI008496; Mon, 14 May 2007 16:42:59 -0400 Message-ID: <4648C9B1.30307@redhat.com> Date: Mon, 14 May 2007 20:43:00 -0000 From: William Cohen User-Agent: Thunderbird 1.5.0.10 (X11/20070302) MIME-Version: 1.0 To: David Wilder CC: Quentin Barnes , systemtap@sources.redhat.com Subject: Re: testsuite and hardcoded timeouts References: <20070511191420.GA12285@urbana.css.mot.com> <4644DB46.1070705@redhat.com> <46489CF5.6010705@us.ibm.com> In-Reply-To: <46489CF5.6010705@us.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org X-SW-Source: 2007-q2/txt/msg00282.txt.bz2 David Wilder wrote: > > I ran into this issue on s390. When a time out occurs if the test > would simply produce a warning message then restarts the timer, allowing > the timeout to be restarted say 4 or 5 times before finally reporting a > failure. Then if something breaks the test will still report a > failure. On slower system the test would still pass. If a system/test > normally passes with one or two restarts of the timer then something > changes and it starts taking 3 or 4 restarts we will know that > investigation is needed. > You might luck out with the caching helping the later attempts skip some of the phases of the translator and avoid those times on the later runs. However, restarting 4 or 5 times is probably not going to help that much if the time required to generate the module is way larger than the time out. The timeout is there to make sure that forward progress is made on the testing. We would prefer to have the test fail in a reasonable amount of time than to have a test hang for an unreasonable amount of time and not get any results at all. The translator internals are pretty much a black box to the testing harness, so the timer is used to judge when the the test isn't making forward progress. Too bad there couldn't be an equivalent to a watchdog for the testing harness, e.g. if the test is making forward progress, leave the test be. -Will