testsuite and hardcoded timeouts

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* testsuite and hardcoded timeouts
@ 2007-05-11 19:14 Quentin Barnes
  2007-05-11 21:08 ` William Cohen
  0 siblings, 1 reply; 10+ messages in thread
From: Quentin Barnes @ 2007-05-11 19:14 UTC (permalink / raw)
  To: systemtap

I mentioned this issue as an aside and asked about it a month ago,
but I don't think it got a response at the time.

In porting the Systemtap testsuite to an embedded ARM platform
(~350Mhz CPU with 64MB and running using an NFS root and swap), I
found many of the existing hardcoded timeout parameters are way too
short.  Several of them I had to at least triple if not increase by
a larger factor, sometimes 6x-15x to get them to pass.

How do we want to deal with this portability problem of hardcoded
timeouts?

There are a few ways I can think of to address this:

1) Let the hardcoded numbers stay, but up them large enough to
   handle the slowest platform we might ever run on.  If that's
   still not slow enough someday, up them some more when the time
   comes.

This has the advantage of simplicity, but can greatly slow down
suite runs on faster processors when tests do get stuck.

2) Ban all standalone hardcoded timeouts replacing them with an
   expression involving a multiplier and/or a constant and a
   multiplier.

This is not the cleanest because some tests are slow due to I/O
bandwidth or paging where others are slow due to CPU limitations.
But it does have the advantage that if someone is having timeout
issues, they can up the multiplier value and rerun to see if the
problem goes away without having to edit all sorts of wrapper
scripts and tests.

If we go with a multiplier, the multiplier could be set
automatically by reading the cpuinfo and taking a stab at it based
on the machine's BogoMIPS or MHz.  We'd still need a way to have a user
straightforwardly tweak it beyond that manually.

Unfortunately, I don't understand the Systemtap testsuite framework
yet well enough to make specific suggestions.

Thoughts?

Quentin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: testsuite and hardcoded timeouts
  2007-05-11 19:14 testsuite and hardcoded timeouts Quentin Barnes
@ 2007-05-11 21:08 ` William Cohen
  2007-05-14 16:50   ` David Wilder
  0 siblings, 1 reply; 10+ messages in thread
From: William Cohen @ 2007-05-11 21:08 UTC (permalink / raw)
  To: Quentin Barnes; +Cc: systemtap

Quentin Barnes wrote:
> I mentioned this issue as an aside and asked about it a month ago,
> but I don't think it got a response at the time.
> 
> In porting the Systemtap testsuite to an embedded ARM platform
> (~350Mhz CPU with 64MB and running using an NFS root and swap), I
> found many of the existing hardcoded timeout parameters are way too
> short.  Several of them I had to at least triple if not increase by
> a larger factor, sometimes 6x-15x to get them to pass.
> 
> How do we want to deal with this portability problem of hardcoded
> timeouts?
> 
> 
> There are a few ways I can think of to address this:
> 
> 1) Let the hardcoded numbers stay, but up them large enough to
>   handle the slowest platform we might ever run on.  If that's
>   still not slow enough someday, up them some more when the time
>   comes.
> 
> This has the advantage of simplicity, but can greatly slow down
> suite runs on faster processors when tests do get stuck.
> 
> 2) Ban all standalone hardcoded timeouts replacing them with an
>   expression involving a multiplier and/or a constant and a
>   multiplier.
> 
> This is not the cleanest because some tests are slow due to I/O
> bandwidth or paging where others are slow due to CPU limitations.
> But it does have the advantage that if someone is having timeout
> issues, they can up the multiplier value and rerun to see if the
> problem goes away without having to edit all sorts of wrapper
> scripts and tests.
> 
> If we go with a multiplier, the multiplier could be set
> automatically by reading the cpuinfo and taking a stab at it based
> on the machine's BogoMIPS or MHz.  We'd still need a way to have a user
> straightforwardly tweak it beyond that manually.
> 
> Unfortunately, I don't understand the Systemtap testsuite framework
> yet well enough to make specific suggestions.
> 
> Thoughts?
> 
> Quentin

Hi Quentin,

I have some machines regularly downloading cvs snapshots of systemtap and 
running the tests. I have encountered the same problem, particularly on the slow 
pentium III machine. I have increased some of the timeouts as a result of this. 
However, the problem is we don't know how long some of the tests take to run. In 
addition to the processor speed the kernel/debuginfo could affect the time 
required to build/install the tests.

I don't have good solutions to this problem. However, it might be good to start 
listing the tests that are "too slow."  People running probe might be okay with 
a script taking a little time to get started, but they might not be so patient 
when it takes minutes for the script to translate and start running. Running 
them by hand with the "-v" to get information about which phases time is being 
spent would be helpful.

-Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: testsuite and hardcoded timeouts
  2007-05-11 21:08 ` William Cohen
@ 2007-05-14 16:50   ` David Wilder
  2007-05-14 20:43     ` William Cohen
  0 siblings, 1 reply; 10+ messages in thread
From: David Wilder @ 2007-05-14 16:50 UTC (permalink / raw)
  To: William Cohen; +Cc: Quentin Barnes, systemtap

William Cohen wrote:

> Quentin Barnes wrote:
>
>> I mentioned this issue as an aside and asked about it a month ago,
>> but I don't think it got a response at the time.
>>
>> In porting the Systemtap testsuite to an embedded ARM platform
>> (~350Mhz CPU with 64MB and running using an NFS root and swap), I
>> found many of the existing hardcoded timeout parameters are way too
>> short.  Several of them I had to at least triple if not increase by
>> a larger factor, sometimes 6x-15x to get them to pass.
>>
>> How do we want to deal with this portability problem of hardcoded
>> timeouts?
>>
>>
>> There are a few ways I can think of to address this:
>>
>> 1) Let the hardcoded numbers stay, but up them large enough to
>>   handle the slowest platform we might ever run on.  If that's
>>   still not slow enough someday, up them some more when the time
>>   comes.
>>
>> This has the advantage of simplicity, but can greatly slow down
>> suite runs on faster processors when tests do get stuck.
>>
>> 2) Ban all standalone hardcoded timeouts replacing them with an
>>   expression involving a multiplier and/or a constant and a
>>   multiplier.
>>
>> This is not the cleanest because some tests are slow due to I/O
>> bandwidth or paging where others are slow due to CPU limitations.
>> But it does have the advantage that if someone is having timeout
>> issues, they can up the multiplier value and rerun to see if the
>> problem goes away without having to edit all sorts of wrapper
>> scripts and tests.
>>
>> If we go with a multiplier, the multiplier could be set
>> automatically by reading the cpuinfo and taking a stab at it based
>> on the machine's BogoMIPS or MHz.  We'd still need a way to have a user
>> straightforwardly tweak it beyond that manually.
>>
>> Unfortunately, I don't understand the Systemtap testsuite framework
>> yet well enough to make specific suggestions.
>>
>> Thoughts?
>>
>> Quentin
>
>
> Hi Quentin,
>
> I have some machines regularly downloading cvs snapshots of systemtap 
> and running the tests. I have encountered the same problem, 
> particularly on the slow pentium III machine. I have increased some of 
> the timeouts as a result of this. However, the problem is we don't 
> know how long some of the tests take to run. In addition to the 
> processor speed the kernel/debuginfo could affect the time required to 
> build/install the tests.
>
> I don't have good solutions to this problem. However, it might be good 
> to start listing the tests that are "too slow."  People running probe 
> might be okay with a script taking a little time to get started, but 
> they might not be so patient when it takes minutes for the script to 
> translate and start running. Running them by hand with the "-v" to get 
> information about which phases time is being spent would be helpful.
>
> -Will

I ran into this issue on s390.   When a time out occurs if the test 
would simply produce a warning message then restarts the timer, allowing 
the timeout to be restarted say 4 or 5 times before finally reporting a 
failure.   Then if something breaks the test will still report a 
failure.  On slower system the test would still pass.  If a  system/test 
normally passes with one or two restarts of the timer then something 
changes and it starts taking 3 or 4 restarts we will know that 
investigation is needed.

-- 
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
dwilder@us.ibm.com
(503)578-3789

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: testsuite and hardcoded timeouts
  2007-05-14 16:50   ` David Wilder
@ 2007-05-14 20:43     ` William Cohen
  2007-05-14 21:01       ` David Wilder
  0 siblings, 1 reply; 10+ messages in thread
From: William Cohen @ 2007-05-14 20:43 UTC (permalink / raw)
  To: David Wilder; +Cc: Quentin Barnes, systemtap

David Wilder wrote:

> 
> I ran into this issue on s390.   When a time out occurs if the test 
> would simply produce a warning message then restarts the timer, allowing 
> the timeout to be restarted say 4 or 5 times before finally reporting a 
> failure.   Then if something breaks the test will still report a 
> failure.  On slower system the test would still pass.  If a  system/test 
> normally passes with one or two restarts of the timer then something 
> changes and it starts taking 3 or 4 restarts we will know that 
> investigation is needed.
> 

You might luck out with the caching helping the later attempts skip some of the 
phases of the translator and avoid those times on the later runs. However, 
restarting 4 or 5 times is probably not going to help that much if the time 
required to generate the module is way larger than the time out.

The timeout is there to make sure that forward progress is made on the testing. 
We would prefer to have the test fail in a reasonable amount of time than to 
have a test hang for an unreasonable amount of time and not get any results at 
all. The translator internals are pretty much a black box to the testing 
harness, so the timer is used to judge when the the test isn't making forward 
progress. Too bad there couldn't be an equivalent to a watchdog for the testing 
harness, e.g. if the test is making forward progress, leave the test be.

-Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: testsuite and hardcoded timeouts
  2007-05-14 20:43     ` William Cohen
@ 2007-05-14 21:01       ` David Wilder
  2007-05-15 22:35         ` Quentin Barnes
  0 siblings, 1 reply; 10+ messages in thread
From: David Wilder @ 2007-05-14 21:01 UTC (permalink / raw)
  To: William Cohen; +Cc: Quentin Barnes, systemtap

William Cohen wrote:

> David Wilder wrote:
>
>>
>> I ran into this issue on s390.   When a time out occurs if the test 
>> would simply produce a warning message then restarts the timer, 
>> allowing the timeout to be restarted say 4 or 5 times before finally 
>> reporting a failure.   Then if something breaks the test will still 
>> report a failure.  On slower system the test would still pass.  If a  
>> system/test normally passes with one or two restarts of the timer 
>> then something changes and it starts taking 3 or 4 restarts we will 
>> know that investigation is needed.
>>
>
>
> You might luck out with the caching helping the later attempts skip 
> some of the phases of the translator and avoid those times on the 
> later runs. However, restarting 4 or 5 times is probably not going to 
> help that much if the time required to generate the module is way 
> larger than the time out.


I was not thinking that the expiration of the time out would restart 
generating the module.   Just warn the user that the test is taking 
longer than expected.   So the purpose of the timer is just to print  
"hay I am taking too long".  The real timeout that would cause the test 
to fail happens after 4 or 5 warning messages have been printed.  This 
way the user is given a heads up that something may be wrong before 
waiting for a timeout that is long enough for even the slowest system to 
normally complete the test.

>
> The timeout is there to make sure that forward progress is made on the 
> testing. We would prefer to have the test fail in a reasonable amount 
> of time than to have a test hang for an unreasonable amount of time 
> and not get any results at all. The translator internals are pretty 
> much a black box to the testing harness, so the timer is used to judge 
> when the the test isn't making forward progress. Too bad there 
> couldn't be an equivalent to a watchdog for the testing harness, e.g. 
> if the test is making forward progress, leave the test be.
>
> -Will



-- 
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA 
dwilder@us.ibm.com
(503)578-3789

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: testsuite and hardcoded timeouts
  2007-05-14 21:01       ` David Wilder
@ 2007-05-15 22:35         ` Quentin Barnes
  2007-05-15 22:47           ` Frank Ch. Eigler
  0 siblings, 1 reply; 10+ messages in thread
From: Quentin Barnes @ 2007-05-15 22:35 UTC (permalink / raw)
  To: David Wilder; +Cc: William Cohen, systemtap

>>You might luck out with the caching helping the later attempts skip 
>>some of the phases of the translator and avoid those times on the 
>>later runs. However, restarting 4 or 5 times is probably not going to 
>>help that much if the time required to generate the module is way 
>>larger than the time out.
>
>I was not thinking that the expiration of the time out would restart 
>generating the module.   Just warn the user that the test is taking 
>longer than expected.   So the purpose of the timer is just to print  
>"hay I am taking too long".  The real timeout that would cause the test 
>to fail happens after 4 or 5 warning messages have been printed.  This 
>way the user is given a heads up that something may be wrong before 
>waiting for a timeout that is long enough for even the slowest system to 
>normally complete the test.

Ah, maybe there is some middle ground here.  Instead of putting the
effort into figuring out some portable method for dynamic timeouts,
just change the behavior for a timeout to be user-settable --
default behavior is fatal like it is now, or as an alternative, just
output an advisory warning and do an "exp_continue".

Would this be a reasonable change for now?

If so, I could code up this work as part of my ARM Systemtap port
and post it to the list.

>>The timeout is there to make sure that forward progress is made on the 
>>testing. We would prefer to have the test fail in a reasonable amount 
>>of time than to have a test hang for an unreasonable amount of time 
>>and not get any results at all. The translator internals are pretty 
>>much a black box to the testing harness, so the timer is used to judge 
>>when the the test isn't making forward progress. Too bad there 
>>couldn't be an equivalent to a watchdog for the testing harness, e.g. 
>>if the test is making forward progress, leave the test be.
>>
>>-Will
>
>-- 
>David Wilder
>IBM Linux Technology Center
>Beaverton, Oregon, USA 
>dwilder@us.ibm.com
>(503)578-3789

Quentin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: testsuite and hardcoded timeouts
  2007-05-15 22:35         ` Quentin Barnes
@ 2007-05-15 22:47           ` Frank Ch. Eigler
  2007-05-16  0:42             ` Quentin Barnes
  2007-05-18  0:36             ` Quentin Barnes
  0 siblings, 2 replies; 10+ messages in thread
From: Frank Ch. Eigler @ 2007-05-15 22:47 UTC (permalink / raw)
  To: Quentin Barnes; +Cc: David Wilder, William Cohen, systemtap

Quentin Barnes <qbarnes@urbana.css.mot.com> writes:

> [...]
> Ah, maybe there is some middle ground here.  Instead of putting the
> effort into figuring out some portable method for dynamic timeouts,
> just change the behavior for a timeout to be user-settable [...]

It can be even easier than that.  dejagnu's "timeout" tcl variable is
exactly the default timeout duration in seconds.  The .exp files under
testsuite/config or even testsuite/lib could set this global variable
based on the "ishost" predicate - leave it for i686, double it for
s390x, dedicule (!) it for arm.  Then we just need to police the test
cases to avoid messing with this value.

- FChE

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: testsuite and hardcoded timeouts
  2007-05-15 22:47           ` Frank Ch. Eigler
@ 2007-05-16  0:42             ` Quentin Barnes
  2007-05-16 19:03               ` William Cohen
  2007-05-18  0:36             ` Quentin Barnes
  1 sibling, 1 reply; 10+ messages in thread
From: Quentin Barnes @ 2007-05-16  0:42 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: David Wilder, William Cohen, systemtap

>Quentin Barnes <qbarnes@urbana.css.mot.com> writes:
>
>> [...]
>> Ah, maybe there is some middle ground here.  Instead of putting the
>> effort into figuring out some portable method for dynamic timeouts,
>> just change the behavior for a timeout to be user-settable [...]
>
>It can be even easier than that.  dejagnu's "timeout" tcl variable is
>exactly the default timeout duration in seconds.

The "timeout" variable is an expect feature.  It is already set
in stap_run.exp and stap_run2.exp, but timeouts are also manually
specified in various expect statements sprinkled through the
testsuite.  Those are the ones that cause me the most headaches.
Otherwise, tinkering with just two files would be trivial.

>The .exp files under
>testsuite/config or even testsuite/lib could set this global variable
>based on the "ishost" predicate - leave it for i686, double it for
>s390x, dedicule (!) it for arm.  Then we just need to police the test
>cases to avoid messing with this value.

It's not that simple.  For example, my setup is really, really slow
because it is using NFS mounted root and swap with a small amount of
RAM.  Another ARM system could run easily 5x-10x faster than mine
with just more memory or a real hard disk.

Rather than create an "ishost" rule, I suggested that what would
probably be better is to use the MHz or BogoMIPS number from
/proc/cpuinfo.  But even that's a heuristic because it only takes
in account the CPU speed, not the system speed that can be choked
due to I/O limitations.

What I'd like to know is if it is really necessary to have fatal
timeouts.  How often does running the test suite truly hang up
where the timeout feature gets it unstuck?

I've found that if my system has taken too long, it's due to a bug
and the kernel is no longer stable.  However, I don't work on the
stap translator.  I suspect bugs in it are what causes recoverable
test hang ups to exist.

>- FChE

Quentin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: testsuite and hardcoded timeouts
  2007-05-16  0:42             ` Quentin Barnes
@ 2007-05-16 19:03               ` William Cohen
  0 siblings, 0 replies; 10+ messages in thread
From: William Cohen @ 2007-05-16 19:03 UTC (permalink / raw)
  To: Quentin Barnes; +Cc: Frank Ch. Eigler, David Wilder, systemtap

Quentin Barnes wrote:
>> Quentin Barnes <qbarnes@urbana.css.mot.com> writes:
>>
>>> [...]
>>> Ah, maybe there is some middle ground here.  Instead of putting the
>>> effort into figuring out some portable method for dynamic timeouts,
>>> just change the behavior for a timeout to be user-settable [...]
>>
>> It can be even easier than that.  dejagnu's "timeout" tcl variable is
>> exactly the default timeout duration in seconds.
> 
> The "timeout" variable is an expect feature.  It is already set
> in stap_run.exp and stap_run2.exp, but timeouts are also manually
> specified in various expect statements sprinkled through the
> testsuite.  Those are the ones that cause me the most headaches.
> Otherwise, tinkering with just two files would be trivial.
> 
>> The .exp files under
>> testsuite/config or even testsuite/lib could set this global variable
>> based on the "ishost" predicate - leave it for i686, double it for
>> s390x, dedicule (!) it for arm.  Then we just need to police the test
>> cases to avoid messing with this value.
> 
> It's not that simple.  For example, my setup is really, really slow
> because it is using NFS mounted root and swap with a small amount of
> RAM.  Another ARM system could run easily 5x-10x faster than mine
> with just more memory or a real hard disk.
> 
> Rather than create an "ishost" rule, I suggested that what would
> probably be better is to use the MHz or BogoMIPS number from
> /proc/cpuinfo.  But even that's a heuristic because it only takes
> in account the CPU speed, not the system speed that can be choked
> due to I/O limitations.

Seems like it would make more sense to have a environment variable that the test 
timeouts are computed off of. Make all the tests use that value. It should be 
fairly simple to grep for the explicit timeout changes and fix those.

> What I'd like to know is if it is really necessary to have fatal
> timeouts.  How often does running the test suite truly hang up
> where the timeout feature gets it unstuck?
> 
> I've found that if my system has taken too long, it's due to a bug
> and the kernel is no longer stable.  However, I don't work on the
> stap translator.  I suspect bugs in it are what causes recoverable
> test hang ups to exist.

It is possible that a probe doesn't fire and cause a systemtap script to exit. 
In that case things could be hung. Really do need to have the explicit timeouts 
to move on. Want to get coverage on the tests. Better to give up on a test 
taking way, way too long, FAIL it, and get the rest of the test run than it is 
to get hung up on that problem test.

-Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: testsuite and hardcoded timeouts
  2007-05-15 22:47           ` Frank Ch. Eigler
  2007-05-16  0:42             ` Quentin Barnes
@ 2007-05-18  0:36             ` Quentin Barnes
  1 sibling, 0 replies; 10+ messages in thread
From: Quentin Barnes @ 2007-05-18  0:36 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: systemtap

On Tue, May 15, 2007 at 06:47:34PM -0400, Frank Ch. Eigler wrote:
>Quentin Barnes <qbarnes@urbana.css.mot.com> writes:
>
>> [...]
>> Ah, maybe there is some middle ground here.  Instead of putting the
>> effort into figuring out some portable method for dynamic timeouts,
>> just change the behavior for a timeout to be user-settable [...]
>
>It can be even easier than that.  dejagnu's "timeout" tcl variable is
>exactly the default timeout duration in seconds.  The .exp files under
>testsuite/config or even testsuite/lib could set this global variable
>based on the "ishost" predicate - leave it for i686, double it for
>s390x, dedicule (!) it for arm.  Then we just need to police the test
>cases to avoid messing with this value.

Ah, ha!  I _finally_ cracked why I was having so many erratic
timeout problems with the test suite on ARM!  It is because of
existing use of "set timeout" in the test suite scripts.

I kept having timeout failures, so I kept bumping up the numbers,
and rerunning that subsection.  The problems would often go away.
Then they'd come back. Then I'd bump the timeout values up higher,
and they'd go away again.  However, when I looked at the timing
information from the tests when they'd finally pass, the numbers I
had bumped them up to made no sense at all.

What was going on is that tests using the TCL procedures "stap_run"
and "stap_run2" would get sporadic timeouts based on what the global
variable "timeout" happened to be set to last.  It is set in the
stap_run2.exp file, but the variable is a global setting.  The next
time it is set by any script anywhere, that becomes the new global
setting that stap_run and stap_run2 would just happen to use, not
the value set in the stap_run2.exp file.

Once I figured out this was the source of the erratic timeout
failures, I was able to tweak timeout values back to something much
more sane and still get tests to pass on ARM.

What I would recommend is that every expect statement that uses a
"timeout" clause inside a TCL procedure _always_ specifies its own
"-timeout" parameter never relying on the global setting.  Would
this be a reasonable change?

Expect statements at the global level outside of procedures can
continue to use the global timeout setting, but even then I'd
recommend against it.

Now that the timeouts for ARM are coming back down to reasonable
values, I think other platforms could live with them too without too
much trouble.  I'm still looking into this though.

I've included an experimental patch that makes the changes I've
recommended with adding "-timeout" parameters (and fixing the $pass4
problem mentioned below).  I have not adjusted any of these values
for ARM though, but kept the timeout's original value for comparison
to the current code base.  Sometimes there was no clear indication
of what the timeout should be, so I put one in that is probably
reasonable.

I also found and fixed what appeared to be an intermittant problem
where tests run individually would occasionally blow up with the
error:
=============
ERROR: can't read "pass4": no such variable
    while executing
"verbose -log "metric:\t$TEST_NAME $pass1$pass2$pass3$pass4$pass5""
=============

I figured out this would happen when the first test run happened to
be in the cache.  Any other time a test with a cached module would
be encountered, it would just happen to work since $pass4 was still
set to some old value.

This bug is because stap_run and stap_run2 rule for detecting cached
references was broken.  When "Pass 3: using cached ..." was matched,
the rule would swallow the "Pass 4: using cached ..." output as well
leaving the variable unset.  This was also a problem in overload.exp.

Note that these changes are very fresh and I couldn't test them
directly on ARM since they are untweaked.  I wouldn't recommend
integrating them into the mainline without testing them on a couple
of different platforms first.


Comments?

>- FChE

Quentin


diff -uprN systemtap-20070512-ref/testsuite/lib/stap_run2.exp systemtap-20070512/testsuite/lib/stap_run2.exp
--- systemtap-20070512-ref/testsuite/lib/stap_run2.exp	2007-05-07 13:31:59.000000000 -0500
+++ systemtap-20070512/testsuite/lib/stap_run2.exp	2007-05-17 18:45:55.000000000 -0500
@@ -8,8 +8,6 @@
 #
 # global result_string must be set to the expected output
 
-set timeout 20
-
 proc stap_run2 { TEST_NAME args } {
   # zap the srcdir prefix
   set test_file_name $TEST_NAME
@@ -28,9 +26,10 @@ proc stap_run2 { TEST_NAME args } {
   eval spawn $cmd
 
   expect {
+    -timeout 20
     -re {^Pass\ ([1234]):[^\r]*\ in\ ([0-9]+)usr/([0-9]+)sys/([0-9]+)real\ ms.\r\n}
     {set pass$expect_out(1,string) "\t$expect_out(2,string)\t$expect_out(3,string)\t$expect_out(4,string)"; exp_continue}
-    -re {^Pass\ ([34]): using cached .+\r\n}
+    -re {^Pass\ ([34]): using cached [^\r]+\r\n}
     {set pass$expect_out(1,string) "\t0\t0\t0"; exp_continue}
     -re {^Pass 5: starting run.\r\n} {exp_continue}
     -ex $output {
diff -uprN systemtap-20070512-ref/testsuite/lib/stap_run.exp systemtap-20070512/testsuite/lib/stap_run.exp
--- systemtap-20070512-ref/testsuite/lib/stap_run.exp	2007-05-07 13:31:59.000000000 -0500
+++ systemtap-20070512/testsuite/lib/stap_run.exp	2007-05-17 18:52:10.000000000 -0500
@@ -31,12 +31,13 @@ proc stap_run { TEST_NAME {LOAD_GEN_FUNC
     }
     eval spawn $cmd
     expect {
+	-timeout 30
 	-re {^Pass\ ([1234]):[^\r]*\ in\ ([0-9]+)usr/([0-9]+)sys/([0-9]+)real\ ms.\r\n}
 	{set pass$expect_out(1,string) "\t$expect_out(2,string)\t$expect_out(3,string)\t$expect_out(4,string)"; exp_continue}
-	-re {^Pass\ ([34]): using cached .+\r\n}
+	-re {^Pass\ ([34]): using cached [^\r]+\r\n}
 	{set pass$expect_out(1,string) "\t0\t0\t0"; exp_continue}
 	-re {^Pass 5: starting run.\r\n} {exp_continue}
-	-timeout 30 -re "^systemtap starting probe\r\n" {
+	-re "^systemtap starting probe\r\n" {
 	    pass "$TEST_NAME startup"
 	    if {$LOAD_GEN_FUNCTION != ""} then {
 		#run the interesting test here
@@ -53,6 +54,7 @@ proc stap_run { TEST_NAME {LOAD_GEN_FUNC
 	    set output "^systemtap ending probe\r\n$OUTPUT_CHECK_STRING$"
 
 	    expect {
+		-timeout 20
 		-re  $output {
 		    pass "$TEST_NAME shutdown and output"
 		    expect {
diff -uprN systemtap-20070512-ref/testsuite/lib/systemtap.exp systemtap-20070512/testsuite/lib/systemtap.exp
--- systemtap-20070512-ref/testsuite/lib/systemtap.exp	2007-05-07 13:31:59.000000000 -0500
+++ systemtap-20070512/testsuite/lib/systemtap.exp	2007-05-17 18:48:21.000000000 -0500
@@ -82,6 +82,7 @@ proc stap_run_batch {args} {
     }
 
     expect { 
+	-timeout -1
         -re {[^\r]*\r} { verbose -log $expect_out(0,string); exp_continue } 
         eof { }
         timeout { exp_continue } 
diff -uprN systemtap-20070512-ref/testsuite/systemtap.base/alternatives.exp systemtap-20070512/testsuite/systemtap.base/alternatives.exp
--- systemtap-20070512-ref/testsuite/systemtap.base/alternatives.exp	2007-05-07 13:31:59.000000000 -0500
+++ systemtap-20070512/testsuite/systemtap.base/alternatives.exp	2007-05-17 18:57:04.000000000 -0500
@@ -26,6 +26,7 @@ proc stap_run_alternatives {args} {
     verbose -log "starting $args"
     eval spawn $args
     expect { 
+	-timeout 20
 	-re {semantic error: .+ \(alternatives: [a-zA-Z_]}
 	    { set alternatives_found 1 }
         -re {[^\r]*\r} { verbose -log $expect_out(0,string); exp_continue } 
diff -uprN systemtap-20070512-ref/testsuite/systemtap.base/overload.exp systemtap-20070512/testsuite/systemtap.base/overload.exp
--- systemtap-20070512-ref/testsuite/systemtap.base/overload.exp	2007-05-07 13:31:59.000000000 -0500
+++ systemtap-20070512/testsuite/systemtap.base/overload.exp	2007-05-17 19:18:30.000000000 -0500
@@ -25,8 +25,9 @@ proc stap_run_overload { TEST_NAME EXPEC
     set cmd [concat {stap -v} $args]
     eval spawn $cmd
     expect {
-	-re {^Pass\ [1234]: .+real\ ms.\r\n} {exp_continue}
-	-re {^Pass\ ([34]): using cached .+\r\n} {exp_continue}
+	-timeout 30
+	-re {^Pass\ [1234]: [^\r]+real\ ms.\r\n} {exp_continue}
+	-re {^Pass\ ([34]): using cached [^\r]+\r\n} {exp_continue}
 	-re {^Pass 5: starting run.\r\n} {exp_continue}
 	-re {ERROR: probe overhead exceeded threshold\r\n} {
 	    if {$EXPECT_OVERLOAD} {
@@ -35,7 +36,7 @@ proc stap_run_overload { TEST_NAME EXPEC
 		fail "$TEST_NAME unexpected overload"
 	    }
 	}
-	-timeout 30 -re "^systemtap starting probe\r\n" {
+	-re "^systemtap starting probe\r\n" {
 	    send "\003"
 
 	    expect {
diff -uprN systemtap-20070512-ref/testsuite/systemtap.maps/foreach_fail.exp systemtap-20070512/testsuite/systemtap.maps/foreach_fail.exp
--- systemtap-20070512-ref/testsuite/systemtap.maps/foreach_fail.exp	2006-10-17 17:10:58.000000000 -0500
+++ systemtap-20070512/testsuite/systemtap.maps/foreach_fail.exp	2007-05-17 18:59:10.000000000 -0500
@@ -7,6 +7,7 @@ if {![installtest_p]} { untested $test; 
 
 spawn stap  $srcdir/$subdir/$test.stp
 expect {
+  -timeout 20
   timeout { 
     fail "$test timed out" }
   eof { 
diff -uprN systemtap-20070512-ref/testsuite/systemtap.stress/conversions.exp systemtap-20070512/testsuite/systemtap.stress/conversions.exp
--- systemtap-20070512-ref/testsuite/systemtap.stress/conversions.exp	2007-05-07 13:31:59.000000000 -0500
+++ systemtap-20070512/testsuite/systemtap.stress/conversions.exp	2007-05-17 19:03:17.000000000 -0500
@@ -9,6 +9,7 @@ foreach value {0 0xffffffff 0xffffffffff
     set errs 0
     verbose -log "exp $test $errs"
     expect {
+	-timeout 20
         -re {ERROR[^\r\n]*\r\n} { incr errs; exp_continue }
         -re {WARNING[^\r\n]*\r\n} { incr errs; exp_continue }
         eof { }
diff -uprN systemtap-20070512-ref/testsuite/systemtap.stress/whitelist.exp systemtap-20070512/testsuite/systemtap.stress/whitelist.exp
--- systemtap-20070512-ref/testsuite/systemtap.stress/whitelist.exp	2007-05-07 13:31:59.000000000 -0500
+++ systemtap-20070512/testsuite/systemtap.stress/whitelist.exp	2007-05-17 19:32:15.000000000 -0500
@@ -301,8 +301,8 @@ proc whitelist_run { TEST_NAME {LOAD_GEN
     catch {eval spawn $cmd}
     set stap_id $spawn_id
     set failed 1
-    set timeout 600
     expect {
+	-timeout 1800
 	-i $stap_id -re {^Pass\ ([1234]):\ [^\r]*\r\n} {
             set error_msg "pass$expect_out(1,string)";
             exp_continue
@@ -313,7 +313,7 @@ proc whitelist_run { TEST_NAME {LOAD_GEN
             send -i $stap_id "\003"
             exp_continue
         }
-        -timeout 1800 -re {Pass\ 5:\ run\ completed} {
+        -re {Pass\ 5:\ run\ completed} {
             set failed 0
         }
 	-re {parse\ error|semantic\ error} { set detail "$expect_out(0,string)" }
@@ -327,7 +327,6 @@ proc whitelist_run { TEST_NAME {LOAD_GEN
 
 proc runbenchs {} {
     global benchs
-    set timeout 900
     set runningcount 0
 
     foreach bench $benchs {
@@ -346,6 +345,7 @@ proc runbenchs {} {
 
     while {$runningcount > 0} {
     	expect {
+		-timeout 900
     		-i $idlist -re {LTP\ Version:\ LTP-([0-9])+\r\n$} {
     			set from $expect_out(spawn_id)
     			lappend benchres($from) $expect_out(buffer)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-05-18  0:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-11 19:14 testsuite and hardcoded timeouts Quentin Barnes
2007-05-11 21:08 ` William Cohen
2007-05-14 16:50   ` David Wilder
2007-05-14 20:43     ` William Cohen
2007-05-14 21:01       ` David Wilder
2007-05-15 22:35         ` Quentin Barnes
2007-05-15 22:47           ` Frank Ch. Eigler
2007-05-16  0:42             ` Quentin Barnes
2007-05-16 19:03               ` William Cohen
2007-05-18  0:36             ` Quentin Barnes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).