Mauve wishlist

public inbox for mauve-discuss@sourceware.org
 help / color / mirror / Atom feed

* Mauve wishlist
@ 2006-03-17 16:27 Thomas Fitzsimmons
  2006-03-17 21:06 ` David Daney
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Thomas Fitzsimmons @ 2006-03-17 16:27 UTC (permalink / raw)
  To: classpath; +Cc: mauve-discuss

Hi,

Anthony Balkissoon has expressed interest in improving Mauve so we'd
like to know what would be the best things to work on.

Here are two items on my list:

- A web reporting facility that generates JAPI-style bar graphs.
Ideally the graphs would represent a code-coverage analysis but I have
no idea how feasible that is using free tools.

- A framework for testing VM invocations and the tools' command-line
interfaces -- in other words, a framework that can exec commands.  This
may be best done as an independent module, separate from Mauve.

There is also lots of room for improvement in how Mauve tests are
selected and run.  I'm hoping someone who better understands Mauve's
design will elaborate.

Tom


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Mauve wishlist
  2006-03-17 16:27 Mauve wishlist Thomas Fitzsimmons
@ 2006-03-17 21:06 ` David Daney
  2006-03-18  8:15   ` Michael Koch
  2006-03-17 22:34 ` Audrius Meskauskas
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: David Daney @ 2006-03-17 21:06 UTC (permalink / raw)
  To: Thomas Fitzsimmons; +Cc: classpath, mauve-discuss

Thomas Fitzsimmons wrote:
> Hi,
> 
> Anthony Balkissoon has expressed interest in improving Mauve so we'd
> like to know what would be the best things to work on.
> 
> Here are two items on my list:
> 
> - A web reporting facility that generates JAPI-style bar graphs.
> Ideally the graphs would represent a code-coverage analysis but I have
> no idea how feasible that is using free tools.
> 
> - A framework for testing VM invocations and the tools' command-line
> interfaces -- in other words, a framework that can exec commands.  This
> may be best done as an independent module, separate from Mauve.
> 
> There is also lots of room for improvement in how Mauve tests are
> selected and run.  I'm hoping someone who better understands Mauve's
> design will elaborate.
> 

I would like to see a way to partition the different tests.  Sometimes 
mauve will not build for gcj (and probably other compilers as well) 
because just a single file has some problem.  It would be nice to still 
be able to run the majority of the tests in this situation.

I also thought about running different groups of tests in their own 
ClassLoader, but cannot really think of a good reason right now to do it.

David Daney

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Mauve wishlist
  2006-03-17 21:06 ` David Daney
@ 2006-03-18  8:15   ` Michael Koch
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Koch @ 2006-03-18  8:15 UTC (permalink / raw)
  To: David Daney; +Cc: Thomas Fitzsimmons, classpath, mauve-discuss

On Fri, Mar 17, 2006 at 01:06:32PM -0800, David Daney wrote:
> Thomas Fitzsimmons wrote:
> >Hi,
> >
> >Anthony Balkissoon has expressed interest in improving Mauve so we'd
> >like to know what would be the best things to work on.
> >
> >Here are two items on my list:
> >
> >- A web reporting facility that generates JAPI-style bar graphs.
> >Ideally the graphs would represent a code-coverage analysis but I have
> >no idea how feasible that is using free tools.
> >
> >- A framework for testing VM invocations and the tools' command-line
> >interfaces -- in other words, a framework that can exec commands.  This
> >may be best done as an independent module, separate from Mauve.
> >
> >There is also lots of room for improvement in how Mauve tests are
> >selected and run.  I'm hoping someone who better understands Mauve's
> >design will elaborate.
> >
> 
> I would like to see a way to partition the different tests.  Sometimes 
> mauve will not build for gcj (and probably other compilers as well) 
> because just a single file has some problem.  It would be nice to still 
> be able to run the majority of the tests in this situation.

That is what batch_run can do today.


Cheers,
Michael
-- 
http://www.worldforge.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Mauve wishlist
  2006-03-17 16:27 Mauve wishlist Thomas Fitzsimmons
  2006-03-17 21:06 ` David Daney
@ 2006-03-17 22:34 ` Audrius Meskauskas
  2006-03-20 10:53 ` Arnaud Vandyck
  2006-03-20 16:51 ` Anthony Balkissoon
  3 siblings, 0 replies; 10+ messages in thread
From: Audrius Meskauskas @ 2006-03-17 22:34 UTC (permalink / raw)
  To: Mauve discuss

1. My largest problem was hangings during the tests. When an error 
report is just an error report, hanging blocks all testing process, 
requiring to remove such test manually.  The problem is difficult to 
solve, but maybe the tests could run in the separate thread, and the 
threads of the normally terminated tests could be reused. JUnit, by the 
way, also has the same problem.
2. Probably the error reports could include the line, where the harness 
check failed. The stack trace of the newly constructed exception could 
be used to get the line information.  The uncaught exception reports 
could also provide more information, not just "uncaught exception", as 
it is now.

Regards

Audrius Meskauskas.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Mauve wishlist
  2006-03-17 16:27 Mauve wishlist Thomas Fitzsimmons
  2006-03-17 21:06 ` David Daney
  2006-03-17 22:34 ` Audrius Meskauskas
@ 2006-03-20 10:53 ` Arnaud Vandyck
  2006-03-20 16:51 ` Anthony Balkissoon
  3 siblings, 0 replies; 10+ messages in thread
From: Arnaud Vandyck @ 2006-03-20 10:53 UTC (permalink / raw)
  To: classpath; +Cc: mauve-discuss

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thomas Fitzsimmons wrote:
> Hi,
[...]
> There is also lots of room for improvement in how Mauve tests are
> selected and run.  I'm hoping someone who better understands Mauve's
> design will elaborate.

I had a very quick look at TestNG[0] and I think it could be a good
approach: no need to change the mauve test classes, just invoke the
tests with TestNG. There is an eclipse plugin that display the running
tests like JUnit.

TestNG use a tag feature like we have in mauve (JDK1.1, JDK1.2, ...) and
we can add other groups like swing, nio, etc. Those tags can be
annotations or javadoc comments (with @ and I think it's processed with
xdoclet).

As I understood (but I did not read all the documentation), the very big
advantage Mauve can take in adopting TestNG (vs JUnit?) is that we
shouldn't have to re-write all the test cases!

[0] http://testng.org/doc/

- --
 Arnaud Vandyck
  ,= ,-_-. =.
 ((_/)o o(\_))
  `-'(. .)`-'
      \_/
Java Trap: http://www.gnu.org/philosophy/java-trap.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEHok24vzFZu62tMIRAmOoAJ9VI7oH+83wQ+FPJhvQjDfPNK+KFACbB5DS
w3SFrZtDNCcjPNGEzJlDQLE=
=7YEE
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Mauve wishlist
  2006-03-17 16:27 Mauve wishlist Thomas Fitzsimmons
                   ` (2 preceding siblings ...)
  2006-03-20 10:53 ` Arnaud Vandyck
@ 2006-03-20 16:51 ` Anthony Balkissoon
  2006-03-21 16:58   ` David Gilbert
  3 siblings, 1 reply; 10+ messages in thread
From: Anthony Balkissoon @ 2006-03-20 16:51 UTC (permalink / raw)
  To: classpath, mauve-discuss

On Fri, 2006-03-17 at 11:32 -0500, Thomas Fitzsimmons wrote:
> Hi,
> 
> Anthony Balkissoon has expressed interest in improving Mauve so we'd
> like to know what would be the best things to work on.
> 

Another suggestion that Tom Fitzsimmons had was to change the way we
count the number of tests.  Counting each invocation of the test()
method rather than each call to harness.check() has two benefits:

1) constant number of tests, regardless of exceptions being thrown or
which if-else branch is taken

2) more realistic number of tests, to accurately reflect the extent of
our testing

For point 1) this will help us see if we are making progress.  Right now
a Mauve run might say we have 113 fails out of 13200 tests and then a
later run could say 200 fails out of 34000 tests.  Is this an
improvement?  Hard to say.  But if we count each call to test() as a
test, and also detect hanging tests, then we should have a constant
number of tests in each run and will be able to say if changes made have
a positive impact on Mauve test results.  Of course, if in one
particular test file there are 1000 calls to harness.check() and only
one of them fails, it's not helpful to just report that the entire test
failed.  So the output will have to pinpoint which call to harness.check
failed (and preferably a line number).  The negative side here is that
the results will be overly pessimistic because any failing harness.check
trumps all the passing harness.check calls and the test is reported as a
failure.

What do people have to say about this idea?

--Tony

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Mauve wishlist
  2006-03-20 16:51 ` Anthony Balkissoon
@ 2006-03-21 16:58   ` David Gilbert
  2006-03-21 22:24     ` Tom Tromey
  2006-03-21 23:08     ` Bryce McKinlay
  0 siblings, 2 replies; 10+ messages in thread
From: David Gilbert @ 2006-03-21 16:58 UTC (permalink / raw)
  To: Anthony Balkissoon; +Cc: classpath, mauve-discuss

Hi,

Anthony Balkissoon wrote:

>On Fri, 2006-03-17 at 11:32 -0500, Thomas Fitzsimmons wrote:
>  
>
>>Hi,
>>
>>Anthony Balkissoon has expressed interest in improving Mauve so we'd
>>like to know what would be the best things to work on.
>>
>>    
>>
>
>Another suggestion that Tom Fitzsimmons had was to change the way we
>count the number of tests.  Counting each invocation of the test()
>method rather than each call to harness.check() has two benefits:
>  
>
I think that would be a backward step (I like the detail that Mauve 
provides, especially when testing on subsets while developing on GNU 
Classpath). 

On the other hand, you can achieve this result without losing the 
current detail - for example, see my recent JUnit patch (not committed 
yet) - it effectively gives a pass/fail per test() call when you run via 
JUnit, without losing the ability to run in the usual Mauve way 
(counting check() results).

>1) constant number of tests, regardless of exceptions being thrown or
>which if-else branch is taken
>  
>
Mauve does have a design flaw where it can be tricky to automatically 
assign a unique identifier to each check(), and this makes it hard to 
compare two Mauve runs (say a test of the latest Classpath CVS vs the 
last release, or the Classpath vs JDK 1.5 - both of which would be 
interesting).

We can work around that by ensuring that all the tests run linearly (no 
if-else branches - I've written a large number of tests this way and not 
found it to be a limitation, but I don't know what lurks in the depths 
of the older Mauve tests). 

There is still the problem that an exception being thrown during a test 
means some checks don't get run, but a new Mauve comparison report (not 
yet developed, although I've done a little experimenting with it) could 
highlight those.

>2) more realistic number of tests, to accurately reflect the extent of
>our testing
>  
>
I think the absolute number is meaningless however you count the tests, 
so I don't see this as an advantage.  Test coverage reports are what we 
need to get some insight into the extent of our testing.

>For point 1) this will help us see if we are making progress.  Right now
>a Mauve run might say we have 113 fails out of 13200 tests and then a
>later run could say 200 fails out of 34000 tests.  Is this an
>improvement?  Hard to say.  
>
I have done a little bit of work on a comparison report to show the 
differences between two runs of the same set of Mauve tests, classifying 
them as follows:

Type 1 (Normal):  Passes on run A and B;
Type 2 (Regression):   Passes on run A, fails on run B;
Type 3 (Improvement):  Fails on run A, passes on run B;
Type 4 (Bad): Fails on run A, fails on run B;

In a comparison of JDK1.5 vs Classpath, Type 4 hints that the check is 
buggy.  This is a work in progress, and I don't have any code to show 
anyone yet, but it is an approach that I think can be made to work.

To make it work, each check has to be uniquely identified - I did this 
using the checkpoint and check index within a test(), so here it is 
important that if-else branches in the tests can't result in checks 
being skipped.  This is the case for most of the javax.swing.* tests, 
but I can't speak for some of the older Mauve tests.

>But if we count each call to test() as a
>test, and also detect hanging tests, then we should have a constant
>number of tests in each run and will be able to say if changes made have
>a positive impact on Mauve test results.  
>
You'll lose the ability to distinguish between an existing failure where 
(say) 1 out of 72 checks fail, and after some clever patch 43 out of 72 
checks fail, but the new system reports both as 1 test failure.

Regards,

Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Mauve wishlist
  2006-03-21 16:58   ` David Gilbert
@ 2006-03-21 22:24     ` Tom Tromey
  2006-03-21 23:08     ` Bryce McKinlay
  1 sibling, 0 replies; 10+ messages in thread
From: Tom Tromey @ 2006-03-21 22:24 UTC (permalink / raw)
  To: David Gilbert; +Cc: Anthony Balkissoon, classpath, mauve-discuss

>>>>> "David" == David Gilbert <david.gilbert@object-refinery.com> writes:

>> Another suggestion that Tom Fitzsimmons had was to change the way we
>> count the number of tests.  Counting each invocation of the test()
>> method rather than each call to harness.check() has two benefits:

David> We can work around that by ensuring that all the tests run linearly
David> (no if-else branches - I've written a large number of tests this way
David> and not found it to be a limitation, but I don't know what lurks in
David> the depths of the older Mauve tests). There is still the problem that
David> an exception being thrown during a test means some checks don't get
David> run, but a new Mauve comparison report (not yet developed, although
David> I've done a little experimenting with it) could highlight those.

I've always tried to write tests the way you suggest, but the
exception problem turns out to be a real one, preventing test
stability in some cases.

One thing I like about this current proposal is that it automates test
stability -- the only failure modes possible are if a test hangs or if
the VM crashes.

As far as having more granular information -- we can still print a
message when a check() fails.  A command line option to the test
harness could control this, for instance.  I think we don't want to
just print a plain 'FAIL', we want some explanation; the detailed info
could go there.

Tom

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Mauve wishlist
  2006-03-21 16:58   ` David Gilbert
  2006-03-21 22:24     ` Tom Tromey
@ 2006-03-21 23:08     ` Bryce McKinlay
  2006-03-22 11:12       ` David Gilbert
  1 sibling, 1 reply; 10+ messages in thread
From: Bryce McKinlay @ 2006-03-21 23:08 UTC (permalink / raw)
  To: David Gilbert; +Cc: Anthony Balkissoon, classpath, mauve-discuss

David Gilbert wrote:
> 1) constant number of tests, regardless of exceptions being thrown or
> Mauve does have a design flaw where it can be tricky to automatically 
> assign a unique identifier to each check(), and this makes it hard to 
> compare two Mauve runs (say a test of the latest Classpath CVS vs the 
> last release, or the Classpath vs JDK 1.5 - both of which would be 
> interesting).

Right. We all understand the problem - its just the solution is what we 
need to agree on :)
> I think the absolute number is meaningless however you count the 
> tests, so I don't see this as an advantage. 

Yes, numbers alone are meaningless, but with the current design, all the 
results are meaningless without a lot of context. The real issue is 
having a simple way to uniquely identify each test case for the purpose 
of identifying regressions. This becomes fundamentally much easier when 
1 test() method corresponds to one test case.

It is not reasonable to expect test case developers ensure that all 
tests "run linearly". Exceptions can potentially be thrown at any time, 
so to ensure linearity, every check() call would need wrapped with a 
try/catch.

> You'll lose the ability to distinguish between an existing failure 
> where (say) 1 out of 72 checks fail, and after some clever patch 43 
> out of 72 checks fail, but the new system reports both as 1 test failure.

This is a valid concern. However, we will still track exactly which 
check() calls fail so that in the event of a test failure, a full 
diagnostic can be provided. In addition, we can still count the total 
number of check() calls executed, for statistical purposes.

If the reduced test-case granularity does prove to be problematic in 
some cases - say some test() where a small number of checks fail for 
cases that are "hard to fix" problems and thus should be "xfailed" (to 
use gcc/dejagnu lingo), then that test case should probably be split. 
Alternatively, to avoid splitting things across multiple test classes, 
we could add JUnit-style support for multiple test() methods in a single 
test class.

Bryce

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Mauve wishlist
  2006-03-21 23:08     ` Bryce McKinlay
@ 2006-03-22 11:12       ` David Gilbert
  0 siblings, 0 replies; 10+ messages in thread
From: David Gilbert @ 2006-03-22 11:12 UTC (permalink / raw)
  To: Bryce McKinlay; +Cc: Anthony Balkissoon, classpath, mauve-discuss

Bryce McKinlay wrote:

>
> It is not reasonable to expect test case developers ensure that all 
> tests "run linearly". Exceptions can potentially be thrown at any 
> time, so to ensure linearity, every check() call would need wrapped 
> with a try/catch.
>
If there is a linear sequence of checks in a test() method, and an 
(unexpected) exception causes the test() method to exit early (after, 
say, 3 checks out of 10), I don't consider that to be non-linear.  If we 
are comparing run A to run B, and 10 checks complete in run A, but only 
3 checks complete in run B, we can safely assume that checks 4 to 10 
were not completed in run B, and report that.

The majority of test() methods in Mauve are written that way, so I don't 
think it is an unreasonable requirement, especially if it means we can 
develop better comparison/regression reporting on top of the existing 
TestHarness.

Regards,

Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-03-22 11:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-17 16:27 Mauve wishlist Thomas Fitzsimmons
2006-03-17 21:06 ` David Daney
2006-03-18  8:15   ` Michael Koch
2006-03-17 22:34 ` Audrius Meskauskas
2006-03-20 10:53 ` Arnaud Vandyck
2006-03-20 16:51 ` Anthony Balkissoon
2006-03-21 16:58   ` David Gilbert
2006-03-21 22:24     ` Tom Tromey
2006-03-21 23:08     ` Bryce McKinlay
2006-03-22 11:12       ` David Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).