Re: Mauve wishlist - David Gilbert

public inbox for mauve-discuss@sourceware.org
 help / color / mirror / Atom feed

From: David Gilbert <david.gilbert@object-refinery.com>
To: Anthony Balkissoon <abalkiss@redhat.com>
Cc: classpath@gnu.org, mauve-discuss@sources.redhat.com
Subject: Re: Mauve wishlist
Date: Tue, 21 Mar 2006 16:58:00 -0000	[thread overview]
Message-ID: <4420336F.4070602@object-refinery.com> (raw)
In-Reply-To: <1142873502.3112.16.camel@tony.toronto.redhat.com>

Hi,

Anthony Balkissoon wrote:

>On Fri, 2006-03-17 at 11:32 -0500, Thomas Fitzsimmons wrote:
>  
>
>>Hi,
>>
>>Anthony Balkissoon has expressed interest in improving Mauve so we'd
>>like to know what would be the best things to work on.
>>
>>    
>>
>
>Another suggestion that Tom Fitzsimmons had was to change the way we
>count the number of tests.  Counting each invocation of the test()
>method rather than each call to harness.check() has two benefits:
>  
>
I think that would be a backward step (I like the detail that Mauve 
provides, especially when testing on subsets while developing on GNU 
Classpath). 

On the other hand, you can achieve this result without losing the 
current detail - for example, see my recent JUnit patch (not committed 
yet) - it effectively gives a pass/fail per test() call when you run via 
JUnit, without losing the ability to run in the usual Mauve way 
(counting check() results).

>1) constant number of tests, regardless of exceptions being thrown or
>which if-else branch is taken
>  
>
Mauve does have a design flaw where it can be tricky to automatically 
assign a unique identifier to each check(), and this makes it hard to 
compare two Mauve runs (say a test of the latest Classpath CVS vs the 
last release, or the Classpath vs JDK 1.5 - both of which would be 
interesting).

We can work around that by ensuring that all the tests run linearly (no 
if-else branches - I've written a large number of tests this way and not 
found it to be a limitation, but I don't know what lurks in the depths 
of the older Mauve tests). 

There is still the problem that an exception being thrown during a test 
means some checks don't get run, but a new Mauve comparison report (not 
yet developed, although I've done a little experimenting with it) could 
highlight those.

>2) more realistic number of tests, to accurately reflect the extent of
>our testing
>  
>
I think the absolute number is meaningless however you count the tests, 
so I don't see this as an advantage.  Test coverage reports are what we 
need to get some insight into the extent of our testing.

>For point 1) this will help us see if we are making progress.  Right now
>a Mauve run might say we have 113 fails out of 13200 tests and then a
>later run could say 200 fails out of 34000 tests.  Is this an
>improvement?  Hard to say.  
>
I have done a little bit of work on a comparison report to show the 
differences between two runs of the same set of Mauve tests, classifying 
them as follows:

Type 1 (Normal):  Passes on run A and B;
Type 2 (Regression):   Passes on run A, fails on run B;
Type 3 (Improvement):  Fails on run A, passes on run B;
Type 4 (Bad): Fails on run A, fails on run B;

In a comparison of JDK1.5 vs Classpath, Type 4 hints that the check is 
buggy.  This is a work in progress, and I don't have any code to show 
anyone yet, but it is an approach that I think can be made to work.

To make it work, each check has to be uniquely identified - I did this 
using the checkpoint and check index within a test(), so here it is 
important that if-else branches in the tests can't result in checks 
being skipped.  This is the case for most of the javax.swing.* tests, 
but I can't speak for some of the older Mauve tests.

>But if we count each call to test() as a
>test, and also detect hanging tests, then we should have a constant
>number of tests in each run and will be able to say if changes made have
>a positive impact on Mauve test results.  
>
You'll lose the ability to distinguish between an existing failure where 
(say) 1 out of 72 checks fail, and after some clever patch 43 out of 72 
checks fail, but the new system reports both as 1 test failure.

Regards,

Dave

next prev parent reply	other threads:[~2006-03-21 16:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-17 16:27 Thomas Fitzsimmons
2006-03-17 21:06 ` David Daney
2006-03-18  8:15   ` Michael Koch
2006-03-17 22:34 ` Audrius Meskauskas
2006-03-20 10:53 ` Arnaud Vandyck
2006-03-20 16:51 ` Anthony Balkissoon
2006-03-21 16:58   ` David Gilbert [this message]
2006-03-21 22:24     ` Tom Tromey
2006-03-21 23:08     ` Bryce McKinlay
2006-03-22 11:12       ` David Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4420336F.4070602@object-refinery.com \
    --to=david.gilbert@object-refinery.com \
    --cc=abalkiss@redhat.com \
    --cc=classpath@gnu.org \
    --cc=mauve-discuss@sources.redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).