* Mauve wishlist @ 2006-03-17 16:27 Thomas Fitzsimmons 2006-03-17 21:06 ` David Daney ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Thomas Fitzsimmons @ 2006-03-17 16:27 UTC (permalink / raw) To: classpath; +Cc: mauve-discuss Hi, Anthony Balkissoon has expressed interest in improving Mauve so we'd like to know what would be the best things to work on. Here are two items on my list: - A web reporting facility that generates JAPI-style bar graphs. Ideally the graphs would represent a code-coverage analysis but I have no idea how feasible that is using free tools. - A framework for testing VM invocations and the tools' command-line interfaces -- in other words, a framework that can exec commands. This may be best done as an independent module, separate from Mauve. There is also lots of room for improvement in how Mauve tests are selected and run. I'm hoping someone who better understands Mauve's design will elaborate. Tom ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Mauve wishlist 2006-03-17 16:27 Mauve wishlist Thomas Fitzsimmons @ 2006-03-17 21:06 ` David Daney 2006-03-18 8:15 ` Michael Koch 2006-03-17 22:34 ` Audrius Meskauskas ` (2 subsequent siblings) 3 siblings, 1 reply; 10+ messages in thread From: David Daney @ 2006-03-17 21:06 UTC (permalink / raw) To: Thomas Fitzsimmons; +Cc: classpath, mauve-discuss Thomas Fitzsimmons wrote: > Hi, > > Anthony Balkissoon has expressed interest in improving Mauve so we'd > like to know what would be the best things to work on. > > Here are two items on my list: > > - A web reporting facility that generates JAPI-style bar graphs. > Ideally the graphs would represent a code-coverage analysis but I have > no idea how feasible that is using free tools. > > - A framework for testing VM invocations and the tools' command-line > interfaces -- in other words, a framework that can exec commands. This > may be best done as an independent module, separate from Mauve. > > There is also lots of room for improvement in how Mauve tests are > selected and run. I'm hoping someone who better understands Mauve's > design will elaborate. > I would like to see a way to partition the different tests. Sometimes mauve will not build for gcj (and probably other compilers as well) because just a single file has some problem. It would be nice to still be able to run the majority of the tests in this situation. I also thought about running different groups of tests in their own ClassLoader, but cannot really think of a good reason right now to do it. David Daney ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Mauve wishlist 2006-03-17 21:06 ` David Daney @ 2006-03-18 8:15 ` Michael Koch 0 siblings, 0 replies; 10+ messages in thread From: Michael Koch @ 2006-03-18 8:15 UTC (permalink / raw) To: David Daney; +Cc: Thomas Fitzsimmons, classpath, mauve-discuss On Fri, Mar 17, 2006 at 01:06:32PM -0800, David Daney wrote: > Thomas Fitzsimmons wrote: > >Hi, > > > >Anthony Balkissoon has expressed interest in improving Mauve so we'd > >like to know what would be the best things to work on. > > > >Here are two items on my list: > > > >- A web reporting facility that generates JAPI-style bar graphs. > >Ideally the graphs would represent a code-coverage analysis but I have > >no idea how feasible that is using free tools. > > > >- A framework for testing VM invocations and the tools' command-line > >interfaces -- in other words, a framework that can exec commands. This > >may be best done as an independent module, separate from Mauve. > > > >There is also lots of room for improvement in how Mauve tests are > >selected and run. I'm hoping someone who better understands Mauve's > >design will elaborate. > > > > I would like to see a way to partition the different tests. Sometimes > mauve will not build for gcj (and probably other compilers as well) > because just a single file has some problem. It would be nice to still > be able to run the majority of the tests in this situation. That is what batch_run can do today. Cheers, Michael -- http://www.worldforge.org/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Mauve wishlist 2006-03-17 16:27 Mauve wishlist Thomas Fitzsimmons 2006-03-17 21:06 ` David Daney @ 2006-03-17 22:34 ` Audrius Meskauskas 2006-03-20 10:53 ` Arnaud Vandyck 2006-03-20 16:51 ` Anthony Balkissoon 3 siblings, 0 replies; 10+ messages in thread From: Audrius Meskauskas @ 2006-03-17 22:34 UTC (permalink / raw) To: Mauve discuss 1. My largest problem was hangings during the tests. When an error report is just an error report, hanging blocks all testing process, requiring to remove such test manually. The problem is difficult to solve, but maybe the tests could run in the separate thread, and the threads of the normally terminated tests could be reused. JUnit, by the way, also has the same problem. 2. Probably the error reports could include the line, where the harness check failed. The stack trace of the newly constructed exception could be used to get the line information. The uncaught exception reports could also provide more information, not just "uncaught exception", as it is now. Regards Audrius Meskauskas. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Mauve wishlist 2006-03-17 16:27 Mauve wishlist Thomas Fitzsimmons 2006-03-17 21:06 ` David Daney 2006-03-17 22:34 ` Audrius Meskauskas @ 2006-03-20 10:53 ` Arnaud Vandyck 2006-03-20 16:51 ` Anthony Balkissoon 3 siblings, 0 replies; 10+ messages in thread From: Arnaud Vandyck @ 2006-03-20 10:53 UTC (permalink / raw) To: classpath; +Cc: mauve-discuss -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thomas Fitzsimmons wrote: > Hi, [...] > There is also lots of room for improvement in how Mauve tests are > selected and run. I'm hoping someone who better understands Mauve's > design will elaborate. I had a very quick look at TestNG[0] and I think it could be a good approach: no need to change the mauve test classes, just invoke the tests with TestNG. There is an eclipse plugin that display the running tests like JUnit. TestNG use a tag feature like we have in mauve (JDK1.1, JDK1.2, ...) and we can add other groups like swing, nio, etc. Those tags can be annotations or javadoc comments (with @ and I think it's processed with xdoclet). As I understood (but I did not read all the documentation), the very big advantage Mauve can take in adopting TestNG (vs JUnit?) is that we shouldn't have to re-write all the test cases! [0] http://testng.org/doc/ - -- Arnaud Vandyck ,= ,-_-. =. ((_/)o o(\_)) `-'(. .)`-' \_/ Java Trap: http://www.gnu.org/philosophy/java-trap.html -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFEHok24vzFZu62tMIRAmOoAJ9VI7oH+83wQ+FPJhvQjDfPNK+KFACbB5DS w3SFrZtDNCcjPNGEzJlDQLE= =7YEE -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Mauve wishlist 2006-03-17 16:27 Mauve wishlist Thomas Fitzsimmons ` (2 preceding siblings ...) 2006-03-20 10:53 ` Arnaud Vandyck @ 2006-03-20 16:51 ` Anthony Balkissoon 2006-03-21 16:58 ` David Gilbert 3 siblings, 1 reply; 10+ messages in thread From: Anthony Balkissoon @ 2006-03-20 16:51 UTC (permalink / raw) To: classpath, mauve-discuss On Fri, 2006-03-17 at 11:32 -0500, Thomas Fitzsimmons wrote: > Hi, > > Anthony Balkissoon has expressed interest in improving Mauve so we'd > like to know what would be the best things to work on. > Another suggestion that Tom Fitzsimmons had was to change the way we count the number of tests. Counting each invocation of the test() method rather than each call to harness.check() has two benefits: 1) constant number of tests, regardless of exceptions being thrown or which if-else branch is taken 2) more realistic number of tests, to accurately reflect the extent of our testing For point 1) this will help us see if we are making progress. Right now a Mauve run might say we have 113 fails out of 13200 tests and then a later run could say 200 fails out of 34000 tests. Is this an improvement? Hard to say. But if we count each call to test() as a test, and also detect hanging tests, then we should have a constant number of tests in each run and will be able to say if changes made have a positive impact on Mauve test results. Of course, if in one particular test file there are 1000 calls to harness.check() and only one of them fails, it's not helpful to just report that the entire test failed. So the output will have to pinpoint which call to harness.check failed (and preferably a line number). The negative side here is that the results will be overly pessimistic because any failing harness.check trumps all the passing harness.check calls and the test is reported as a failure. What do people have to say about this idea? --Tony ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Mauve wishlist 2006-03-20 16:51 ` Anthony Balkissoon @ 2006-03-21 16:58 ` David Gilbert 2006-03-21 22:24 ` Tom Tromey 2006-03-21 23:08 ` Bryce McKinlay 0 siblings, 2 replies; 10+ messages in thread From: David Gilbert @ 2006-03-21 16:58 UTC (permalink / raw) To: Anthony Balkissoon; +Cc: classpath, mauve-discuss Hi, Anthony Balkissoon wrote: >On Fri, 2006-03-17 at 11:32 -0500, Thomas Fitzsimmons wrote: > > >>Hi, >> >>Anthony Balkissoon has expressed interest in improving Mauve so we'd >>like to know what would be the best things to work on. >> >> >> > >Another suggestion that Tom Fitzsimmons had was to change the way we >count the number of tests. Counting each invocation of the test() >method rather than each call to harness.check() has two benefits: > > I think that would be a backward step (I like the detail that Mauve provides, especially when testing on subsets while developing on GNU Classpath). On the other hand, you can achieve this result without losing the current detail - for example, see my recent JUnit patch (not committed yet) - it effectively gives a pass/fail per test() call when you run via JUnit, without losing the ability to run in the usual Mauve way (counting check() results). >1) constant number of tests, regardless of exceptions being thrown or >which if-else branch is taken > > Mauve does have a design flaw where it can be tricky to automatically assign a unique identifier to each check(), and this makes it hard to compare two Mauve runs (say a test of the latest Classpath CVS vs the last release, or the Classpath vs JDK 1.5 - both of which would be interesting). We can work around that by ensuring that all the tests run linearly (no if-else branches - I've written a large number of tests this way and not found it to be a limitation, but I don't know what lurks in the depths of the older Mauve tests). There is still the problem that an exception being thrown during a test means some checks don't get run, but a new Mauve comparison report (not yet developed, although I've done a little experimenting with it) could highlight those. >2) more realistic number of tests, to accurately reflect the extent of >our testing > > I think the absolute number is meaningless however you count the tests, so I don't see this as an advantage. Test coverage reports are what we need to get some insight into the extent of our testing. >For point 1) this will help us see if we are making progress. Right now >a Mauve run might say we have 113 fails out of 13200 tests and then a >later run could say 200 fails out of 34000 tests. Is this an >improvement? Hard to say. > I have done a little bit of work on a comparison report to show the differences between two runs of the same set of Mauve tests, classifying them as follows: Type 1 (Normal): Passes on run A and B; Type 2 (Regression): Passes on run A, fails on run B; Type 3 (Improvement): Fails on run A, passes on run B; Type 4 (Bad): Fails on run A, fails on run B; In a comparison of JDK1.5 vs Classpath, Type 4 hints that the check is buggy. This is a work in progress, and I don't have any code to show anyone yet, but it is an approach that I think can be made to work. To make it work, each check has to be uniquely identified - I did this using the checkpoint and check index within a test(), so here it is important that if-else branches in the tests can't result in checks being skipped. This is the case for most of the javax.swing.* tests, but I can't speak for some of the older Mauve tests. >But if we count each call to test() as a >test, and also detect hanging tests, then we should have a constant >number of tests in each run and will be able to say if changes made have >a positive impact on Mauve test results. > You'll lose the ability to distinguish between an existing failure where (say) 1 out of 72 checks fail, and after some clever patch 43 out of 72 checks fail, but the new system reports both as 1 test failure. Regards, Dave ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Mauve wishlist 2006-03-21 16:58 ` David Gilbert @ 2006-03-21 22:24 ` Tom Tromey 2006-03-21 23:08 ` Bryce McKinlay 1 sibling, 0 replies; 10+ messages in thread From: Tom Tromey @ 2006-03-21 22:24 UTC (permalink / raw) To: David Gilbert; +Cc: Anthony Balkissoon, classpath, mauve-discuss >>>>> "David" == David Gilbert <david.gilbert@object-refinery.com> writes: >> Another suggestion that Tom Fitzsimmons had was to change the way we >> count the number of tests. Counting each invocation of the test() >> method rather than each call to harness.check() has two benefits: David> We can work around that by ensuring that all the tests run linearly David> (no if-else branches - I've written a large number of tests this way David> and not found it to be a limitation, but I don't know what lurks in David> the depths of the older Mauve tests). There is still the problem that David> an exception being thrown during a test means some checks don't get David> run, but a new Mauve comparison report (not yet developed, although David> I've done a little experimenting with it) could highlight those. I've always tried to write tests the way you suggest, but the exception problem turns out to be a real one, preventing test stability in some cases. One thing I like about this current proposal is that it automates test stability -- the only failure modes possible are if a test hangs or if the VM crashes. As far as having more granular information -- we can still print a message when a check() fails. A command line option to the test harness could control this, for instance. I think we don't want to just print a plain 'FAIL', we want some explanation; the detailed info could go there. Tom ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Mauve wishlist 2006-03-21 16:58 ` David Gilbert 2006-03-21 22:24 ` Tom Tromey @ 2006-03-21 23:08 ` Bryce McKinlay 2006-03-22 11:12 ` David Gilbert 1 sibling, 1 reply; 10+ messages in thread From: Bryce McKinlay @ 2006-03-21 23:08 UTC (permalink / raw) To: David Gilbert; +Cc: Anthony Balkissoon, classpath, mauve-discuss David Gilbert wrote: > 1) constant number of tests, regardless of exceptions being thrown or > Mauve does have a design flaw where it can be tricky to automatically > assign a unique identifier to each check(), and this makes it hard to > compare two Mauve runs (say a test of the latest Classpath CVS vs the > last release, or the Classpath vs JDK 1.5 - both of which would be > interesting). Right. We all understand the problem - its just the solution is what we need to agree on :) > I think the absolute number is meaningless however you count the > tests, so I don't see this as an advantage. Yes, numbers alone are meaningless, but with the current design, all the results are meaningless without a lot of context. The real issue is having a simple way to uniquely identify each test case for the purpose of identifying regressions. This becomes fundamentally much easier when 1 test() method corresponds to one test case. It is not reasonable to expect test case developers ensure that all tests "run linearly". Exceptions can potentially be thrown at any time, so to ensure linearity, every check() call would need wrapped with a try/catch. > You'll lose the ability to distinguish between an existing failure > where (say) 1 out of 72 checks fail, and after some clever patch 43 > out of 72 checks fail, but the new system reports both as 1 test failure. This is a valid concern. However, we will still track exactly which check() calls fail so that in the event of a test failure, a full diagnostic can be provided. In addition, we can still count the total number of check() calls executed, for statistical purposes. If the reduced test-case granularity does prove to be problematic in some cases - say some test() where a small number of checks fail for cases that are "hard to fix" problems and thus should be "xfailed" (to use gcc/dejagnu lingo), then that test case should probably be split. Alternatively, to avoid splitting things across multiple test classes, we could add JUnit-style support for multiple test() methods in a single test class. Bryce ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Mauve wishlist 2006-03-21 23:08 ` Bryce McKinlay @ 2006-03-22 11:12 ` David Gilbert 0 siblings, 0 replies; 10+ messages in thread From: David Gilbert @ 2006-03-22 11:12 UTC (permalink / raw) To: Bryce McKinlay; +Cc: Anthony Balkissoon, classpath, mauve-discuss Bryce McKinlay wrote: > > It is not reasonable to expect test case developers ensure that all > tests "run linearly". Exceptions can potentially be thrown at any > time, so to ensure linearity, every check() call would need wrapped > with a try/catch. > If there is a linear sequence of checks in a test() method, and an (unexpected) exception causes the test() method to exit early (after, say, 3 checks out of 10), I don't consider that to be non-linear. If we are comparing run A to run B, and 10 checks complete in run A, but only 3 checks complete in run B, we can safely assume that checks 4 to 10 were not completed in run B, and report that. The majority of test() methods in Mauve are written that way, so I don't think it is an unreasonable requirement, especially if it means we can develop better comparison/regression reporting on top of the existing TestHarness. Regards, Dave ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-03-22 11:12 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-03-17 16:27 Mauve wishlist Thomas Fitzsimmons 2006-03-17 21:06 ` David Daney 2006-03-18 8:15 ` Michael Koch 2006-03-17 22:34 ` Audrius Meskauskas 2006-03-20 10:53 ` Arnaud Vandyck 2006-03-20 16:51 ` Anthony Balkissoon 2006-03-21 16:58 ` David Gilbert 2006-03-21 22:24 ` Tom Tromey 2006-03-21 23:08 ` Bryce McKinlay 2006-03-22 11:12 ` David Gilbert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).