towards zero FAIL for make installcheck

public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed

* towards zero FAIL for make installcheck
@ 2011-12-01 14:14 Mark Wielaard
  2011-12-01 19:01 ` Dave Brolley
  0 siblings, 1 reply; 3+ messages in thread
From: Mark Wielaard @ 2011-12-01 14:14 UTC (permalink / raw)
  To: systemtap

Hi,

Today I was happily surprised by the following test results:

Host: Linux toonder.wildebeest.org 3.1.2-1.fc16.x86_64 #1 SMP Tue Nov 22
09:00:57 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
Snapshot: version 1.7/0.152 commit release-1.6-522-g6ba4d29
GCC: 4.6.2 [gcc (GCC) 4.6.2 20111027 (Red Hat 4.6.2-1)]
Distro: Fedora release 16 (Verne)

# of expected passes            3124
# of unexpected failures        1
# of unexpected successes       8
# of expected failures          245
# of unknown successes          1
# of known failures             42
# of untested testcases         958
# of unsupported tests          2

So only one FAIL was reported for this run:
FAIL: semok/thirtynine.stp

and 8 unexpected successes were reported:
XPASS: semko/nodwf01.stp
XPASS: semko/nodwf02.stp
XPASS: semko/nodwf03.stp
XPASS: semko/nodwf04.stp
XPASS: semko/nodwf05.stp
XPASS: semko/nodwf06.stp
XPASS: semko/nodwf07.stp
XPASS: semko/nodwf09.stp

Which I haven't investigated yet. But just 9 things that could be
serious issues is a good thing (at least compared to a few weeks ago
when we would have tens of such issues).

In the unexpected good news there was one unknown success:
KPASS: cast-scope-m32 (PRMS 13420)

Which I know does fail on some other architectures.

As you can see the low number of FAILs (unexpected failures) is
compensated by a high number of KFAILs (known failures) and UNTESTED
(untested testcases). The idea behind that is that we would like to see
FAILs only for things that used to PASS and through some regression now
start FAILing. So, if you hack a bit, run make installcheck and see some
FAILs you know you should investigate them.

When you write new tests, or fix some old tests, please follow these
rough guidelines:

- PASS (expected pass), use pass "<testname> <specifics>" to indicate
  that something worked as expected.
- FAIL (unexpected failure), use fail "<testname> <specifics>" to
  indicate that something unexpectedly didn't work as expected.
- XFAIL (expected failure), use xfail "<testname> <specifics>" or
  setup_xfail "<arch-triplet>" followed by a normal pass/fail, to
  indicate something is expected to fail (so this isn't something bad,
  not unexpected, but often it makes sense to invert the test and just
  use PASS).
- XPASS (unexpected success), this is generated when you use setup_xfail
  "<arch-triplet>" and then the test results in a pass "<testname>
  <specifics>". This indicates a problem. The test was expected to
  XFAIL, but unexpectedly passed instead (so this is something bad and
  unexpected, should not happen).
- KFAIL (known failure), use kfail "<testname> <specifics>" or
  setup_kfail "<arch-triplet> <bug number>" followed by a normal
  pass/fail, to indicate something is known to fail and has a
  corresponding bug number in systemtap bugzilla on sourceware.org.
  (so this is something bad, but we know about it and should fix it,
   the bug report will contain more information why it is currently
   failing.)
- KPASS (unknown success), this is generated when you use setup_kfail
  and then the test results in a pass. This indicates that a bug might
  have been fixed (or you were just lucky). Check the corresponding
  bug report to see if this test should really just pass now or whether
  it is just dumb luck this time.
- UNTESTED (untested testcase), use untested "<testname> <specifics>" to
  indicate that the test could have been run, but wasn't because
  something was missing to complete the test (for example a previous
  test failed on which this test depended).
- UNSUPPORTED (unsupported testcase), use unsupported "<testname>
  <specifics>" to indicate that the test just isn't supported on this
  setup (for example if it is for a syscall not available on this
  architecture).

Hope these guidelines make a little sense and will help us to spot
regressions earlier. If we can make sure that the unexpected failures
are zero (or very low) it will be much easier to be sure you patches are
correct (or at least don't introduce regressions).

Cheers,

Mark

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: towards zero FAIL for make installcheck
  2011-12-01 14:14 towards zero FAIL for make installcheck Mark Wielaard
@ 2011-12-01 19:01 ` Dave Brolley
  2011-12-01 19:06   ` Mark Wielaard
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Brolley @ 2011-12-01 19:01 UTC (permalink / raw)
  To: systemtap

On 12/01/2011 06:57 AM, Mark Wielaard wrote:
> As you can see the low number of FAILs (unexpected failures) is
> compensated by a high number of KFAILs (known failures) and UNTESTED
The recent spike in the number of UNTESTED was me reducing the number of 
tests run by the unprivileged_embedded_C.exp test. I simply changed it 
to test on a fraction of the tapset functions in our library and to 
report the rest as UNTESTED. The test still covers all variants of no 
embedded C/unprivileged/myproc-unprivileged.

It was an arbitrary decision on my part to report the skipped functions 
as UNTESTED. If it is preferred, I could change it to simply ignore them.

Dave

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: towards zero FAIL for make installcheck
  2011-12-01 19:01 ` Dave Brolley
@ 2011-12-01 19:06   ` Mark Wielaard
  0 siblings, 0 replies; 3+ messages in thread
From: Mark Wielaard @ 2011-12-01 19:06 UTC (permalink / raw)
  To: Dave Brolley; +Cc: systemtap

On Thu, 2011-12-01 at 10:41 -0500, Dave Brolley wrote:
> On 12/01/2011 06:57 AM, Mark Wielaard wrote:
> > As you can see the low number of FAILs (unexpected failures) is
> > compensated by a high number of KFAILs (known failures) and UNTESTED
> The recent spike in the number of UNTESTED was me reducing the number of 
> tests run by the unprivileged_embedded_C.exp test. I simply changed it 
> to test on a fraction of the tapset functions in our library and to 
> report the rest as UNTESTED. The test still covers all variants of no 
> embedded C/unprivileged/myproc-unprivileged.
> 
> It was an arbitrary decision on my part to report the skipped functions 
> as UNTESTED. If it is preferred, I could change it to simply ignore them.

How exactly "would be nice to run" tests fit it in is somewhat unclear.
In theory the difference between UNTESTED and UNSUPPORTED is that the
first could be made to work (but some setup or an earlier test failure
prevented it) and the later means the test never can be run successfully
on this setup/arch. But there is no clear indicator for "EXPECTED
UNTESTED" (because the user said so). I think ignoring them is slightly
more helpful, then you can more easily see how many "real" UNTESTED
tests there were.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-12-01 19:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-01 14:14 towards zero FAIL for make installcheck Mark Wielaard
2011-12-01 19:01 ` Dave Brolley
2011-12-01 19:06   ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).