Re: bunsen (re)design discussion #1: testrun & branch identifiers, testrun representation

public inbox for bunsen@sourceware.org
 help / color / mirror / Atom feed

From: "Serhei Makarov" <serhei@serhei.io>
To: "Serhei Makarov" <serhei@serhei.io>, Bunsen <bunsen@sourceware.org>
Subject: Re: bunsen (re)design discussion #1: testrun & branch identifiers, testrun representation
Date: Wed, 09 Mar 2022 17:25:49 -0500	[thread overview]
Message-ID: <cbefae08-eb8f-4e9b-9e05-14e9f0644712@www.fastmail.com> (raw)
In-Reply-To: <320ed3c9-2612-4f64-bb1a-6a791bef4168@www.fastmail.com>

* #1c Details of DejaGNU name/outcome/subtest

There are two options for the format of the 'testcases' array.

*Option 1:* PASS subtests are combined into a single PASS entry for each .exp.

Example: from a dejagnu sum file containing:

    01 Running foo.exp...
    02 PASS: rotate gears
    03 PASS: clean widgets (widget no.1,2,3)
    04 PASS: polish tea kettle
    05 Running bar.exp...
    06 PASS: greet guests
    07 FAIL: serve casserole (casserole was burnt)
    08 oven temperature was 1400 degrees F
    09 XFAIL: guests are upset 3/5 stars
    10 "You are a so-so host."
    11 PASS: clean house after guests depart

The Bunsen testcase entries corresponding to this entry would be:
- {name:'foo.exp', outcome:'PASS',
  origin_sum:'project.sum:01-04'}
- {name:'bar.exp', outcome:'FAIL',
  subtest:'serve casserole (casserole was burnt)',
  origin_sum:'project.sum:07-08'}
- {name:'bar.exp', outcome:'XFAIL',
  subtest:'guests are upset 3/5 stars',
  origin_sum:'project.sum:09-10'}

The current testrun format, as extensively tested with the SystemTap buildbots,
combines PASS subtests into a single entry for each .exp (with 'subtest' field
omitted), but stores FAIL subtests as separate entries. When working with
portions of the testsuite that don't contain failures, this significantly
reduces the size of the JSON that needs to be processed.

The reasoning for why this format works is that the set of subtest messages
across different DejaGNU runs is extremely inconsistent. Therefore, we define
the 'state' of an entire .exp testcase as 'the set of FAIL subtests produced by
this testcase' and compare testruns at the .exp level accordingly.

[Q] We _assume_ that the 'absence' of a PASS subtest in a testrun is not a
problem, and consider it the testsuite's responsibility to explicitly signal a
FAIL when something doesn't work. Is this assumption accurate to how projects
use DejaGNU?

Note that the PASS subtest for bar.exp were dropped in the above example.
If the set of failures is empty, we mark the entirety of bar.exp as a 'PASS'.
Since the set of failures is nonempty, we just record the 'FAIL' subtests in the set.

*Option 2:* PASS subtests are stored separately just like FAIL subtests.
In this case, every subtest line in the testrun creates a corresponding
entry in the 'testcases' array.

keiths requested this mode for the GDB logs.

It takes up a lot more space (although the strong de-duplication of Bunsen's
storage format may make this a moot issue, it slows down batch jobs working with
bulk quantities of testcase data).

But it allows test results to be diffed to detect the 'absence of a PASS' as a problem.

In principle, I don't see a reason why Bunsen couldn't continue supporting
either mode, with some careful attention paid to the API in the various analysis
scripts (the analysis scripts contain a collection of helper functions for
working with passes, fails, diffs, etc. which has been gradually evolving into a
proper library).

* #1d Applying the testcase format scheme to config.log

Including the yes/no answers from configure in the parsed testcase data. This is
an idea from fche I find very intriguing, especially because changes in autoconf
tests can correlate with regressions caused by the environment.

Applying the scheme to config.log:
- name = "config.log"
- subtest = "checking for/whether VVV"
- outcome = yes (PASS) or no (FAIL)

This should probably be stored with both PASS and FAIL subtests 'separate'.

[Q] Where should the parsed config.log be stored?
- in the 'testcases' field of same testrun?
  -> analysis scripts must know how to treat 'config.log' FAIL
  entries and ignore them (as 'not real failures') if necessary.  
  In short, 'config.log' FAIL entries are relevant to diffing/bisection for a
  known problem, but not relevant when reporting regressions.
- in a different field of same testrun? e.g. 'config'?
  -> analysis scripts will ignore 'config' unless explicitly coded to look at it.
- in a testrun in a separate project? (e.g. 'systemtap' -> 'systemtap-config')
  -> this is similar to the gcc buildbot case, where one testsuite run
  will create testruns in several 'projects' (e.g. 'gcc','ld','gas',...)

[Q] In analysis scripts such as show_testcases, how to show changes in large
testcases more granularly (e.g. check.exp, bpf.exp)? A brainstorm:
- Add an option split_subtests which will try to show a single grid for every
  subtest
- Scan the history for subtest strings, possibly with common prefix. Try to
  reduce the set / combine subtest strings with identical history
- Generate a grid view for each subtest we've identified this way
- In the CGI HTML view, for each .exp testcase of the grid view without
  split_subtests, add a link to the split_subtests view of that particular .exp
  - This would be much better than the current mcermak-derived option to show
    subtests when a '+' table cell is clicked. The HTML is lighter-weight and
    the history of separate subtests is clearly visible.
- Possibly: identify the testcases which require more granularity
  (e.g. they are always failing, only the number of failures keeps changing)
  and expand them automatically in the top level grid view.

[Q] For someone testing projects on a company-internal setup, how do we
extract a 'safe' subset of data that can be shared with the public?

- Option 1: analysis results only (e.g. grid views without subtests are
  guaranteed-safe)

- Option 2: testrun data but not testlogs (includes subtest strings; these may
  or may not be safe)

- Option 3: testrun data with scrubbed subtest strings (replace several FAIL
  outcomes with one testcase entry whose subtest says 'N failures')

Note: Within the 'Makefile' scheme, the scrubbing could be handled by an analysis
script that produces project 'systemtap-sourceware' from 'systemtap'.

next prev parent reply	other threads:[~2022-03-09 22:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-09 15:07 Serhei Makarov
2022-03-09 22:25 ` Serhei Makarov [this message]
2022-03-10 18:04 ` Frank Ch. Eigler
2022-03-10 20:00   ` Serhei Makarov
2022-03-10 23:00     ` Frank Ch. Eigler
2022-03-14 17:24       ` Serhei Makarov
2022-04-07 16:42         ` Keith Seitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cbefae08-eb8f-4e9b-9e05-14e9f0644712@www.fastmail.com \
    --to=serhei@serhei.io \
    --cc=bunsen@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).