Re: bunsen (re)design discussion #1: testrun & branch identifiers, testrun representation

public inbox for bunsen@sourceware.org
 help / color / mirror / Atom feed

From: "Frank Ch. Eigler" <fche@redhat.com>
To: Serhei Makarov <serhei@serhei.io>
Cc: Bunsen <bunsen@sourceware.org>
Subject: Re: bunsen (re)design discussion #1: testrun & branch identifiers, testrun representation
Date: Thu, 10 Mar 2022 13:04:14 -0500	[thread overview]
Message-ID: <20220310180414.GC28310@redhat.com> (raw)
In-Reply-To: <320ed3c9-2612-4f64-bb1a-6a791bef4168@www.fastmail.com>

Hi -

> Every testrun in the Bunsen repo should have a unique user-visible
> identifier. [...]

Reminding: "testrun" = "analysis data derived from a set of log files
stored in a git branch nearby".

> [Q] Should the bunsen_commit_id be deprecated entirely as a unique
> identifier? We could conceivably allow a single set of testlogs to
> define several testruns for the *same* project, which would break the
> uniqueness property for <project>+<bunsen_commit_id>

Given that we're reconsidering use of git for purposes of storing this
analysis data, there is no obvious hash for it.  OTOH, given that it
is the result of analysis, we have a lot of rich data to query by.  I
suspect we don't need much of a standard identifier scheme at all.  An
arbitrary nickname for query shortcutting could be a field associated
with the "testrun" object itself.

> - To accommodate fche's fine-grained branching idea, the
>   branch names in the Bunsen repo can be based on the unique ID.

(I was talking about fine-grained branching *for the test logs*, not
for the derived analysis data.)

> All of this machinery lets us specify natural-looking commands like:
> $ bunsen show systemtap/2022-02-13/12
>   -> show testrun
> $ bunsen show systemtap/2022-02-13/12:systemtap.log*
>   -> show testlog
> $ bunsen show systemtap/2022-02-13/12:systemtap.log:400-500
>   -> show subset of lines in a testlog
> $ bunsen ls systemtap/2022-02
>   -> list testruns matching a partial ID
> [...]

> Overall, though, the branch name format can be fairly free-form and
> different commit_logs scripts could even use different formats. The
> only design goals are to adequately split the testlogs into a fairly
> small number of fairly small branches (i.e. divide O(n*m) testlogs
> into O(n) branches of O(m) testlogs), and to allow 'bunsen pull'
> invocations to identify a meaningful subset of branches via wildcard
> or regex.

Noting we switched to talking about testlogs - the raw test result log
files that are archived in git.  These (rather than analysis data
stored somewhere else) are indeed worth pushing & pulling &
transporting between bunsen installations.  I suspect that imposing a
naming standard here is not necessary either.  As long as we
standardize what bunsen consumes from these repos (every tag?  every
commit?  every branch?  every ref matching a given regex?), a human or
other tooling could decide their favorite naming convention.

> [...]
> #1: Testrun Schema, very quick take1
> 
> version fields: these identify the version of the project being tested
> - source_commit: commit id in the project source repo
> - [Q] source_branch: branch being tested in the project source repo
>   - [...]
> - package_nvr: version id of a downstream package
>   - [Q] Should this be the full nvr (e.g. 'systemtap-4.4-6.el7') or
>     just the version number ('4.4-6.el7')?
> - version: fallback field if the project uses some other type of versioning scheme
>   - [Q] By default, we don’t know how to sort this. I could extend the
>     analysis libraries to load the original parsing module for the
>     project (e.g. systemtap.parse_dejagnu) and check for a method to
>     extract the version sort key.
> 
> configuration fields: identify the system configuration on which the project was tested
> - arch: hardware architecture
> - [...]
>   - 'target_board' is the only one that isn't covered by existing fields?

This list, and especially the "... and other fields ..." suggest to me
that these don't need to be tightly specified, but function as a set
of key/value tuples that the parsers would produce as results.  Yeah,
a conventional set is great for user convenience and for building
other analysis on top of the initial parse values.  But the list of
key names can be left open in the tooling core, so queries can use any
set of them.  i.e., the schema for "testrun" analysis objects could be:

   (id, keyword, value)

and then a user can find testruns by combination of keywords having
particular values (or range or pattern).  That can map
straightforwardly to a relational filter (select query) if that's how
we end up storing these things.

> testcases: an array of objects as follows 
> - name - DejaGNU .exp name
> - outcome - DejaGNU outcome code e.g. PASS, FAIL, KPASS, KFAIL, ...
> - subtest - DejaGNU subtest, very very very freeform
> - origin_sum - cursor (subset of lines) in the dejagnu .sum file
> - origin_log - cursor (subset of lines) in the dejagnu .log file

OK, so that could be a separate relational table of testrun analysis
data, derived from the testlog and related to the above set.

> [Q] (!!) The 'store PASS subtests separately' vs 'consolidate PASS
> subtests' format tradeoff is a complex one [...]

To the extent this represents an optimization, I'd tend to think of it
as a policy decision that the dejagnu parser configuration should make.

> bookkeeping information: where is the relevant data located in the Bunsen Git repo?
> - project: the project being tested
> - testrun_id: unique identifier of the testrun within the project
> - bunsen_testruns_branch: branch where the parsed testrun JSON is stored
> - bunsen_testlogs_branch: branch where the testlogs are stored
> - bunsen_commit_id: commit ID of the commit where the testlogs are stored

(Not sure how much of this needs to be formally separated, vs. just
basic key/value tuples associated with the testrun vs. already
represented otherwise.)

> - pass_count
> - fail_count
>    - [Q] These are self-explanatory, but not as full-featured as
>      DejaGNU’s summary of all outcome codes. I could remove this
>      entirely, or add an outcome_counts field containing a map
>      {‘PASS’: …, ‘FAIL’: …, ‘KFAIL’: …, …}

These could also go as additional key/value tuples into the testrun.
(Prefix their names with "dejagnu-" to identify the analysis/parse
tool that created them.)

- FChE

next prev parent reply	other threads:[~2022-03-10 18:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-09 15:07 Serhei Makarov
2022-03-09 22:25 ` Serhei Makarov
2022-03-10 18:04 ` Frank Ch. Eigler [this message]
2022-03-10 20:00   ` Serhei Makarov
2022-03-10 23:00     ` Frank Ch. Eigler
2022-03-14 17:24       ` Serhei Makarov
2022-04-07 16:42         ` Keith Seitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220310180414.GC28310@redhat.com \
    --to=fche@redhat.com \
    --cc=bunsen@sourceware.org \
    --cc=serhei@serhei.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).