From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com [64.147.123.25]) by sourceware.org (Postfix) with ESMTPS id A617E3858C74 for ; Wed, 9 Mar 2022 22:26:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A617E3858C74 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=serhei.io Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=serhei.io Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id 8F38C32007F9; Wed, 9 Mar 2022 17:26:10 -0500 (EST) Received: from imap47 ([10.202.2.97]) by compute5.internal (MEProxy); Wed, 09 Mar 2022 17:26:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=serhei.io; h=cc :content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to; s=fm1; bh=U+E0JqycTFN69/q5vIgKUFDmAjlU9Lag2wUPFV TMh8o=; b=YSM7toUVUD4VuQbLTkcaf2RLX18NA9oEPRj0k3TbME3uvHk6vjtjw9 9Cmome6miqh4CtPZSTLdoMTnc7s0vMPgvkdcO7++oe5bUS4+jev4Eda9TYwePXSb 60alhaTTJY7+6BOLTiJ6E+AP7D4m5Ps8+JAlHN8UglS1C2l8eeJGvvNs8Cb0AZVG NcYZDMQCuoVeznx5Wl/5M5dcND3pZKUorAl47D1ti3IIk0CguMzk9aBZEQrz2o5Q 9MvqU8LaAFhg9jdLgER4X76CxFrxFtnL26kLhkbQYQtalXAYLRNaTw5xMAO+gMG+ 09ncJsLEdNGLyhafgBl8OsSIpMaQWpWg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=U+E0JqycTFN69/q5v IgKUFDmAjlU9Lag2wUPFVTMh8o=; b=XoclsWC8ChKJ8af5Ke11UwzgCd1bUfEM8 VmAI59R7I/OLkmzR04YnX7sCcodHEX3d/I1th8JRCiyxmzNnF0tqlUORZ5I00iGc F0UKLLRpVZ/ZiSrMSZoeOiz5J8uOkej2mDu+HKizsfh+tx3waxeI4+LO8KJ/Bxk6 amN6vobkTjCnE7WEr63o051pQ6BaoGLDRWSF8dtQlnMBVNxX82OFWKQxHa7/z9b5 vL5mE7sNkIe8bn+ks1DNkV5/kMrDP/XvedaMkiOTS0yLSb8JSTt32kG4bnCiWpRZ mfu9N9U4QwLXCgCWlIAX72wMcG0EaSYpGuc8PhrR/XjekrKBxdNyg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddruddukedgudehlecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhepofgfggfkjghffffhvffutgesth dtredtreertdenucfhrhhomhepfdfuvghrhhgvihcuofgrkhgrrhhovhdfuceoshgvrhhh vghisehsvghrhhgvihdrihhoqeenucggtffrrghtthgvrhhnpefhuddvieeukeffkefhle ehuedtvdehkeevfefgvdelgfevudelffefhfefuddugfenucevlhhushhtvghrufhiiigv pedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehsvghrhhgvihesshgvrhhhvghirdhioh X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id BEBFB2740108; Wed, 9 Mar 2022 17:26:09 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.5.0-alpha0-4778-g14fba9972e-fm-20220217.001-g14fba997 Mime-Version: 1.0 Message-Id: In-Reply-To: <320ed3c9-2612-4f64-bb1a-6a791bef4168@www.fastmail.com> References: <320ed3c9-2612-4f64-bb1a-6a791bef4168@www.fastmail.com> Date: Wed, 09 Mar 2022 17:25:49 -0500 From: "Serhei Makarov" To: "Serhei Makarov" , Bunsen Subject: Re: bunsen (re)design discussion #1: testrun & branch identifiers, testrun representation Content-Type: text/plain X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, JMQ_SPF_NEUTRAL, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: bunsen@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Bunsen mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Mar 2022 22:26:16 -0000 * #1c Details of DejaGNU name/outcome/subtest There are two options for the format of the 'testcases' array. *Option 1:* PASS subtests are combined into a single PASS entry for each .exp. Example: from a dejagnu sum file containing: 01 Running foo.exp... 02 PASS: rotate gears 03 PASS: clean widgets (widget no.1,2,3) 04 PASS: polish tea kettle 05 Running bar.exp... 06 PASS: greet guests 07 FAIL: serve casserole (casserole was burnt) 08 oven temperature was 1400 degrees F 09 XFAIL: guests are upset 3/5 stars 10 "You are a so-so host." 11 PASS: clean house after guests depart The Bunsen testcase entries corresponding to this entry would be: - {name:'foo.exp', outcome:'PASS', origin_sum:'project.sum:01-04'} - {name:'bar.exp', outcome:'FAIL', subtest:'serve casserole (casserole was burnt)', origin_sum:'project.sum:07-08'} - {name:'bar.exp', outcome:'XFAIL', subtest:'guests are upset 3/5 stars', origin_sum:'project.sum:09-10'} The current testrun format, as extensively tested with the SystemTap buildbots, combines PASS subtests into a single entry for each .exp (with 'subtest' field omitted), but stores FAIL subtests as separate entries. When working with portions of the testsuite that don't contain failures, this significantly reduces the size of the JSON that needs to be processed. The reasoning for why this format works is that the set of subtest messages across different DejaGNU runs is extremely inconsistent. Therefore, we define the 'state' of an entire .exp testcase as 'the set of FAIL subtests produced by this testcase' and compare testruns at the .exp level accordingly. [Q] We _assume_ that the 'absence' of a PASS subtest in a testrun is not a problem, and consider it the testsuite's responsibility to explicitly signal a FAIL when something doesn't work. Is this assumption accurate to how projects use DejaGNU? Note that the PASS subtest for bar.exp were dropped in the above example. If the set of failures is empty, we mark the entirety of bar.exp as a 'PASS'. Since the set of failures is nonempty, we just record the 'FAIL' subtests in the set. *Option 2:* PASS subtests are stored separately just like FAIL subtests. In this case, every subtest line in the testrun creates a corresponding entry in the 'testcases' array. keiths requested this mode for the GDB logs. It takes up a lot more space (although the strong de-duplication of Bunsen's storage format may make this a moot issue, it slows down batch jobs working with bulk quantities of testcase data). But it allows test results to be diffed to detect the 'absence of a PASS' as a problem. In principle, I don't see a reason why Bunsen couldn't continue supporting either mode, with some careful attention paid to the API in the various analysis scripts (the analysis scripts contain a collection of helper functions for working with passes, fails, diffs, etc. which has been gradually evolving into a proper library). * #1d Applying the testcase format scheme to config.log Including the yes/no answers from configure in the parsed testcase data. This is an idea from fche I find very intriguing, especially because changes in autoconf tests can correlate with regressions caused by the environment. Applying the scheme to config.log: - name = "config.log" - subtest = "checking for/whether VVV" - outcome = yes (PASS) or no (FAIL) This should probably be stored with both PASS and FAIL subtests 'separate'. [Q] Where should the parsed config.log be stored? - in the 'testcases' field of same testrun? -> analysis scripts must know how to treat 'config.log' FAIL entries and ignore them (as 'not real failures') if necessary. In short, 'config.log' FAIL entries are relevant to diffing/bisection for a known problem, but not relevant when reporting regressions. - in a different field of same testrun? e.g. 'config'? -> analysis scripts will ignore 'config' unless explicitly coded to look at it. - in a testrun in a separate project? (e.g. 'systemtap' -> 'systemtap-config') -> this is similar to the gcc buildbot case, where one testsuite run will create testruns in several 'projects' (e.g. 'gcc','ld','gas',...) [Q] In analysis scripts such as show_testcases, how to show changes in large testcases more granularly (e.g. check.exp, bpf.exp)? A brainstorm: - Add an option split_subtests which will try to show a single grid for every subtest - Scan the history for subtest strings, possibly with common prefix. Try to reduce the set / combine subtest strings with identical history - Generate a grid view for each subtest we've identified this way - In the CGI HTML view, for each .exp testcase of the grid view without split_subtests, add a link to the split_subtests view of that particular .exp - This would be much better than the current mcermak-derived option to show subtests when a '+' table cell is clicked. The HTML is lighter-weight and the history of separate subtests is clearly visible. - Possibly: identify the testcases which require more granularity (e.g. they are always failing, only the number of failures keeps changing) and expand them automatically in the top level grid view. [Q] For someone testing projects on a company-internal setup, how do we extract a 'safe' subset of data that can be shared with the public? - Option 1: analysis results only (e.g. grid views without subtests are guaranteed-safe) - Option 2: testrun data but not testlogs (includes subtest strings; these may or may not be safe) - Option 3: testrun data with scrubbed subtest strings (replace several FAIL outcomes with one testcase entry whose subtest says 'N failures') Note: Within the 'Makefile' scheme, the scrubbing could be handled by an analysis script that produces project 'systemtap-sourceware' from 'systemtap'.