public inbox for bunsen@sourceware.org
 help / color / mirror / Atom feed
* [RFC] GDB sanity test
@ 2022-04-19 19:01 Keith Seitz
  2022-04-19 20:33 ` Serhei Makarov
  0 siblings, 1 reply; 5+ messages in thread
From: Keith Seitz @ 2022-04-19 19:01 UTC (permalink / raw)
  To: bunsen

This RFC is meant to start a discussion about testing. As bunsen
starts to mature and more people rely on it, it is imperative that
contributions don't randomly break other projects.

In that vein, this is a basic sanity test that I've written for GDB
which imports a single gdb.{log,sum} from a test fixture (.tar.xz).
It double-checks all results and test counts.

To do this, I've written a custom gdb test script that outputs every
valid test outcome and run this on a fedora 33 VM.

I'm only including the actual test file, not the glue code
(tests/core.py) necessary to make it work.

Does this look like an acceptable approach?

Keith
---
 tests/bunsen/test_init.py                     |  11 +++
 tests/core.py                                 |  43 +++++++++
 .../test_gdb_sanity/gdb-sanity.tar.xz         | Bin 0 -> 2120 bytes
 .../fixtures/test_gdb_sanity/test_outcomes.c  |  24 +++++
 .../test_gdb_sanity/test_outcomes.exp         |  50 +++++++++++
 tests/gdb/test_gdb_sanity.py                  |  82 ++++++++++++++++++
 6 files changed, 210 insertions(+)
 create mode 100644 tests/bunsen/test_init.py
 create mode 100644 tests/core.py
 create mode 100644 tests/gdb/fixtures/test_gdb_sanity/gdb-sanity.tar.xz
 create mode 100644 tests/gdb/fixtures/test_gdb_sanity/test_outcomes.c
 create mode 100644 tests/gdb/fixtures/test_gdb_sanity/test_outcomes.exp
 create mode 100644 tests/gdb/test_gdb_sanity.py

diff --git a/tests/gdb/test_gdb_sanity.py b/tests/gdb/test_gdb_sanity.py
new file mode 100644
index 0000000..1ce65be
--- /dev/null
+++ b/tests/gdb/test_gdb_sanity.py
@@ -0,0 +1,82 @@
+# A quick sanity test that the ensures that bunsen can import a really
+# basic log/sum test run which contains all test outcomes.
+
+import os.path
+from tests.core import setup_bunsen_repo, import_results
+
+TEST_FIXTURE = 'tests/gdb/fixtures/test_gdb_sanity/gdb-sanity.tar.xz'
+PATH_STRING = '/home/fedora33/work/gdb/fsf/virgin/linux/gdb/testsuite'
+
+def assert_test_name(test, subtest):
+        assert test['name'] == 'gdb.gdb/test_outcomes.exp'
+        assert test['subtest'] == subtest
+
+def print_test(test):
+    print(f'{test["outcome"]}: {test["name"]}: {test["subtest"]}')
+
+def test_gdb_sanity(tmp_path):
+    # Initialize a new bunsen instance.
+    instance = setup_bunsen_repo(tmp_path)
+    assert instance is not None
+
+    # Make sure the fixture for this test exists.
+    fixture = os.path.join(os.getcwd(), TEST_FIXTURE)
+    assert os.path.exists(fixture)
+
+    # Import the results.
+    commits = import_results(fixture, instance)
+    assert commits is not None
+    assert len(commits) == 1
+
+    # Print out the commit info.
+    canonical, commit = commits[0]
+    print(f'successfully imported test fixture, commit {commit}')
+    print(f'(canonically {canonical}')
+
+    # Get all test results and info strings.
+    testrun = instance.testrun(commit)
+    all_tests = testrun.testcases
+    info = testrun.get_info_strings()
+
+    # Verify metadata.
+    assert info["branch"] == 'f33'
+    assert info["architecture"] == 'x86_64'
+    assert info['target_board'] == 'native-gdbserver/-m32'
+    assert testrun.source_commit == '1234567890123456789012345678901234567890'
+    assert info['version'] == '13.0.50.20220412-git'
+
+    # Verify the tests.  First the easy ones with one result each.
+    for outcome in {'FAIL', 'XPASS', 'XFAIL', 'KPASS', 'KFAIL',
+                    'UNTESTED', 'UNRESOLVED', 'UNSUPPORTED'}:
+        test = [t for t in all_tests if t['outcome'] == outcome]
+        assert len(test) == 1
+        test = test[0]
+        print_test(test)
+        assert_test_name(test, 'expect ' + outcome)
+
+    # PATH should also have one result.
+    test = [t for t in all_tests if t['outcome'] == 'PATH']
+    assert len(test) == 1
+    print_test(test[0])
+    #assert_test_name(test[0], PATH_STRING)
+
+    # PASS should have 3 results.  One of these will be a duplicate, one will
+    # contain only a path name (which elicits GDB's 'PATH' outcome).
+    passes = [t for t in all_tests if t['outcome'] == 'PASS']
+    assert len(passes) == 3
+
+    # Verify the PASS result resulting in a PATH outcome.
+    path = [t for t in passes if t['subtest'].startswith('expect PATH')]
+    assert len(path) == 1
+    path = path[0]
+    print_test(path)
+    assert_test_name(path, 'expect PATH ' + PATH_STRING)
+
+    # Now verify the remaining two PASS results, which have duplicate "subtest"
+    # names.  The parser will append " <<2>>" for the duplicate to keep subtest
+    # names unique.
+    passes.remove(path)
+    print_test(passes[0])
+    assert_test_name(passes[0], 'expect PASS')
+    print_test(passes[1])
+    assert_test_name(passes[1], 'expect PASS <<2>>')
\ No newline at end of file
-- 
2.35.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] GDB sanity test
  2022-04-19 19:01 [RFC] GDB sanity test Keith Seitz
@ 2022-04-19 20:33 ` Serhei Makarov
  2022-04-20 15:32   ` Keith Seitz
  0 siblings, 1 reply; 5+ messages in thread
From: Serhei Makarov @ 2022-04-19 20:33 UTC (permalink / raw)
  To: Keith Seitz, Bunsen

On Tue, Apr 19, 2022, at 3:01 PM, Keith Seitz via Bunsen wrote:
> This RFC is meant to start a discussion about testing. As bunsen
> starts to mature and more people rely on it, it is imperative that
> contributions don't randomly break other projects.
>
> In that vein, this is a basic sanity test that I've written for GDB
> which imports a single gdb.{log,sum} from a test fixture (.tar.xz).
> It double-checks all results and test counts.
Correct, & I have other ideas for operations to test
(e.g. duplicate-add, list testruns, update, delete) based on your
example.

Any other operations that come to mind as a priority to test?

> To do this, I've written a custom gdb test script that outputs every
> valid test outcome and run this on a fedora 33 VM.
>
> I'm only including the actual test file, not the glue code
> (tests/core.py) necessary to make it work.
It looks good to me. I'm fine if you commit it and I add some SystemTap variants
subject to the following caveats/questions:

- Perhaps the test data is better to commit as plaintext rather than binary .tar.xz?
  When needed, a tar or tar.xz can be prepared with tar cJf /path/to/sample/data/*

- We'll want to run the same sequence of test operations on different data sets,
  so I would need to rework your fixture to allow that (e.g. perhaps
  expected result counts / etc. could be specified in the fixture and the 
  checking procedure generalized). Similar fixtures could be created
  for more realistic SystemTap and GDB data (to test the parser's edge cases),
  and perhaps for other completely-synthetic data (where I can encode
  some regressions that we would want the downstream analysis to capture).

- Since there can be Bunsen setups operating on private data, ideally
  I'd like to make sure they can set up their own fixtures to run this
  testsuite against and check for breakage, the same way that Bunsen
  itself can invoke local scripts stored in .bunsen/scripts-whatever/.
  Pretty sure a way to do that can be concocted with these standard Python
  testing tools.

Couple of comments on the code:

> diff --git a/tests/gdb/test_gdb_sanity.py b/tests/gdb/test_gdb_sanity.py
> new file mode 100644
> index 0000000..1ce65be
> --- /dev/null
> +++ b/tests/gdb/test_gdb_sanity.py
> @@ -0,0 +1,82 @@
> +# A quick sanity test that the ensures that bunsen can import a really
> +# basic log/sum test run which contains all test outcomes.
> +
> +import os.path
> +from tests.core import setup_bunsen_repo, import_results
Out of curiousity, is that glue code calling out to shell operations,
or is it invoking methods in class Bunsen directly?

> +TEST_FIXTURE = 'tests/gdb/fixtures/test_gdb_sanity/gdb-sanity.tar.xz'
> +PATH_STRING = '/home/fedora33/work/gdb/fsf/virgin/linux/gdb/testsuite'
If the data is synthetic (and even if it isn't), we could do a search and replace
to change it to something more generic and less revealing of your work setup :)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] GDB sanity test
  2022-04-19 20:33 ` Serhei Makarov
@ 2022-04-20 15:32   ` Keith Seitz
  2022-04-21 15:18     ` Frank Ch. Eigler
  0 siblings, 1 reply; 5+ messages in thread
From: Keith Seitz @ 2022-04-20 15:32 UTC (permalink / raw)
  To: Serhei Makarov, Bunsen

On 4/19/22 13:33, Serhei Makarov wrote:
> On Tue, Apr 19, 2022, at 3:01 PM, Keith Seitz via Bunsen wrote:
>> This RFC is meant to start a discussion about testing. As bunsen
>> starts to mature and more people rely on it, it is imperative that
>> contributions don't randomly break other projects.
>>
>> In that vein, this is a basic sanity test that I've written for GDB
>> which imports a single gdb.{log,sum} from a test fixture (.tar.xz).
>> It double-checks all results and test counts.
> Correct, & I have other ideas for operations to test
> (e.g. duplicate-add, list testruns, update, delete) based on your
> example.
> 
> Any other operations that come to mind as a priority to test?

My priority right now is correctness testing, starting with, e.g,
the summarize script. After that my next step is to test the log
annotation and see if that is working as expected.

>> To do this, I've written a custom gdb test script that outputs every
>> valid test outcome and run this on a fedora 33 VM.
>>
>> I'm only including the actual test file, not the glue code
>> (tests/core.py) necessary to make it work.
> It looks good to me. I'm fine if you commit it and I add some SystemTap variants
> subject to the following caveats/questions:
> 
> - Perhaps the test data is better to commit as plaintext rather than binary .tar.xz?
>    When needed, a tar or tar.xz can be prepared with tar cJf /path/to/sample/data/*

I can do that.

> - We'll want to run the same sequence of test operations on different data sets,
>    so I would need to rework your fixture to allow that (e.g. perhaps
>    expected result counts / etc. could be specified in the fixture and the
>    checking procedure generalized). Similar fixtures could be created
>    for more realistic SystemTap and GDB data (to test the parser's edge cases),
>    and perhaps for other completely-synthetic data (where I can encode
>    some regressions that we would want the downstream analysis to capture).

Yeah, that is something that I've only cursorily thought about as a generalized
solution. This sanity test (which really is sort of a basic gdb parser test) is,
IMO, okay to hardcode the expected results. I agree we will want some infrastructure
to facilitate writing tests.

I guess if we rely on DejaGNU .log/.sum files as test data, we could write up
a function to grok the results at the end of the .sum file to compare with what
the actual bunsen parsers read. That is essentially how/why I wrote summarize.py.

Any other ideas how we might want to do this?

> - Since there can be Bunsen setups operating on private data, ideally
>    I'd like to make sure they can set up their own fixtures to run this
>    testsuite against and check for breakage, the same way that Bunsen
>    itself can invoke local scripts stored in .bunsen/scripts-whatever/.
>    Pretty sure a way to do that can be concocted with these standard Python
>    testing tools.

The one obvious missing bit is being able to easily invoke tests. Right now,
I just rely on "python3 -m pytest tests" to run all tests. Maybe a makefile
or shell script to do this would be appropriate.

> Couple of comments on the code:
> 
>> diff --git a/tests/gdb/test_gdb_sanity.py b/tests/gdb/test_gdb_sanity.py
>> new file mode 100644
>> index 0000000..1ce65be
>> --- /dev/null
>> +++ b/tests/gdb/test_gdb_sanity.py
>> @@ -0,0 +1,82 @@
>> +# A quick sanity test that the ensures that bunsen can import a really
>> +# basic log/sum test run which contains all test outcomes.
>> +
>> +import os.path
>> +from tests.core import setup_bunsen_repo, import_results
> Out of curiousity, is that glue code calling out to shell operations,
> or is it invoking methods in class Bunsen directly?

It is 100% python. Between wrapping python API calls in a shell script for
CLI usage and wrapping shell scripts as API calls for python, I prefer the
former, and that's what I've implemented.

Writing python libraries as shell script wrappers is beyond my pain
threshold. I am writing python-based utilities/web services, not a cron job
or ancient (and insecure) CGI script.

>> +TEST_FIXTURE = 'tests/gdb/fixtures/test_gdb_sanity/gdb-sanity.tar.xz'
>> +PATH_STRING = '/home/fedora33/work/gdb/fsf/virgin/linux/gdb/testsuite'
> If the data is synthetic (and even if it isn't), we could do a search and replace
> to change it to something more generic and less revealing of your work setup :)

This reveals nothing other than how the VM I used to generate the data is setup.

I'll continue hacking at this a bit until I'm satisfied that things are genuinely
useful. I've discovered a few minor issues with the gdb parser that I'd like to
correct, and that will affect this test.

Keith


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] GDB sanity test
  2022-04-20 15:32   ` Keith Seitz
@ 2022-04-21 15:18     ` Frank Ch. Eigler
  2022-04-21 16:31       ` Keith Seitz
  0 siblings, 1 reply; 5+ messages in thread
From: Frank Ch. Eigler @ 2022-04-21 15:18 UTC (permalink / raw)
  To: Keith Seitz; +Cc: Serhei Makarov, Bunsen

Hi -

> [...]
> I guess if we rely on DejaGNU .log/.sum files as test data, we could write up
> a function to grok the results at the end of the .sum file to compare with what
> the actual bunsen parsers read. That is essentially how/why I wrote summarize.py.
> 
> Any other ideas how we might want to do this?

If this kind of ongoing sanity checking were valuable -- comparing
final .sum "# of FOOBAR" counts from dejagnu to parser output data,
then the parser could do that itself every time it runs, as a
confirmatory self-check.


> [...]
> The one obvious missing bit is being able to easily invoke tests. Right now,
> I just rely on "python3 -m pytest tests" to run all tests. Maybe a makefile
> or shell script to do this would be appropriate.

In the fche/bunsenql branch, there is now autoconf/automake machinery that
lets you run "configure; make; make check" and run the bulk of your test
in the automake style.


- FChE


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] GDB sanity test
  2022-04-21 15:18     ` Frank Ch. Eigler
@ 2022-04-21 16:31       ` Keith Seitz
  0 siblings, 0 replies; 5+ messages in thread
From: Keith Seitz @ 2022-04-21 16:31 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Serhei Makarov, Bunsen

On 4/21/22 08:18, Frank Ch. Eigler wrote:
>> Any other ideas how we might want to do this?
> 
> If this kind of ongoing sanity checking were valuable -- comparing
> final .sum "# of FOOBAR" counts from dejagnu to parser output data,
> then the parser could do that itself every time it runs, as a
> confirmatory self-check.

Yeah, that's not a horrible idea... We still need some sort of sanity
or CI test, but it would likely be much simpler. Like just include
{project}.{log,sum} files for gdb and systemtap (and ...) and run
those as tests.

>> The one obvious missing bit is being able to easily invoke tests. Right now,
>> I just rely on "python3 -m pytest tests" to run all tests. Maybe a makefile
>> or shell script to do this would be appropriate.
> 
> In the fche/bunsenql branch, there is now autoconf/automake machinery that
> lets you run "configure; make; make check" and run the bulk of your test
> in the automake style.

I saw, thanks!

So the question I have is: if I want to progress my goals w/bunsen, what
should I be doing now to help out, if anything, and where should I be
doing it (bunsenql, master)?

Keith


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-04-21 16:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-19 19:01 [RFC] GDB sanity test Keith Seitz
2022-04-19 20:33 ` Serhei Makarov
2022-04-20 15:32   ` Keith Seitz
2022-04-21 15:18     ` Frank Ch. Eigler
2022-04-21 16:31       ` Keith Seitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).